IoT Podcast Logo

Our work describes foundational efforts with SautiDB-Naija, a novel corpus of non-native (L2) Nigerian English speech. We describe how the corpus was created and curated as well as preliminary experiments with accent classification and learning Nigerian accent embeddings.

The initial version of the corpus includes over 900 recordings from L2 English speakers of Nigerian languages, such as Yoruba, Igbo, Edo, Efik-Ibibio, and Igala. We further demonstrate how fine-tuning on a pre-trained model like wav2vec can yield representations suitable for related speech tasks such as pronunciation classification, accent translation, or voice conversion. SautiDB-Naija has been published to Zenodo for general use under a flexible Creative Commons Licence.

ABOUT THE SPEAKERS

Tejumande is a computer scientist/research assistant and founder of AI Saturdays (in Lagos, Nigeria). AI Saturdays Lagos is an artificial intelligence community. Tejumade co-founded this community in 2018 and offers 16 consecutive Saturdays of free courses on Data Science, Machine Learning, and Deep Learning in structured learning groups.

Tejumade is passionate about the field of artificial intelligence but is wary of its potential to open up or exacerbate inequalities. Therefore, she believes that in order to innovate better in this field and address many of its shortcomings, we as a community need to expand the places where AI is practised and set ambitious goals to democratize AI education, with a broader audience discussing and contributing to the success of an AI-enabled world.

Find out more about AI Saturdays Lagos

Follow Ria

Session Transcript

Tejumade Afonja
Hi, everyone. My name is Tejumade Afonja, and today I will talk about my research on Learning Nigerian Access Embedding from Speech. I’ll show some preliminary results we have based on our SautiDB-Naija Corpus. This work is done in collaboration with my co-authors listed as follows. Before we dive into the research, I’d like to share some details about me. I am a computer science master’s student as well and University. I’m also a research assistant at CES PR, where my research is focused on learning generative model for tabular data.

Tejumade Afonja
In 2018, I co founded AI Saturday Lagos, a community dedicated to teaching machine learning and its related subjects to African youths. Our teaching activity happens twice in a year, with each cohort running for 16 weeks. Every Saturday, we meet from 10am to 4pm. To learn different topic on data science and machine learning, based on our current curriculum for that cohorts. I grew up in Ibadan, Nigeria, but I have since moved to Zabrak. And for my masters three years ago. Today, the outline of my talk is as follows. I’m going to motivate why access matters. I’m also going to talk about online learning or motivates this from the perspective of online learning. Specifically, I’ll talk about our survey results and also our project this out to project. In the second part of my presentation, I’ll talk about the South DB corpus. And finally to finish up our discourse and access classification task, which is a work which we which we which is the task we pose for the embedding for learning, accent embedding, and then I’ll show some preliminary results. To begin, she could not understand me. This was what Sony has mom told her when she spoke to Alexa. Sony AS mom was born in Philippine, our father is India. Both of them speak English as a third language. And in nearly over 50 years that they’ve lived in the United States, they have spoken English daily, fluently, but with their own distinct accents, and sometimes different phrasing than a native speaker.

Tejumade Afonja
So in the experience, that means that Siri or Alexa or basically any device that uses speech technology, will struggle to recognise their accent or their command majorly. Because the data set that are used to train this technology to speech technologies might not be well representative of the audience that it seeks to serve. So there is certainly a tension between accent and comprehensibility. There are currently more than 7000 languages in the world, with only 23 of those covering more than half of the world’s population. English is one of those languages. So English is mostly spoken is the most widely spoken language in the world with over 1 million speakers. 753 million of those are non native speaker. English is dominant language on the internet. nearly 54% of the content that are currently present on the internet is in English. So English is recognised as an official language in 59 countries 24 of those African country due to colonise Kowloon now, colon, colonialism, even that word is so big for me to speak or globalisation. So one way or the other. We speak English. So well I’m getting at is that speaking English seem to be the most crucial skill in terms of communication in today, learning, interacting,and going about our daily life, a bar.

Tejumade Afonja
Since we all have different accent, we all speak English differently. It means that in some, sometimes we struggle to understand one another. And certainly, like I said, not all 1 million speakers speak English in the same way. So, we all have our own unique speaking style. And it’s referred to as accents. It’s a distinctive way of pronouncing the language especially, it is especially associated with a particular country, area or socio class. The accent is often a reflection of each individual’s unique background, it encodes a lot of information about a person, and accent differ in the quality of voice, the pronunciation, distinction of vowel distress, intonation, and so on. So, what this relates to what this means is that accent also is one of those problems that we can all relate to, which jumps up in different areas of our life. It might be in the form of a two way conversation with a colleague, for example, who is from another country, where we are struggling to understand them or they to be understood by them. Or it may be a one way communication where in the case of an instructor and students in an online setting, for example, where the student is stronger struggling to understand the accent of the instructor, so they might better understand the content being taught by the instructor. So clearly, accents plays an important role in communication.

Tejumade Afonja
And indeed, a lot of research has shown some negative correlation between accents and comprehensibility, which nicely points to the need to be familiar with the speaker’s accent, in order to better understand them. It is true that it’s possible to get used to an accent with more exposure, perhaps or say you move to another country, you may immerse yourself in another culture. But how do you get used to accents that you seldom interact with, which is the case of online learning? Online learning as become mainstream in the world, with many institution offering courses, and like this courses are mostly developed, all in order is like in all other institution that not in the in the developing regions, so to say. So what I’m trying to get at here is that online is is set out to break the barrier or democratise knowledge, helpers, access any information that we want at any point in time. But it’s still not accessible to everyone. Equally. What I mean is that the the way in which this courses are displayed is relates to the per the instruction who is presenting this course, has their own unique way of speaking. And if a certain group of people are not used to that way of their speaking, it can delay their understanding or impede the way they assimilate the content that the structure is trying to pass along. So what I’m trying to say is that if if a bulk of the of this online courses have been created by some instructor in some geographical location, it means that we are still going to be faced with this problem of understanding, like if we are not used to the accent in which this content has been presented. So the problem remain unaddressed. In fact, I’d say that it’s even much more pronounced in the setting.

Tejumade Afonja
But we don’t know for sure, right? Like how does the accent really influence understanding? Or is this just something that some of us experience more than others? others. This brings us to our survey. So better to better answered this question. I’m better motivates this research, we conducted a survey asking learner to tell us what they think about accents. And whether or not it’s impact understanding. We received responses from 374 people from over 58 countries. The first question we ask, which can we see on the left? Is that do you think the accents of a speaker have an influence? On the understanding of the listeners? It was 70% of our respondents said, yes, they do think this is a problem. We probably have faced it at one point in our life or something, you know, it’s one of those things, it just follows you around, then we, we were much more curious whether or not they think it impacts their own understanding, do you think the differences in accents affects your ability to learn and they we can see that we have more people would do not think that this accent disparity in impacts their understanding, you know, if you, if you if you start watching a YouTube video, if you don’t like the accents, you just find another YouTube video that you can relate to, or you do something else, somehow you get used to it or you figure it out.

Tejumade Afonja
But then there is still about 144 people who think that access affects their ability to learn. And this number is non negligible, because it means that 145 people will live in a world where accents are in that way the Lord. So, we have these people that said, Yes, we fought our hacks them to give us give us their story, why? Maybe they can share light, so we better understand their perspective, in what way accent as impacted their ability to learn. And as we can see, one of our respondents said that if I don’t understand a specific word, when they speak, all my focus moved to just understanding that word, then I lose whatever comes next, under versus said, If I had my southern accent that I’m drawn to, and then I’m drawn to listen more, on the other end, some accents have me trying to decipher what exactly has been said, hence, a miss out on the entire contents. What we can deduce from this is that people often lose concentration, when they are not familiar with an accent. Another person here says that, you know, it’s a recording process, sometimes it takes a long time to figure out that they had a hard time in one of their undergrad classes where the lecturer was, of a certain extent, speaks a certain way with a southern accent. And this impacted the way like their their success in that course that they didn’t do well. Maybe somehow they did not understand the content that was being parsed along. And there’s this other person that says that technical explanation much other especially like if the instructor has an accent, that accent, so they often need to reduce the speed to be able to catch up. What this is saying. What this means is that there is increase in learning or assimilation time. We had a lot more stories that I’m not presenting on this talk for brevity. Well, what shows up in our findings is that there is a knowledge transfer gap that exists between learners and instructors with different accents. And people often expend more time they and exert more energy trying to understand the accent of the person instead of assimilating the information the content itself. So this made us ask ourselves, we post this question within our our research community, can we develop it so they can translate one accent to another without significantly distorting the speech content of that speaker?

Tejumade Afonja
And our projects here, now introduce our projects, it’s in four folds. One of the one of one aspect of our projects is in data collection, we want to collect different speech of non access instead non English native speaker. Another aspect of our project would be to have tools in which we can visualise audio embeddings and pre process this speech that we collect, it’s hard. A part of our project lies in trying to building tools that that can classify the different varieties in accent. And then this last part, which is how to translate for more accents or an order. In this talk, in the second part of my talk, I’m going to focus on this to the south a DB projects and the salsa pacifier project. So for the SATA classified projects, the research question that we post here is that given to non native English speakers that speak English in the same way, they have the same accent for example, Korean show that they are closer to each other in some embedding space. So we have this speakers, these two speakers here, we want to make sure that the speech of this two speakers they accent, they are similar to one another. And we ensure that we maximise the the difference between this accents and this accent here. But in order for us to do this, we would need to have access to some non native speech we need data. This brings us to the first part of our projects, which is to collect this data. So we set up we set out to collect non native English 90 non native English speech from Nigerian from Nigerian people. Starting from I’m Nigerian, so we started from our community, and SautiDB is now a non native English accent speech corpus is consists of 919 speech recording of short sentences, which is an assemblage of first Lang the person’s first language that is your robot ego, Edo, Efika, Bibione Gala. For example, I mean, your reverse speaker, I come from the Yoruba tribe, and I speak like your person, I speak English like a Yoruba person.

Tejumade Afonja
So the this is what the distribution of the of our couples looks like we had more contributor or your reverse speakers than for example, see if we, I’ll talk a little bit more about the way we curated our corpus, we created a text prompt of short sentences from 1100 132, phonetically balanced sentences. And then we built a simple web application interface to enable contributors to contribute their their voice. And then we persisted this speech and start the audio recording on Google Cloud. This is what the platform looks like we have on to your left, you would see the speech data collection, if you go to Southie db.web.up, you should be able to see exactly what I’m showing here. The way we know what you sound like is that you have to tell us we complete you have to first complete a form where we ask for your nationality. Then we asked you what your native language is. And here you can see some of the native language that we’ve highlighted here. So once you’re done fill in this form, it takes you to the prompt the text prompt page, where you can record your voice. So this is this the audios to the studio, for example. For example, the the text prompts here says it was a pariah a wanderer without a friend or home. So a contributor that comes to our platform will be expected to read the sentence and they can listen to it again or delete it if they don’t like it or submit it. So, this was how we collected our speech.

Tejumade Afonja
That the this the statistics of our corpus looks like this after after collecting the speech samples we add to manually listen to all of the speech to remove any distortion like repetition speech, that there was there was too much noise and so on. And after doing all of those cleaning, we published our datasets on Zenodo which consists of the this distribution, we had about 65.7% major ethnic group after collecting the data, which was which is now also available for people to to do research on for the entire community as a whole, not just ourselves. We beaut we built a model to, to learn speaker independence embeddings that represent the accent of a speaker, I’m going to just give a high level detail of this model, it’s the task is accent classification, we want to classify what accents are the order, for example, a Yoruba accent or an eagle accent. So the model is an encoder classifier model, it’s consumed the speech data. And then we can see here the feature extractor to extract the information of the speech and then represent as an embedding, we now have a classifier ad, which help us classify the probability of a speech belonging to an accent. So, the results of our work is, is shown in on this slide of removed some of the technical details for brevity, but we can see here that the the embeddings, that was learned by our model is able to close the different accents to different region within the embedding space. So this is a this is the first two components of our embeddings. And we can see the Igbo speakers seem to be nicely clustered together, and the euro brass speakers. But what is much more interesting about the result, our preliminary results is the fact that what we have on this embedding space seems very, it seems very close to what we would expect, geographically, this is the map of Nigeria, here, with with Cameroon to the right hand side. But you can see here that your reverse because geographically seems to be closer to Edo speakers. And we also see that kind of relationship in the embedding space. So this result is very promising.

Tejumade Afonja
In conclusion, based on our survey, access of the speaker negatively impact on the standing and as a result, we would like to develop it to the can translate from one access to another. So far, we present SATA DB Nyjah, which is a no next Ignite English speech database of short sentences. It consists of five major languages, and it can support access classification, conversion and translation task. Our preliminary result on asset classification validates the significance of our corpus, and its pointer was possible future use case for our subsidiary Niger, we will continue to work on this research and expand our copper significantly. This is this is my team, very thankful for their work. And we acknowledge the support of Endeavour x for the first version of our project and deeply tank, everyone will contribute to our work. So you can find the references here. Thank you so much. I’m happy to take any questions, if any, if any questions you might have.

Georgia Harris
Tejumade, that was brilliant. Thank you so much for sharing that research with us. It really I mean, that’s why I’m a very, very shed a light on the importance of cultural representation when it comes to developing tech. So thank you for sharing that with us. We have a couple of questions for you, if you don’t mind answering them. The first one is when it comes to a piece of research like this, what are the challenges you face? And how do you overcome them?

Tejumade Afonja
Well, for us, they this this research, the first challenge is that you you have to be able to narrow the problem down significantly. So when we started this project, we just wanted to improve online learning experience. I mean, that was how we posed the question improving on island and experience. Well, as we continue to think about the problem, we realised that there are many aspects to this research and that was how we came up with the four projects that we highlighted, one of which is to accent classification, cleaning data collection and translation tasks. So, when it comes to a research that has broad use case and a lot of social components to it, one has to define it in much more clearly. And then be able to tackle the different aspects understand the problem, so that you might be answer you might be able to answer the like maybe to answer the problem, not a proxy to the problem. Okay, cool.

Tejumade Afonja
And how would someone get involved in conducting a piece of research like this, if you’re if you’re saying specifically to my research, I would really encourage you to please check out our website sautiproject.com where you would see information on how you can collaborate with us. Again, it’s a very big research, are we excited about it, my team is excited about it. But we are open to collaborating with a lot of people. And to bring about a positive change, I hopefully we will be able to build this tools that can give users or students more flexibility, more access to be able to toggle or translate to any access that they find familiar to them so that they can be able to learn in an in a much more effective way.

Georgia Harris
Awesome, thank you both share the link to the Sauti project on the lobby chat so that everybody can see that. And you’re back this afternoon for power discussion with a shea two and we will be talking more about research and the importance of it when it comes to IoT. Thank you so much for joining us. It was it was brilliant. We look forward to having you back this afternoon. And the next session is actually lunch. So feel free to take What’s the time now you’ve got half an hour until we come back with the welcome session for part two. And then we’ll kick off our panel discussions.

Tejumade Afonja
Well, thank you so much for having me.

Thank you to our sponsors

The IoT Podcast Team

The IoT Podcast is powered by Paratus People, a leading organisation in IoT Talent Solutions.

Innovation is at the heart of IoT, it is our passion to explore and learn more about this fast paced and transforming sector.

Connect & Get Involved

Your subscription could not be saved. Please try again.
Your subscription has been successful.
Subscribe to our newsletter to be amongst the first to find out exclusive information about The IoT Podcast.

We use Sendinblue as our marketing platform. By Clicking below to submit this form, you acknowledge that the information you provided will be transferred to Sendinblue for processing in accordance with their href="https://www.sendinblue.com/legal/termsofuse/">terms of use