Data is the new commodity in our digital world
Dr Karen Calteaux, from the Council for Scientific and Industrial Research (CSIR), has never forgotten a comment a Google speech technologist made at a conference eight years ago. At the time, Google was busy with its voice search. The technologist said that in one day, they were collecting the equivalent speech of one person speaking continuously for 13 years.

"If you can imagine that," said Dr Calteaux (left), "you can imagine the kind of infrastructure required to process that. And you can imagine how far we are in South Africa from having that volume of data to process to build systems. This is why the work we're doing at the CSIR, and the work that the Data Science for Social Impact group (at the University of Pretoria) is doing is extremely important, because we have to find ways to do similar quality language technology development, but with far less data."
Dr Calteaux is the Research Group Leader of Voice Computing, which is part of the Next Generation Enterprises and Institutions (NGEI) Cluster at the CSIR. She presented Human Language Technologies (HLTs) for African Languages at Universities South Africa's (USAf's) recent virtual Round Table that was themed African Languages in the Age of 4IR. It was hosted by USAf's Community of Practice for the teaching and learning of African Languages (CoPAL).
Calteaux presented another example which showed how African languages are lagging behind in terms of availability of online data. She said there are groups that test over 100 different voices before they select one to build a text-to-speech voice. They have more than 300 hours of speech available from a single voice artist. "We are, at this stage, building a text-to-speech voice from five hours of data," she said.
She was the last of the three panelists to speak at the event, and so was able to comment on what had been said before. ''What are the unique challenges that we face?'' she asked, saying everybody had said it that morning: "Data, Data, Data. Data, really, is the new commodity," said Dr Calteaux.
We need to ensure that relying more on technology doesn't widen the digital divide
4IR is rapidly leading to more automation, she said. Lockdown has led to the prominence of online meeting platforms. Even if some people didn't know how to use these before, the pandemic compelled us to become very familiar with various platforms. Lockdown has also led to much more interconnectedness on smart devices, with platforms such as Twitter and WhatsApp helping to disseminate news about the coronavirus.
Language is a crucial part of information exchange. This makes natural language processing and language technology development extremely important. However, we also need to be careful, she warned, that proliferation of technology doesn't end up creating a larger digital divide than the one we have in South Africa already. Language barriers in our society can exacerbate other existing socio-economic barriers, and we need to find a way to address this, she said.
Barriers in information exchange can either be between people directly, or in how they access the information that is made available. "Fortunately, language technology applications can help to address both these types of barriers,'' she said.
Human language technology, she explained, is about applying computers to process human language. It is divided broadly into:
- text technology, for example, spell checkers and machine translation systems; and
- speech technology, for example, speech recognition, also known as speech-to-text, and text-to-speech.
SA boasts abundant human language technology research and development projects
There might be challenges of data resources but the South African language technology landscape is not all bleak. Although she stressed the list was by no means exhaustive, Dr Calteaux gave some insight into the players working on human language technology research and development projects in South Africa.
Her list included the following research centres or companies, and their focus areas:
- The Centre for Text Technology (CTexT) at North-West University (NWU) - focusing mostly on text processing and statistical machine translation, and working on Autshumato - an integrated translation environment that has machine translation capabilities;
- CSIR – focusing on speech technology localisation and, more recently, rule-based machine translation; also the developers of Qfrency – a commercial text-to-speech product;
- The Multilingual Speech Technologies group (MuST), part of the engineering faculty at NWU – pattern recognition with some focus on speech;
- Praekelt – building a natural language-understanding engine that is the nucleus of a chatbot or virtual assistant for local languages, already being deployed for commercial clients;
- Data Science for Social Impact, a research group at the University of Pretoria, led by Dr Vukosi Marivate, (also a speaker at the Round Table) – machine learning and natural language processing; and
- The South African Centre for Digital Language Resources (SADiLaR), at NWU, - a national centre supported by the Department of Science and Innovation (DSI) as part of the new South African Research Infrastructure Roadmap (SARIR). SADiLaR has an enabling function, with a focus on all official languages of South Africa, supporting research and development in the domains of language technologies and language-related studies in the humanities and social sciences. The Centre supports the creation, management and distribution of digital language resources, as well as applicable software, which are freely available for research purposes through the Language Resource Catalogue, and has strong collaboration with international bodies.
She described SADiLaR, whose director Prof Langa Khumalo facilitated the Round Table, as "the centre on which everything turns around in the country, when it comes to developing the resources we require to be able to develop these language technologies".
SADiLaR funded the Human Language Technologies Audit of 2017/2018 that revealed the resources available across all languages in South Africa, and the percentage of their availability. English, Afrikaans and isiZulu are the languages with the most resources.
The CSIR built a mobile app to help healthcare workers' communication
The CSIR has been working on a speech-to-speech translation application. It recognises the speech input of a user, converts it from language 1 into text in language 1, sends it to the machine translation engine, which translates it from text in language 1 to text in language 2. Then it sends it to a text-to-speech engine, which converts the L2 text into audio.
They have built this into a mobile app, used as a communication tool not to replace, but help, in the interaction between healthcare providers and patients. It is called AwezaMed. There are two versions:
- the initial one for midwifery and obstetrics, in English, Afrikaans, isiZulu and isiXhosa; and
- a COVID-19 version for screening and triage situations, available in all 11 languages.
This app had many requirements that needed to be perfected. They needed to make sure that the speech-to-text system could accurately recognise the speech in the noisy environment of a hospital or clinic. The app needed to have natural language understanding so that users could vary their speech input yet still find the phrases they were looking for. They also had to make sure the translation was highly accurate, and that the text-to-speech output was as natural sounding as possible, given that the end users might not be used to this kind of technology. The app is available on the Play Store on Android phones and here is a short video about it.
Augmented ebooks are available in all 11 official languages
Dr Calteaux believes the augmented ebook system the CSIR is developing is an initiative that is very important for teaching and learning. The system ingests text, in various formats including PDF and docx, converts the text into the EPUB 3 format, and augments the converted text with audio, either spoken by a person (human-narrated, if available) or synthesised. The output can be highlighted at word, sentence or paragraph level and a unique feature is that users can search for words or other content in the augmented ebook. Augmented ebooks can be made for ebooks in any of the official languages (using Qfrency text-to-speech). A main aim of the project is to enhance accessibility to information for all, so a bespoke Reader has been built which enables users to control various reading settings, including the colours or the background or text, the speed of the audio, and the levels and colours of the highlighting. The product will be known as Qfrency ebooks.
The CSIR's future work
The CSIR is working on conversational technologies, more commonly known as chatbots or virtual assistants like Apple's Siri, and Google Assistant. They want to create these spoken dialogue systems in local languages but they cannot simply use European or American interfaces or speech technologies. These systems need our distinctive social, linguistic and cultural practices, and incorporate the way we ask questions, and expect to receive answers.
The CSIR has also started on a new project on captioning. Aimed to be used for government speeches viewed on television and other devices, there will ultimately be live captioning, or subtitles, in African languages, running across the bottom of screens while, for example, the president is talking.
"Smart technologies are with us, that's what 4IR is about, so we need to be focusing on man-machine interfaces, and on how we implement them. Our main focus is to make these technologies available on a platform where anybody can implement them into their own applications or solutions," said Dr Calteaux.
This is a third piece in a series of four articles being published from the Roundtable on African Languages in the Age of 4IR.
The organising body, the Community of Practice on the Teaching of African Languages (CoPAL), was formed in 2015 to enable academics and relevant other university staff members to collaborate, network and share knowledge on issues of common interest. CoPAL is just one of numerous communities of practice operating under USAf's banner.
In addition to influencing and contributing to national and institutional language policies, the CoPAL seeks to benchmark, develop, advocate for and share good practices and relevant information needed to advance the teaching of African Languages in schools and universities. It does so by contributing to teacher training initiatives for African Languages and to the development of approaches to the teaching of African Languages that use African Languages in the teaching process.
This CoP also seeks to enhance regional collaboration among African Languages scholars, as well as to actively contribute to the establishment of linguistic networks to ensure that information and common understandings are shared.
The final article from the Roundtable will follow, shortly, on this platform.
Written by Gillian Anstey, a freelance writer commissioned by Universities South Africa.