Nē-mu is a tool that helps English-speakers pronounce diverse names correctly by generating audio and phonetic notations in different languages. It's an experiment in making both the digital and real world more inclusive and linguistically diverse.
Why this tool?
Because the way someone's name is pronounced can deeply affect their sense of identity and belonging. I shared my personal experiences in a blog post, and I thought that others might benefit from a tool that addresses this challenge.
In doing so, I wanted to challenge the dominance of English in AI development. As AI technologies continue to evolve primarily in English and primarily for English speakers, they risk reinforcing linguistic hierarchies. This tool represents a small step towards celebrating and preserving linguistic diversity.
How does it work?
This tool processes your input (a name and language combination) through two AI models to generate: audio pronunciation and english-friendly phonetic notation. Initially, I aimed to explore 'Minimum Viable Models' - using the smallest possible computational models to accomplish specific tasks. While the current version relies on mainstream solutions the long-term goal is to transition to more lightweight, specialised models.
Audio generation
This tool leverages an interesting quirk that exists in how Google’s text-to-speech (TTS) AI processes different languages: when you input Latin characters while specifying a non-English language, the TTS AI adopts that language's accent patterns. This phenomenon extends to non-Latin languages as well, perhaps reflecting the way words in English and other languages that use the Latin alphabet are exported to the rest of the world.
For instance, Japanese has numerous "borrowed English words" that follow Japanese pronunciation rules. 'コンピュータ', the Japanese word for 'computer', is pronounced [kon-pyu-ta], applying Japanese phonetics to the original pronunciation. I use this principle and force Google's TTS AI to pronounce names in various languages. The results vary in accuracy, but it's a fun experiment and an exploration into cross-linguistic name pronunciation.
Phonetic notation
With platforms like Slack now including name pronunciation fields, there's growing awareness of this need. As someone whose romanised name (Yosuke) is often difficult to pronounce for English speakers, I appreciate this trend. I now use [yō-ské] in places like Slack profile, email signature, Zoom username etc.
On the other hand there is a more academic notation called International Phonetic Alphabet (IPA). It’s a standardised way to describe how words sound regardless of the language. For example, the correct pronunciation of my name ‘Yosuke’ is /joːsɯ̥ke/ in IPA. I explored the possibility of generating IPA notations and letting a TTS pronounce, but didn’t pursue this direction because a) [yō-ské] is much more accessible and useful than /joːsɯ̥ke/ for the English-speaking audience, and b) no TTS engines I found could pronounce IPA reliably.
Currently, there isn't a dedicated model for generating these pronunciation guides. While I’m using GPT-4 for this purpose, the growing collection of name pronunciation data could eventually support the development of a specialised model.
What now?
Future development could follow two paths: creating dedicated models for name pronunciation and identifying other opportunities to use AI for promoting linguistic diversity online. I welcome all forms of collaboration and feedback.
Who’s making this?
I'm Yosuke Ushigome (yō-ské • he/him), an London-based interaction designer. I explore new and sustainable interactions with data and AI. Visit my website for more.