Bible Translation and WhatsApp

This entry is crossposted at https://www.etenlab.org/post/bible-translation-and-whatsapp

Though not the most popular means of communication in the United States, WhatsApp is one the of the most popular mobile applications in the world (estimated to have more than 1.5B users!). I personally use it to keep in touch with friends and family from across the world (I do prefer Signal though). It is primarily an online mobile chat application that supports text, audio and video over an encrypted channel (for most parts). Other than chatting directly with individuals, they also have the concept of Business Accounts which allows programmatic interaction with other WhatsApp users.

With Large Language Models (LLMs) becoming good at managing a conversation (at least in Languages of Wider Communication (LWCs)), we now have the opportunity to explore how these could be combined with WhatsApp to support some Bible Translation (BT) needs. I see these approaches as complementary to (and not replacing) the existing methods of Bible Translation. One obvious area of the application of such an idea is crowdsourcing. That is, the use of WhatsApp to gather, review and contribute to various stages of a Bible Translation project. The key insight is that: LLMs are good at carrying conversations and can be steered to intelligently guide the user to provide data or feedback which it is then able to file away in an appropriate data store (like a Database). This basically means that we can treat the conversational interface of WhatsApp with an LLM as the User Interface (UI) for interacting with users. For example, I just made 'friends' on WhatsApp with an AI-bot that helps me learn Spanish. There is not one right way to learn Spanish and it would not be helpful if the AI-bot only spat out Spanish sentences and did not take into account what my current level of fluency is. Yet, because LLMs are able to gauge my abilities and answer my questions pertinently, they can, through multiple rounds of back/forth, steer me towards exploring and learning more Spanish. Now imagine such an AI-bot but for answering relevant questions and gathering data! Such an AI-bot can easily become a helpful assistant for translation teams for answering individual translation/theological questions and linking back to the actual source content (Retrieval-augmented Generation) such as Bible commentaries and dictionaries.

Getting larger numbers of contributors is generally difficult- it is common (and necessary in Church-Based Bible Translation) in the BT process to involve the local church and gather their feedback (Community Checking). But this is usually seen as a separate stage in the process, usually after there are already a couple rounds of internal checks done on a larger body of text (e.g. A Bible book). What if, we could design an AI-bot that proactively looks at translator/reviewer questions and pings on connected contacts to seek real time feedback (e.g. "Please share what you understand from the word 'Baptism'?" in their target language). Conveniently, the answers can be in different formats: one-word, short-text, long-text, audio recording, video, through multiple rounds of back/forth and still be post-formatted by an AI to a distilled/summarized output that is actionable for the translation team. The unique advantage is that it could possibly help teams break out of the linear mold and enable proactive Community Checking for smaller sections of content and asynchronously (without the need to coordinate and gather the group in one physical location- which sometimes may be infeasible/unsafe).

Usual crowdsourcing methods involve some kind of internal verification, where the response from one user is shown to another for their feedback. Having a AI-bot that is cognizant of the global state of all inputs shared and stored by the different users, will then also be able to gather feedback (e.g. votes) for each of the responses. This would mean asking the user a question like "Please rate this answer between 1 and 5", etc. Such ratings and votes can then be used by the AI to internally produce a sense of quality of the responses and dynamically make decisions on which questions to continue to ask and which ones are resolved based on agreed upon policy (e.g. an answer with 3 or more upvotes is considered good enough). Another area where I think such a system would help is for gathering seed data for training AI models for drafting. Say for languages that do not yet have any data available, there is the need to gather some data to start finetuning AI models. And leveraging WhatsApp bots to gather translations of sentences and words for a seed dataset would be helpful to further start producing AI drafts (AKA "Draft Zero") for the human translators to leverage. Currently, this process is bottlenecked by not having enough of the participants on the same application platform or not having enough motivation across the wider community. Both of these things may addressed by using WhatsApp (already popular and stable software) and using AI-bots that make gathering such data more interesting and even fun (gamify)!

Though a bit far-fetched, in an extremely resource-strapped or sensitive region, a WhatsApp AI-bot could be the application for doing Bible Translation. More people own smartphones than computers and I know places that have differential pricing for Meta (formerly Facebook) services. The AI-bot can be prompted to guide the process of translation (audio, video or text) and then use the same techniques mentioned above to draft and check. There is also an opportunity to see the whole process of Bible translation as a series of questions and answers (a "Socratic method?!").

A disadvantage of this system is that it assumes internet connectivity. Even though internet coverage is ever increasing (especially for mobile phones), this system still need to be seen working in conjunction with other existing techniques. A lot of details need to be figured out and iterated over but I believe this idea is worth exploring further.