When hosting an online meeting with people from all over the world, it isn’t realistic to expect everyone to be fluent in the same language. As discussed in a previous post about multilingual events, the challenge of making sure that everyone understands each other can be a formidable one.
Equal Entry has a lot of experience working with human captioners to create accessible online events. With artificial intelligence (AI) generated content, there will always be a need for human oversight and correction. Real AI Solutions for Accessibility talks about the partnership between AI and humans. This project illustrates a practical example of how to achieve this partnership.
Our team has hosted online global events for a long time. We have tried many tools and plugins to determine which one worked best for our attendees with varying results. For example, an application called UD Talk worked well for Japanese to English translation. However, it did not work well for English to Japanese translation.
As we considered the alternatives, we worked with one of our talented engineers, Kevin Vaghasiya, to design something better. During weekly meetings over the course of a month, we built our own plugin called Polly.
How Polly Works with AI and Human Translators
Polly uses StreamText API, Google Translate API, and Google Docs API. It runs on the Google App Engine server, which is free for our purposes.
Typically, a meeting platform such as Zoom uses AI to perform real-time translation. It allows attendees to listen to the speaker while reading captions in their native language. During the meeting, human translators check the AI’s real-time translation and correct it as needed.
Here’s an example of the process.
Imagine you’re setting up an event in New York. While the event is hosted in English, you want the event to be available to Spanish speakers. First, you hire a captioner who speaks English. The professional captioner will listen to the presentation and transcribe the speech into text to create the captions and transcript.
Events that rely on only automatic captions will often contain many mistakes in the original language. The more mistakes there are in the original language, the more mistakes there will be in the translated language.
The event speaker starts, “Welcome to our Accessibility New York City Meetup. Thank you for being here.” The captioner types these words into StreamText. The StreamText API works with the Google APIs. The words — along with the AI-generated Spanish translation of those words — are automatically posted into a Google Document for human translators to review in real time.
In the online meeting room, the following text will appear.
Fixing Errors in AI Translation
As said before, we’ve done meetups for a long time. We hadn’t found a way to fix mistakes that happened live during the event. Our Polly plugin makes this possible. Now, mistakes can be edited in real time.
When you think about supporting multiple languages, you must think about supporting a solution like this. Even if you have a live human captioner, you’ll probably only have one for the speaker’s or event’s native language. You won’t have one for each language available. We believe that Polly’s plugin functionality is important because it gives volunteers a way to correct AI-generated content.
The action occurs in Google Docs. Polly provides a transcription for each spoken sentence in English and Spanish in Google Docs. Volunteers open the Google Docs.
For example, the speech-to-text AI makes a common mistake by confusing the conjugation of the verb “to have” in Spanish. Volunteer James Herndon navigates to the sentence and corrects the error as the following video demonstrates.
The Google Doc shows four sets of lines where the first line is in English and the second line is in Spanish. James Herndon’s name appears with a cursor over “Tienes” to edit it to “Tiene.”
Google Docs is useful for this because multiple volunteers can edit the document at the same time. The document shows where each volunteer’s cursor is located. This allows volunteers to identify who is fixing a specific sentence. Instead of fixing one that already has a cursor, the volunteer can jump to a different sentence that needs editing.
After the volunteer makes the correction in Google Docs, the text is simultaneously corrected in the public transcript that appears for attendees. The updated text is highlighted in purple with an underline so that a viewer will notice the change was made as the next clip demonstrates.
This process requires displaying 10 to 12 lines of text in the meeting room instead of the standard two or three lines of text. This is because it takes time for the human translator to make the correction to the generated text. We need the audience to have enough time to see the update, which improves their comprehension of the content.
Our multilingual plugin provides a tangible example of people and artificial intelligence working together to create a more accessible experience. As AI solutions become increasingly prevalent, it’s important to know that people will not be displaced. Every AI solution is going to require human supervision and control just like our plugin shows.
Want to give Equal Entry’s Polly plugin a try? Reach out to us at contact@equalentry.com and we’ll work with you!
Does your website or technology need an accessibility audit or VPAT?
Equal Entry has a rigorous process for identifying the most important issues your company needs to address. The process will help you address those quickly. We also help companies create their conformance reports often referred to as VPAT so they can sell to the government or provide it to potential clients who require them. If you’d like to learn more about our services for auditing and creating VPAT accessibility conformance reports, please contact us.