How to Host Accessible Multilingual Events

Summary

Image Description: Reginé Gilbert at the podium holding a microphone. To her left is a screen with her presentation. To her right is a large monitor that shows captions in English on one side and Japanese on the other side.

It’s not realistic to expect everyone attending meetings will speak the same language. That’s why it’s important to discuss how to make events accessible in more than one language. For example, someone wanted to participate in a Japanese language exchange online. However, it turned out to be a Japanese-Chinese language exchange.

The person tried Google Translate on their phone to translate the Chinese conversation on the go. It helped them get the gist of the topics. Part of this was due to the translation accuracy. The speaker was not clear and would be hard to follow in English. In a language exchange, participants usually have a general idea of the languages spoken. But what do you do if you have a meeting or event where not everyone communicates in the same language?

Live captions (subtitles) by automated systems and artificial intelligence (AI) or human captioners can be helpful. But, there are still challenges. So, how can you facilitate a successful multilingual event?

At the Accessibility Tokyo A11yTokyo Meetup, the team aims to provide presentations followed by conversations in English and Japanese. The purpose of the meetup is to advance accessibility and inclusive design.

How to Facilitate an Accessible Multilingual Event

First, here’s a look at what the A11yTokyo Meetup is trying to achieve and the challenges to overcome.

The goal

Host an event at a physical venue or online where participants speaking different languages can follow the presentation, ask questions, and join conversations.
Participants ask questions in their preferred language, either in a written chat or verbally. They should be able to understand the answer.

Challenges to overcome

When people create live captions on the go, the accuracy tends to be high. But companies providing these services are expensive, which may not be an option for events with a limited budget.
Some automated solutions exist that work well for some languages. But the captions can be inaccurate depending on the microphone quality.
The software providing the captions and translations does not always automatically integrate with common video conferencing tools. Thus, participants may need to use multiple applications at the same time. This is hard to do.
The quality of the translation strongly depends on the quality of the original captions. If the original captions are inaccurate, then the translation won’t make sense.
Currently, there isn’t a way to support sign language for input or output unless someone translates live.

Possible solutions

The two major solutions are:

Using human generated captions and live-translation.
Using technology to help with captions and translations.

However, when holding a multilingual event, it is important to remember that technology can only bridge part of the gap. A lot depends on the speaker, the way they are presenting, and whether other people like volunteers are available to support the conversation and translations.

Many mobile apps and web applications exist that can more or less accurately create subtitles. They fall into the following three categories.

1. Automatic captions for recorded content

Software that analyzes the audio of a video file and automatically creates a captioned file. This software is useful when using a pre-recorded video in one language. Often the software has limited support for different speakers presenting in different languages.

It also does not typically provide automated translations between different languages. Several video platforms provide the option to automatically translate one subtitle file into other languages or automatically generate captions for uploaded content.

2. Display live captions during a conversation

Software in this category displays live captions during a conversation. This usually happens through receiving audio input from a video conferencing software or a microphone on a mobile phone. Often the apps are standalone or only integrate with specific proprietary APIs. They can’t readily be used on a shared screen in a meeting room or in a video conferencing platform.

3. Display live captions and automatic translation

Recently, several applications have become available that display live captions during a conversation with the ability to translate almost instantly into another language within only a few seconds.

Google Meet can translate English captions to or from multiple languages. It can use translated captions to translate English into a handful of other languages. The use of translated captions requires a business or education plan.

Microsoft Teams can generate captions for more than 25 languages and translate them to more than 25 languages. The meeting language needs to be pre-selected and only one language is supported at a time.

Zoom can translate speaking language captions into another language. However, if there is more than one language spoken, then it greatly affects the accuracy as Zoom can only handle one spoken language for captions.

Another app is UD Talk, which specializes in live captions for conversations in Japanese. UD Talk uses Google Translate to translate Japanese into English. Another benefit of this app is that it can be integrated with video conferencing software like Zoom to display automated captions and translations to participants on the go in nearly real-time.

Challenges with all three categories

The software’s speech recognition is typically tailored to specific accents of a language. Any deviation from the standard accent reduces the accuracy of the captions.
The software usually assumes the conversation occurs in the same language (and often the same accent) for the entire time. However, speakers might frequently switch between languages at a multilingual event. In other words, the video platform cannot support more than one spoken language for the event.
It can be confusing to follow the captions when they don’t identify the speaker, the display is too small, or the captions show in a bad location on the screen. This is not an issue in Zoom and Google Meet because the captions can be moved. If there is other movement in the captions, it can cause motion sickness.
Live captions are often not in sync as they tend to appear between 2 to 5 seconds after the speaker said the line.
Captions tend to inaccurately capture industry terms, which need editing or adding to the custom vocabulary.

While the current software solutions can be improved, it is already possible to use some at your event. They allow people to participate and communicate with each other when they otherwise would not be able to.

How UD Talk Works in a Multilingual Event Like A11y Tokyo

The A11y Tokyo meetup is a multilingual event for speakers of Japanese and English. The event uses UD Talk to allow everyone to communicate in their preferred language.

UD Talk provides several applications that allow people to generate live captions of spoken content. The “UD” in UD Talk stands for universal design. The applications aim to be useful for as wide an audience as possible, supporting different communication and language needs. The applications consist of a desktop editor for Windows and MacOS to edit the captions, and mobile apps for iOS and Android to display the captions. The captions can also be viewed in a web browser.

The A11yTokyo team chose to use this technology at its events in Tokyo so Japanese speakers can speak Japanese and the display shows their speech in English. Using UD Talk requires an account and selecting a UD Talk payment plan.

It only has a delay of a few seconds and it can display the captions on different devices. For example, participants can view the captions in a mobile app, web browser, or in Zoom. UD Talk can also automatically translate between English and Japanese.

Here are the basic steps to use UD Talk in a multilingual, hybrid event where some attendees are in person and others are virtual. The event contains both Japanese and English speakers.

Before the event, an event organizer adds custom vocabulary and terminology in the UD Talk desktop editor. These terms can include names of people, software, and industry jargon that will be mentioned. Users can include both the writing and phonetic pronunciation for the terms. This feature is currently only available in Japanese.

The event organizer then sets up a talk in UD Talk. This is similar to creating a chat room. The process generates links for the web and the app chat room. The organizer sends the links to the participants. Then, participants choose how they want to view captions. They can access them in the UD Talk app, web browser, or on a screen at the venue if they’re there in person.

At the venue, the organizer uses a smartphone or tablet to capture the presenters’ voices. They can do this by connecting a microphone to the phone or by opening Zoom on the phone. It’s recommended to use a high-quality microphone to improve the quality of the captions in Japanese.

The next step is for the organizer to use another smartphone or tablet to display the captions on a big monitor in a larger font. Unfortunately, the UD Talk laptop app does not have a feature to pick up voices.

It is possible to edit the captions during the presentation in the UD Talk desktop editor. Both Japanese and English captions are supported. This will automatically update the displayed captions in the mobile apps and web browser. It is not possible to edit the translation at this point.

How to Ensure Good User Experiences at Multilingual Events

There are challenges and problems encountered when communicating in different languages. Even with current technology, an app or software cannot guarantee a good user experience.

It can be hard to follow or understand when the translations are inaccurate, don’t make sense, or are delayed compared to the presentation. It also creates a frustrating experience for participants when the caption display is poorly designed.

The quality of the experience depends on the speakers’ speech, microphones, and whether there are volunteers willing to manually support the event and live captions. That said, here are some things for speakers and presenters to consider.

Speaking and microphone quality

Like with any video platform, the accuracy of the captions increases when speakers use a high-quality microphone and speak clearly. However, accents may affect the quality of the automatic captions.

Language of content

To encourage people who speak different languages to attend, it helps to have the presentation slides, handouts, agenda, or event invite, in multiple languages. When the language of the content is in one language, people who don’t speak the language will not likely attend the presentation. It helps to provide a link to the presentation slides in multiple languages to the participants at the beginning of the event.

If the presenter describes the content of every slide including the text and images, then the caption app and translation engine can pick it up. This increases the chances of captioning and translation accuracy while making the content more accessible for participants who don’t have visual access to the event.

Displaying content

For in-person events, captions should be displayed on a screen that is large enough to be seen by participants regardless of where they sit. For online events, some video meeting platforms allow users to customize the caption settings more than others. Viewers can change the size, colors, and font of the captions.

It’s important to consider how much content appears on the screen and how they appear. For example, the event captions tend to roll-up. This can be distracting for some people, especially when the captions are in more than one language. The movement of the captions can also potentially cause motion sickness.

As with all things in accessibility, there’s no one perfect solution. The key is to be intentional. It can help to create a checklist for speakers with reminders to speak clearly and describe content on the slides.

Networking and breakout sessions

For online participants, it can be helpful to have separate breakout rooms, one for English speakers and one for Japanese speakers, with an event organizer or participant facilitating the conversation in each breakout room.

The Future of Accessible Multilingual Events

The A11yTokyo team believes it will become more and more common to have these solutions in place as technology evolves. The team tries new solutions in the meetups to accelerate the normalization of the technology.

Does your website or technology need an accessibility audit or VPAT?

Equal Entry has a rigorous process for identifying the most important issues your company needs to address. The process will help you address those quickly. We also help companies create their conformance reports often referred to as VPAT so they can sell to the government or provide it to potential clients who require them. If you’d like to learn more about our services for auditing and creating VPAT conformance reports, please contact us.

Equal Entry

Accessibility technology company that offers services including accessibility audits, training, and expert witness on cases related to digital accessibility.