Editor’s note: For the purpose of this article, captions and subtitles are interchangeable. Captions are the text that appears on the screen capturing the audio.
Almost a year ago, I attended my first-ever virtual reality presentation. And it was captioned. Now, I’m writing about my first virtual reality presentation in AltspaceVR that also had captions. Two very different VR environments. Two very different experiences.
After I left the AltspaceVR’s VR world, I felt like I had attended an exciting yet exhausting tennis match. Because this was my first-ever foray into AltspaceVR, I had to learn how to maneuver and manage my presence. I’ll share some of those experiences as well.
Adventures of Exploring AltspaceVR
Attending a virtual reality webinar in AltspaceVR requires downloading the software. You can either go to the 2D world through a computer or the 3D world through a VR headset. I took the 2D route as I don’t have a headset … yet.
Thankfully, my colleague Thomas Logan walked me through the AltspaceVR world the day before. The most important tip is to right-click for menu options. That’s where you can change the settings. It also allows me to stay put. The menu is where you add friends. Lucky Thomas became my first AltspaceVR friend.
Thomas noticed my avatar and thought I’d like to change it. The default gave me an avatar with a bob cut. As a curly-haired girl, a bob cut would not go over well! I made a few tweaks until my avatar resembled a decent generic representation of me.
Entering the Event
Now I was ready for Educators in VR‘s main event with Thomas as the guest speaker. I entered the room, which resembled an auditorium with a large screen. Like a movie theater with flat benches. After situating myself, I spotted Education in VR’s Lorelle VanFossen and Thomas at the front of the auditorium. I keyboarded my way to them.
They noticed me and I heard my name with my bionic ear. (It’s a cochlear implant. That doesn’t sound as cool.) Thankfully, Lorelle sent a chat message telling me how to turn on the captions in the settings.
Done. Lo and behold, what do I see? Caption bubbles or bullhorn captions on the screen (I’ll explain the difference shortly.). Wowza. It was nothing like I’ve ever seen before. Well, sure, I’ve seen bubbles in video games. But that’s different because it’s not an immersive VR world with spontaneous conversations. These are real live humans engaging in a virtual world.
During the presentation, I looked around the room at the audience. Amazing. I could see captioned conversations.
I sat there taking it all in. Watching the captions, figuring out the controls, and searching for a chatbox. I spent a lot of time looking for that because I wanted to send messages to Lorelle and Thomas without turning on the audio. It wasn’t until the end when I found an easier way to chat with them.
The only place I knew how to use the chatbox was through the menu. It was hard to read the transparent chats because of all the content behind them. And I could only send a chat to Lorelle or Thomas as they were the only friends I had in the AltspaceVR world. For now.
The easier way to chat is by selecting a person’s avatar and then the chat icon. You can also friend them using this method. As the next image shows, the text in the chat is hard to see because it’s transparent.
I go back into the captioning settings to check out the other languages available for captions. The other languages translate the English captions into their language. It works similarly to any online language translator tool. It won’t be perfect, but it allows people to follow in their preferred language.
Since Thomas lives in Japan, I tried Japanese first. And then I randomly selected German. The captions switched to Japanese, then German, and then back to English. How cool is that?
Here’s a short clip showing how captions work. It shows an audience seating area in virtual reality with five avatars. Makoto’s avatar is speaking and the captions display the conversation in Japanese.
And this clip translates Japanese into English. The video is in a virtual reality world that shows Evans’s avatar. Captions and the speaker’s name appear on the screen in English.
Imagine if I meet up with a German speaker in AltspaceVR. The friend can select German captions. When I speak, it’ll be in English. When the friend reads the caption, it will convert what I said into German. Unfortunately, I have an accent that automatic captions don’t like. So, my translated captions will be quirky German.
The automatic captions work well with Thomas’s speech. So, my German friend would see an automatic translation of what Thomas says in English converted to German.
A Disappearing Magic Trick No One Wanted
It came time for Thomas’s presentation. Lorelle welcomed everyone and introduced Thomas. She passed the baton to Thomas. He started speaking.
Doh! No captions!
Thomas’s captions never showed up in the presentation. Alas, the show must go on and he continued.
Fortunately, the team figured out the problem in the post-mortem. Thomas mentioned that he re-entered the auditorium on a Mac 2D. Eureka! Problem solved. It turned out captions didn’t work on Macs yet.
Where Are Those Emojis?
Looking around the room, I noticed people would clap, wave, love, and other actions via the emojis appearing above their heads. I selected the emoji button on the menu. Nothing happened.
Select. Deselect. Select. Repeat for the entire hour. OK, maybe that’s an exaggeration. At the end of the presentation, I finally figured it out. I had positioned myself just right and a faint circle of emojis appeared.
Aha! They had shown up whenever I selected the emoji button, but they didn’t stand out on the scene. In the next image, look around the word “Questions” to locate the emojis. Once I selected that, the emojis appeared clear as day.
Captioning in the VR World
The captions are generated from automatic speech recognition (ASR). It makes sense to use automatic captions especially with all the conversations going on. It’s not perfect. Like all other autocraptioning apps, VR autocraptions don’t like my accent.
My captions say: “How about the name dot in italics? They don’t need to be in italic. The bracket. A girl a talent can make it hard for some people to read, not me personally, but I know a lot of people who have trouble breathing attacks.”
What I actually said (paraphrased): The names don’t need to be in italics because they’re already in brackets. Italics can make it hard for some people to read, not me personally, but I know a lot of people who have trouble reading italics.
Automatic captions have come a long way. However, they still have a long way to go. They don’t work for all accents and voices. Many automatic speech recognition apps tend to be optimized for English and not other languages.
It takes one wrong word, one wrong letter, or no punctuation to confuse the reader.
Several companies have created accessible VR guidelines for captions and subtitles. At the moment, there is no single guideline for captions and subtitles in a VR world. If I were to create the guide, here are the points I’d cover.
There is no universal font for captions. However, here’s Captioning Key‘s recommendation: “They need to be medium weight, be sans serif, have a drop or rim shadow, and be proportionally spaced.”
I’d like to see captioning in VR guidelines to recommend using Captioning Key’s advice. Otherwise, creators may decide to run wild and use fancy fonts and harder-to-read serif fonts. I would not mention the drop or rim shadow. I haven’t seen it make a difference in captions. Keep it simple.
Plain captions remove tension in reading. The No. 1 rule of great captions is readability. If you can’t read it, nothing else matters. It could have all kinds of typos and be out of sync. People won’t notice if they can’t read the captions.
As for font size, in an ideal world, the VR participant should have a say in font size. Offer at least three settings: small, medium, and large. Medium would be the default.
The standard captions contain a slightly off-black background with slightly off-white text. This is based on research and talking to people with different disabilities who depend on captions. Captions need a background for contrast. Too often, the text is hard to read because it blends in with the VR setting.
The best captions are boring. Black and white work. They always have a strong contrast without distracting the viewer from the content.
AltSpaceVR has two different text types. One is the bubble. It’s similar to comic books. The bubble appears by the person’s avatar with the captions as the next image shows.
Notice the captions can be at an angle and too small to read. I had to back up to see everyone’s captions. Yet, I could barely see the captions of the two people furthest from my view. And the captions of the person closest to me are partly off-screen.
The other type of captions is the bullhorn. This appears somewhere on the scene rather than in a bubble. This one contains no background. It blended a lot with whatever was behind it. This adds friction to the reading experience. They often contain many rows of captions. The next two images show how captions can get lost in the scene.
The bubble is more readable because it has a transparent grey background. But you can still see through it. Whatever is behind the bubble can cause strain while reading. I recommend darkening the bubble or eliminating the transparency to improve readability. Transparent captions can be a distraction. Err on the side of caution and skip the transparency.
If I had to choose, I’d go with the bullhorn style. It keeps all the content together and minimizes the tennis match problem, which comes up shortly. I’m not sure how bubble captions can be revisited to allow attendees to read it all without moving their eyes from person to person, bubble to bubble. Besides, some of the captions can be small when you’re in a group conversation.
The hardest part about creating accessible VR guidelines for captions is placement. It’s easy on 2D videos. In most cases, the captions belong on the bottom. But VR? You’re in a 360-degree world. There’s not always a logical “bottom.”
The other thing to consider is how close or far away to display the caption from the viewer. As I mentioned before, I could not read the captions of some of the people in our small group conversation. I’d have to move forward and back constantly. And that causes problems as revealed next.
Captioning Key includes recommendations for speed or timing of the captions. The easiest thing to do is target lengths and rows of lines. My captioned videos follow the guideline for 2D videos. The recommendation is one to two lines at a time with up to 32 characters per line.
VR may get away with more than two lines. How many lines? I’m not sure yet. If the captions are bullhorn style with the speaker’s name in brackets, it can contain more lines because the spacing between speakers will help with tracking.
The length of the lines in AltspaceVR’s bullhorn captions worked well. They probably didn’t exceed 36 characters.
Size matters in captions. If captions are long in terms of width and contain more than three lines, it causes cognitive overload. Short lines enhance scanning and improve comprehension.
“Operational Overhead Caused by Horizontal Scrolling Text” research from Wayne E. Dick, Ph.D. practically justifies the importance of short and crisp captions. Although the research is about horizontal scrolling on websites, it’s very similar to reading long lines in captions. “Since scrolling is overhead to reading comprehension, the scrolling by itself, is a serious disruption,” Dick writes.
Instead of scrolling, captions with long lines expend more energy in the physical reading of the text. This takes away the ability to comprehend. And you’re missing a lot of the action in the environment. Plus, you can easily lose your place reading on-screen text as the following image shows a long block of text with 17 rows of captions.
I digress. The point is that size matters in captions and they should be scannable. There’s a difference between reading and scanning. Reading means spending more time looking at the captions. Scanning means glancing at the captions for a split second while looking around the VR setting. In the meantime, the number of acceptable rows in captions will need more investigation to determine what will be effective.
Sounds and Speaker Identification
The VR accessibility guide needs to cover sounds and speaker identification. Sound matters. This side-by-side video shows the importance of captioning sounds.
One thing AltspaceVR does well is speaker identification. This is especially true in bullhorn captions. The speaker’s name would appear in brackets and italics like this:
I suggested the team drop the italics. The brackets work well by themselves. Besides, italics are hard for many people to read.
There were a couple of instances of bubble captions where I wasn’t sure whose bubble it was. There’d be two people close enough. The caption bubble would show up in a way that you didn’t know who was talking.
A Painful Tennis Match
I have a history of vertigo. Whenever I played VR games (Duke Nukem!) on a computer or game console, it didn’t agree with me. I’ve attended A11yVR Meetups in a VR setting. It rarely went well.
During the presentation, I minimized my movements to prevent vertigo. Little did I know what waited for me afterward. I had no idea I’d be watching tennis without the tennis ball and racquets. I joined Thomas and the Educators in VR team for a post-mortem. Because of the bubble captions, I had to put myself in a spot to see everyone’s captions.
We had between three and six people in the conversation. I darted my eyes to follow the bubble captions. Some popped up nearly the same time, so I didn’t know which to look at first.
It didn’t take long for my eyes to start hurting. That’s why I prefer the bullhorn captions. They keep all of the text in one place and identify the speaker. If they add a dark box behind the captions, it will be marvelous.
Next Steps for Captioning VR
I’d like to see VR accessibility guidelines for captions and subtitles similar to Captioning Key‘s. They’re a stellar example of captioning guidelines. No jargon, short sentences, and clear explanations accompanied by examples.
In closing, the Educator in VR experience resembled Wimbledon. Part of the excitement was being in the stands and discovering new players. Except, in this case, the players were two different styles of captions. During the game, my head moved back-and-forth, side-to-side to follow the tennis ball in the form of bubble captions.
Considering the captions are in beta, this is amazing work so early in the game. VR is a work in progress and evolving every day. I look forward to providing input on captions for VR, seeing AltspaceVR grow, and checking it out with a VR headset.
- Altspace VR announcement
- BBC Subtitles for VR
- Oculus accessibility VR guidelines
- XR Association Developers Guide