Using Microsoft’s Seeing AI in Day-to-Day Life

Sofia Gallo, Accessibility Consultant and Screen Reader User

by Sofia Gallo

An Introduction

Last month, Microsoft introduced Seeing AI – an app that would describe the world for visually impaired people using artificial intelligence technology.

Due to my limited vision, I am constantly looking for new tools that will help me capture important visual information around me, so I decided to try Seeing AI and find out what it could do using real-life situations where having visual information would be useful.

The app’s layout is very intuitive. The camera view takes over most of the screen. At the bottom, there are different channels you can select depending on what you want Seeing AI to recognize. The channels are Document, Short Text (it will read as you move up and down the page), Product (recognizing a barcode), Person, and Scene (a beta feature). At the top of the screen, there is a “quick help” button that explains what the selected channel does and how to use the app. I tried Seeing AI in four different situations that required different channels within the app.

Scenario 1: Document

Going to a restaurant is often frustrating, because there is no way for me to read the menu, and, if it is very long, it would take a while for someone else to try to read it all. The next time I went out, I tried seeing AI on the restaurant menu, hoping it could solve this problem for me. Taking a picture of the document was the hardest part, because I had no idea what the camera was pointing at. While I was able to take a decent picture with my limited vision, this may be more difficult for someone who is totally blind. After taking the picture, however, I had a text-only version of the menu on my phone, with the names of the dishes, a description, and the price. The menu was clear, even though the restaurant was relatively dark, and the menu had decorative drawings around the page which could have made text recognition difficult.

Scenario 2: Product

I often like to order everything I need online in order to avoid trying to navigate the grocery store. However, when multiple products come in the same box, it can be difficult to know what each item is, which Seeing AI’s Product feature could fix. The app uses the barcode to recognize the product. As you move the camera around, the app starts to beep. If beeps are getting faster, it means the app is closer to recognizing the bar code. Unfortunately, I could not get this feature to work. I got to the point where the app was beeping quickly, indicating that I was close, but I did not get a result even after I held the camera at the spot where the fast beeps began for about a minute. I also tried using a different product, tried to adjust the camera better when the fast beeping started, and even asked a sighted person to watch and make sure the entire barcode was visible and focused, but Seeing AI did not recognize the product.

Scenario 3: Person

After focusing on mainly text-based features (document and barcode), I decided to try person recognition. When this channel is selected, the app will actually tell you if a face is visible, where in the screen it is (center, right edge, left edge), and how far it is. All this information makes it easier to take an accurate picture without having to see what the camera is pointing at. When I took a picture of myself smiling, Seeing AI described me as a “24 year old woman with black hair looking happy.” The age varied a little depending on how close I was, but I got a range between 20 and 27 on different tries. (I am 22, so that was actually a pretty good estimate.) The description was very impressive. Most face recognition technology I have used only tells you if there is a face on the screen and what expression it has, but definitely no age, gender, or hair color, which are very useful pieces of information. I wish eye color was mentioned too, but that is a minor point given how much Seeing AI is able to say about a face compared to other technology. This feature can give very useful information about what a face generally looks like and what expression it has. Too bad it is probably not wise to whip out your phone and take a picture of your parents as they sternly tell you they must talk to you immediately, or when your teacher is about to hand out grades, or when you are in the middle of an argument – all situations where it would be extremely helpful to know what expression people have on their face!

Here is a video of me trying the Person channel.

Scenario 4: Scene (Beta)

Even though I am visually impaired, I have a strange obsession with pictures. There is something about going to a grand historical mansion or a panoramic natural landscape, getting a picture to remember the moment (even if I can only see some details of it) and share the experience with friends on social media. It would be even better, however, if I could get a detailed description of my pictures so I could remember smaller details later. Given my strange obsession with panoramic views, I decided to try Seeing AI’s scene feature, even though it is still on beta. So, on an outing to the Brooklyn Bridge with my family, I took a picture of part of the bridge with the people walking in front of me. Seeing AI’s description was amazingly detailed: “people walking on a bridge over a body of water”. To get an idea of how impressive this is, the iPhone’s photo recognition technology – which is pretty advanced — only told me that the picture was on portrait mode and that there was a bridge. When I took a picture of the Manhattan Skyline, Seeing AI only told me there was a white building (there were multiple buildings and the sky was clearly visible, so this description was not as useful).

Seeing AI is definitely an improvement when it comes to describing scenes, but for now, my dream of having pictures described in detail is not yet a reality. However, the fact that Seeing AI can describe the basic features of a scene in detail and it is only in Beta is a good sign of things to come.

Here is a video of me trying the Scene channel.

Overall Takeaway

Except for some difficulty with the Product channel, Seeing AI gave me a lot of useful information that I would otherwise miss. The Person and Scene channels were especially impressive, because other established apps do not even come close to providing so much information. The Document channel was a useful addition, but apps have been able to recognize text in documents for a while now, so it was not as surprising as the others.

Even with photo recognition apps, there are still a lot of important visual cues that I miss as a visually impaired person: street signs, information on grocery store aisles, and body language from other people are just a few visual aspects of the world that would be useful to know but that are often not accessible. However, artificial intelligence technology is fairly new, so it will probably improve over time and become even more accurate. Seeing AI is at the forefront of this technology, and it has already demonstrated the endless possibilities that AI technology presents as it continues to grow.

Sofia Gallo graduated from Princeton University with a Bachelor of Arts in Politics in 2017. She has used screen reader technology for multiple years and is passionate about learning to use new technology, increasing accessibility of applications and websites (especially mainstream commercial products), and using technology to remove barriers to independence for people with disabilities. She lives with her guide dog and best friend, a black lab named Karleen.

2 comments:

  1. What if this technology could be blended with something like Aira’s use of Google Glasses? That technology and service relays information from a live operator who utilizes the camera in the Google Glass verbally to the user. If this Microsoft technology could be put into a more wearable technology via an API, it would be easier to use than snapping a photo with a mobile device. Maybe the user could just audibly request the description from the video feed rather than the still photo that has to be manually taken each time. Here is the link to the Aira technology. https://aira.io/.

    This was a great demonstration of the technology and thank you for sharing!

Leave a Reply

Your email address will not be published. Required fields are marked *