Dragon NaturallySpeaking: Then & Now

Equal Entry
July 11, 2017
Dragon Naturally Speaking logo
Dragon Naturally Speaking logo

by Sam Berman

The first time I tried Dragon NaturallySpeaking, I was in Middle School in the early 2000s. I immediately saw the potential. I was already computer literate, but not having fine motor skills in my right hand, I was not an efficient typist. I tried Dragon a few times, training it to understand me, but ultimately gave it up after growing frustrated with its sensitivity to background noise. In a classroom with only myself and an occupational therapist, noise from the hallway would frequently be interpreted as text to be transcribed. Even with the door closed, it would still pick up the noise. It would hear me cough or clear my throat, and would interpret that as text as well. But having always seen its potential, when Equal Entry offered me a license for Dragon this year, I jumped at the opportunity to try it again. And what a difference fifteen or so years makes. When I tried the software as a teenager, I knew that hardware limitations were holding it back. Playing with Dragon as an adult in the age of voice interfaces, I can tell that the software is finally coming into its prime. CPUs are getting ever more powerful, which not only increases the effectiveness and accuracy of voice processing, but also enables the program to be compatible with a larger variety of microphones. As the hardware catches up, it provides an increasingly reliable platform to work with the application.

I have had mixed results using Dragon on my Mac. It works most effectively when used for dictation. I have been able to dictate most of this article with a very low error rate. It is smooth and responsive and there have been only a few instances of the software misinterpreting background noise as text. It has made a few errors when I mumble or stumble over my own words, but I am confident that the more I use it, the less this will happen. I recall spending days training the software to understand my voice with a dedicated microphone when I first tried Dragon as a teenager. Conversely, in 2017, I have trained the software to understand my voice in a matter of minutes using the microphone built into my computer. Overall, I am impressed with the improvements the Nuance team have made to the dictation portion of the software. There is still a learning curve, but it is not nearly as steep as it used to be. Time has vastly improved the performance of the software and I am confident it will continue to do so.

Although dictation has improved over time, there is still much room for improvement in Dragon’s other functionality. For example, my confidence in its capabilities decreases when I need to edit text that has been dictated. The process for changing the location of the cursor and making corrections to text is cumbersome. I can type messages in Google Hangouts and even operate some common keyboard commands such as “create a new Tab” or “press Enter”. I am running the most up-to-date version of the software, but the browser compatibility seems questionable. Some days, I have no trouble browsing the web and dictating text into Safari, and have difficulty operating Chrome; while other days Chrome is semi-operable but I have difficulty using Safari. Some common commands such as “scroll” and “click” require a very specific sequence of words in order to operate. I would love to be able to use synonymous terms in order to perform these commands. That would make the software much easier to learn and incorporate into my workflow. Navigating a web browser is doable but confusing. More often than not, Dragon simply freezes when working in Safari. While I am able to navigate the web, doing so is unreliable. Many of the most commonly-needed commands just don’t work consistently. Many times, I recited the command “do a web search” to no avail. Other times, it would scroll down the page, but it would never actually put the address bar in focus. Once the focus was on the address bar (either by my navigating to it with the trackpad or by saying the phrase “create a new tab”), the process of dictating a URL is extremely unintuitive. The documentation says to use the phrase “www dot” at the beginning of a web address to indicate to the software that I am in fact giving it a web address rather than plain text. Half the time I try to recite a web address, Dragon writes “W W W .” Not only do I have to correct this, but if I recite even a moderately long address, it frequently puts spaces between the words and I have to correct that as well. I am able to dictate a message in Gmail, but navigating between the fields in the “compose” form has thus far proven impossible. Scrolling webpages is a constant headache. The command “scroll down” doesn’t work, while “arrow down” does. “Scroll up” always takes me to the very top of the webpage, but “arrow up” does nothing. I found the phrase “scroll one screen up” the most reliable way to slowly scroll up a page, but even that doesn’t work consistently.

Let’s take a minute to talk about competition in the speech-to-text space. Right now, Nuance has its fair share of competitors. Many of the big tech companies such as Google, Apple, and Amazon have voice assistants built into many of their hardware products. These assistants are not only built on artificial intelligence, but they have speech-to-text capabilities baked in. They haven’t been around nearly as long as Dragon, but they are already much more powerful. I can tell my phone “Siri, send a message to Bill” and my phone will prompt me, saying “sure, what would you like me to say?” Then I can give it the message and tell it to send the message. This essentially serves the same purpose as Dragon does on my computer. I wonder, what’s stopping these companies from incorporating these virtual assistants into computers? But this has already started to happen. Even in Google Docs, there is a “voice typing” option built in. Newer companies are overtaking Dragon’s territory at an ever faster pace, and seeing as voice is Dragon’s specialty, I wonder why and how the developers haven’t gotten it right yet.

As a lifelong one-handed typist, I would love to rely on my voice to fully operate my computer. Dragon has the potential to be a great platform for this purpose, but in order to make this a compelling option, the features of the software beyond dictation must function more consistently and reliably. If Nuance achieves this, there will be many other people with and without disabilities who will benefit as well.

Sam Berman has two years of experience as an educator and advocate for people with disabilities, and in the aging community teaching computer literacy. In the spring of 2016, he attended a web development bootcamp at the New York Code & Design Academy with the hopes of combining his passions for technology and advocacy into a career. As a consultant for Equal Entry, he taps into these experiences to make the web easier to use for people with disabilities.

Equal Entry
Accessibility technology company that offers services including accessibility audits, training, and expert witness on cases related to digital accessibility.

5 comments:

  1. Great article Sam! I like your overview of the speech-to-text space and how quickly this is being integrated into our daily digital interactions. It makes me wonder how much longer we will translate from one format to another, like speech to text, building towards future conversational interfaces that interpret a wide variety of human inputs/outputs – spoken, written, gestural, etc. However given the challenges of input errors and editing like you described with using Dragon, I can only imagine the learning curve and changes in communication that would be required to use these systems.

    1. Thank you for your comment, Shannon! The learning curve for Voice User Interfaces is so steep, especially for users with cognitive impairments. For example, most of the virtual assistants that I have interacted with require requests to be made verbally and in complete sentences. For certain people with certain types of impairments, it may be hard for them to formulate a thought in a full sentence before verbalizing it. For other people, they may be able to form the thought in either complete or partial sentences, but they may have difficulty verbalizing aloud. Developers of these platforms need to do a better job of incorporating the broad array of communication styles that exist between people. Google stands out in my mind as one of the leaders in embracing this idea. The Google Assistant can not only be spoken to, but it can be typed to! I love this in part because the ability to type to these assistants is not only beneficial to users who have disabilities, but to all users since as these technologies become more widely adopted and utilized, people may find themselves wanting to communicate with them in situations in which it wouldn’t be appropriate to speak aloud.

  2. Scrolling on Safari would never work for me, but thank you for the “arrow down” suggestion, that is good to know. My workaround was to create a command that made saying “scroll down” have the same effect as saying “press the space bar”, because usually hitting space on Safari will scroll the page down, but it only works some of the time.

    I wonder if it is easier to use Dragon to navigate the Internet if you are using a PC instead of a MacBook. I have never used Dragon on a PC, but I suspect that might be significant because Nuance.com sells a lot more options for PCs , and voice to text software is often used by people in careers that typically rely on PCs (such as hospital employees), so I can’t help but think it was optimized for that setting.

    Also, as far as I can tell, there is no way to write a command using the function key, a key that MacBooks have, but PCs do not. I think that is actually why the scroll down command never works – on a MacBook, scrolling requires the function key and an arrow key, but the command is written just using an arrow key.

    It might be something worth researching before you buy your next computer. I’ve developed cubital tunnel syndrome and tennis elbow, and am in the middle of writing my thesis, and I have found Dragon on my Mac significantly better than the dictation option that is already available on my computer. But you’re right, navigating the Internet is a headache, and I find the program kind of buggy. Restarting my computer regularly seems to help, but it is still frustrating.

    1. Thanks for your comment Brittany. You’ve brought up several points regarding software cross-functionality between PC’s and Mac’s that are definitely worth exploring.

  3. I love your resourcefulness, Brittany! As an assistive technology user myself, I have also learned to implement workarounds or “hacks” to make my workflow more efficient. I think there is definitely some legitimacy, at least anecdotally, to your suggestion that using Dragon to navigate the internet is more effective on a PC vs a Mac. Based on my knowledge of the company, I think that Nuance has more experience developing the application for PC than for Mac. I’m glad to hear that it’s helping you though and I agree that having to restart your computer every time the program stops working is far from an optimal solution. Hopefully with time, the Mac application will improve and become more reliable.

Comments are closed.