Dragon NaturallySpeaking: Then & Now

Dragon Naturally Speaking logo

by Sam Berman

The first time I tried Dragon NaturallySpeaking, I was in Middle School in the early 2000s. I immediately saw the potential. I was already computer literate, but not having fine motor skills in my right hand, I was not an efficient typist. I tried Dragon a few times, training it to understand me, but ultimately gave it up after growing frustrated with its sensitivity to background noise. In a classroom with only myself and an occupational therapist, noise from the hallway would frequently be interpreted as text to be transcribed. Even with the door closed, it would still pick up the noise. It would hear me cough or clear my throat, and would interpret that as text as well. But having always seen its potential, when Equal Entry offered me a license for Dragon this year, I jumped at the opportunity to try it again. And what a difference fifteen or so years makes. When I tried the software as a teenager, I knew that hardware limitations were holding it back. Playing with Dragon as an adult in the age of voice interfaces, I can tell that the software is finally coming into its prime. CPUs are getting ever more powerful, which not only increases the effectiveness and accuracy of voice processing, but also enables the program to be compatible with a larger variety of microphones. As the hardware catches up, it provides an increasingly reliable platform to work with the application.

I have had mixed results using Dragon on my Mac. It works most effectively when used for dictation. I have been able to dictate most of this article with a very low error rate. It is smooth and responsive and there have been only a few instances of the software misinterpreting background noise as text. It has made a few errors when I mumble or stumble over my own words, but I am confident that the more I use it, the less this will happen. I recall spending days training the software to understand my voice with a dedicated microphone when I first tried Dragon as a teenager. Conversely, in 2017, I have trained the software to understand my voice in a matter of minutes using the microphone built into my computer. Overall, I am impressed with the improvements the Nuance team have made to the dictation portion of the software. There is still a learning curve, but it is not nearly as steep as it used to be. Time has vastly improved the performance of the software and I am confident it will continue to do so.

Although dictation has improved over time, there is still much room for improvement in Dragon’s other functionality. For example, my confidence in its capabilities decreases when I need to edit text that has been dictated. The process for changing the location of the cursor and making corrections to text is cumbersome. I can type messages in Google Hangouts and even operate some common keyboard commands such as “create a new Tab” or “press Enter”. I am running the most up-to-date version of the software, but the browser compatibility seems questionable. Some days, I have no trouble browsing the web and dictating text into Safari, and have difficulty operating Chrome; while other days Chrome is semi-operable but I have difficulty using Safari. Some common commands such as “scroll” and “click” require a very specific sequence of words in order to operate. I would love to be able to use synonymous terms in order to perform these commands. That would make the software much easier to learn and incorporate into my workflow. Navigating a web browser is doable but confusing. More often than not, Dragon simply freezes when working in Safari. While I am able to navigate the web, doing so is unreliable. Many of the most commonly-needed commands just don’t work consistently. Many times, I recited the command “do a web search” to no avail. Other times, it would scroll down the page, but it would never actually put the address bar in focus. Once the focus was on the address bar (either by my navigating to it with the trackpad or by saying the phrase “create a new tab”), the process of dictating a URL is extremely unintuitive. The documentation says to use the phrase “www dot” at the beginning of a web address to indicate to the software that I am in fact giving it a web address rather than plain text. Half the time I try to recite a web address, Dragon writes “W W W .” Not only do I have to correct this, but if I recite even a moderately long address, it frequently puts spaces between the words and I have to correct that as well. I am able to dictate a message in Gmail, but navigating between the fields in the “compose” form has thus far proven impossible. Scrolling webpages is a constant headache. The command “scroll down” doesn’t work, while “arrow down” does. “Scroll up” always takes me to the very top of the webpage, but “arrow up” does nothing. I found the phrase “scroll one screen up” the most reliable way to slowly scroll up a page, but even that doesn’t work consistently.

Let’s take a minute to talk about competition in the speech-to-text space. Right now, Nuance has its fair share of competitors. Many of the big tech companies such as Google, Apple, and Amazon have voice assistants built into many of their hardware products. These assistants are not only built on artificial intelligence, but they have speech-to-text capabilities baked in. They haven’t been around nearly as long as Dragon, but they are already much more powerful. I can tell my phone “Siri, send a message to Bill” and my phone will prompt me, saying “sure, what would you like me to say?” Then I can give it the message and tell it to send the message. This essentially serves the same purpose as Dragon does on my computer. I wonder, what’s stopping these companies from incorporating these virtual assistants into computers? But this has already started to happen. Even in Google Docs, there is a “voice typing” option built in. Newer companies are overtaking Dragon’s territory at an ever faster pace, and seeing as voice is Dragon’s specialty, I wonder why and how the developers haven’t gotten it right yet.

As a lifelong one-handed typist, I would love to rely on my voice to fully operate my computer. Dragon has the potential to be a great platform for this purpose, but in order to make this a compelling option, the features of the software beyond dictation must function more consistently and reliably. If Nuance achieves this, there will be many other people with and without disabilities who will benefit as well.

Sam Berman has two years of experience as an educator and advocate for people with disabilities, and in the aging community teaching computer literacy. In the spring of 2016, he attended a web development bootcamp at the New York Code & Design Academy with the hopes of combining his passions for technology and advocacy into a career. As a consultant for Equal Entry, he taps into these experiences to make the web easier to use for people with disabilities.

One comment:

  1. Great article Sam! I like your overview of the speech-to-text space and how quickly this is being integrated into our daily digital interactions. It makes me wonder how much longer we will translate from one format to another, like speech to text, building towards future conversational interfaces that interpret a wide variety of human inputs/outputs – spoken, written, gestural, etc. However given the challenges of input errors and editing like you described with using Dragon, I can only imagine the learning curve and changes in communication that would be required to use these systems.

