Digital Accessibility's Gap: AI to Bridge Mobile and Web Barriers

Summary

Mobile accessibility remains challenging, with 60–80% of users encountering barriers across common apps. Jason Tan frames the challenge through four apocalyptic horsemen: lack of mobile-specific standards, weak automation tools, opaque UI structures, and restrictive platform APIs. He proposes a new automation layer to bridge gaps.

Michael Bervell adds a human-tech lens arguing AI can amplify accessibility professionals’ impact when used wisely. A study has found “cyborg” workflows outperform traditional ones known as centaur. While AI shows promise for automating up to 95% of WCAG remediation, human expertise remains irreplaceable. Be a cyborg, not a centaur.

Image Description: A vibrant digital illustration set in a peaceful twilight meadow shows four racially diverse professionals sitting at desks with laptops.

This article is based on Jason Tan and Michael Bervell, co-founders of TestParty, who talked about Digital Accessibility’s Gap: AI to Bridge Mobile and Web Barriers at A11yNYC.

Mobile accessibility Four Horsemen

Jason Tan reveals that around 20% of iOS users use larger text. He references a mobile accessibility survey published by the American Federation for the Blind. The survey looked at diverse application groups from crypto to banking to ordering. Overall, mobility accessibility remains a big barrier. For most apps, 60% to 80% of the respondents run into a barrier.

That leads to the four main issues of mobile accessibility, which are like four apocalyptic horsemen. The first one is around standards. Many may be familiar with WCAG, Web Content Accessibility Guidelines. So, where are accessibility guidelines for mobility?

To have a thriving ecosystem of developers and open source tools requires a unified standard. The W3C has a working draft on mobile accessibility that adds different controls that don’t exist on the web.

The second thing that makes this mobile accessibility hard is the lack of automation. There are some open source tools like Appium, but they tend to be hard to use. In fact, some companies choose to skip user interface (UI) tests because the frameworks are hard to maintain.

Without kind of the right open source automation tools, it’s hard to start systematically testing. On OpenAI, with Operator, you can tell it to do something. It uses Puppeteer and Playwright and takes screenshots and navigates. This can’t be done on mobile unless you’re crafty and trying to overengineer a lot of things.

The third issue is the opaque element structure. Deque’s aXe-core is one of the best open source frameworks that primarily relies on embedding itself within the HTML DOM. Yet, there is no such DOM in mobile. If you’ve tried cross-platform mobile, like React Native, you might have seen the React Native debugging tree. It is technically available for you to try to crawl, but that is nowhere near the accessibility provided by the HTML DOM.

DOMs are crucial for constructing the relationship between elements on a screen. Without this sort of representation, you must rely on simple visual analysis. The Document Object Model (DOM) shows the structure of how a page is overlaid.

For mobile, you can get a view hierarchy. That’s helpful for debugging, but it won’t help for things that don’t appear. This isn’t useful for automated testing.

Hence, the fourth issue is platform limitations. Apple and Google have restrictions on their APIs for what you can and can’t do. While they have a lot of prebuilt components, they only have a couple of fields available for accessibility. Some are useful while others are restrictive. That makes it hard to navigate how to make things accessible.

Automating mobility accessibility testing

TestParty has built an automation framework that strips away the ugly parts of Appium to create a new standard that is closer to Puppeteer. Developers can write a YAML-based test file, navigate through screens, and create hierarchies around them.

It lets you capture the element types and the various relationships inside a hierarchy. That hierarchy can be used for automated testing. The tool can do it while detached from the Apple ecosystem or the Google ecosystem, as it’s strictly within VSCode. This lets you take control of the running of a device and test things remotely.

Combining AI and humans

Michael tells the story of one of his favorite superheroes, who is half cybernetic robot and half human. The combination of the person’s technology and humanness makes them a superhero. Along these lines, people can use AI to turn themselves into something like an accessibility superhero. It’s a combination of assistive AI tools plus the human heart.

Where we are today with WCAG remediation automation

Where are we today? This is chapter 1, episode 1, season 1. This is the pilot. Where are we today? Today, 94% of the million most visited home pages are inaccessible per the WebAIM Million. There’s been a little progress, which is good.

Nonetheless, the web is also becoming more complex. This year, there were 11.8% more elements on these home pages. That means there are 11% more things to test than last year. The web will continue to grow more complex because people will get more creative and do it with more interesting new components.

The reality is that manual audits are slow. It takes months to complete a digital accessibility audit. Michael’s company went through SOC 2 compliance (system and organization controls), and it took eight weeks to complete, and then another three months to get the complete report of the audit.

Accessibility professionals have a lot on their plates, as almost 61% of accessibility employees do not have the resources they need to do their accessibility work effectively. They’re giving 110% to make the web 5% more accessible than last year. How can we do better? By using artificial technology.

What can AI do?

Artificial intelligence in this context is the Frontier large language models (LLM) like Gemini and ChatGPT trained on massive data sets. These large language models essentially work by taking huge amounts of data, adding weights to that data, in terms of how they interpret it, and making predictions. These predictions are getting good.

All these models can score well on any standardized test. You might think you don’t need a lawyer when an AI model can score high on the LSAT. Benchmarks show that these AI tools are getting better and better and better at general logic and general reasoning.

Michael was a lead research assistant on a Harvard Business School study that came out in September 2023. It delineates what types of learning AI are best at versus worst at.

In the experiment, the researchers worked with the Boston Consulting Group’s (BCG) global workforce. They had them do 18 tasks around creativity, analytics, and persuasion. Half had access to ChatGPT GPT-4. And the other half had no access.

What the study found was that those who had access to AI with no training were able to work about 12.5% faster. They did 12.5% more work, 25% faster, and 4% higher quality.

Some people were 10% better than their other AI-using counterparts. The researchers wanted to know what the difference was. There was a group who called themselves centaurs, the mythological half-horse, half-human creatures, who divided the work between themselves and AI.

It’s almost as if you were to use a calculator and ask, “What is 2+2?” It would tell you “4.” They would use 4 to do the rest of their work. That’s centaur work. The AI is an assistant to their thinking.

The other half of consultants who performed better were called cyborgs. They integrated AI into their workflow. They continually interacted with technology. On average, the cyborgs prompted AI 3.5 times more than a centaur.

A centaur might say, “Hey, I have to design this new marketing campaign for Nike. Give me 10 example names.

Whereas a cyborg would ask that question and then say, “I like two of the names. Give me more that are like that.” They kept refining it to another subset. They used as almost like a sounding board or a teammate.

There are two takeaways. The first is that the expert AI is going to be better than the average human. And the expert human will be better than the average AI. The Frontier AI model can turn an average individual into an almost-expert in any field. However, experts in specific fields beat AI.

The second takeaway is that how someone uses AI matters more than just using AI. In other words, be an AI cyborg, not a centaur.

One way to be more effective is to integrate AI into your everyday work. Good examples of this in action are Grammarly, Be My Eyes, GitHub Copilot, and Cephable.

Three practical WCAG AI remediations

Can you automate the WCAG? This applies to remediation, not testing. Michael asked AI how much of the WCAG AI can automatically remediate.

DeepSeek said 70%: 20% could be fully automated, plus 50% with a human in the loop.
Claude said 75%. 17% fully automated plus 60% with a human in the loop.
ChatGPT said 80% automation, 50% fully automated, plus 30% with a human in the loop.
Gemini said 95%. 56% fully automated, plus 39% a human in the loop.

What can’t AI remediate when it comes to Web Content Accessibility Guidelines? In summary, the experts in accessibility will always be valuable because AI can’t replace the expertise.

AI can’t replace what humans do at their best, which is to be empathetically human. However, AI can help you always operate as your best human. An accessibility superhero has the AI tools, the heart, and their knowledge. The key is to keep the human in the loop. This is the best way to moderate AI systems to ensure the outputs are accurate.

This is where the centaur versus the cyborg mentality is the most different. A cyborg is constantly involved in the loop of the AI interaction. When someone prompts 8 times in a row, they’re validating, re-prompting, and re-validating.

Human-in-the-loop AI technology is using spellcheck, autocomplete for forms, and fraud alerts. A fraud alert involves a human texting the human asking if they purchased something that seemed out of the ordinary.

Now apply this concept to an AI agent in the loop. Can we have an agent looking at the results and validating them? How do we use technology to validate technology to create better and more effective models within a specific niche like accessibility, security, design, and usability testing?

The question then becomes: How good is your validation agent? Perhaps, you can have a positive validation agent that says something looks right. Then, a negative validation agent says it looks wrong. They give feedback differently. Put them together and it turns into a cyborg AI system rather than a human-in-the-loop centaur system.

Watch the Presentation

Bios

Jason Tan is the co-founder and CTO of TestParty, a startup automating digital accessibility testing across platforms. Jason studied Computer Science, Economics, and Latin at Princeton, and previously worked as an iOS engineer at Twitch, where he encountered accessibility challenges during a live lawsuit.

At TestParty, he brings a uniquely technical and humanistic lens to mobile and web accessibility. Jason is passionate about building developer-first tools that don’t just detect problems — but help fix them.

Michael Bervell is the CEO and co-founder of TestParty, an AI-powered digital accessibility platform that automates WCAG remediation across web and mobile. He previously consulted on accessibility for Google and the United Nations and was awarded an NSF SBIR grant for advancing automated compliance technologies.

A published author and Harvard graduate, Michael’s work sits at the intersection of AI, education, and inclusion. He’s passionate about building tools that make the internet equitable for everyone.