How Do You Add Alternative Text and Metadata to glTF Objects?

Image Description: A box of Pocky with sticks dipped in chocolate. Next to the box is a text: alt = "Pocky"

The Web Content Accessibility Guidelines require all non-text content to have a text alternative provided. Though this requirement has existed for over 20 years, there is no official way to add it to 3D content. This article proposes a method for providing and embedding the alternative text into the metadata of the file and having various software platforms recognize it. Here’s how to do that using the glTF file format.

Let’s talk about robust for a moment. We don’t want screen readers to have to write custom code for every application that they use. To be interoperable, there’s needs to be a single defined way to provide a text alternative that can be accessed consistently and easily.

New to WebXR?

If you want to start working with 3D on the web, look into WebXR and the A-frame Framework. This will cover the gLTF format and how to embed metadata into the 3D models.

Model Viewer Accessibility

The ModelViewer web component is a 3D model viewer designed to help developers quickly visualize 3D models. Developers can view models in a variety of formats, including glTF (GL transmission format), OBJ, STL, and FBX. It also lets developers manipulate the model’s camera, lighting, materials, and animations. They can customize the model’s lighting and materials to give the model a unique look and feel.

The glTF file format is used for 3D scenes and objects that can be used for web-based content. These include gaming, augmented reality, and virtual reality. It is a royalty-free format that is rapidly becoming the industry standard for 3D content.

The format provides an efficient, reliable way to deliver 3D content with high-quality graphics. GlTF is a JSON-based file format that supports geometry, materials, animation, and other properties. The format can be used to create interactive 3D experiences across multiple platforms.

Why should someone have to enter alt text each time they use the glTF file on a website? (GlTF file extensions can be gltf or glb.) With embedded metadata, this information can be included and be overridden when necessary.

Model Viewer

Below, a 3D model of Pocky is displayed using model-viewer element. Use a mouse or an arrow key to rotate the object.

Here is the code snippet of the model viewer above with the label and description added manually.

<model-viewer aria-label="Pocky"
  aria-description="It consists of coated biscuit sticks. It was named after the Japanese onomatopoeic word pokkiri, which is supposed to resemble the sound of the snack being cracked."

How to Embed Metadata in a glTF

There are two ways to embed metadata in a glTF.

1. Using extras

In this technique, you open the glTF file in a text editor such as Notepad. These glTF models are convenient because you can edit them in any text editor. Once you open it you will be able to see the file in a similar structure to a JSON file. This object contains all the information necessary to generate the 3D model. The format also allows you to add custom data under the ‘extras’ property.

In this example, open the glTF model in a text editor.

Inside the object under asset.extras, add the following:

“asset”: {
  “extras”: {
    ‘title’: “Shiba Inu”,
    “accessibility”: {“description”: “Object description”, “Price”: 10}

However, for a permanent accessibility solution, these attributes should not be put under extras. They should have a defined location for developers to set them as attributes of the “asset” property. Since the glTF format does not allow the addition of anything related to title and/or description outside of extras, we logged this issue on the glTF spec about making “title” a core attribute of “asset.”

2. Using the xmp extension

The Khronos group recently published a guide on using XMP metadata with 3D models.  Using the XMP extension is similar to the method previously mentioned except that it requires a lot more lines in the json. Also, you will have to follow the Dublin Core specification. Dublin Core is a set of metadata elements used to describe digital resources.

The core elements are Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, and Relation. These elements can be used to describe any type of digital resource, such as web pages, images, video, and audio. Dublin Core is widely used by libraries, archives, and other organizations to provide a consistent way to describe digital resources and make them easier to find in search results.

You can add information like title and description using glTF-transform and it will demand a bit of coding knowledge to set it up. Another option would be doing it manually like in the previous section.

Here is a tweet from Don McCurdy with a demo of the glTF-transform tool.

Text Alternative Is Important!

As mentioned in the introduction, it is crucial for any media to have descriptive metadata, whether it is 2D or 3D. This ensures people with disabilities using assistive technologies can understand the content.

In conclusion, the glTF format has the potential to store descriptive metadata. However, currently, it requires actively creating the property using one of the ways described in this article using “extras” or “xmp extension.” Alternative text should be a built-in property that all content providers must fill and all 3D platforms should support.

Virtual Reality (VR) Accessibility Consulting Services

Our years of experience working with virtual reality and being speakers on the topic have given us a unique perspective when it comes to consulting on VR projects. If you’d like to innovate in the accessibility of VR please, please contact us to discuss how we can help you.

Celso Yamashita


  1. Your idea sounds very good for accessibility, though I don’t know anything about 3D content myself. You certainly need a way to describe such things. Just one small point, alt texts are there to tell blind people what an image or other content shows, not just to give the name of it. So the alt text for the image of Pocky sticks should say something like “A box of Pocky, long biscuit sticks dipped in chocolate, by Glico.” That describes what is actually shown in the image, giving a blind person (who may not know what Pocky is) the same information that sighted people are seeing.

    (To be strictly comprehensive, you could also add the advertsing slogan text shown at the foot of the pack, though that blurb isn’t quite so important as the rest of the description.)

    1. Hi Guy, thank you for your thoughts on this topic. One thing we are looking to continue exploring with the community is how to best make more information available from the objects. Ideas that have been explored are having names, long descriptions, descriptions of the weight, height, color, and texture for example. One thing we want to be careful about in 3D envirvonments is overloading the user with too much of a description at first. If you can try our convenience store example you will see our illustration was the idea of multiple products on a shelf. The idea is to have a very short name so that the user can then query the object for a longer description that would be more verbose.

Let us know your thoughts!

Your email address will not be published. Required fields are marked *