The promise and the pitfalls of AI-generated Image Descriptions

November 10, 2025

Share This Article

Sometimes AI can describe images correctly, but other times it makes mistakes.

We all share content online, making each of us a content creator. By presenting information through both text and images, creators can connect with a wider audience.

Providing text descriptions of images makes content in documents, presentations, websites and social media posts accessible to people who are blind or have low vision, but many images are uploaded without text descriptions. Content creators—and remember that’s everyone—have choices about how they share information in text and images.

The best practice is to include information in both text and image formats. For example, if you are promoting an event, write the details in the text of your post. You can upload a screenshot of a flier, but that is inaccessible content. Someone who needs the information in text would have to use their own software to scan the image for text, or send the image to an AI model, to extract the information from the event flier. These work arounds might potentially introduce mistakes into your content. For example, an AI model might misidentify the number 2 or 5 in an address or a date. These errors would make a person miss your event.

Add descriptions, called Alt Text, to photos that you are posting on social media. Most platforms publish instructions for this in their help documentation. For instance, you could label your photos of birthday parties, or other family occasions.

For instance, you could write “thanks for the lovely gift”. Then you could give your photo the description “person unwrapping a present”. If you wrote “thanks!” without adding Alt Text , then someone who can’t see the image of the present would not know what you are talking about.

If you forget to add Alt Text descriptions, some of your friends who want to know about the image will have to find ways to access that information. Blind people are using various AI tools to describe images when content creators—ordinary people posting online—do not write their own descriptions. AI-generated image descriptions can be helpful, but they sometimes contain mistakes that do not accurately represent the image quality intended by the content creator—and that could be you!

I will begin by explaining how people who are blind use assistive technologies like screen readers (voice output) to access digital content including articles, social media posts, documents, and presentations. Next, I will define Alt Text and image descriptions. Then, I will explore how AI-generated descriptions are created.

Finally, I strongly recommend that content creators provide text descriptions of images that they post online. AI can be used as a starting point, but there’s no substitute for a human checking AI-generated image descriptions for accuracy. This approach aims to make more digital content accessible to a wider audience by ensuring that images are described and that the information contained in images is available to everyone.

The accessibility settings on many devices control the appearance of text and images on screen. People who are blind or have low vision may change the way that text and images are displayed on their phones, tablets, and computers. Accessibility settings allow users to change the color contrast and font size. People who are low vision may use screen magnification to enlarge both the text and images that appear on their screen.

Blind people rely on screen readers, software that provides voice output in synthetic speech for the text that is displayed on their devices. Additionally, they may connect their device to a refreshable braille display and read the text with their fingers. To learn more, read this article by Access Lab linking to videos of people using screen readers.

Screen readers cannot directly interact with images. For this reason, the Web Content Accessibility Guidelines, WCAG, mandate text alternatives for images whether they are pictures, icons or buttons.

I use both the terms Alt Text and image description in this story because they have similar functions. Alt Text should be short, and it should give a summary. Image description can be longer, and this is particularly needed for complex images such as works of art.

Content creators—everyone who posts online—should provide text descriptions of images to make them accessible. Alt Text,” short for alternative text, is an HTML attribute that labels an image with a description. Many websites and apps allow users to enter Alt Text into a metadata field that is associated with each image that they post.

Medium staff explain how to add Alt Text to images in your stories. The information is about halfway through this article on Tips and Tricks for Medium Writers. Follow the link that they provided for more information.

Now, I will expand on the explanation provided by Medium.

Accessibility: Screen readers rely on Alt Text to convey the meaning and purpose of images to users who are blind or have low vision.
SEO: Search engines use Alt Text to understand the context and content of images, which can improve a website’s ranking in search results.

Usability: In cases where images fail to load or are blocked, Alt Text ensures that users can understand the content.
I copied these examples from The Importance of Alt Text for Screen Reader Users: A Guide to Best Practices and Accessibility.

• “Woman in a wheelchair smiling and waving at the camera”

• “Pie chart illustrating the proportion of renewable energy sources, with solar energy being the highest at 40%”

• “Scenic landscape of the Rocky Mountains with a blue sky and a forest in the foreground”

People often confuse Alt Text with photo captions. Compare the alt text and the photo captions from a report about inaccessible ed tech.

Alt Text	Photo Caption
Donut Chart Showing 60%	60% of educators reported that their blind and low vision students could not access at least one classroom digital learning tool.
Donut Chart Showing 35%	35% of educators reported that their students could not access at least two tools.

These examples demonstrate that the Alt Text says what the graph looks like, and the photo caption tells the reader why the graph matters.

Now that you have read examples of Alt Text, think about how a blind person would experience images when Alt Text is not provided. When Alt Text is missing, a blind person using a screen reader would hear “graphic,” or “unlabeled graphic.” At times, the screen reader will say the file name that might contain words or a long string of letters and numbers. In some cases, they may not hear anything at all.

Alt Text is often missing from digital content. The 2025 report on the accessibility of the top 1,000,000 home pages found that 18.5% of all home page images (11 per page on average) had missing Alt Text.

Despite the ongoing efforts of blind people and our allies, there are millions of undescribed images on the web and in social media posts. A few years ago, technology companies began providing automatically generated descriptions for images that lacked Alt Text as a temporary solution to this problem. I referred to a temporary solution because AI models are not completely accurate. Here is a simple explanation of how machine learning can be used to analyze images.

Initially, AI models for image analysis require human input. This training data consists of thousands of images that are paired with a human-created annotation naming the objects in the image. Eventually, the models are tasked with identifying objects in images that are not paired with human annotations.

Software algorithms use computer vision and natural language processing to analyze images. Image Analysis includes content tags for thousands of recognizable objects, living beings, scenery, and actions. Tags are generated for objects in both the foreground and the background of the image.

AI-generated descriptions have improved with time. For example, Facebook used to identify photos of my friend Joe as “glasses.” The automated image detection did not describe other characteristics like his blond hair or the color of his shirt. Now, when an AI examines an image, it identifies and labels specific components within the picture. At some point, Facebook updated its algorithm to associate my friend’s name with his photo.

When prompted to describe an image, AI models organize the list of tags found when the image was analyzed. Then the AI responds in complete sentences. Descriptions are generated quickly, but they may contain inaccurate statements.

Think of AI as that co-worker who has more confidence than knowledge. AI may not tag images correctly because of the limitations of training data. Yet, it will confidently describe an image including mistakes alongside the factual information.

When I refer to the limitations of training data, I mean that the AI cannot respond well to objects that were not included in the training data used to create the AI model. The specialized images found in museum collections are a good example of these limitations. Museum collections contain many unique items that may not have been found in the images in training data.

I first noticed this problem in 2022 when I compared three automatically generated images of a historic cannon. I wrote a more detailed blog post at the time, but for now, I will summarize my findings.

Description	Reason
green and brown metal pipe	barrel of the cannon is hollow
a close-up of a sword	tagged as a weapon
sports equipment	wrong tags for the barrel and wheels

The first two examples, algorithms developed by Google and Microsoft, focused on the long metal barrel, but they ignored the cannon’s wheels. In the third example, the algorithm developed by Apple described the photo as “sports equipment”. It correctly identified the parts of the cannon, the wheels and the long metal barrel, but it categorized these elements as exercise equipment rather than as parts of a historic cannon. The three descriptions were inaccurate because many different objects can have similar shapes. The long cannon barrel is similar in shape to a long metal pipe.

Here is the human-generated description for this photo. “A deep green colored bronze cannon sits on a wooden carriage with two small wooden wheels.” Photo Credit, Fort Ticonderoga Museum Collection.

This description was written by the museum staff who are subject-matter experts on the objects in their collections. They describe the bronze (metal) cannon and the wooden wheels of the carriage that it rests upon. There’s no doubt that the object in the photo is a historic cannon. It is not a pipe, a sword, or sports equipment as was suggested by software algorithms.

The algorithms developed by Google, Microsoft, and Apple, among other technology companies, are designed for general use to identify common objects that occur in photos like people, clothing and trees. It is not surprising that commercially available software algorithms do not recognize images that contain unique objects like a historic cannon. For this reason, image descriptions written by subject matter experts are more accurate than AI-generated descriptions.

Although AI-generated descriptions offer a useful starting point, they require careful verification. These tools can support content creators in improving accessibility, but the benefits are realized only when the output is thoroughly fact-checked. Writing Alt Text for images makes digital content accessible to blind people using screen readers.

Content creators can describe these images accurately because they are experts in the subject matter of their own articles, social media posts, documents, and presentations. You understand your material better than an AI model.

Content creators can check for accuracy, and they can correct mistakes made by over-confident AI models. They may opt for writing their own Alt-Text and avoiding the problems with automatically generated image descriptions.

I will end this story by returning to example Alt Text. You might write this post about a hypothetical family vacation. “Check out my selfie with the big cannon”, and the Alt Text would read “person stands behind a cannon”. Then on another post, you might write: “We celebrated Mom’s birthday” with Alt Text reading “person blowing out candles on a birthday cake”.

I hope these examples give you a good idea of the importance of creating accessible content with Alt Text, and I hope that I have inspired you to add Alt Text to your images so everyone can appreciate your content. This gives you control because blind people using screen readers won’t have to ask AI to describe your images, and potentially be given a description that is full of mistakes.

Here are some resources about accessibility, Alt Text, screen readers, and AI-generated image description.

What is a screen reader?

Diverse Abilities and Barriers, How People with Disabilities Use the Web

The Importance of Alt Text for Screen Reader Users: A Guide to Best Practices and Accessibility

How to Meet WCAG,

Version 3.5.1

The WebAIM Million

Image Tagging article from Microsoft

How Facebook is using AI to improve photo descriptions for people who are blind or visually impaired

This is not an apple! Benefits and challenges of applying computer vision to museum collections