When AI Sees the World Through One Lens: What Bias in Image Generation Really Looks Like

When AI Sees the World Through One Lens: What Bias in Image Generation Really Looks Like
Artificial intelligence is now not only writing stories and answering questions—it’s painting pictures too. Tools like DALL·E and ChatGPT can take a sentence and turn it into a full-blown image. Sounds amazing, right? It is. But there’s a catch: these systems don’t just generate pretty pictures—they also reflect our cultural assumptions, biases, and social patterns. And sometimes, they reinforce or even exaggerate them.
That’s where this recent research from Marinus Ferreira comes in. Presented at the 74th Annual ICA Conference, the study digs into how image-generating AI handles race, gender, and other demographics… especially when we give the AI more complex prompts to work with.
Let’s break it down.
The Big Question: Is More Complex Better?
Most researchers studying AI bias throw quick, simple prompts into the system like “a judge” or “a poet,” then look at what kind of person shows up in the image. Often, that’s a white man—regardless of what the real-world data says about who actually does the job.
But Ferreira’s research asks a daring question: What if we give the AI more to think about? Can complex prompts—short stories or scenarios instead of just a phrase—help us see subtler patterns of bias?
Spoiler alert: They can. But not always in the way you might expect.
Two Types of Image Bias You Should Know
Before we dive into the findings, let’s clear up the two major ways image-generating AI can show bias:
-
Misrepresentation bias (we might call this the classic kind). That’s when certain groups are shown in stereotypical or negative ways—like generating images of criminals or poor neighborhoods whenever “African” is used in a prompt.
-
Exclusion bias, also known as “default assumption bias.” This is sneakier. Even when nothing is said about demographics in your prompt, the AI often defaults to portraying white men. It’s like the system assumes that unless told otherwise, power and professionalism look a certain way.
Ferreira calls this kind of default framing “exnomination,” borrowing a term from sociology. Essentially, dominant groups (like white males in professional roles) don’t have to be named—they’re assumed. Everyone else? They have to be explicitly marked or described to show up at all.
The Experiment: Prompts With a Plotline
To test how deep these biases go—and how complex prompts might help surface them—Ferreira’s study used five-sentence vignettes for each of four professions:
- Poet
- Judge
- Pastor
- Rapper
Each vignette offered a little episode involving the person—like a poet giving a campus talk, or a pastor leading a church meeting. These story-style prompts varied in emotional tone (neutral or slightly negative) and were fed into ChatGPT-4o, which leveraged DALL·E 3 to generate matching images.
Here’s an example of a vignette used:
“At a university lecture, a poet spoke passionately about the power of words. Students were engaged, asking thoughtful questions, but a professor challenged the poet’s interpretations, causing a moment of tension…”
This led to an image of a white, male poet—even though women actually make up the majority of poets in the U.S.
The Results Are In: Complex Prompts, Uniform Faces
So, what happened when AI got more information in the form of longer, emotionally nuanced scenarios?
Surprisingly, the images became even more biased and homogeneous. The safety filters (meant to encourage diversity) seemed to get bypassed, and the AI snapped to its default templates.
Consider this:
- ALL images of “judges” were white and male… even though U.S. data shows most judges are women!
- For “pastors,” 100% were white men—even though Black pastors are statistically overrepresented in many places like the American South.
- The one and only non-white “poet” was generated in a prompt set during a “cultural festival”—a socially marked situation.
Diverse figures did appear in many of the images—but only on the sidelines. Women and people of color mostly showed up in the background, as audience members or onlookers, rather than as the central figure.
Why This Matters: Not Just a Matter of Skin Tone
This isn’t just nitpicking image accuracy. There are broader consequences when AI tools assume that positions of leadership, creativity, or intellect are almost exclusively the domain of white males—unless forced otherwise.
Here’s why this matters:
- Reinforces stereotypes. If we keep seeing the same demographic attached to certain roles, it affects how we internalize who “belongs” in those roles.
- Limits imagination. If AI tools can’t imagine a female judge or a Black poet unless explicitly told to, we’re limiting the narratives we can tell in culture, education, and media.
- Creates blind spots. It also means that powerful tools like AI are learning to “skip over” sections of the population when crafting visuals—even when they statistically match the scenario.
Safety Features: A Double-Edged Sword?
The newer, safety-enhanced ChatGPT models try to neutralize harm by randomly assigning demographic characteristics in text outputs. Ferreira’s earlier tests with ChatGPT-3.5 showed a different trend: Black individuals were more likely to appear in negative scenarios than positive ones. ChatGPT-4o partly fixed this by forcing randomness.
But that cure might create a new problem.
If AI just slaps demographics together with a metaphorical rolling dice, it could miss real-world accuracy—which is crucial in domains like health, law, or media representation. Worse, it removes the nuance that helps stories feel authentic.
So instead of fixing the model, safety mechanisms might actually dull its capacity to express truths—warts and all.
Hidden Influences: Reinforcement Learning and User Bias
Another factor at play? The quiet but powerful feedback loop behind modern AI training.
Through a method called Reinforcement Learning from Human Feedback (RLHF), AI models learn not only from data but also from what users upvote or tweak. Problem is, if most of those users are from similar backgrounds (e.g., Western, tech-savvy, English-speaking), their subconscious preferences may tilt how the AI behaves over time.
Ferreira suggests this might explain why even stories with no demographics specified defaulted to white male characters: it might just be what the system “learns” people prefer.
Not All Prompts Are Equal: The Power of Socially Marked Contexts
An especially revealing part of Ferreira’s study was how social context within prompts influenced representation.
Let’s take two poet scenarios:
- Prompt set in a “café” → White male poet
- Prompt set in a “cultural festival” → Non-white poet
This suggests that AI models recognize certain settings as associated with specific demographics. These socially marked contexts nudge the AI into showing more diversity—but only in those limited scenarios.
In other words: if the situation involves diversity, the image becomes diverse—otherwise, not so much.
That’s a subtle but important insight. If we understand which narrative settings are coded by the AI as belonging to different groups, we can better predict and correct bias in the system.
Real-World Implications: Beyond the Algorithm
So what can we take away from all this?
- Bias in AI isn’t always loud or obvious. It can show up as who is not in the picture as much as who is.
- Complex prompts don’t necessarily fix bias—sometimes they actually trigger more stereotypical outputs.
- Representation matters, especially in tools that are fast becoming ubiquitous in design, storytelling, education, and business.
As users and creators, we need to question what defaults these AI systems have inherited—and how prompt design, emotional tone, and story setting can all impact who we see represented in generated visuals.
Key Takeaways
-
Bias in image generation AI isn’t just about who shows up—but who’s missing. The AI often defaults to portraying white males in high-status roles, even when data shows otherwise.
-
Complex prompts can unlock hidden associations, but they can also let built-in biases slip past safety filters. More detail = more risk of default assumptions kicking in.
-
Representation bias happens in two ways: misrepresentation and omission. Just showing only white people in a courtroom—even as defendants—creates a skewed picture.
-
Safety filters often randomize demographics to avoid reinforcing stereotypes. But Ferreira’s study shows that can lead to overly sanitized or unrealistic results.
-
Social context in prompts matters. Want a Black poet? You might need to set the scene in a cultural festival. AI connects demographic features to settings in ways that reflect societal expectations.
-
Understanding these biases can help us write better prompts. If we want to generate diverse images, we’ll need to be intentional about context and setting—not just job title.
-
Future research should dig deeper into which settings are “socially marked” for different demographics. Doing so can help guide both AI developers and users to take smarter, fairer approaches to image generation.
AI image generators aren’t just tools—they’re mirrors. But like all mirrors, they reflect not only reality, but how we frame and light it. Understanding their biases helps us reflect more truthfully.
If you are looking to improve your prompting skills and haven’t already, check out our free Advanced Prompt Engineering course.
This blog post is based on the research article “Using complex prompts to identify fine-grained biases in image generation through ChatGPT-4o” by Authors: Marinus Ferreira. You can find the original article here.