Why The Pudding Treats AI Like IKEA Furniture for Journalism

The Pudding by co-founder Matt Daniels (right) and his teammates have been “bringing you stories you didn’t know you needed since 2017”

When the website The Pudding asked Claude in July 2024 to brainstorm some cool story ideas it got suggestions that sounded compelling at first glance. "Female characters in video games," "fast fashion environmental impact," "emotional content in songs across decades"—topics that at first sight felt like natural fits for the publication's signature visual storytelling approach. But ultimately all of them were just hollow AI write-ups with no real stories behind them. The Pudding meticulously documented its approach and its conversation with Claude which makes for a really interesting story of its own.

I discovered this story because The Pudding is nominated for an Online News Association (ONA) Award at the annual ONA conference in New Orleans (which I am attending this week). Their nomination is for “General Excellence in Online Journalism” in the “Micro Newsroom” category.

If they win, it’s highly deserved. One of the most creative examples of visual storytelling I’ve ever seen is co-founder Matt Daniels’ recent story “NYC’s Urban Textscape”. Daniels used the database that media artist Yufeng Zhao created by extracting 138 million text snippets from 8 million Google Street View images for stunning visual storytelling: pizza word clouds laid over a map of all five boroughs, cultural enclaves defined by their words for “church” and a map of marketing phrases using “luxury”, scattered all over Manhattan and highly concentrated in Hudson Yards.

I found the Pudding’s visual data approach and especially the Claude experiment so intriguing that I talked to co-founder Matt Daniels last week about what they learned from it and how that shaped how they use AI at The Pudding.

In terms of AI for storytelling, it seems like having Claude go wild was a one-off. “These were all really basic, formulaic, underwhelming ideas, so I don't think it has a role for us," Matt told me.

What is The Pudding?

The Pudding is a digital publication known for its visual essays that explain ideas debated in culture through data journalism and rich interactive graphics. It often produces viral projects, such as AI-powered music taste analyses and explorations of cultural trends, and is recognized for using data visualization over traditional text-heavy reporting.

It was founded in 2017 by Matt Daniels, Ilia Blinderman and Russell Goldenberg. The name comes from the old adage, "the proof is in the pudding”.

The Pudding features in-depth, visually-driven stories spanning a wide range of topics: pop culture, sports, music, social issues, and more. Popular projects include interactive essays like “How Bad Is Your Streaming Music,” cultural data deep-dives, and playfully critical analyses of Spotify playlists, among others.

The Pudding operates independently from its commercial arm, the content studio Polygraph, which translates complex information into visual storytelling for brands. The Pudding and Polygraph are staffed by are staffed by 9 data journalists and developers (8 full-time, 2 part-time) who create original research and interactivity. The Pudding and Polygraph are kept apart in terms of content and editiorial approach.

The organization won a Peabody Award in 2017 and an Online Journalism Award (ONA) in 2023. It is nominated as a finalist for another ONA Award in 2025.

Claude’s story suggestions lack what separates quality journalism from content generation: specific entry points, unique angles, the intriguing elements that make readers care about broad subjects. As the authors Russell Samora and Michelle Pera-McGhee wrote: "Many are just topics and not concrete stories with a clear point, question, or dataset."

Between surface appeal and substantive journalism is where The Pudding's AI experiment showed its widest discrepan. This is actually a reassuring sign that human-made journalistic excellence will endure. AI can write compelling versions of what content marketers call copy but automated content creation is still a long way off from producing unique, compelling, deeply researched and creative stories.

Ideas That Sound Good Until You Dig Deeper

So failure was the result, but how did the team actually reach that conclusion? The Pudding conducted a methodical, transparent test of whether large language models could actually replace human editorial judgment. Samora and Pera-McGhee guided Claude through four distinct phases of their actual workflow: idea generation, data collection and analysis, storyboarding and prototyping, and development and writing. They established clear rules: minimal human intervention except for task guidance, transparent documentation of every interaction, and honest letter grades for each phase.

The results were illuminating. Claude earned a C+ overall, but that average conceals a telling pattern of degradation as complexity increased.

Phase 1: Idea Generation (Grade: C+)

Claude's initial suggestions about emotions in music seemed promising enough to greenlight. But the ideas suffered from what Matt Daniels identified in our conversation a being "formulaic" and lacking the specificity that transforms broad topics into compelling narratives. As Samora and Pera-McGhee write in their experiment documentation, "It feels like Claude struggles to generate things that humans would find intriguing."

Phase 2: Data Collection & Analysis (Grade: B-)

Here Claude demonstrated both its greatest strength and a critical weakness. It wrote functional Python scripts within minutes that would take humans much longer—a genuine productivity gain.

But speed came with a price. Claude treated "Dua Lipa & Elton John" as a single artist rather than parsing multiple names—the kind of oversight that cascades through data analysis and undermines accuracy. As Samora and Pera-McGhee write in their experiment, "It did the bare minimum, strictly following its prompts, for better or worse." When scripts broke, Claude often cycled through ineffective fixes rather than identifying root problems.

Phase 3: Storyboarding & Prototyping (Grade: B+)

Claude's strongest performance came in rapid visualization prototyping, generating charts in seconds that typically require hours. It produced solid structural outlines that mimicked The Pudding's approach, even including details like methods sections and links to imaginary GitHub repositories.

Yet the work remained what they describe as "full of safe choices, nothing that blew our minds." The fundamental difference from human workflow became clear in their analysis: "The big difference from our process is the lack of iteration. It is just trying to get things done."

Phase 4: Development & Writing (Grade: D)

Complete breakdown. When complexity increased beyond simple tasks, Claude entered what Samora and Pera-McGhee describe as "whack-a-mole" debugging - fixing one problem while creating others. For their most complex visualization, human experts had to intervene or the section would have been cut entirely.

AI Stops When Human Journalists Keep Going

The Pudding's experiment exposed what may be AI's most fundamental limitation in journalism: statistical satisfaction. While human journalists iterate obsessively - questioning results, identifying edge cases, refining approaches - Claude completed tasks and moved forward.

"Critical thinking still falls to the user," Samora and Pera-McGhee conclude in their analysis. "The effort it takes us humans to fix things like this and produce stories that cover all the bases can often take days on end. It involves questioning results, identifying edge cases, doing more research, and repeating."

Daniels' broader assessment of AI journalism capabilities is blunt. When I asked him about AI's role in ideation, he said: “It was fine if your bar for quality is very low, which is why you see some models producing a lot of content on the internet.” Claude and similar models can produce superficially appealing content very quickly, but they lack the critical dissatisfaction that drives iterative improvement. They don't question their own work or recognize when "good enough" isn't actually good enough. Any journalist who has worked with Claude can probably relate: If you confront it with its shortcomings Claude is apologetic (“You’re absolutely right”), but then comes up with the next mediocre step, unless prompted very clearly.

Technical Tasks Show AI's True Strength

While editorial AI disappointed, The Pudding identified genuinely valuable AI non-creative applications: "I think the biggest impact it's having is on the technical side. I can't think of a programmer that hasn't started debugging their work and using large language models,” Daniels told me.

The Pudding team has adopted similar approaches: using AI for helper scripts to process images, generating headline options, idea brainstorming (with heavy human filtering), resolving Git problems, and identifying names in large text datasets. As Samora and Pera-McGhee note in their conclusion, they've also continued using AI for rapid chart prototyping, which they found "pretty amazing, reducing sometimes hour-long tasks (for us) to seconds."

The key insight from their experiment: AI works best as what they call an "editor" model. "We see ourselves more as editors of the small tasks we outsource to AI," they write. "We know what the output should look like, and how to adjust things as needed."

This mirrors The Pudding's editorial approach more broadly. Unlike major newsrooms like the New York Times with strict approved-tool policies, Daniels told me they allow editorial discretion: "It just depends on where are the challenges that this author is facing and where it makes sense to rely on a specific model."

IKEA Versus Artisan Furniture in Digital Journalism

Samora and Pera-McGhee concluded their experiment with a metaphor that captures the broader implications: "It's sort of like comparing a woodworking artisan's table to one from IKEA. The artisans invest immense time and effort into their high-quality pieces, while IKEA produces things quickly and cheaply, and most people probably can't tell the difference (or don't care). Which is kind of sad for us artisans."

The analogy illuminates the challenge facing quality journalism in an age of AI content proliferation. While automated content floods the internet, publications maintaining human editorial standards may find themselves in a premium position - but only if audiences recognize and value the difference.

The Pudding's systematic test suggests current AI cannot replicate the underlying thought processes that create lasting journalistic impact. Claude earned decent grades on individual tasks but failed at the recursive problem-solving that defines editorial work. It produced technically functional content that was editorially hollow.

As Daniels told me, the question revolves around professional standards. "If your goal is to find good ideas, it's not quite there yet, in terms of actually being an idea partner."

Lessons for Journalists and News Publishers

The Pudding's experiment offers concrete guidance for newsrooms considering AI integration:

Test AI systematically before committing resources. The Pudding's phase-by-phase evaluation revealed AI's specific strengths and weaknesses.
Focus AI tools on technical tasks, not editorial judgment. Debugging code, processing data, and generating headline options show genuine productivity gains. Story ideation and complex analysis remain human domains.
Transparency builds trust during AI experimentation. The Pudding discloses all AI assistance in their articles. This approach maintains reader credibility while allowing innovation.
Quality-based business models resist AI content pressure. When editorial reputation drives revenue, the economic incentives favor human oversight over automated production.