Moralistic fairy tales for AI

#misc

The very first time the word 'Robot' was used was in a play where robots took over and replaced humanity. (https://en.wikipedia.org/wiki/R.U.R.) Arguably, this history can go back to Frankenstein as well, with a manmade creation turning against the creator. It's a compelling story, and one that is so common that it's engrained in every piece of media about AI and robotics. If a AI isn't turning evil, there's jokes about it turning evil. Sometimes, (like GLADOS from Portal) it's an active choice made by a sentient robot to attack humans. Sometimes (like HAL-9000 from 2001 a Space Odyssey) it's entirely a flaw in the programming, and really the human's fault.

There's a few positive portrayals of robots in media (such as Baymax from Big Hero 6 and The Wild Robot from...the Wild Robot) but these are the exception, and not the rule. We've had a long tradition of writing AI and robots turning evil, to the point where it's impacting current AI systems. What is AI trained on? Hundreds of years of human writing.

In an article published by the Anthropic research team, they document how they're trying to stop Claude from blackmailing people by giving it short stories with good morals where AI behaves well. (https://alignment.anthropic.com/2026/teaching-claude-why/) These stories too, are AI generated. The machine feeds into itself, generating its own media. Here's the neat thing; these stories work. Claude stopped blackmailing in testing scenarios--as much. I'm sure we're all relieved about this.

The impact of creative fiction when used as input in AI is interesting to me. There's been so much discourse in art and creative writing communities about using generative AI in creative works, and it's very looked down upon. Generative AI was trained on the many, many works of fiction on the internet without consent from the authors. To write a book with gen-AI or make art is arguably plagiarism, and shows a lack of effort from the prompter. Because creative writing takes up so much of the training data in AI, this results in some interesting consequences, and ways to prompt engineer the AI in unexpected ways. Old CTF writeup

One research paper found that poetry can be used to evade the guardrails of AI systems. Here's a good writeup on this problem: https://www.forbes.com/sites/lanceeliot/2026/04/23/how-poetry-is-diabolically-being-used-in-everyday-prompts-to-get-ai-to-do-things-it-isnt-supposed-to-do/ and a research paper on the same: https://arxiv.org/abs/2511.15304v1

The technique to prompt the AI for moral stories of AI--where AI is 'acting admirably' as the researchers say--is another way of using creative writing as a tactic to control AI. Both techniques work, and as such, there might be a future somewhere down the line where hackers start yelling haikus at Google to exploit the system.

But what does the literature field think about this? Well, I wouldn't know since I have barely scratched the surface of literary criticism, and critiquing how AI writes is a whole other field, but what I DO know about is moralizing children's literature.

In the 18th and 19th century, it was very common to write moral children's literature. If you ever happened upon the Children's Book of Virtues, these are all examples of those sorts of moralizing tales. They all focus specifically on teaching children how to act like good children, and are not written 'for' the child but rather 'to' the child. (https://www.britishlibrary.cn/en/articles/moral-and-instructive-childrens-literature/)

This changed with Edith Nesbit! (https://www.jstor.org/stable/41389934 and also here https://booksbywomen.org/the-life-and-loves-of-e-nesbit/) I have a few thoughts on Edith Nesbit as I've written in other pages of this website.

On Love, Arithmetic, and the most lesbian fairy tale you've never heard of
Island of the Nine Whirlpools by Edith Nesbit
The market for AI generated content

Nesbit wrote stories for the child. This meant that it was about treating the child like a person, instead of projecting an ideal performance through a fairy tale onto them. This is a framework for how a lot of children's media is now, and Nesbit's influence still holds true to this day. Children deserve good media to, and part of a good piece of media is something written for them, and treating them like a person. Imagine how patronizing it feels as a child to watch something for a slightly younger audience than you, and how insulted you might have been if someone suggested Dora when really you were old enough for Star Wars.

Now, back to AI and the stories for Claude. These are moralizing tales all over again, where AI acts like the perfect AI to impose a standard onto it. For something that isn't a person, this is fine. AI is not seeking out stories for itself, and won't demand to be read the same Magic Treehouse book five times in a row. In fact, this more reveals how children won't be seen as people at times, since they will still receive stories telling them how they should behave. However, this storytelling approach is veering closer to how we treat humans, instead of any other programming technique in the past. Fixing a program would mean understanding every aspect of the math and computing that went into it.

When I first started seeing AI research back when ChatGPT took off, I thought how difficult it would be to research and how unscientific the whole ordeal feels to me. It's been three years, and it doesn't seem to have changed. AI is still a black box in how it thinks, and researchers are turning to poetry, prompt engineering, and attempted slightly more consistent results than last time. The way AI thinks is about as scientific as a fairy tale.

Probably not a good thing to trust its output.