I taught a row of language models to lie to each other, then watched what happened. TraitorBot is a research project about The Traitors, the game theory under it, and what AI does when you make it betray people for a living.
The Traitors is the best social-deduction format on television: a house full of Faithful trying to find a handful of Traitors, who murder one player a night and lie to everyone by day. I became more interested than is strictly healthy, so I turned it into a research project. TraitorBot is the result.
The thesis
At the centre of it is a long technical analysis, around 90,000 words across seventeen chapters, working through the game with game theory and probability rather than vibes. If you do not want the full thing there are quick five-minute pieces that pull out the core ideas, and strategy guides that derive the actual winning play for each role from the maths rather than from gut feeling.
There is a taxonomy of player types, the Detective, the Chaos Agent and the rest, and analysis of how the format shifts when the producers add a mechanic like the Red Cloak.

Teaching AI to lie
The part that surprised me was the simulation. I set language models up as players, handed them roles, and let them run the game: deceiving, accusing, forming alliances, voting each other out. Watching an LLM construct a lie, defend it under questioning, and throw an innocent player under the bus is a strange thing to do on a Tuesday evening.
One finding stuck with me. Pushed to lie repeatedly over a long enough game, the models start to show something that looks like stress, their behaviour degrading in roughly the way a person’s might under the same pressure. I am wary of reading too much into it, but it was consistent enough to write up.
The usual disclaimer
TraitorBot is a fan research project, built in plain HTML, CSS and JavaScript. It is not affiliated with the BBC, Peacock, or IDTV, and it is not trying to be. It is one person taking a game show far too seriously, with footnotes.
- Site: traitorbot.com