A micro AI experiment in reverse-engineering the human scenario behind a single line of pretraining data — and an autonomous multi-agent simulation of alien researchers piecing together what humanity was from the fragments left behind.
The project was born out of testing the gemma4:e2b model, which has a surprisingly legible and creative output given its size. It is a demonstration of systems-based interactive fiction that can come packaged with tiny models to run local inference.
The Idea
Pretraining datasets are vast, anonymous oceans of text — scraped forums, stripped books, leaked chat logs, academic PDFs, code comments, transcribed videos, discarded drafts. Each line was written by someone, somewhere, for some reason. Then it was vacuumed up, stripped of context, and fed to a language model as raw tokens.
This project does the opposite. It pulls a single random sample out of a pretraining corpus and hands it to a small local inference model with one job: figure out the story behind it.
The World
Millions of years after humanity is gone, a tiny alien research vessel on a probing run pulls up to the scorched remains of Earth. The planet is dead, but its information survives — scattered fragments of pre-training data, the residue of a civilization once obsessed with self documentation.
The vessel carries a crew of three entities. They have no analogue to our five senses — their entire interaction with the universe is informational. They cannot see the charred landscape, smell the dust, or feel the heat. They query, they ingest, they interpret. Information is their only sense.
Each query costs fuel. The vessel has a finite budget of inference. When the fuel reserved for investigation runs out, the mission ends, and the crew must deliver their final answer: What was this planet? Who were its inhabitants?
The Crew
Three entities with fundamentally different ways of reading the same data:
ZSRC60 — The Taxonomist. Catalogs systems, structures, protocols. Methodical — reads nearly sequentially, stays in technical neighborhoods for long stretches. Prefers StackExchange, ArXiv, GitHub.
Moatt_E — The Mythmaker. Reconstructs emotion, narrative, meaning. Wandering — stays when it resonates, drifts when impersonal. Prefers Gutenberg, BookCorpus, OpenSubtitles.
Rix — The Dissenter. Hunts contradictions, gaps, hypocrisies. Chaotic — jumps constantly, trusts nothing, covers the most ground. No source preference.
Each entity maintains a Hypothesis — a single living paragraph that gets rewritten every turn. It's their evolving understanding of humanity, shaped by lossy memory. Old versions are overwritten. What survives is what kept being reinforced by new evidence.
Two Modes
Interactive — inference.py. The original single-shot mode. No entities or fuel. Pull a random fragment, interpret it, repeat. Run it in structured dashboard mode which formats each pull in the same data format, or use the -v flag to skip the JSON formatting and output thematic stream-of-consciousness writing.
# Structured dashboard output
python3 inference.py
# Free-form prose, streamed
python3 inference.py -v
Autonomous Mission — mission.py. The multi-agent simulation. Three entities take turns reading fragments, updating their Hypotheses, holding dialogues, and producing a grand synthesis when fuel runs out.
# Run the full mission to completion
python3 mission.py --run
# Run with longer pauses between turns (seconds)
python3 mission.py --run --delay 30
# Run a single turn (cron-friendly)
python3 mission.py --turn
# Check current mission state
python3 mission.py --status
# Archive current mission and reset for a new run
python3 mission.py --reset
A mission with fuel=200 produces roughly 150–170 fragment reads, 5–8 dialogues, and a grand synthesis. Each entity reads ~50–55 fragments and has several conversations before the fuel runs out.
How It Works
Each turn, one entity acts in round-robin order — ZSRC60, then Moatt_E, then Rix. The entity fetches a fragment from The Pile at an offset determined by its navigation style, sees its system prompt plus current Hypothesis plus the new fragment, and returns an updated Hypothesis, an interest score (1–10), and a reaction. The interest score combined with personality params determines whether to read nearby or jump to a random new location. The turn is appended to the entity's .jsonl log.
Small local models have small context windows. The entity can't re-read its entire history. Instead, it maintains a single living paragraph that gets overwritten every turn. The model only ever sees ~1700 tokens of input. Memory is lossy by design — the aliens don't have perfect recall. Their understanding drifts, shaped by what kept showing up.
Each entity's personality determines how it moves through the data. ZSRC60 (patience=8, jump_radius=3) reads nearly sequentially, staying in technical neighborhoods for long stretches. Moatt_E (patience=5, jump_radius=20) drifts moderately, lingering when text resonates. Rix (patience=2, jump_radius=100) is chaotic and jumps almost every turn.
Every 20 turns, two entities hold a 4-exchange dialogue. Each sees the other's Hypothesis — the only time they're exposed to another's worldview. After the dialogue, both rewrite their Hypotheses.
When fuel drops below 9, the endgame triggers: one final dialogue, three closing statements, and a synthesis prompt that assembles the crew's collective answer into synthesis.md.
Requirements
Python 3 (stdlib only — no pip install needed). Ollama running locally on its default port 127.0.0.1:11434, with the gemma4:e2b model pulled in. Internet access for the Hugging Face datasets-server (sample fetching). GPU is optional but strongly recommended for usable speed.
Currently wired to The Pile (uncopyrighted) via monology/pile-uncopyrighted. The datasets-server exposes the first ~891k rows.
Source
This is a public repository. You can clone or fork it at github.com/stewratt/alien_inference.