How Many Tokens Define a Human Life?

June 14, 2026

Machine learning papers love to compare context windows to books, codebases, or the internet. 1 million tokens roughly covers the entire Harry Potter series. 10 million tokens is a code repository. A trillion tokens is a training corpus.

The comparison I find more interesting is a human lifetime.

Think for a moment: how many tokens do you think you have processed in your lifetime? And more interestingly, is it tractable for a language model to model your life experience as represented by a contiguous stream of tokens?

For this thought experiment, I do not want to count images or sounds. After all, people without the ability to see or hear still process the world in equivalently human ways. I want to count the discrete symbolic traces that define our core of our conscious being: words heard, seen, said, written, and thought.

The rough answer is smaller than I expected:

For a language-heavy knowledge worker, a reasonable upper-bound estimate is roughly 381,000 tokens/day.

Over 30 adult-equivalent years, that is about 4 billion tokens. A quieter life can plausibly be much lower.

Why does this matter? Well, if a life is measured in memories, it might one day be modelable in tokens. While few would argue that a raw transcript defines a person, many would agree that our life experiences and context are what separate us from language models. As long-context pretraining and active distillation research continues, we might have to adjust our priors on what it means to be human.

Defining the Question

Let's start with a toy version of the problem. Suppose I gave you a compressed day in the life of a software engineer:

morning:
  read issue, repro steps, stack trace
  skimmed source and wrote a patch
  exchanged slack messages
  thought through failure case

afternoon:
  reviewed PR comments
  watched a short talk
  read docs
  rehearsed next debugging step
...

If this log were expanded into words, how big would it be? Not how meaningful would it be, not how much of it would be remembered, and not how much information is latent in the physical state of the person. Just how many text-like tokens pass through the system, as this forms an upper bound for point-in-time retrievable context.

Obviously, more goes on in the head than this. We form latent thoughts, spatial intuitions, emotional associations, and mental maps that are not tokenizable in the same way words and images currently are. But for this comparison, what matters is the stream of discrete ideas that survive latent processing and become expressible as words.

For English, the usual tokenizer approximation is:

1 token ~= 4 characters ~= 0.75 words
1 word ~= 1.33 tokens

Daily Estimate

The estimate comes from three dominant buckets: external media, conversation, and inner speech. A modern media day mixes text, audio, captions, docs, code, Slack, TV, music, and short-form video. So I model the media budget directly.

symbolic transcript = external media
                    + conversation
                    + inner speech

The cleanest daily estimate is:

External media: ~197,000 tokens/day
Conversation: ~51,000 tokens/day
Inner speech: ~133,000 tokens/day
Total: ~381,000 tokens/day
30 adult-equivalent years: ~4 billion tokens

This is not a confidence interval. It is an activity-budget estimate. The point is to make each term legible enough that you can swap in your own day and see how much the answer moves.

External Media

In early 2008, the UCSD Measuring Consumer Information study attempted to answer this exact question and estimated that the average American received about 100,500 media words/day outside work, or roughly 134,000 tokens/day. That included TV, radio, print, phone, computer use, games, recorded music, and movies. It is a useful sanity check, but it predates the current shape of social video, always-on messaging, and modern screen work.

For the version of the question I care about here, I want to model a language-heavy knowledge worker, for example a software engineer. So I build the external budget from current activity estimates:

TV / movies:
  2.6 hours/day * 60 * 150 wpm ~= 23,000 words/day

work text:
  4.7 hours/day * 60 * 260 wpm ~= 73,000 words/day

music:
  20.7 hours/week * 60 / 7 * 147 wpm ~= 26,000 words/day

leisure reading:
  17 minutes/day * 238 wpm ~= 4,000 words/day

social video:
  141 minutes/day * 150 wpm ~= 21,000 words/day

external media total ~= 148,000 words/day
148,000 words/day * 1.33 ~= 197,000 tokens/day

A few notes on those constants:

The Bureau of Labor Statistics reports 2.6 hours/day of TV for U.S. adults in the 2024 American Time Use Survey. For TV and social video, I use 150 wpm as a normal spoken-language rate.
For work text, I use 260 wpm from the adult silent-reading meta-analysis and 4.7 hours/day from a workplace-reading study of office workers. That is still aggressive for code and dense docs; the point is to model the kind of day where a coder is continuously reading issues, code, compiler errors, PR comments, docs, chat, and search results.
As for music, the IFPI reports 20.7 hours/week of music engagement, and a lyrics study found recent hip-hop/rap averages around 147 words/minute.
For leisure reading, BLS reports about 17 minutes/day, and the reading-rate meta-analysis gives about 238 wpm for nonfiction.
The social line uses DataReportal/GWI's 2025 mean of 2 hours 21 minutes/day on social media and treats the feed as speech-heavy video at 150 wpm.

Conversation

The daily speech estimate comes from studies using the Electronically Activated Recorder (EAR): participants wear a device during normal life, it periodically records short snippets of ambient audio, and researchers later code the sampled recordings. That matters because people are not very good at estimating how much they talk. A large registered replication of daily word use pooled 2,197 participants from 22 samples using this kind of naturalistic audio sampling and estimated 12,792 spoken words/day with a standard deviation of 9,154.

The EAR study measures spoken output, not the full back-and-forth of conversation. Since conversation includes the words a person hears from other people, I multiply the spoken average by three to estimate total conversational language throughput:

conversation_words_per_day = 12,792 * 3
                           ~= 38,000 words/day

conversation_tokens_per_day = 38,000 * 1.33
                            ~= 51,000 tokens/day

Inner Speech

Inner speech is the hardest term to measure. The frequency estimate comes from Descriptive Experience Sampling (DES). In DES, participants carry a random beeper in everyday life. When it beeps, they note what was in experience immediately before the beep. They are then interviewed in detail, usually soon afterward, to distinguish inner speaking from visual imagery, unsymbolized thought, feelings, remembered recent speech, or a general sense of "I was thinking."

Across the studies discussed by the UNLV inner-experience group, inner speaking appears in roughly 20-23% of sampled moments, with large individual differences across participants. I use 23% as the empirical frequency anchor.

For the rate of active inner speech, I use an expanded-inner-speech experiment that measured inner speech in syllables/second; converted with an English word-length corpus estimate, that is about 260 words/minute. Combining the DES frequency estimate with 16 waking hours gives:

inner_speech_words_per_day = 16 hours/day * 60 * 0.23 * 260
                           ~= 57,000 words/day

inner_speech_tokens_per_day = 57,000 * 1.33
                            ~= 76,000 tokens/day

For a language-heavy upper bound, I use 40% of waking time at the same measured inner-speech rate:

upper_inner_speech_words_per_day = 16 hours/day * 60 * 0.40 * 260
                                 ~= 100,000 words/day

upper_inner_speech_tokens_per_day = 100,000 * 1.33
                                  ~= 133,000 tokens/day

Putting It All Together

Using the point estimates:

external media: ~197,000 tokens/day
conversation:    ~51,000 tokens/day
inner speech:    ~133,000 tokens/day

total:          ~381,000 tokens/day

381,000 tokens/day * 365 * 30 ~= 4.2B tokens

So the clean point estimate for a language-heavy knowledge worker is about 4 billion tokens per 30 adult-equivalent years.

The useful thing about this framing is that the answer is easy to audit. If you think 4.7 hours of work text at 260 wpm is too high, cut that term in half and the total drops by about 49,000 tokens/day. If you think the social feed is mostly silent images, cut the social term and the total drops by about 28,000 tokens/day. If you think internal speech is much more frequent than DES suggests, the inner speech term is the next obvious place to move the estimate.

Calculator

Here is the same arithmetic as a small calculator. I included it mostly because it is easy to lose intuition for the scale once the units switch from days to decades.

tokens / day 381,000

years 30

total ~= 4.2B tokens

Why Not Pixels?

The obvious objection is that words are not the whole input stream. Humans see, hear, touch, move, and feel. If you naively counted raw visual or auditory bandwidth, the number would be much larger.

But raw sensory bandwidth is not the thing I am trying to measure. People without the ability to see or hear still think in fully human ways. Their conscious lives are not missing the essence of human cognition because one sensory channel is absent. That is evidence that the part of experience most relevant to thought is not raw pixels or sound waves themselves, but the structured representations we build from them.

Language is not all of those representations, but it is the most portable and shared one. It is how we name things, explain them, remember them, plan around them, and pass them to other minds. For this comparison, I care about that symbolic layer: words read, heard, spoken, written, and internally rehearsed. Conveniently, it is also the layer language models are trained to process.

Why You Should Care

We already have models that can operate over million-token context windows. Dario Amodei has said that 100 million words of context is not really blocked by ML fundamentals so much as inference cost. If that is roughly right, then the scale changes completely.

At the estimate above, 100 million tokens is months of symbolic life. A few billion tokens is a 30-year symbolic life. That does not mean a model has a body, a childhood, or human consciousness. But it does mean that the durable text-like residue of a life -- what someone read, heard, said, wrote, rehearsed, decided, and explained -- is no longer an obviously impossible amount of context. We're entering a world where your LLM can experience a contiguous human lifetime of context in a matter of days.

Ilya Sutskever tweeted in 2023: "If you value intelligence above all other human qualities, you're gonna have a bad time." The same may be true of context. If the distinction you care about is that humans have vastly more lived context than models, that distinction may not survive long-context scaling.

The obvious objection is context rot due to the high-depth composition of thoughts at long context. A fixed-depth transformer can hold the tokens without necessarily connecting them. In my Divide and Conquer post, I show that dependency resolution can in principle scale exponentially with layer count; assuming a base of 2, 32 layers gives 2^32 ~= 4.3B. As shown in that blog, current models may still only make on the order of ~10 semantic hops per generated token, but longer-context pretraining or dynamic-depth architectures like HRM and Sakana's DiffusionBlocks will likely just solve this problem.

So what do we do in the face of this fact: that LLMs will soon experience orders of magnitude more token-years than us? When long context is a mechanized commodity, isolating the right context to consume and understand deeply is where human ingenuity has room to thrive. Isaac Newton began developing calculus around age 22, which is about 3.1 billion tokens under the rough upper-bound arithmetic above. Mary Shelley began Frankenstein at 18, perhaps 2.5 billion tokens, and finished it at 19. The scarce thing was never raw context. It was the taste, compression, and courage to turn a small, finite slice of experience into insight large enough to matter.

Sources

Token conversion and scale examples: OpenAI token-counting guidance; Harry Potter word count; CPython source listing. OpenAI tokens; Harry Potter; Debian Sources CPython listing
External media estimate: UCSD media-word estimate; BLS American Time Use Survey TV time; spoken-language rate; workplace-reading estimate; adult silent-reading meta-analysis; IFPI music engagement; lyrics words/minute by genre; BLS leisure-reading time; DataReportal/GWI social-media time. UCSD media words; BLS time use; speaking rate; workplace reading; reading-rate meta-analysis; IFPI music; lyrics by genre; BLS reading; DataReportal social media
Conversation estimate: Large registered replication of daily word use using naturalistic audio sampling; original daily-speech paper. registered replication; original paper
Inner speech estimate: UNLV Descriptive Experience Sampling work; individual-difference discussion; expanded inner-speech rate experiment; English word-length corpus estimate. DES paper; individual differences; expanded inner-speech rate; word length
Long context and context depth: Anthropic context-window documentation; Dario Amodei interview transcript; Ilya Sutskever tweet; divide-and-conquer context-depth argument; HRM and DiffusionBlocks dynamic-depth references. Anthropic context windows; Dario transcript; Ilya tweet; Divide and Conquer; HRM; DiffusionBlocks
Historical examples: Newton's calculus timeline; Mary Shelley's Frankenstein timeline. Britannica Newton; Isaac Newton Institute; Britannica Shelley; New Yorker Shelley