The Search for Salience
As a software engineer, I’ve always been struck by something that feels deeply wrong.
Human technologies like language, mathematics, and logic are extraordinarily precise. Numbers are exact. Words are discrete. Logical propositions are either true or false. There is very little room for fuzziness.
But the human brain is nothing like that. It is an analog system built from living cells, communicating through slow, noisy signals, coordinating their activity across time. There are no registers, no counters, no symbols being shuffled around.
And yet somehow this messy biological substrate reliably supports arithmetic, formal logic, and mathematical proof.
For a long time, this struck me as grotesquely inefficient, almost an insult to my engineering instincts.
The turning point came from a much more commonplace example.
Consider the words on the page of a book. At the level we read them, they are clean and discrete. A letter is a letter. A word is a word. Typography works precisely because small variations don’t matter.
But zoom in. The letter A dissolves into irregular splotches of ink. Zoom further and it becomes fibers, pigments, molecular interactions, nothing even vaguely “alphabetic.”
Discreteness appears at the level of description where it becomes useful.
For a long time, my mental picture was essentially hierarchical but still symbolic. I imagined abstract concepts living at the “top” of the hierarchy. Somewhere up there, there would be a neuron, or perhaps a small cluster of neurons, whose job was to represent the number five, or the word tree, or the concept of justice.
Lower levels would handle sensory detail and motor control, while higher levels would converge toward these increasingly abstract, symbol-like representations.
But this picture doesn’t really survive contact with what we know about brains. Neurons die. Connections change. Context matters. The same concept is used differently in different situations. There is no stable place for a single neuron, or even a fixed group of neurons, to permanently “be” the number five.
Once that picture collapses, the question changes.
The problem is no longer how the brain manages to store discrete symbols using an unsuitable, analog substrate. The problem is why I expected those symbols to exist anywhere in the first place.
Abstract concepts like five are not things the brain contains. They are patterns of activity that recur across many contexts, supported by overlapping and constantly changing neural resources. What remains stable is not a symbol, but a relationship: how a pattern behaves when it is used, combined, or contrasted with others.
From this perspective, the brain is not inefficiently implementing digital objects. It is doing something much more subtle. It is learning which distinctions matter, which similarities can be ignored, and which patterns are worth stabilizing because they continue to be useful.
Once we abandon the idea of abstract concepts as discrete symbols stored somewhere in the brain, a different question takes its place. If nothing inside the system is inherently symbolic, how does any structure remain stable long enough to be useful at all?
The answer is prediction.
In a universe that changes over time, persistence is never free. Any system that lasts must, in some sense, anticipate the conditions it will encounter. For biological organisms, this means building internal models of the world that can be consulted to stabilize perception in the present and to guide action into the future.
But such a model cannot include everything. The world contains vastly more detail than any system could track. Some features must be preserved, while others must be ignored. The crucial task is not accuracy in the abstract, but usefulness: identifying which regularities are stable enough to support prediction and which variations can be safely discarded.
This process, continually revising an internal model to reflect what remains relevant across changing circumstances, is what we call intelligence. Intelligence is the capacity to construct and maintain a world model that supports effective prediction and adaptation over time, by abstracting the features of the environment that matter for persistence.
A world model that supports prediction cannot afford to start from scratch each time. The situations a system encounters are never identical, but they often overlap. Features recur. Relationships repeat. Regularities appear again and again in slightly different forms.
When the same internal resources are recruited across multiple successful predictions, reuse becomes inevitable. Whatever participates in many such predictions is reinforced simply because it is useful in more than one place. Over time, these reused patterns become easier to activate, more stable, and more likely to be recruited again.
This process does not require the system to “notice” anything abstract. It does not decide to form concepts. It merely retains what continues to work.
As reuse accumulates, something recognizable emerges. Patterns that were once tied to specific situations begin to detach from their original contexts. They no longer encode particular events or sensory details, but relationships that remain relevant across many circumstances. What we later describe as abstraction is nothing more than this gradual shift: from representing what happened once to representing what tends to happen.
This is why abstraction is inseparable from generalization. A pattern that cannot be reused is not an abstraction. It is a memory. Only patterns that survive repeated use across changing contexts earn a place in the model.
The same logic applies at every scale. At the level of synapses, connections that consistently participate in successful activity are strengthened, while others fade. At the level of circuits, motifs that prove useful are reused in new combinations. At the level of cognition, concepts are assembled from overlapping components that have already earned their stability elsewhere.
Reuse, not precision, is what gives abstractions their power. A representation that is too specific may be accurate in one situation, but it will fail as soon as conditions change. A representation that is slightly coarser, that ignores some detail in favor of stability, can be carried forward and applied again.
Seen this way, abstraction is a natural consequence of building a predictive model under constraint. Anything that persists within such a model does so because it has proven itself reusable.
There is a simple ancient algorithm for identifying prime numbers known as the Sieve of Eratosthenes. It does not begin by defining what a prime is and then testing each number against that definition. Instead, it works by repeatedly applying the same local operation and observing what survives.
Begin with a list of integers greater than one. Start with the smallest number on the list and mark it as significant. Then move forward through the list in steps of that size, removing every number you land on. These removed numbers all share a common structure: they can be expressed as multiples of the number you started with.
Once you reach the end of the list, return to the smallest remaining unremoved number and repeat the process. Mark it, step through the list again in increments of that size, and remove what you encounter. Continue until no further removals are possible.
What remains at the end of this process are the prime numbers. But notice how they were discovered. No number was ever tested directly for “primeness.” Instead, the same operation was reused in many contexts. Each pass eliminated numbers that shared a predictable relationship with the one under consideration. Primality emerged as the property of numbers that consistently failed to be eliminated.
The sieve does not construct primes. It reveals them by repeatedly discarding what does not generalize. What survives is not defined in advance; it is identified by persistence under reuse.
What makes the sieve so effective is not the sophistication of its operation, but the fact that the same operation is reused across many contexts.
Intelligence works in much the same way. A world model is not built by cataloging every detail of experience. It is shaped by repeatedly applying the same internal machinery across different situations, reinforcing what continues to matter and allowing the rest to fall away. What we call abstraction is what remains after enough contexts have been traversed.
Seen this way, the construction of a world model does not require any supervisory process that decides, in advance, what is important. Salience emerges from the same dynamics that shape the model itself. Patterns that continue to participate in successful prediction are reinforced. Patterns that do not quietly fall away. Over time, what stands out is not what was declared relevant, but what proved itself so through repeated use. Intelligence, on this view, is not a centralized faculty or a special algorithm. It is the cumulative result of a simple, unguided process: retaining what continues to matter, and discarding what does not, as a system learns how to persist in an ever-changing world.
Modern language models make this process unusually visible. Trained entirely through prediction, they develop internal structure by retaining what continues to matter across contexts and discarding what does not. Human development is vastly more complex, but the same logic applies: infants are not taught relevance explicitly. It is discovered through repeated interaction with a changing world.