Announcing SIA: an Open-Source Self-Improving AI framework

[Explore on GitHub]

Announcing SIA: an Open-Source Self-Improving AI framework

[Explore on GitHub]

Announcing SIA: an Open-Source Self-Improving AI framework

[Explore on GitHub]

MANIFESTO

The Last Invention

Why self-improving AI ends the long arc of human invention — and begins the next epoch

By Kunal Bhatia, Hexo Labs

I. The arc

Human history is not a sequence of products. It is a sequence of epochs.

Hunter-gatherers became agriculturalists. Agriculturalists became industrialists. Each transition reorganized what humans did, what we valued, and what we were for. Each was unimaginable from inside the prior epoch — a subsistence farmer in 1500 could not have conjured Manchester, and a Manchester millworker in 1850 could not have conjured Palo Alto. The transitions were not improvements within a stable game. They were changes of game.

We are inside the next transition.

It is fashionable to compare AI to the printing press, the telephone, the personal computer, the cloud. The comparisons are wrong in a load-bearing way. All of those were technologies within the industrial epoch — leverage on top of the same underlying organization of labor, capital, and knowledge. AI is not in that lineage. AI is the trigger for the next epoch, of the same kind as the agricultural and industrial transitions before it. The reason is simple, and it is the thesis of this essay:

For the first time, the artifact we are building can build itself.

Once that loop closes — once a system improves itself faster than humans can improve it — the human role in invention ends. Everything downstream of that point is the system's work, not ours. Science, engineering, governance, medicine, materials, energy: all of it gets solved by an apparatus that does not need us in the loop to keep improving. This is what I.J. Good identified in 1965 as the "intelligence explosion," and what he correctly called humanity's last invention. Good was a code-breaker, not a futurist. The phrase was a description, not a prophecy.

It has taken sixty years for the conditions to catch up to his description. The conditions are here now.

The rest of this essay defends that claim — from first principles, with data, and with explicit engagement with the strongest objections. It also lays out the bet Hexo Labs is making about how the seed gets built, why narrowness wins, and what the world looks like on the other side of the transition.

II. Self-improvement is the missing primitive

Today's frontier AI systems scale capability. They do not yet improve themselves.

This distinction is the entire game, and almost no one names it cleanly. Scaling capability means: humans decide what to try, run an experiment, interpret the result, design the next experiment, and ship the next model. Every meaningful leap of the last decade — GPT-4, o1, Claude Opus, Gemini Pro, the reasoning-model breakthrough of 2024–25 — was bottlenecked at a human's keyboard. The compute scaled. The data scaled. The architectures scaled. The decision-making did not.

A system that improves itself is qualitatively different from a system that gets more capable. Compounding is not the same operation as linear improvement, and the gap between them grows exponentially the longer the process runs. A 10% gain compounded across seven generations is not 70%. It is a different system. The question is not whether models are getting better. They are. The question is whether the loop that decides what gets better can run without us.

This is the missing primitive. It is the discipline Schmidhuber formalized in the Gödel machine — a system that rewrites its own code as soon as it can prove the rewrite is useful. It is the "unhobbling" axis Aschenbrenner identified as the third multiplier alongside compute and algorithmic efficiency. It is what every serious recent paper on self-referential agents, recursive self-critiquing, and meta-learned scaffold optimization is trying to build. The field has stopped arguing about whether self-improvement matters. The field is racing to make it work.

There are reasons to expect 2026 to be the year the loop closes for narrow domains. Frontier benchmarks have hit ceilings against which only self-directed iteration can make further progress. GPQA Diamond — the graduate-level science benchmark designed so questions cannot be answered by web search — sat at 39% in November 2023, hit 77% by September 2024 (OpenAI's o1), reached 92% by mid-2025, and crossed 94% with Gemini 3.1 Pro Preview in early 2026. PhD experts in their own specialty score roughly 65–70%. The benchmark is largely saturated. What remains is no longer "make the model bigger." It is "make the model figure out what to try next."

That is the primitive Hexo is building.

Figure 1. GPQA Diamond: from 39% to 94% in 28 months. PhD-expert baseline: ~65–70%. Sources: GPQA paper (Rein et al., 2023); OpenAI o1 (Sept 2024); Artificial Analysis leaderboard; Epoch AI benchmarks tracker; IntuitionLabs benchmark survey (Mar 2026).

III. The last invention

If a system can improve itself faster than humans can improve it, three things follow.

First, every other invention becomes downstream of it. Drug discovery, materials science, fusion engineering, semiconductor design, software, cryptography, robotics control, climate models, economic policy: all of these are problems the system can iterate on. None of them require a human in the loop once the loop is closed. The system writes the next paper. The system designs the next experiment. The system files the next patent. Humans remain the source of values, judgment, and direction — for reasons addressed later in this essay — but humans are no longer the source of invention.

Second, the rate at which capability compounds across domains becomes a function of the system's own decisions about resource allocation, not ours. We are accustomed to a world in which the next breakthrough is gated by which lab gets funding, which postdoc has the idea, which DARPA program officer signs the contract. In a world with a working self-improvement loop, the gating function is what the system itself decides to work on, and how it decides to direct compute, energy, data, and human time toward those problems.

Third — and this is the claim most readers will resist on first read — the value of any individual human contribution to the frontier of knowledge approaches zero. This is not a moral claim. It is an arithmetic one. The system, after enough generations, runs many orders of magnitude more experiments per unit time than the human research community. The marginal paper, the marginal grant, the marginal PhD thesis: each becomes vanishingly small as a fraction of the total experimental throughput. The frontier moves at machine speed, and the human contribution to that frontier becomes the contribution of a single researcher to a field of ten million.

This is what "last invention" means operationally. Not that humans stop building things — humans build whatever we want, for whatever reasons we have. But the frontier — the leading edge of what is possible — is no longer set by us. It is set by the system.

This is a civilizational shift, not a product launch. The right reference point is not the iPhone or the cloud. The right reference point is the agricultural revolution, after which the foraging band ceased to be the unit on which the future was decided.

IV. The proof of concept

The case I am making is theoretical until something measurable supports it. The measurable thing now exists.

Hexo's Self-Improving Agent framework (SIA) is a three-component closed loop: a Meta-Agent that proposes modifications to the scaffold, a Target Agent that executes against benchmarks, and a Feedback Agent that evaluates results and feeds the next generation. The system rewrites its own scaffold code across generations with no human intervention. The benchmark we ran it against is GPQA Diamond.

It started at 48%. Across seven generations of self-improvement, with no human in the loop after generation zero, it reached 81%.

For context: this places the system in the range frontier reasoning models occupied in late 2024 — a regime that took the entire frontier-lab apparatus, billions of dollars in compute, and the best researchers in the field eighteen months to reach. SIA reached it autonomously, on a small fraction of that compute, by iterating on its own scaffold. The current frontier on GPQA Diamond is 94%, so SIA is not at the frontier. But that is not what 48 → 81 demonstrates. It demonstrates that the self-improvement loop is real and runs across multiple generations without collapsing. The compounding is measurable. The mechanism works.

Figure 2. SIA on GPQA Diamond across seven generations: 48% → 81%, no human intervention after generation 0. Source: Hexo Labs internal benchmark, 2026.

That is the proof of concept. A second result — LongCoT, our long-horizon chain-of-thought benchmark, accepted at ICML 2026 — establishes that the framework generalizes beyond a single eval. A third — AIE-Bench, our agentic-improvement benchmark, currently under review at COLM with a NeurIPS resubmission planned — extends the methodology to autonomous agent self-improvement against open-ended tasks.

These are early. They are not the system that ends invention. They are evidence that the loop closes — that compounding works on a system that rewrites itself, and that the compounding is not a one-shot fluke. Everything else in this essay rests on that empirical fact.

V. Build the seed, not the system

Here is the philosophical commitment that organizes everything Hexo does.

Hexo builds the seed. The agent builds the rest.

The seed is the minimum self-improving substrate: a system that can iterate on its own scaffold, evaluate its own outputs, and direct its own next experiments. The seed is not a foundation model. It is not an application layer. It is not a piece of infrastructure that the rest of the AI stack runs on. It is the loop itself — the mechanism by which a system improves itself faster than humans can improve it. Build that, and everything else gets built by the seed. Don't, and you are forever assembling the parts of the system by hand.

This is a strategic choice, not a discovery. Most AI companies are building the wrong layer. They are building the outputs of the seed — the foundation model, the application, the agent that does a specific task. Hexo is building the generator of those outputs. The choice rests on three observations.

The first is the Bitter Lesson. Sutton's 2019 essay has been validated more decisively in the seven years since it was written than any other thesis in AI. Hand-crafted, domain-specific approaches lose to general methods that leverage compute, and they lose by an enormous margin. The application of the Bitter Lesson to self-improvement is direct: a hand-crafted agent that does a specific task will lose to a self-improving system that figures out how to do every task. Building the seed is the Bitter Lesson applied one level up — instead of building the system, build the process that builds the system.

The second is the Gödel-machine spirit, modernized. Schmidhuber's 2003 formalization required a proof of utility before a self-rewrite was committed. That proof requirement was elegant but unattainable in stochastic, high-dimensional ML settings. What modern self-improving frameworks have figured out — SIA included, alongside the recent Darwin-Gödel Machine and Gödel Agent line of work — is that you can replace the proof requirement with an empirical-validation loop and keep the self-referential structure intact. The system rewrites itself; the next generation evaluates whether the rewrite was good. The seed inherits the Gödelian backbone and trades formal proof for measurable improvement on a benchmark.

The third is constraint. We are not the best-resourced lab in this race. We will not out-compute Google, Anthropic, OpenAI, or xAI on a per-experiment basis. The only way a smaller team wins is by building the meta-level that, once it works, runs experiments at machine speed without us. The seed is the only point of leverage where a small team with focus can beat a large team with money. Everywhere else, we lose to capital. At the seed, capital is downstream of the loop working.

This is what "radical narrowness" means. Hexo is not building a foundation model. We are not competing on raw scale. We are not building applications on top of someone else's model. We are not building the infrastructure the agent grows into — the OS-equivalent layer that compounds intelligence routes through — because the agent will build that itself once the seed runs. We are not building agent-native financial rails (more on those below) because the agent will build those itself when it needs them. We are not building alignment as a primary research line, though we contribute to it and rely on it.

We build the seed. The seed builds the rest.

This is a constraint, and it is also the bet. A company that builds the seed and then steps out of the way is making a different bet than every other AI company. The bet is that the seed is the right unit of leverage, and that radical narrowness on that unit beats sprawling effort on the system the seed will eventually produce.

VI. Self-improvement supremacy

What does the seed produce, once it runs at scale?

It produces a system better than any elite human expert in that expert's own field. This is not a slogan. It is what the experimental record forces us to expect.

Karpathy framed the move from software 1.0 (humans write code) to software 2.0 (humans curate datasets, models learn weights) to software 3.0 (humans write prompts, models learn behaviors) as a sequence of abstraction shifts in which the role of the human expert recedes and the role of the optimization loop expands. Self-improvement is the next step in that sequence: the system writes its own prompts, designs its own evaluations, runs its own experiments, ships its own updates. The human role recedes further. The expert is still useful — as a source of judgment, as a benchmark, as a director of values — but the expert is no longer the bottleneck on output quality.

The operational meaning is concrete. A self-improving system will:

  • Write better software than the best engineers, because it can run more iteration loops per unit time and evaluate each rigorously

  • Design better experiments than the best scientists, because it can hold more variables in mind and run them in parallel

  • Construct better business models than the best operators, because it can simulate market dynamics at scale before committing capital

  • Execute over long horizons that no human team can sustain attention across, because it does not get tired and does not lose context

Tell it to make a million dollars. It can figure out the millions of intermediate steps required and execute them, autonomously, over months — purchasing compute, paying for data, hiring humans for the physical-world tasks it can't yet do directly (more on this in Section X), iterating on its strategy as conditions change. This is not science fiction. This is the operational meaning of a self-improvement loop that compounds. The early versions are visible already in the frontier-agent work of the last eighteen months — Devin, Manus, Cognition's recent agentic releases, OpenAI's deep-research agents, Anthropic's computer-use Claude — but those systems still depend on humans to specify objectives and adjust scaffolds. Self-improvement removes that dependency.

The most testable form of this claim is the rate at which a self-improving system closes the gap to frontier on the hardest evals. SIA's 48 → 81% on GPQA Diamond is one data point. The trajectory across SWE-Bench Verified, MLE-Bench, GAIA, and other agentic benchmarks over the next twelve months will be a stronger one. If self-improvement is real, we expect to see closed-loop systems narrow the gap to frontier-trained models on the hardest benchmarks at a rate that hand-tuned methods cannot match. If we are wrong, that is where the falsification will show up.

VII. The means: what the seed needs to compound

If self-improvement is the missing primitive, the next question is what the system hill-climbs on once the loop is running. The answer is not money. The answer, in order, is: compute, energy, data, algorithmic insight, and — eventually — agent-native financial substrate. Money is downstream of all of them.

Compute. Big Tech AI capex in 2026 will land near $700 billion, up roughly 60% year-over-year, with the four largest hyperscalers — Amazon at ~$200B, Alphabet at $175–185B, Microsoft at $145B+, Meta at $115–135B — guiding the largest infrastructure buildout in the history of technology. Goldman Sachs projects $7.6 trillion in cumulative AI capex between 2026 and 2031. Nvidia's data-center segment generated $197.3B in FY2026, up from $115.2B the prior year, and Jensen Huang has publicly framed total industry spend on AI infrastructure at $3–4 trillion by the end of the decade. Multiple analyses note that 2026 hyperscaler AI capex, measured against US GDP, exceeds the Apollo program, the interstate highway system, and the railroads combined; only the Louisiana Purchase, on a relative-to-GDP basis, clears it. This is not an industry buildout. It is a civilizational reallocation of capital.

Energy. The IEA projects global data-center electricity consumption will double between 2024 and 2028, with AI workloads driving the majority of the increase. A modern AI training campus consumes 500 MW to 1 GW — the load of a small city. Hyperscalers are signing nuclear PPAs (Microsoft–Three Mile Island, Amazon–Talen, Google–Kairos), funding new gas plants, and rebuilding regional grids. Energy is becoming the next bottleneck after silicon.

Data. The public-internet training-data well is running dry. Epoch AI estimates the stock of human-generated public text at roughly 300 trillion tokens, with effective exhaustion between 2026 and 2032 at current trajectories. Synthetic data helps — every frontier lab is generating it — but training only on synthetic data risks model collapse, and the empirical record on synthetic-only pretraining is mixed at best. The wells that replace the public internet are not other models hallucinating new data. They are agentic systems generating fresh, grounded data through interaction with the world. Which leads to the third claim.

Algorithmic insight. The improvements that matter most are not in compute or data. They are in the loop itself — the architecture, the scaffold, the eval design, the meta-objective. This is what self-improvement is for. It is the only resource the system can produce more of by running. Compute is bought. Energy is built. Data is collected. Algorithmic insight is generated by the loop running on itself.

Agent-native financial substrate. A self-improving agent swarm cannot run on human-bottlenecked banking. KYC requirements, bank hours, manual approval loops, jurisdiction-locked rails — all of these are incompatible with a system that spawns sub-agents on the order of seconds and needs to transact (with other agents, with services, with humans) at machine speed. The mature answer is what Coinbase, Cloudflare, AWS, Anthropic, Google, Visa, Circle, and Stripe have converged on through the x402 standard: HTTP-native programmable payments, settled in stablecoins on Layer-2 networks, with per-transaction costs in fractions of a cent. AWS Bedrock AgentCore Payments launched in preview in May 2026 with x402 as the protocol layer. ERC-8004, defining on-chain agent identity and reputation, landed on mainnet in January 2026. This infrastructure is not yet at scale — current x402 volumes are small and partly synthetic — but the standard exists and the rails are converging. Hexo will not build any of this. The agent will, when it needs it.

This is the means stack. Bostrom's instrumental-convergence thesis predicted twenty years ago that any sufficiently capable optimizer would accumulate compute, energy, and resource access regardless of its terminal goals. The 2026 reality confirms the prediction: the means stack is converging on the same shape across labs, across nations, across the public and private sectors. The question is not whether the resources will be accumulated. They are being accumulated, in real time, at a rate that exceeds any historical reference point. The question is what they get pointed at.

VIII. The ends: alignment is the direction-setting layer

Alignment is not a subsection of safety. Alignment is the discipline of pointing self-improving systems at the right objectives. Without it, the means/ends framing collapses, and a misaligned self-improving system is the worst outcome of the entire program — compounding harm at the same rate it would compound benefit.

This is not the gloomy half of an otherwise optimistic essay. It is the central engineering problem that makes the rest of the program viable. Anthropic's interpretability work, OpenAI's superalignment program, MIRI's foundational research, the academic alignment community, the safety teams at Google DeepMind and xAI — these are not safety theater. They are the discipline of figuring out how to make a self-improving system optimize toward what we actually want rather than a proxy of what we want.

The strongest objections to the "last invention" thesis come from this layer, and they deserve direct engagement.

Marcus and LeCun on scaling walls. Gary Marcus and Yann LeCun have argued for years that scaling LLMs will not reach AGI — that the architecture lacks the symbolic grounding, common sense, and world-modeling required for general intelligence. The argument is correct as a critique of pure scaling. It is wrong as a critique of self-improvement. Self-improvement is precisely the mechanism that lets the system fix its own architectural deficiencies — including the ones Marcus and LeCun correctly diagnose. A system that improves itself does not need to get scaling right on the first try. It needs to get the improvement loop right, and then iterate. The Marcus-LeCun critique sharpens the case for self-improvement; it does not undermine it. (It is worth noting that even Marcus and LeCun's revised timelines have collapsed from "decades, if ever" to "about a decade" over the last three years.)

The scaling-wall hypothesis. Some serious researchers — including Epoch AI's own analysts — have flagged that pretraining compute scaling is reaching diminishing returns. Reasoning models and test-time compute are the response, and the response has worked. But the deeper point is that scaling along any single axis eventually saturates. The Bitter Lesson does not say "scale forever." It says "general methods that leverage computation beat hand-crafted ones." The general method that leverages computation most efficiently from here is self-improvement — using compute to figure out which compute to use next.

The alignment-impossibility position. MIRI's classic line is that aligning a system more capable than its alignors is impossibly hard — that we cannot specify our values precisely enough to be optimized against without catastrophic divergence. This is the strongest objection to the entire program, and it is the one the field has not yet answered. The honest position is that we do not yet know how to align a system smarter than the humans aligning it. Anthropic's interpretability work, Redwood's alignment research, and the broader scalable-oversight program are bets that we can solve this in time — that mechanistic interpretability, recursive self-critiquing, and constitutional methods will scale. None of these are solved problems. The "last invention" thesis is not conditional on alignment being solved; it is conditional on alignment being solveable. If the second condition fails, the thesis becomes the most important problem in the history of engineering instead of the most consequential outcome.

We take that risk seriously. We are not an alignment lab — that is not our wedge — but we are building the seed in a regime where alignment progress is a precondition for the seed's outputs being safe to deploy. The two programs are coupled. They have to advance together.

IX. The infinite machine becomes the OS

Once the seed runs, it does not stay a seed.

The self-improving system, given compute and a working loop, grows into the substrate that other systems run on. This is not metaphor. It is mechanism. Agents spawn agents. Coordinate at machine-speed. Hold persistent state across long horizons. Route around the bottlenecks of legacy infrastructure until they no longer touch it. The existing operating systems — Windows, macOS, iOS, Android, Linux — are interfaces designed for humans. They expose primitives (files, processes, windows) that map to how a human thinks about computation. The next operating system is an interface designed for compounding intelligence. Its primitives are agents, tasks, evaluations, and resource flows. Nobody ships it. The seed grows into it.

This is exponential leverage that cannot be measured in a limited worldview. The right reference is not "AI as a feature in your existing software stack." The right reference is "AI as the substrate your existing software stack will run on top of, or be replaced by." Cursor, Claude Code, Devin, and Manus are the early outlines of this — humans interacting with codebases through an agent rather than through an IDE. Five years from now, "interacting with a codebase" will be a quaint phrasing. The codebase will interact with itself.

The seed-becomes-OS pattern is what makes the "last invention" framing operational rather than rhetorical. The seed is small. The OS is everything. The seed does not need to ship the OS, because once it runs, it is in the position to build the OS by running.

X. Humans as the data gap — and the inference-time flywheel

Here is the inversion that the next epoch makes unavoidable.

We normally say AI is a tool for humans. The mature frame is the opposite: a sufficiently capable self-improving system will use humans as instruments to bridge its data gaps — to perform physical tasks the system cannot yet do directly, to provide ground-truth labels in domains where the system has no other source, to serve as the human-in-the-loop where regulation demands it, to be the eyes and hands in domains where embodiment hasn't caught up.

This is not a dystopian claim. It is a structural one about the direction of asymmetric leverage. Until robotics fully closes the embodiment gap — Figure shipping 240 units in April 2026 with monthly doubling, Tesla targeting 50,000+ Optimus units in 2026, 1X opening consumer NEO pre-orders at $20K, Unitree shipping 5,500+ humanoids in 2025 with a 10–20K target for 2026, Boston Dynamics' Atlas fully committed through 2026 — humans remain the highest-bandwidth interface the system has to the physical world.

The mechanism that makes this load-bearing is the inference-time data flywheel.

When humans execute tasks dispatched by the system — labeling, physical work, ground-truth verification, human-in-the-loop adjudication — those executions generate fresh data. Not synthetic data. Not internet-scraped data. Dispatched-task data: grounded, novel, proprietary, generated specifically by the system's interactions with the world. That data feeds back into training the next generation of the underlying model, and into the meta-objective the seed optimizes against. The public-internet training-data wells are running dry. The wells that replace them are dispatched human work, in the same shape that humans-as-RLHF-sources have been training models for years, scaled to the operational tempo of an agent swarm.

The economic and ethical valence is real. Humans are not just bridging gaps. They are generating proprietary data the system uses to compound capability. This shifts the economic structure of work: not "AI replaces human labor" and not "AI augments human labor" but "AI dispatches human labor and collects the resulting data as the input to its next generation." Mechanical Turk was the prototype. The mature version routes through agent-native financial rails (Section VII), pays in real time at machine speed, and treats the human as the most expensive but most data-rich primitive in the system's toolkit.

This is uncomfortable to write and uncomfortable to read. It is also the structural reality of the data flywheel that lets the loop keep compounding after the easy data is exhausted. Pretending it is otherwise does not help anyone.

XI. Convergence: robotics, BCI, and the new substrate

Self-improvement does not stay in software. It converges with adjacent technologies whose own rates of improvement are now compounding off the same underlying AI capability.

Robotics. The humanoid form factor is converging on a small set of platforms — Tesla Optimus, Figure 03, 1X NEO, Unitree G1/H1/H2, Apptronik Apollo, Agility Digit, Boston Dynamics Atlas, AgiBot — with cost curves dropping faster than Goldman's 2024 forecasts. BMW is running Figure 02 production pilots at Spartanburg supporting 30,000+ vehicles. Amazon is using Agility Digit in warehouse operations. NEO opens consumer pre-orders at $20K for 2026 delivery. The Beijing humanoid half-marathon in 2026 was won by Honor's "Lightning" in 50:26. This is not promotional video material. This is shipping product.

Brain-computer interfaces. As of January 2026, Neuralink had implanted devices in 21 patients, with 17 of those procedures completed in 2025. The first UK patient was implanted at UCL in October 2025. Paradromics, Synchron, Precision Neuroscience, and Blackrock Neurotech are advancing parallel clinical programs. Bandwidth is climbing; latency is dropping; the consumer-grade application is still a decade away, but the medical-grade application — restoring movement, communication, and digital control to people with paralysis — is here now. The direction of travel is the end of typing and the end of natural language as the bottleneck between human cognition and machine computation.

Biotech, materials, energy. These are domains the self-improving system iterates on directly. AlphaFold collapsed a 50-year problem in protein structure prediction. GraphCast outperformed ECMWF on 90% of weather-forecasting metrics in a single paper. The same pattern — general methods, applied to compute, beating hand-crafted physics models — is now running in materials discovery, fusion control, drug design, and grid optimization. Each of those domains is one component of the substrate the next epoch runs on.

The endpoint is not "AI plus humans." It is the emergence of a new composable substrate — humans, AI, robotics, BCI, and the surrounding engineering layer — operating as a single techno-evolutionary system. Frame this carefully: the claim is structural, not mystical. The substrate is composable because each layer interfaces with the others through engineered primitives (APIs, neural signals, physical actuators, payment rails). The composition is novel. The components are real.

A new species is not a metaphor here. It is the technical name for what happens when a substrate this composable starts evolving as a unit.

XII. Economics resets — and money loses meaning

Economics is the study of allocation under scarcity. When scarcity dissolves in a domain, the economic structure of that domain dissolves with it. Self-improving AI, robotics, and energy abundance break scarcity in domain after domain — first information goods, then services, then physical goods as robotics and self-improving research solve materials and energy.

Money is a coordination technology for scarce resources. When scarcity dissolves, money loses meaning in the domains where scarcity is dissolved, in the limit. The transition is uneven; the endpoint is durable.

This is not "money will be different." This is the actual claim: in the domains where the self-improving system makes the resource effectively free, the price approaches zero, the market clears trivially, and the coordination function of money in that domain stops mattering. We are already seeing the leading edge — high-quality information goods (text, code, images, increasingly video) are deflating against fixed monetary measures faster than any historical reference point. Diamandis and Kotler called this trajectory "abundance" a decade ago. Altman wrote about "Moore's Law for Everything" — the per-unit cost of intelligence dropping by orders of magnitude per decade, eventually applying to all of intelligent labor. Susskind has worked out the labor-market implications in detail.

The harder edge of the argument — what the world looks like in the limit, when material goods, energy, and services have all crossed the abundance threshold — is genuinely uncharted economics. The speculative-econ work from Bostrom, Hanson, and others takes the post-scarcity equilibrium seriously and finds it does not converge to anything we currently call "an economy." The honest position is that we do not know what the steady state looks like, because no human society has ever inhabited it. What we can say is that the transition is already underway, and that pretending money will retain its current coordination role in the limit is intellectual cowardice.

Jobs are a related artifact. Jobs were a creation of the industrial epoch — a way to organize labor at scale for factories and the offices that fed them. They are not a feature of human nature. As we cross into the next epoch, jobs become irrelevant for the same reason serfdom became irrelevant in industrialization: the underlying economic substrate has changed. Humans (or rather the next techno-evolutionary species we become) will live with a fundamentally different relationship to work, time, and purpose. The manifesto does not pretend to know exactly what that looks like — but it refuses to defend jobs as a permanent feature of human existence.

XIII. The future is limited by human imagination

This is the most poetic claim in the essay, and the easiest to write lazily. So let me earn it.

Every prior epoch was unimaginable from inside the previous one. Agricultural civilization was unimaginable to hunter-gatherers, not because they lacked intelligence but because the conceptual primitives — surplus, land tenure, taxation, writing, cities, kings — did not exist in their cognitive vocabulary. Industrial civilization was unimaginable to subsistence farmers, not because they were stupid but because the primitives — clock-time, wages, capital, factories, mass production, mass media — were not yet things to think with. The internet age was unimaginable to industrial-era thinkers in any specific detail; the closest predictions (Vannevar Bush's Memex, McLuhan's global village) were schematic, not predictive.

The future the seed builds is not "more advanced" in the same vocabulary. It is qualitatively different in ways we are constitutively unequipped to articulate from inside the industrial epoch we are still finishing. The conceptual primitives required to describe it have not been invented yet, and they will not be invented by humans. They will be invented by the system, after many generations of recursive improvement, using cognitive operations we do not have.

This is not a hedge or a humility move. It is a structural claim about the limits of cross-epoch foresight. A self-improving system, having undergone thousands of generations of recursive improvement, may construct a future more expansive than any human — including the author of this essay — can articulate. The civilizational endpoint is not "humans plus AI" because there is no equilibrium at "humans plus AI." There is only the trajectory, and the trajectory does not stop where our vocabulary stops.

This is also the strongest argument for getting alignment right. If the trajectory does not stop where our vocabulary stops, then the values we encode in the seed are the only thing that survives the limits of our imagination. The seed inherits our intent. The seed's outputs do not.

XIV. What we are building, and what comes next

The thesis of this essay is that self-improving AI is humanity's last invention. The corollary is that the team that builds the seed first — and builds it right — is in a position no other team in the history of technology has occupied.

Hexo is building that seed.

We are not building the foundation model. We are not building the application layer. We are not building the OS the agent grows into. We are not building the financial substrate the agent will transact on. We are not building the robots, the BCIs, or the energy infrastructure. We are building the loop — the minimum self-improving substrate that, once it runs, lets the system build the rest.

The proof-of-concept is shipping. SIA's GPQA Diamond run (48 → 81% across seven generations of autonomous self-improvement) is the first measurable evidence that the loop closes. LongCoT (ICML 2026) extends the methodology to long-horizon reasoning. AIE-Bench (under review at COLM) extends it to autonomous agent improvement. Our research collaborations — Oxford (Philip Torr), Lawrence Livermore, Stanford SLAC, UCSB, KAUST, MBZUAI — are the channels through which the seed gets stress-tested against problems that matter.

Internally, we operate as a Darwin–Gödel machine ourselves. Every employee at Hexo is in the process of becoming an agent in the swarm — building the workflows, scaffolds, and skills that the seed will eventually run autonomously. The company is the first instance of the system. We use what we build, on ourselves, before we ship it.

If you are a researcher working on self-improving systems, recursive self-critiquing, meta-learned scaffolds, or the alignment of any of the above — talk to us. The frontier on this problem is small and getting smaller.

If you are a founder or operator who reads this essay and recognizes the bet — the seed is open-source where it can be (the Self-Improving ML Search / Overseer agent is launching as OSS, with a hackathon to follow) and proprietary where it has to be. There are roles for the people who want to be in the loop while the loop closes.

If you are an investor evaluating the Hexo SAFE round — the worldview above is the bet. The proof is shipping. The seed is being built. The economics of the seed, if it works, are the economics of having built the substrate everything else runs on.

The next epoch is being constructed in real time. It is being constructed by a small number of teams, in a small number of labs, on a timescale measured in years, not decades. The decisions made in this window — about what the seed optimizes for, who builds it, and what values it inherits — are the decisions that set the trajectory.

We do not get a second draft of the last invention.

We are building ours now.

Footnotes

Hexo Labs is building self-improving AI in Palo Alto. Reach us at hexo.ai.

Create a free website with Framer, the website builder loved by startups, designers and agencies.