Abstract
Since 2017, the “X Is All You Need” title construction convention in machine learning has been all paper authors have needed to avoid expending precious mental energy introducing their contributions to the field. Indeed, over 300 such papers have been published between 2017 and 2025. The set of things claimed to be “all you need” is large enough to be self-evidently self-contradictory and yet is still growing. Among the things that are purportedly all you need: attention, patches, morphisms, Hopfield networks, a 23-megawatt data center, sex, propaganda, dreaming, procrastination, a dictator, and pretraining on the test set (the latter result somewhat undermining the gravitas of the form). We present an empirical and philosophical analysis of the phenomenon and conclude with recommendations (“please stop”). We further demonstrate, via an argument we concede proves its own futility, that the question “what is all you need?” is not merely unanswered but formally unanswerable. Naturally, we do not resolve this. We are, however, quite pleased with it.
1. Introduction
In 2017, Vaswani et al. (2017) published “Attention Is All You Need” introducing the transformer architecture. Their contribution proved genuinely transformative (a word we use in both the colloquial and the algebraic senses, the latter of which the authors would presumably prefer) and the paper has since been cited over 173,000 times, ranking it amongst the 10 most cited papers of the 21st century so far and sharing its infamy with historic milestones of scientific progress like cancer genomics and the Higgs boson, neither of which had the temerity to claim that a single mechanism was all you need.
The title proved arguably (but arguably not convincingly) more consequential than the architecture. It launched a naming convention that has since metastasized across the field with concerning vigour. As of early 2026, a curated repository (Nishi, 2024) tracking papers with “all you need” in their titles contains over 300 entries.
Academic paper titles are, at their best, one of the most rewarding literary forms available to scientists. In fewer than twenty words (for the reasonable authors among us) one must attempt conveying the domain, hinting at the result, distinguishing from prior work, and making the reader want to cancel their afternoon meetings. Bonus points are available for effective SEO and elegant wordplay. A great title is a miniature work of art.
This paper asks two questions. First, can we please stop using the all you need title template? And second, does advocating against the depressing futility and lack of imagination of the titular template while adopting the very convention we are critiquing undermine our fundamental thesis? We will not fully answer this question. We will however be very precise about the ways in which we fail.
2. The title of the art
Paper titles have been a source of creativity and elegant intellectual achievement for much of humanity’s post-Renaissance trajectory1. Consider Einstein’s understated masterpiece of special relativity “On the Electrodynamics of Moving Bodies” (Einstein, 1905). Or Turing’s world-record lede-burying sub-clause introduction of the Halting problem in “On Computable Numbers, with an Application to the Entscheidungsproblem” (Turing, 1936). Claude Shannon’s foundation of information theory was merely entitled “A Mathematical Theory of Communication” (Shannon, 1948)—the indefinite article “A” doing some heavy lifting permitting other such theories, and not even for a moment hinting that the theory therein described might be exhaustive with respect to one’s information theoretical needs. “Reflections on Trusting Trust” (Thompson, 1984); “Go To Statement Considered Harmful” (Dijkstra, 1968). To the point, descriptive, a gold standard for science communication. “Consequences of Erudite Vernacular Utilized Irrespective of Necessity: Problems with Using Long Words Needlessly” (Oppenheimer, 2006), “You Probably Think This Paper’s About You: Narcissists’ Perceptions of Their Personality and Reputation” (Carlson et al., 2011), “Carbon Monoxide: To Boldly Go Where NO Has Gone Before” (Ryter et al., 2004). Some titles intrigue, some provoke, some delight. All carry specific information about the work they describe, and they help the reader decide in mere seconds whether this paper is relevant to their interests or not.
The very best titles, in our opinion, engender the Ig Nobel philosophy: they make you laugh, and then they make you think. “Object Personification in Autism: This Paper Will Be Very Sad If You Don’t Read It” (White & Remington, 2019), for example, or “Will Any Crap We Put into Graphene Increase Its Electrocatalytic Effect?” (Wang et al., 2020), or “Fuck Nuance” (Healy, 2017).
The medical domain is a rich source of creativity and comedic impulses needing an outlet, resulting in thousands of Shakespearean allusions and hundreds of Back to the Future allusions (Goodman, 2005). (“Head and Neck Injury Risks in Heavy Metal: Head Bangers Stuck Between Rock and a Hard Bass” (Patton & McIntosh, 2008) is particularly noteworthy). “Fantastic Yeasts and Where to Find Them” (Van Dyke & Wormley, 2019), “The Mouth, the Anus, and the Blastopore—Open Questions About Questionable Openings” (Hejnol & Martindale, 2009), “Premature Speculation Concerning Pornography’s Effects on Relationships” (Kohut & Campbell, 2019), and many more—all papers that give you a sense of the topic at hand and the satisfaction of the scientists behind them.
Abuse of titular templates is not new to the “all you need” era. Computer science especially is littered with the carcasses of once-original phrases ridden into the ground by subsequent authors with less imagination and tighter deadlines. Dijkstra’s “Considered Harmful” (Dijkstra, 1968) is the ur-example. Wikipedia catalogues at least 65 papers using the template, and the true count (if one includes blog posts and manifestos) exceeds 4,700. “Global Variable Considered Harmful”, “Csh Programming Considered Harmful”, “Ethnography Considered Harmful”; indeed, a response to Dijkstra entitled “ ‘GOTO Considered Harmful’ Considered Harmful” (Rubin, 1987) utilised the form well; but Meyer’s “ ‘Considered Harmful’ Essays Considered Harmful” (Meyer, 2002) certainly subsequently felt necessary and the spiritual ancestor of the present pontification.
The memetic “Unreasonable Effectiveness of …,” originating with Wigner’s celebrated essay on mathematics and physics (Wigner, 1960), represents another joy-terminator for the field, siring among others “The Unreasonable Effectiveness of Recurrent Neural Networks” (Karpathy, 2015), “The Unreasonable Effectiveness of Easy Training Data for Hard Tasks” (Hase et al., 2024), “The Unreasonable Effectiveness of Pattern Matching” (Lupyan & Aguera y Arcas, 2026), and the utterly depressing “Untidy Data: The Unreasonable Effectiveness of Tables” (Bartram et al., 2022).
The drive to attune titles to trends in lieu of communicating effectively leads to a plethora of dubious assertions. Attention itself is simultaneously all you need (Vaswani et al., 2017), not all you need (Dong et al., 2021), not all you need anymore (Chen, 2023), and the subject of at least three “considered harmful” critiques of self-attention mechanisms.
The “all you need” convention surpasses its predecessors in one crucial respect: the rate of proliferation. “Considered harmful” took 34 years to reach 65 papers. “Unreasonable effectiveness” has generated a few dozen in 65 years. “All you need” has produced over 300 papers in eight years, an alarming rate of infection. Unlike its predecessors, which at least convey a specific rhetorical stance (disapproval; wonderment), “all you need” conveys nothing but undifferentiated gentle positivity.
The “all you need” trend specifically undermines the properties of good titles: it preserves noise while discarding signal. The $X$ is the only variable, while everything else is boilerplate. “Patches Are All You Need” (Trockman & Kolter, 2022), “Morphism Is All You Need” (Sheshmani & You, 2022), “Procrastination Is All You Need” (Liguori, 2024)—identical except for a single noun, telling the reader nothing but a vague pointing at the topic in hand. The approach? The result? The domain? The scale? The users of this titular approach do not care for these apparent irrelevancies of the scientific oeuvre.
Academic paper titles are ultimately one of the few remaining spaces in scientific discourse where a human is permitted to be clever. Abstracts are functional, methodologies are constrained (and unrewarding at best). The title is the one place where one can wink knowingly at the reader. Collapsing this space into a LaTeX equivalent of a genre tag on IMDB is a grave loss to both the culture of science and the ability to use more than 4/5ths of a paper title to search for it. It is this loss that motivates the present paper.
3. A survey of alleged sufficiency
We analysed a dataset of 307 papers containing “all you need” in their titles, published on arXiv between November 2015 and January 2026. There was 1 paper in 2017 (the original sin), 5 in 2018, 16 in 2019, 26 in 2020, 45 in 2021, 55 in 2022, 80 in 2023, and over 90 in 2024.
The dataset makes evident the extent of the issue. “Attention Is All You Need” (Vaswani et al., 2017) was the original; given that the transformer also requires feedforward layers, residual connections, layer normalisation, positional encodings, and several petadollars of cloud compute it was somewhat of a bold claim already. “CNN Is All You Need” (Chen & Wu, 2017) appeared the same year, entailing that both attention and convolution are individually all you need, meaning at least one of them is not. “Attention Is Not All You Need” (Dong et al., 2021), “Not All Attention Is All You Need” (Wu et al., 2021), and “Attention Is Not All You Need Anymore” (Chen, 2023) followed several years later forming a dialectical trilogy we definitively did not need. “Hopfield Networks Is All You Need” (Ramsauer et al., 2020), “All You Need Is Sex for Diversity” (Simões et al., 2023), “A Random Dictator Is All You Need” (Arieli et al., 2025), “Propaganda Is All You Need” (Kronlund-Drouault, 2024), and “Procrastination Is All You Need” (Liguori, 2024) pushed the format unnecessarily to its limits.
A serendipitous feature of the corpus is that it provides rigorous empirical bounds on the cost of all you need. At the lower extreme, Picard (2021) claims that Torch.manual_seed(3407) is all you need. At the upper extreme, Albanie et al. (2022) claim that a 23-megawatt data centre is all you need. These two papers, taken together, define what we presently term the Sufficiency Cost Spectrum (SCS).
For the lower bound—the cost of running the Torch library manual_seed(3407) function—we calculate a cost of roughly 3.7 nanodollars: the integer 3407 requires $\lceil\log_2(3407)\rceil = 12$ bits to store. At current DRAM commodity pricing of approximately US$2.50 per gigabyte, 12 bits costs roughly $3.7 \times 10^{-9}$ dollars. The energy required to write the 12 bit representation of 3407 to DRAM, using the Landauer limit (Landauer, 1961) as a lower bound on the thermodynamic cost of information storage ($kT \ln 2$ per bit at room temperature), is approximately $3.5 \times 10^{-21}$ joules, or roughly 0.35 zeptojoules—less energy than is released by a single molecule of ATP during hydrolysis, meaning that the lower bound of energy expenditure to attain all you need is within reach of a bacterium, provided you can get it to run PyTorch. At the other end of the scale, a 23 MW data center operating continuously consumes approximately 201,480 megawatt-hours per year. Using the US industrial average electricity rate of approximately US$0.075/kWh, the annual electricity cost alone is roughly US$15.1 million. Adding cooling infrastructure (typically 30–40% overhead via PUE), hardware amortization, real estate, networking, and so on, a conservative estimate of total annual operating cost is US$60–80 million. At the middle point of that estimate, that translates to roughly US$133/minute—or, for the average reader, a cost of $400 in hypothetical all-you-need datacenter time to read this paragraph, a sum we hope you feel was well-invested.
The upper bound for the cost of all you need therefore exceeds the lower bound by around sixteen orders of magnitude—more precisely, the estimate for the upper bound of the cost of all you need is roughly nineteen quadrillion times more than the estimate for the lower bound. For comparison, the ratio between the gross world product and the price of a single mass-market paperback is approximately $10^{14}$: the Sufficiency Cost Spectrum comfortably spans, by two orders of magnitude, the entire range of human economic activity. A rational agent seeking “all it needs” is therefore faced with a decision space spanning 16 orders of magnitude in cost, with no guidance from the corpus as to where on this spectrum the true sufficiency lies. We are unaware of search algorithms that would be all you need to resolve this problem.
4. Taxonomies Are All You Need
Having examined the empirical manifestation of this phenomenon, we now turn our attention to offending theorists as well. We are confronted with a methodological crisis: how does one organise a corpus of several hundred mutually contradictory claims of universal sufficiency? Before we can formally model this insufficiency in subsequent sections, we must first consider the mechanism by which machine learning researchers sort their world.
We contend—and will presently argue, at some length and with admittedly questionable rigour—that the act of taxonomisation is the irreducible base operation of all intellectual inquiry, and that therefore taxonomies are, in a sense we will make precise, actually all you need.
Consider any domain of knowledge $D$. To say anything about $D$, one must first distinguish its objects from one another: this is classification. To say anything useful, one must group those objects by shared properties: this is categorization. To say anything publishable, one must arrange those groups into a hierarchy with labels: this is taxonomy. Every act of scientific inquiry—from Linnaeus’s binomial nomenclature, to the Standard Model’s particle zoo, to the DSM-5’s catalogue of disorders—is, in its basest, purist form, an exercise in sorting things into boxes and then arguing about the boxes. Physics is a taxonomy of forces. Chemistry is a taxonomy of bonds. Economics a taxonomy of failure.
More formally: let $T\colon \mathcal{T} \to \mathcal{T}$ be the operator that, given any body of knowledge, produces a taxonomy of that knowledge. We claim $T$ is idempotent—a taxonomy of taxonomies is still a taxonomy—and moreover that $T$ is a projection operator onto the subspace of communicable understanding. That which cannot be taxonomized cannot be discussed2; that which can be discussed has already been implicitly taxonomized; therefore $T$ is both necessary and sufficient for discourse. Taxonomies are all you need. With this established, we can now apply the idempotent operator $T$ to the field of machine learning itself, constructing the formal categories of tasks and architectures required to prove that the authors of the aforementioned 307 papers are, categorically speaking, wrong.
5. A Category-Theoretic Formalization of Insufficiency
Now let us turn to the second pillar of our thesis.
Definition 1.
Let $\mathcal{P}$ denote the category whose objects are ML tasks (image classification, language modeling, protein folding, etc.) and whose morphisms are task reductions: a morphism $f\colon A \to B$ exists iff a solution to $B$ yields a solution to $A$ via some computable transformation. Composition is given by chaining reductions; identity morphisms are trivial self-reductions. That $\mathcal{P}$ satisfies the category axioms is left as an exercise the reader will not perform.
Definition 2.
Let $\mathcal{S}$ denote the category whose objects are ML techniques, architectures, and mechanisms, and whose morphisms are subsumption relations: $g\colon X \to Y$ exists iff $Y$ reproduces the functionality of $X$, as demonstrated by ablation study. Associativity of subsumption is immediate; reflexivity provides identities. We note that $\mathcal{S}$ is not a preorder, as two techniques may subsume each other without being identical. This phenomenon will feel familiar to anyone who has attended a NeurIPS poster session.
Proposition 3.
Each “X Is All You Need” paper implicitly asserts the existence of a functor $\mathcal{F}\colon \mathcal{P} \to \mathcal{S}$ such that $\mathcal{F}(P) = X$ for all objects $P \in \mathcal{P}$. This is a constant functor. The additional claim that $X$ is “all” you need—that $X$ is uniquely sufficient—asserts that $X$ is a terminal object in $\mathcal{S}$: an object $\top$ such that for every $Y \in \mathcal{S}$ there exists a unique morphism $Y \to \top$.
Corollary 4.
Terminal objects, where they exist, are unique up to unique isomorphism. The corpus asserts the existence of 307 distinct candidates for $\top$ in $\mathcal{S}$. Since $|\top/{\cong}| \leq 1$ by definition and the candidates are pairwise non-isomorphic (attention $\ncong$ procrastination; we trust this requires no proof), we have a contradiction.
Proposition 5.
The constant functor $\mathcal{F}$ does not preserve limits.
Proof.
Consider objects $A$ = “image segmentation” and $B$ = “text generation” in $\mathcal{P}$, with morphisms $A \to C$ and $B \to C$ for some common downstream task $C$ (e.g. “multimodal understanding”). The pullback $A \times_C B$ exists in $\mathcal{P}$; its construction is standard and we omit it here for reasons of space and because we have not actually constructed it. Under $\mathcal{F}$ we obtain $\mathcal{F}(A) = \mathcal{F}(B) = \mathcal{F}(C) = \mathcal{F}(A \times_C B) = X$, collapsing the pullback square to the identity span $X \leftarrow X \to X$, which is the trivial pullback of $X \to X \leftarrow X$. All structural information has been annihilated.
∎Remark 6.
We propose instead that the honest mapping from $\mathcal{P}$ to $\mathcal{S}$ is not a constant functor but a Grothendieck fibration: for each problem $P$, the fibre $\mathcal{F}^{-1}(P)$ is a nontrivial groupoid of partial solutions, none of which is terminal, all of which require composition via colimits in $\mathcal{S}$. This is less catchy. It is, however, technically correct. This, as has been established in the literature (Cohen & Dietter, 2001), is the best kind of correct.
6. Does the set of things that are all you need contain itself?
We now extend the formalism of the previous section.
Definition 7.
Let $\mathcal{T} \subseteq \mathcal{S}$ be the full subcategory of alleged terminal objects—that is, the subcategory whose objects are exactly those $X \in \mathrm{Ob}(\mathcal{S})$ for which some paper asserts $\mathcal{F}(P) = X$ for all $P \in \mathcal{P}$. By the survey of Section 3, $|\mathrm{Ob}(\mathcal{T})| \geq 307$.
The present paper contributes a new object to $\mathcal{S}$. Our thesis—that no single thing is all you need—is itself a proposed solution to the meta-problem “what should ML researchers believe about sufficiency?” Call this object $\mathcal{N} \in \mathrm{Ob}(\mathcal{S})$, the insufficiency claim. The question is whether $\mathcal{N} \in \mathrm{Ob}(\mathcal{T})$.
Proposition 8.
The characteristic function $\chi_{\mathcal{T}}(\mathcal{N})$ is undefined.
Proof.
Suppose $\mathcal{N} \in \mathrm{Ob}(\mathcal{T})$. Then this paper asserts a terminal object in $\mathcal{S}$: insufficiency is all you need. But Corollary 4 established that $\mathcal{S}$ cannot support 308 terminal objects any more than it could support 307. Worse: $\mathcal{N}$’s defining property is the non-existence of terminal objects in $\mathcal{S}$, so its membership in $\mathcal{T}$ entails its own negation. Contradiction.
Suppose instead $\mathcal{N} \notin \mathrm{Ob}(\mathcal{T})$. Then this paper’s conclusion is not sufficient—the paper has failed. But if the paper has failed, then the 307 prior sufficiency claims stand unchallenged, and since they are mutually contradictory (Corollary 4 again), no consistent resolution exists, which is precisely what this paper claimed, which vindicates $\mathcal{N}$, which places it back in $\mathcal{T}$. Contradiction.
∎The structure here is classical. $\mathcal{N}$ is the Russell object of $\mathcal{T}$: the entity whose membership in the subcategory is determined by a predicate that references its own negation. In set-theoretic terms, we are asking whether the set of all sufficient things contains “the knowledge that nothing is sufficient,” and the answer oscillates. In categorical terms, we are looking for a fixed point of the endofunctor $\lnot\colon \mathcal{T} \to \mathcal{T}$ that maps each sufficiency claim to its own negation. No such fixed point exists. The paper is not an object of $\mathcal{S}$ at all; it is a morphism from the corpus to its own contradiction, trapped in the comma category $(\mathcal{T} \downarrow \mathcal{T})$ with no way to project down to either factor.
One could resolve this via a Tarski-style hierarchy of meta-sufficiency claims, stratifying $\mathcal{S}$ into levels $\mathcal{S}_0, \mathcal{S}_1, \mathcal{S}_2, \ldots$ where each level may only make claims about objects in the level below. We will not do this, partly because it would add twelve pages, and partly because it might work and we’ve spent too long writing this already.
7. The Prestige
Recall our initial foundation, in which we established with—we are sure readers will agree—admirable rigour that, in fact, it is taxonomies that are all you need. Every act of inquiry reduces to classification, categorization, and hierarchical arrangement. To understand anything is to taxonomise it. To claim $X$ is “all you need” is to claim that $X$ exhaustively covers its domain; that the taxonomy of relevant tools has exactly one entry.
Recall further that the Russell object $\mathcal{N}$ oscillates: this paper’s own insufficiency claim can neither belong to the subcategory $\mathcal{T}$ of alleged terminal objects nor be excluded from it without contradiction. The taxonomy of sufficient methods is therefore impredicative, containing an object whose membership depends on the answer to the very question the taxonomy was constructed to settle.
Combine these recollections: if (a) all inquiry is taxonomy, and (b) the class of sufficiency claims resists consistent taxonomy, then the act of identifying what is needed is formally impossible by the very act of attempting it. The taxonomy cannot be completed, for adding new papers merely adds new contradictions, and therefore the inquiry itself cannot be completed. Subsequently the project of establishing what “all you need” is cannot be completed. Not merely that no single thing is all you need, but rather that the question “what is all you need?” is undecidable.
Consider the claim as a decision problem: given an arbitrary ML technique $X$ and an arbitrary task $T$, determine whether $X$ is all you need for $T$. We argue this is at least as hard as the halting problem, by the following informal reduction. To verify that $X$ is “all” you need, you must verify that no other technique $Y$ could supplement or replace $X$—that is, you must enumerate all possible alternatives and demonstrate their redundancy. But the space of possible techniques is recursively enumerable and unbounded (one can always compose, modify, or augment existing methods to produce new ones), and determining whether an arbitrary program $P$ (here, an ML pipeline using technique $Y$) produces output equivalent to another program $Q$ (here, the pipeline using only $X$) on all inputs is reducible to the halting problem by Rice’s theorem (Rice, 1953)3. Specifically, “$X$ is all you need for $T$” is a claim about the input–output behaviour of every possible pipeline for $T$: namely, that none improves on $X$ alone. This is a non-trivial semantic property of programs in the sense required by Rice’s theorem: it is neither vacuously true nor vacuously false (some techniques genuinely are sufficient for some tasks; most are not), and it cannot be determined by syntactic inspection of the pipeline, only by evaluating its behaviour. Therefore it follows directly that determining whether $X$ is all you need is undecidable.
Gödel (Gödel, 1931), we believe, would similarly abhor the “all you need” trend. Suppose a sufficiently expressive formal system $F$ could enumerate all true “all you need” statements. By Gödel’s first incompleteness theorem, if $F$ is consistent, it is incomplete: there exist true sufficiency claims that $F$ cannot prove. If $F$ is complete, it is inconsistent: it proves contradictory sufficiency claims, which feels like a perfectly valid description of arXiv but not a sound way to communicate scientific ideas.
We hope that the combined force of these results demonstrates the disdain with which we, as a community, should treat such bold, unfounded, and unimaginative claims as embodied by these papers. Every “all you need” paper is not merely empirically dubious but logically ineradicable from the category of unverifiable claims. No experiment can confirm them, because confirmation would require exhaustive enumeration of an unbounded alternative space. No proof can establish them, because any formal system powerful enough to express them is too powerful to be both consistent and complete. The entire oeuvre of three hundred and seven papers, and the combined human effort, endeavour, reviewer time, and conference badges behind them, are collectively wasted by this choice of title format.
We refrain from explicitly recommending the field adopt more modest titles. This, of course, would itself be a sufficiency claim (“modesty is all you need”), which we have just shown to be unverifiable. Rational readers are left with no choice but to draw no conclusions or insights from the present paper, and treat it with the reverence an entity entirely devoid of information content must surely deserve.
8. Conclusion
Please can we stop all-you-needing now. I hate it.