The Cult of Intelligence (& the Ultimate Levity), pt. 2

Today’s regnant paradigm (one hesitates to call it a method) of bigger-is-better, of statistics-as-intelligence, or even of “it-from-bit”, surely requires vast technical proficiency to summon up its signs and wonders. To develop an LLM (Large Language Model) takes a team of high-level specialists (or intelligence-cultists) to collect, collate, and curate the gigabytes, terabytes and even petabytes of training data, to set up the necessary parallel computing resources to handle the training, to specify the “transformer” architecture to be trained, to pick out the hyperparameters to be used within that architecture, and so on.

Yet even this proficiency is itself remarkable more in its mindless character than its actual intelligence. The objective is to create a kind of pipeline–a one-size-fits-all mechanism, designed to ingest certain types of data in vast quantities, do some series of operations, and chuck out simulated examples in great quantity. Equally mindlessly, no one can cogently explain why this needs to be pursued in the first place: it is simply judged inevitable.

One of the basic points made by the veteran linguist Noam Chomsky and other critics of the current situation is that, being in essence statistical inference pipelines, LLMs are so general that they can be trained to very high accuracy on literally anything, whether it is truly linguistic or not. In Chomksy’s estimation, “…deep learning is useful, but it doesn’t tell you anything about human language.”

The LLM, in other words, differs profoundly from actual brains in that it makes no intrinsic distinction between learning language and learning gibberish, and thus adds nothing to understanding the difference between sense and nonsense. The pipeline is agnostic about content, whether that of its input or its output; instead, content is reconceived purely in terms of volume and style.

It has, for instance, been known for some time that neural networks can be trained on datasets of purely random labels–effectively gibberish–and still reach very high training accuracies. Furthermore, if trained on data that is corrupted or markedly different from that in their original training set, even the most well-trained artificial neural network models will rapidly “melt” into something else; they do not, for example, have any robustness against false, dangerous or corrupted data and cannot exercise “common sense” or “skepticism” to prevent such data from fundamentally rewiring them–as brains can.

Indeed, it may best to think of the modern artifical neural network as less a form of intelligence at all, than as the equivalent of hyperdimensional silly-putty: during the “training” of such a network, it is “squeezed” against the dataset, filling the textures and features in its surface and smoothly curving (interpolating) between these. On this analogy, then, it is no surprise that the neural network has no innate preference for any kind of structure, linguistic or otherwise, any more than it is a surprise that a ball of silly putty has no “preference” for being pressed against a quarter, a figurine, or a page of newsprint–or all three in succession. The artificial neural network is therefore very close to the ultimate tabula rasa–history-less and without intrinsic qualities, hence something which Chomsky, Gary Marcus and many others argue inherently contrasts with the highly structured, substantially genetically-encoded, and above all highly improbable characteristics of actual brains and actual languages.

Chomsky has been on the record in recent weeks with unsparing criticisms of ChatGPT, LLMs, and the whole current approach to “AI”. He has warned that LLMs such as ChatGPT represent only “high-tech plagiarism” and “a way of avoiding learning”–thus implicating current AI as a kind of anti-intelligence, something that not only lacks understanding, but may have the effect of fundamentally disrupting or distorting intellectual development in humans. But in a recent op-ed in the New York Times, Chomsky, Ian Roberts, and Jeffrey Watumull give possibly their most urgent and complete summary thus far of their concerns over these dangers of the intelligence-cult. This includes the inability of LLMs to distinguish between linguistic and non-linguistic training data, their lack of discernible structure or guiding principles.

But in this op-ed, Chomsky et al. also introduced a much less discussed perspective. They concede that LLMs, when trained, become “increasingly proficient at generating statistically probable outputs”. But this turns out to be just the problem. Chomsky et al. quote Karl Popper, the eminent philosopher of science:

…we do not seek highly probable theories but explanations; that is to say, powerful and highly improbable theories”.

From this, they conclude the following:

True intelligence is demonstrated in the ability to think and express improbable but insightful things“.

This last statement could be taken as the exact maxim of the variance in its struggle against the mean–a struggle that leads not only away from the mean, but from statistics altogether.

As we consider this maxim, we begin to realize how much the relationship between probability and intelligence lies at the heart of the question of what intelligence is. As we reflect still further, the whole conception that intelligence can spontaneously arise (or in the current argot from nonlinear dynamics, “emerge”) from a purely agnostic, “silly-putty-like” statistical approach, suddenly appears uniquely vulnerable.

On the one hand, the intelligence-cultists’ dedication to a statistical view of intelligence inevitably leads them to the notion that intelligence may be found by optimizing probability: this is encapsulated in the gradient-descent method by which all neural networks are trained, which amounts to a search for a (locally) maximally-probable model. This is very clearly like taking the mean’s side of the battle: understanding is nothing more than the search for central tendencies, which in turn are found by (howsoever cleverly) sifting through aggregates of “data”. The larger the aggregates, the better the estimate of the mean. Intelligence, strangely enough, is thus understood as being based on mediocrity.

On the other hand, those who instead insist on the the inherent deficiencies of these models, such as Chomsky et al., Marcus, Filip Piekniewski, and many others, are fighting on behalf of the variance: they deal not in large numbers, but in counterexamples, adversarial cases, and “fat tails” , designed to show how models like the LLMs, once nudged out of the “mediocristan” defined by their training data, make wildly absurd errors no half-sane (or even half-intelligent) human ever would. These counterexamples are unique, flukey, even heroic in their defiance of the mean’s relentless gravity.

For these critics, and with reference to Chomsky et al’s maxim, gradient-descent alone can never yield intelligent systems, precisely because it looks for what is most probable and therefore completely misses the statistically negligible yet innumerable “heroic” cases that would nullify the mindset of central tendency and pipelines. Moreover, because the greatest advances in scientific understanding have always depended not on huge aggregates of raw data sucked through one-size-fits-all pipelines, but on the exquisitely purposeful design of very unnatural, totally improbable combinations of observations and phenomena–that is, on experiments–it follows that the widespread adoption of purely statistical methods will mean no less than the end of scientific understanding.

Given these considerations, any actual intelligence on the part of the deep learning engineers–to say nothing of the corporations blindly rolling the LLMs out with almost no transparency as to how they are built and trained–effectively averages to zero. For it must be manifest not in the final result of these engineers’ labors–the trained LLM pipeline–nor in the outputs of that pipeline. It appears, instead, in these engineers’ handling of the improbable, the intrusions of the variance, the various technical snags handled en route to the final product.

But any such intrusions of actual, variance-delving intelligence are bracketed out of the final product. Rhe goal, clearly, is to eliminate any evidence of them (ultimately, perhaps, by having LLMs replace programming jobs altogether, at which point the entire process will be, paradoxically, truly intelligence-free). The only concern seems to be that pipeline must run smoothly enough to generate plausible illusions on demand, in great quantity, and seemingly without any outside tweaking or know-how.

Again, what is singular here is not the capability of the model, nor the indefatiguable troubleshooting and exception-handling brilliance of the engineers, but the way the whole process, from end to end, admits effectively no insights. The snowballing complexity of the neural nets masks a massive intellectual simplification in the other direction, since the only “experiment” we can talk about is “what happens if we build indefinitely larger transformers, etc.?

The resulting model itself is unfathomable, an “emergent” artifact of self-organized nonlinear complexity, a “black box” as black as any ever devised (or in the phraseology of Yudkowsky, “inscrutable matrices of floating-point numbers”). Even leaving aside the oddly circular justification of its “inevitability”, the present pursuit of AI, even as it presumes to bottle the secret of intelligence and even to knock on the door of sentience, is, paradoxically, a profoundly unconscious activity.

Famed complexity researcher and physicist Stephen Wolfram summarizes this situation judiciously:

“[Designing and training neural networks] is basically an art. Sometimes—especially in retrospect—one can see at least a glimmer of a “scientific explanation” for something that’s being done. But mostly things have been discovered by trial and error, adding ideas and tricks that have progressively built a significant lore about how to work with neural nets.”

This accumulation of such “lore”, without insight into any inner workings, may in itself be a kind of intelligent activity, much as the building of the LLM pipeline itself requires some real intelligence. It may even be an “art”. But it is very far from anything we could consider science. Instead, the LLMs, as well as the myriads of other hypercomplex statistical models that fall under the term “artificial neural networks”, continue to exemplify what the late biologist Sydney Brenner described as “low input, high throughput, no output science”.

Many retort that these are just the insecurities of old fuddy-duddys whose time in the sun has long passed. Chomsky is 96 and long retired, and by his own admission his conception of language as an intricately structured whole–with innate, computationally formulable features stemming from precise cognitive and physiological considerations–has fallen out of favor in much of the linguistics community, with more and more of that community instead embracing the essentially behaviorist/Skinnerian paradigm of the mind-as-black-box.

Under this paradigm, results supersede theory, and the tabula rasa is practically a given. The researcher’s aim should be not understanding inner states or innate endowments so much as “behavioral engineering“: controlling and conditioning external stimuli to achieve desired behaviors (under which Skinner included language as “verbal behavior”). The similiarities between the process of imprinting or “nudging” behaviors via statistically-determined operants, and that of training neural nets by “pressing the silly putty against the training data” are obvious, particularly in their indifference to mechanism, explanation, or mind. Behaviorism, in fact, is deliberately mindless.

Chomsky’s intellectual disdain for Skinner’s conception of language-as-behavior, and the intellectual animosity between between their respective camps, is rather legendary. Further, given the strong and widely acknowledged similarities between behaviorism and the current AI paradigm, we may not be surprised at Chomsky et al.’s near-revulsion at recent developments. Yet given the range of howling errors produced by LLMs–these essentially behaviorist magnum-opuses of the cult of intelligence–Chomsky’s remarks should still give us pause.

On the other side, there is a curious lack of principled reply. This is not necessarily surprising, since the paradigm of current AI basically contains no principles as such (nor has any great interest in finding such). It instead returns always to the emergentist hope that, if artificiaal neural networks of sufficiently high complexity are built and trained with enough of the “right” data, there will simply “self-assemble” or “emerge” an intelligence comparable in scope, if not exact character, to that of the brain. The mean, if made large enough, will simply create its own exceptions and variances, in just the right structure.

These behaviorist-emergentists sometimes suggest that cognition is probably too complicated to be understood theoretically anyhow–which, given the well-entrenched diminishing returns on research in all sorts of complex fields, may have some truth. If the human mind is “cognitively closed” to understanding things as complex as how intelligence comes about, then indeed the only possible way to create AI is to hope that it will simply build itself. (Yet note that Chomsky himself is not averse to the idea of cognitive closure, or “mysterianism”, yet maintains his supported for the principled, rather than the black-box statistical approach to scientific understanding.)

Note that with this admission, we have moved boldly over the line from scientific work to magical work. The pipeline-builders, at once clever and clueless, sophisticated and yet completely simplified, now robe themselves as wizards, chanting invocations whose workings they know not. Yet the dimensions of this shift are not appreciated, least of all by the behaviorists, who generally seem too swept away by their own spells to concern themselves much with the potential risks should the waited-for “emergence” happen; astonishingly, their concerns, when voiced at all, mostly are confined to the worry that the AI might simply be rude, or could produce politically incorrect conclusions or language.

What else would we expect from intelligence-cultists, the purveyors of statistical, mindless, intelligence so-called?

Leave a comment