Much ink has been spilled on the question of what Large Language Models (LLMs) like ChatGPT can and can’t do. (In fact, we've written a little ourselves about using ChatGPT for psalm recommendations.) Unfortunately, the ink has a tendency to declare that LLMs are either capable of thought (or at least make good “thought partners", whatever that means) or are “just statistics” as if this explains them. The first approach supposes that intellect can be an emergent property of math, which is difficult to reconcile with the notion of a rational soul or the experience of most people who’ve tried to use ChatGPT, and the second approach is reductive to the point of failing to communicate anything. After all, speech-to-text systems, search engines, and the various tools for extracting text from images are all also “just statistics” as well. Thus, being dissatisfied with the present ink puddle, I shall pour a bit more in by proposing that there are exactly two human-scale tasks that (unaugmented) LLMs are good at: being thesauri1 and being templates.
LLMs Make Good Thesauri
The general purpose of a thesaurus is that it attempts to answer the question, “Could I use a different word here?” Constrained by the demands of both covering the vocabulary of the entire language and fitting into a physical book, pre-internet thesauri were more or less restricted to providing a list of words sans context, making them the bane of many an English teacher’s existence, with students low on time completely neglecting the task of acquiring that context from other sources. Post-internet thesauri, meanwhile, are basically the same data, except now the words are at least hyperlinked to their definitions.
LLMs, on the other hand, know nothing of the meaning of words. To an LLM, a word is just a sequence of numbers representing which other words it tends to occur near. Using these representations, LLMs can improve on thesauri by effectively representing equivalences between whole phrases, a task which, up to this point, has been both prohibitively tedious and not obviously beneficial for thesaurus makers. In addition, LLMs can go one step further and rewrite the whole sentence to better fit the changed word when necessary.
LLMs can also incorporate other information that doesn’t usually fit in a thesaurus, like how formal or academic a particular phrasing sounds; or whether one version of the sentence is more associated with a particular emotion than the other. This is the basis for two other things that LLMs have occasionally been used for: summarization and searching. A lot of the work of summarization can be achieved by having the model replace specialized jargon with more informal terms. Searching is a little more tricky, since you need to augment the LLM with a normal search engine to actually get useful results; but once you’ve done that, the resulting system can often rephrase queries into ones that will find what the user is looking for even when they don’t know the appropriate term.
LLMs Make Good Templates
The second thing I think LLMs can do is generate templates or examples. As context-sensitive autocompletes trained on most of the internet, they have encountered pretty much any document structure - say, a cover letter - that a user might wish to write, and can mash various such examples together to achieve a sample document that is a few steps closer to what the user is ultimately trying to write than what they would get if they simply looked up a static template.
I can, for example, look up templates for professional emails, but they’re likely to be extremely generic. An LLM, meanwhile, can construct an example of a professional email which asks a question relatively close to the one that I want to ask, and this process makes it a lot easier to visualize what I actually want to write. The LLM approach can be applied to programming as well, in which case the LLM’s function is to write example code nearer to your current use case than would be reasonable to find in the documentation – or, sometimes, example code that should be in the documentation, but currently isn’t.
LLMs Make Bad Everything Else
In one sense, the claim that thesauri and templates are the only uses of LLMs follows more or less directly from their structure as glorified autocomplete engines.2 They can generate output in the shape of the documents they were trained on, and the only way to really control the output is to have them generate multiple variants and then throw away bad ones. Thus, rephrasing and templates are the only uses of an LLM; you have to restructure a problem to look like one of the two in order to feed it into the LLM in the first place.
On the other hand, there is a broader statement to be made. Rephrasing a sentence, or taking a few bullet points and fitting them into a rough template are distinct from other potential uses of LLMs such as answering questions, in that they are ways of interacting with text that don’t require external knowledge beyond the statistical relationships between words. If the task at hand is answering questions, then an input such as “How far is the Moon from the Earth in duck beaks?” requires me to know something about both ducks and the orbit of the Moon. But if the task is simply to generate a template of some sort, then all that is needed is for the response to have the right structure – that is, it need only be a number.
Another way to say this is that thesauri and templates are the two uses where “hallucinations” (saying things that aren’t true) don’t invalidate the output. For most of the things we might want computers to help us with, the outputs of a system are judged by how they correspond to the real world, and in those domains the current generation of LLMs will always fail beyond a certain point.3 Yet in a small handful of situations, it is perfectly reasonable to manipulate text as text, without immediate reference to external reality, and in those instances an LLM can indeed be a useful tool.
This post grew out of a discussion on our Discord server, which you can join if you are so inclined.
Mark objected to “thesauri” as the plural of “thesaurus”, but part of being a linguist is understanding that the speakers of a language have a natural right to make up words.
One might here wonder why I haven’t included autocomplete as a use of LLMs. The answer is quite simple: to be worth anything, an autocomplete system has to be faster than typing, and the structure of current LLMs makes that basically impossible on current hardware (or possibly any hardware).
On the other hand, there are still substantial areas where they don’t fail, and this is a testament to just how much of our perception of the world is encoded in our words.