AI chatbots like GPT-3 and its new descendant ChatGPT, show some impressive abilities. (See When AI was old-fashioned.)
My writer friends always ask me: what makes these AI programs so darned quick? How come they don't get writer's block? As soon as you give them a prompt, they respond with a stream of grammatical sentences. In fact, you often need to instruct them to keep their answer short, otherwise they produce reams of words.
Speaking coaches and clubs like Toastmasters will tell you that whether you call it "extempore" or "impromptu", good speakers are always well prepared. What seems like a last-minute speech has been researched, composed, and rehearsed several times.
Of course, you've got to be quick on your feet, able to take your prepared topics and connect them to your given subject and your audience spontaneously, but what makes it look effortless is solid preparation.
The AI chatbots are no different.
Their authors, like presidential speechwriters, have taken pains to do the research and prepare the model. When you give the chatbot a prompt, it uses the model to generate its response.
What is the nature of these language models?
Human languages like English have built-in patterns that you naturally learn as you read and listen. A lot of our ability to write text comes from our knowledge of these patterns.
Look at this fill-in-the-blanks puzzle:
Cleanliness is next to —
It's an eighteenth-century quote from Church of England revivalist John Wesley, but you don't need to be a Methodist to complete the sentence with "godliness." Even if you've only seen it once, you probably have no trouble remembering the sentence; it's that memorable.
Contrast this with long passages full of run-of-the-mill cliches that fill too many official or business reports (or Church sermons). Those are not memorable. Even if the individual cliches and tired turns of phrase are easy to regurgitate, prose infested with them gets repetitive at the level of sentences and paragraphs. A hack writer can drone on and on for entire chapters without conveying much information. His readers can get the gist of the plot even if they sleep through most of the page.
During World War II, American mathematician Claude Shannon designed secret codes for the government. After the war, he quantified the above observation about efficient communication in a couple of papers.
Shannon was interested in calculating communication capacity and efficiency in terms of how much information can be conveyed in a given number of symbols. He showed how the characteristics of the language in which the messages are written, limit this capacity. Some languages are more concise than others, both in terms of having shorter words and in requiring fewer words to convey a message.
Claude Shannon's results have to do with statistical probabilities, and they are foundational for most of the work on natural language generation since. Yes, we can use the term "seminal" for his papers.
It turns out that in a mathematical sense, to communicate information succintly is to increase the degree of surprise for your listener. In English, some words are much more frequent than others, and combinations of words also frequently appear together, like "kith and kin." Mathematically speaking, it's a fairly redundant language. Very frequently appearing words like the article "the" could often be left out entirely with no ill effect.
Now, there's more to life than efficiency. Redundancy can save you if messages get garbled and you lose a portion.
Take our fill-in-the-blank sentence above. Rarely does the word "godliness" appear in English prose. It's even more unusual to spot it paired with "cleanliness" in a sentence, with the phrase "is next to" connecting them.
You could effectively convey that sentence even if you left out the beginning or end of it; your listener could fill in the missing word.
In learning English by reading, one of the patterns we learn is how frequently particular word combinations occur. Similarly, we learn patterns in the frequencies of letters, syllables, words, and how they appear in entire sentences and paragraphs.
Shannon needed to analyze the language of the messages mathematically. Given the statistical characteristics of English, he could build a mathematical model that could generate messages that were similar to English text. The more features of English that he accounted for in the model, the more realistic its output would be.
Here's an example that Shannon used.
THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED
The above message contains pieces that look as if they could be parts of sentences, although it doesn't make sense overall.
Shannon generated these words using a mechanical procedure. First, he
wrote down a random word,
THE. Then, he opened a book at a random
page and searched for
THE. When he found it, he wrote down the word
immediately after it in the book, which was
HEAD. Then he started at
another random page looking for
HEAD. When he found that, he wrote
down the following word in the book, which was
AND, and so on.
This mechanical procedure produces pairs of words, called digrams, that follow each other with a probability similar to that found in English text.
Once researchers in the field of natural language processing got hold of computers, they analyzed many books and prepared tables, not only of digrams, but also three-word sequences called "trigrams", and so on. These tables made it quick to generate text that was even more faithful to English.
Today, digrams and trigrams are child's play. To build ChatGPT and similar language models, powerful computers consume vast amounts of training text and glean from it the frequencies of various terms and relationships.
Using not just statistical methods but also human-produced examples, these artificial neural network models can be trained to guess which parts of the sentence are verbs or nouns. They can guess that two words are synonymous to the degree that they are used interchangeably, and whether one statement entails another (that is, connect "it was raining" with "he took an umbrella").
Researchers spend a lot of time, effort, and computing power to build a model containing many billions of numerical parameters that capture these sorts of relationships. In a way, the model is a compressed form of all the text that the model has seen during training.
Once the model has been trained, then all that remains is for the chatbot to take a prompt and generate plausible-looking text that's relevant to the prompt. No writer's block.