Language models — AI’s way of talking to us

 



Image by Rajashree Rajadhyax

Language models are a buzzword these days. The current state-of-the-art in language modelling is the transformer based GPT-3, which was trained on a staggering amount of text data. Here’s a quick look at what exactly is a language model.

Researchers believe that language began somewhere between 50,000 and 1,00,000 years ago. Language evolved from the human need to communicate with each other. The ability to communicate using language has given the human species a better chance at survival. Language is an incredibly important tool to pass on knowledge and to communicate thought. To acquire knowledge we read or simply listen to others and all of this is possible because of language.

As my friend puts it, the human species has always remained a step ahead because of their ability to augment their capabilities. I’m adding a link to his article here. AI is one more such attempt; the attempt to augment intelligence.

Recent developments in AI have introduced autonomous and human-like robots in numerous aspects of everyday life. The attempt has been to develop robots that can communicate with humans using the language that humans speak. This is no easy task! Natural language processing (NLP) and natural language understanding (NLU) are disciplines of AI that focus on helping computers understand human interaction. NLP is concerned with processing language and facilitating “natural” back-and-forth communication between computers and humans. NLU, on the other hand, is focused on a machine’s ability to understand human language.

So how does a computer learn a language, in fact how do we humans do it? Human language is essentially a system of symbols and their phonetics that are used for communication. All humans are experts at statistical learning. We don’t realize it, but we do this everytime we learn something new. As babies, for example, we are presented with overwhelming amounts of linguistic information (the symbols, their arrangement and their sounds) every day when people talk to us. Without realizing, we make exceptionally accurate generalizations about the patterns we deduce. Slowly we build a repertoire and before we realize we have a good command over the language. Do computers work on a similar principle? The answer is both a yes and a no. Computers cannot understand, but they can definitely learn.

There is however a difference between a machine learning a language and humans learning it. When we learn a language, our learning is not restricted to just the words in that language. We learn by building associations. Not only are we learning the language but we are learning the concepts too. The learning involves a beautiful weaving together of information gathered from our other senses such as the visuals, the phonetics and the others. Using this unified information, we create a complete picture of the world around us. Unfortunately, the machines do not have this ability and advantage. It can only use text to learn.

Whether a language or any other prediction, computers work by looking at large amounts of data and detecting patterns in that data. Once the pattern is learnt, the learning is compiled into a form that can be reused for similar tasks. This is called the model.

To learn this pattern, the words or phrases in the language will have to be first brought into a form that the machine can interpret. Machines work best with numbers and hence the words and phrases also have to be represented as a sequence of numbers aka vectors. The vector representation for the phrases should be such that any two vectors should come close, if the words or phrases that these vectors represent are similar. These representations are called word embeddings. Word embeddings are just representations i.e. the machine’s way of looking at the words and phrases. The representation is based on the meaning of that word or phrase in the input dataset. This representation can be general purpose or specific for a task. If it is general purpose then it can be used for different types of NLP tasks such as question answering, summarisation and so on. One such popular general purpose language model is word2vec. Another one is the glove which was trained on the entire Wikipedia text. Specific word embeddings can be used only for the task that they have been trained for.

This approach of treating language words like vectors with mathematical properties helps in preserving the meaning of words and their similarity to one another. The easiest way to think about words and how they can be added and subtracted like vectors is with an example. The most famous example is the following: king — man + woman = queen. In other words, adding the vectors associated with the words king and woman while subtracting man is equal to the vector associated with queen. This describes a gender relationship. Interestingly, vector relationships that appear in English generally also work in Spanish, German, and indeed all languages. For the machine to learn the semantic and syntactic relationships between words in a language well, it is “fed ‘’ large text datasets that have been previously labeled with parts-of-speech information and other derived natural language features like grammar relations.

Once these word embeddings are ready, the machine has now understood the representation for a language. The next step is to now use this representation for the task at hand. For example if the task is sentiment analysis then the machine has to further learn classification of phrases as positive and negative sentiments. A simpler analogy could be useful here. Let’s say that a child has learnt to speak a language and is now being taught to write an essay.

Training a language model is about making the machine learn this representation followed by the task specific learning. These word embeddings along with the learning for the task at hand put together is called the language model and is at the heart of NLP. It is important to note that the representations encapsulate the meaning of the word in them by looking at the context in which it appears or by its reference and co-location with other words in the training dataset. So a general purpose pre-trained model will work only if the subsequent NLP task is on a similar dataset as that on which the representations were learned.

I hope you have got a glimpse of what language models are. A lot of efforts go into training a machine learning model. Whether a language model or a model tasked with identifying faces or objects, the training process is quite laborious but rewarding too! It is very interesting to know how training works and what goes behind the scenes. I’ll be writing about it in another article.

By Rajashree Rajadhyax

Co-Founder, Cere Labs



Comments

Popular posts from this blog

Can language models reason?

AI implementation myths

Homework 2.0: The science behind homework and why it still matters in the AI age!