Gen AI Glossary
Large language models (LLMs) are becoming a big part of our everyday lives, but the jargon surrounding them can be confusing. To make things easier, I've created a glossary of the 10 most common terms you'll encounter in the world of LLMs. Starting with the basics and moving to more advanced concepts, this guide will help you understand these terms better. Whether you're a tech enthusiast or someone looking to integrate the latest technologies into your daily work, I hope this glossary proves useful.
Model
There are computer programs that can find patterns in large data sets. They can take in information, analyze it, and then make decisions based on what they learn. Their learning is encapsulated into a set of numbers. It's a representation of what the algorithms have learned. This representation in the form of numbers is called a model. During training this set of numbers also called as the parameters are adjusted so that the model gets better at the task it has been trained on.
Generative model
A model is designed to learn patterns and predict future outcomes based on these learned patterns. For example, it might predict the next word given a starting word or generate data representing an image or video based on a given description. Models capable of generating text, images, and videos are known as generative models.
Pre-trained model
A model that has been trained on a large general dataset and can be used directly or fine-tuned for specific tasks with less training data. These models have undergone initial training and can be adapted for more specific applications. Pre-trained models save time and resources, as they leverage the knowledge gained from the initial training.
Transformer
The algorithms behind generative models have a certain architecture. Transformer is the fundamental component in most of the generative model architectures. Transformers are designed to handle data that comes in a sequence, like sentences or time series. They can understand the relationship between words and produce text, making them very powerful for tasks like language translation, text generation, and even creating images from descriptions.
GPT
GPT stands for Generative Pretrained Transformer. This is a class of models that have been trained to predict the next word or generate images and videos based on its learning from a very large general dataset. The fundamental design component of this class of models is a transformer hence the name GPT.
Tokens
When training language models, words and phrases are encoded in a form that machines can interpret. This encoding is in the form of a long string of numbers, also known as vectors. These vectors represent the meaning of the words or phrases in a high-dimensional space. In this space, vectors for words or phrases with similar meanings are closer together, while those with different meanings are farther apart. Before creating these vectors, the words or phrases are broken down into smaller units called tokens. Tokens can be as small as individual characters, subwords, or whole words, depending on the specific tokenization method used. This process of breaking down text into tokens helps the model to better handle and understand various linguistic structures and nuances.
Hallucination
Large language models are trained on vast datasets that encompass a broad spectrum of knowledge about the world. While their responses to questions can seem almost magical, fundamentally, they operate as next-word predictors. This means that the answers they generate are based on predicting the most likely next word or sequence of words in a given context. Although it may appear that the models understand the questions they are asked, in reality, the question serves merely as a seed for the prediction process. This prediction is primarily a statistical phenomenon.
Despite efforts to maintain a tight context and ensure relevant answers, sometimes the generated responses can deviate from the intended context or may not be entirely factual. This phenomenon, where the answer might not be fully relevant to the question or may contain inaccuracies, is known as hallucination. Various guardrails and measures are implemented to minimize this issue, but models are still susceptible to hallucinations due to the inherent limitations of the prediction process.
Attention
The core architecture in many large language models is a transformer. Transformers work on the principle of attention. Attention helps the model focus on the most important parts of the input text when predicting the next word.
When you ask a question, your question is the starting point (or seed) for the model to begin predicting an answer. The attention mechanism looks at the entire question and figures out which parts are most important and how they connect. This helps the model create a context, which is a way to remember what it's already predicted and what comes next. By focusing on the most important parts and their connections, the model can make better guesses about what word should come next, resulting in a meaningful and accurate answer.
Prompt
For a large language model to produce its output, it needs a starting point, known as a prompt. This prompt gives the model a basis to start its work on a given task. For example, if you are using a large language model like ChatGPT, your question is the prompt. The output of the model is highly dependent on this prompt, as it guides the model on what type of response to generate.
Different language models are trained for various tasks. Some models allow you to chat with them, others follow instructions, and some can generate images, videos, or even code. Your input to the model, whether it’s a question, a description to generate an image, video, or code, or a command to perform a specific action, is called a prompt.
A well-crafted prompt can significantly influence the quality and relevance of the model's output. Clear and specific prompts help the model decide what you are asking for, leading to better responses. On the other hand, vague or ambiguous prompts can result in less accurate or useful outputs.
Prompts can also be used creatively to explore different capabilities of language models. For instance, you can use prompts to generate stories, write essays, solve math problems, or even compose music. The versatility of prompts makes them a powerful tool in interacting with large language models.
Fine tuning
Fine-tuning is a process of specializing a large language model (LLM) for a specific task. LLMs are initially trained on large amounts of general data sets like Wikipedia articles, books etc, enabling them to perform general language tasks such as question answering, translation, summarisation. However, to excel in specific domains such as law, medicine, or finance, these models require further training.
Fine-tuning involves taking the pre-trained LLM and training it further on a smaller dataset highly relevant to the target domain. This additional training refines the model's parameters, allowing it to better understand and generate text within the specified area. The benefits of fine-tuning include improved accuracy, efficiency, and faster processing times compared to training a model from scratch.
By tailoring LLMs to specific applications, you can unlock their potential for a wide range of personal tasks, such as language learning, writing assistance, or research support.
While I've limited myself to just these 10 terms for this article, I plan to delve into more complex concepts in future pieces. When I explore a new topic, having a glossary of key terms is incredibly helpful. It simplifies the subject and enhances my understanding. I hope you found this glossary useful as well.
Comments
Post a Comment