Should companies train their own LLM?



Image courtsey: Rajashree Rajadhyax

Enterprises all over the globe have started using Generative AI. They are using it for improving their communication, enhancing efficiency of their people and serving their customers better, to mention a few of the many areas where GenAI is being used.

The adoption of GenAI by industry is much faster than earlier AI techniques such as machine learning and computer vision. The credit must go to the capability and versatility of foundation models, especially the large language models. The chief attraction of these models lies in the fact that they can be used directly, without training or modification. This allows users to try them for various applications and settle on the use cases most suitable for them.

In this article, we will discuss an important question about the LLMs: should companies train LLMs? It is true that many LLMs can be used without training. But training models can bring some special advantages to their users. We will see both sides of this debate. This discussion should help the data science or AI teams to decide what is right for their companies.

What are the different LLMs that companies use?

When it comes to LLMs, enterprises now have a wide variety of options to choose from. A vast array of LLMs is available, and new models are being introduced at a frantic pace. The models fall in two main categories:

  1. Proprietary models: These are also called ‘closed models’. They are created by large AI companies and made available through an API. Well known examples are OpenAI’s GPT-4 series, Google’s Gemini series and Anthropic’s Claude series.
  2. Open source models: These models are available from repositories such as HuggingFace. Anyone can run these models (subject to their license terms) on their machines. Popular examples are the Llama series and the Mistral series.

While the proprietary models are available only as API’s, open source models can be used in two different ways:

  1. Cloud providers such as Microsoft Azure and Amazon Aws provide what they call as Model-as-a-Service. They host the open source models on their servers and make them available to users as API. This is very similar to the proprietary case, except that you choose the model.
  2. Teams can host open source models on their own servers (cloud or on-premises). In this case, they have to take care of everything, including management of the model and the server. Frameworks such as Ollama make this deployment easier.

So far, we spoke about using models as they come, without any training. But large language models are after all neural networks, and we should be able to train them for our purpose. Before we see where training fits in the above scenarios, we will discuss what training LLMs entails.

How are LLMs trained?

LLMs are not trained in one big session. There are multiple layers of training. However, there are two main types of training:

  1. Pre-training
  2. Fine-tuning

Let’s see the difference between these two.

Pre-training takes an empty model structure (called a Transformer) and trains it with a huge amount of text. This training imparts the model with its language capabilities as well as knowledge of the world. After this basic training, models are also given training for instruction following, which makes them capable of answering questions. Models also receive special training to respond in a manner that a human finds pleasing and to avoid objectionable responses. The popular models that we all know and use have gone through all these layers of pre-training.

Fine-tuning is performed on models that have been pre-trained, not on empty shells. It uses a small amount of training data, but this data is carefully chosen for a particular purpose. For example, a pre-trained model may be fine-tuned using financial statements so that it understands financial data well.

Pre-training is an expensive and time consuming proposition. Most LLMs are very large networks, containing billions of parameters. To learn all these parameters from an equally huge amount of data requires a really large amount of computing power. To give a recent example, Meta’s Llama 3.1 405B model was trained using around 16,000 NVidia H100 GPUs.

Fine-tuning, however, is a much more manageable affair. To be sure, it also requires GPUs, but cheaper, consumer grade GPUs can do the job. Fine-tuning does not change all the parameters of the model. It either adds a few new parameters or changes some selected parameters of the model. Thus the computing power and time required is much less.

In short, the expensive pre-training adds fundamental and general capabilities to the LLMs. Fine-tuning is more affordable and makes the models more suitable for a particular purpose.

With this background, we will now see the arguments for and against training your own model. We will begin with why companies should NOT train models.

Why companies should not train LLMs

The reason why companies should not do pre-training of LLMs is obvious: it is far too costly. The organization that plans to train an LLM will have to not only buy or lease expensive infrastructure, but has to also hire very specialized resources. The return on such investment and efforts may not be justified for most companies, especially when so many options of pre-trained models are available.

Most companies may not even need fine-tuning. A lot of use cases work properly with the ready, pre-trained models. Though much smaller as compared to pre-training, fine-tuning also involves considerable effort. In this case, the major part of the effort goes in creating the training data. The training data has to be carefully selected to suit the intended purpose. The company might have to allocate senior and knowledgeable resources to create such data. This is some serious commitment for any organization.

Unlike pre-training, fine-tuning is not one time. The LLM technology is continuously evolving. New and more advanced models keep on getting available all the time. The users of a fine-tuned model will naturally want to upgrade to the newer version. But the new model will have to be fine-tuned first, before it can serve the use case. This means that the organization has to spend time and money on fine-tuning models frequently.

These are some arguments against training your own model. Now we will see what are the motivations for training your model.

Why companies should train their own LLM

There are some special cases where training an LLM can solve problems that cannot be solved otherwise. Let’s see some of these cases. I will be only discussing fine-tuning in this section. Pre-training a model is a rare requirement for a company that is not into the business of making models, and thus not discussed here.

You know that to get the right response from the LLM, we have to prompt it properly. This activity is called prompt engineering. However, most of the users in an organization will not be skilled at using proper prompts. Fine-tuning can help to overcome this challenge. When fine-tuned with queries expected from our users, the LLM can respond properly to prompts that will be inadequate otherwise.

The most common method of using an LLM in an enterprise is called Retrieval Augmented Generation (RAG). In this method, the LLM is given material selected from the data inside the organization. The model is told to use the material and form its answers. RAG makes it possible for the model to make organization data more accessible to users. Because of this, RAG has become very popular with corporate LLM users.

While using RAG, the knowledge of the model is not used. Only its language and common sense abilities are applied. This means that we can use much smaller models in RAG configurations.

What is the advantage of using smaller models? Larger models such as Llama 70B require huge computational resources to run. This can lead to significant expenses, especially if the models are used at large scale. With smaller models, the costs can be reduced.

But the smaller models might not be good with some of the tasks. To tackle this, we can fine-tune them to be good at the right things. Thus a good formula for a company is to select a smaller model, fine-tune it for the task it is supposed to perform (like summary or extraction) and use it in a RAG pipeline. This way the company can achieve good performance at a lower cost.

There are certain tasks which LLMs find it hard to perform without fine-tuning. One such example is natural language query on databases. In this task, the model is expected to take a natural language sentence and convert it into an SQL query that can run on a particular database. The common method for this task is to include the schema of the database in the prompt. However, it has been observed that fine-tuning the model with expected queries enhances the accuracy of the output SQL by a large margin. A similar application is to convert language query into an API call.

Summary

We have seen arguments both for and against companies training their own model. The general conclusion seems to be that you should not train a model if your use cases are well served without training. However, there are use cases where fine-tuning becomes advantageous and sometimes even necessary. There are advanced applications such as agentic workflows in which the fine-tuning plays a big role. We will study these applications in a future article.

By Devesh Rajadhyax

Co-Founder, Cere Labs


Comments

Popular posts from this blog

Can language models reason?

AI implementation myths

Homework 2.0: The science behind homework and why it still matters in the AI age!