From theory to practice: Implementing domain-specific LLMs efficiently and effectively.

Large language models (LLMs) are hard to beat when it comes to instantly parsing reams of publicly available data to generate responses to general knowledge queries. Things get quite a bit more complicated, however, when those models – which were designed and trained based on information that is broadly accessible via the internet – are applied to complex, industry-specific use cases.
That’s a problem when you consider that most of the GenAI development work being done today is focused on highly specialized use cases. In fact, by 2027, more than 50% of the GenAI models used by large businesses are predicted to be designed specifically for focused industry or business process functions – up from about 1% in 2023, according to Gartner.
The challenge, as many businesses are now learning the hard way, is that simply applying black box, off-the-shelf LLMs, like a GPT-4, for example, will not deliver the accuracy and consistency needed for professional-grade solutions. Meanwhile, efforts to re-engineer these models to perform specific tasks with retrieval augmented generation (RAG) frameworks or customized small language models can quickly add complexity, significant cost, and maintenance overhead to the AI initiative.
Fine-tuning GenAI for cost accuracy and latency without compromising privacy
The hard truth is that optimizing a GenAI system for the trifecta of cost, accuracy, and latency is an “art” that has still not been perfected. Moreover, challenges around data privacy and recognition of intellectual property often require a level of transparency that simply does not exist in many off-the-shelf models. However, significant progress has been made on a new approach to fine-tuning LLMs, which builds domain-specific context and explainability directly into the LLM, as opposed to working around the edges with complex engineering projects.
In fact, in our efforts to develop highly specialized GenAI solutions for the financial services, insurance, utilities, retail, and healthcare industries, we’re finding the most effective way to do this is by integrating proprietary domain data into existing, pre-trained LLMs.
The key to this approach is developing a solid data foundation to support the GenAI model. For example, in a project, EXL recently led helping a multinational property and casualty (P&C) insurance provider deploy an LLM to support the claims adjustment process. We were able to fine-tune Mistral’s open-source Mixtral 8x7b model with proprietary client data sets that had been assembled and manually curated over the past nine years. By leveraging this strong data foundation, anonymizing any personally identifiable information, tokenizing data, and layering that data into the open-source model, we were able to quickly and efficiently launch a highly accurate, fine-tuned LLM tailored to the client’s specific needs.
Similarly, we recently worked on a project with a multinational bank that was trying to move away from a legacy SAS system to Python in a Google Cloud Platform (GCP) data estate. Historically, such a migration would have been a multi-year, multi-million-dollar exercise. By fine-tuning LLMs for SAS to Python translation tasks, and testing on detailed, domain-specific LLM-generated synthetic data, however, we were able to accelerate this transition.
Critical considerations when building domain-specific LLMs
It is important to note, however, that these LLM fine-tuning projects were not push-button processes that simply applied proprietary data to commercial models. The path to balancing the demands of cost, accuracy, speed, and data security is filled with critical steps and considerations that need to be factored into the process. The following are some of the important lessons we’ve learned along the way.
- Select the right business benefit: While it may be tempting to tackle everything all at once, the elegance and efficiency of a domain-specific LLM comes from its focus on specific business outcomes. In insurance claims processing, for example, our goal was to help claims adjusters process more claims with higher accuracy much faster, thus lowering claim processing costs to insurers. By focusing and training our models based on that specific goal, we were able to quickly drive measurable value.
- Treat unstructured data as a first-class citizen: Tooling is also a major challenge in building domain-specific LLMs. The first step is building a new data pre-processing pipeline suitable for LLMs. Typically, this involves handling unstructured data from PDFs that require a robust tokenization pipeline. Our insurance LLM model, for example, the model contains two billion tokens for training. Identifying and anonymizing sensitive data and handling that data with the utmost care is essential.
- There is no substitute for domain expertise: Involving subject matter experts to review and label training data as well as to evaluate LLM outputs for accuracy is a critical component in the data pipeline. This is not a place where tech generalists can replace industry experts. Businesses need to nurture collaboration between the business and technology sides of the equation, and both need an intimate understanding of the workflows being transformed with LLMs.
- Recognize legal and ethical considerations: Legal and ethical considerations are also paramount when using client data in the data preparation phase. Obtaining legal approval for data use rights and ensuring data anonymization are critical steps to comply with privacy regulations and build client trust. These measures protect privacy and facilitate the ethical use of information in training LLMs. Data anonymization further ensures that sensitive information is protected, aligning with regulatory requirements and ethical standards.
- Start small, think big: While having a highly focused use case is key to getting domain-specific LLMs launched and quickly adding value, it is also important to leverage the work done in that specific area for other parts of the business. We advocate a mixture of Experts (MoE) model, which divides an LLM into separate sub-networks, each specializing in a subset of the input data, to jointly perform a task. This allows us to carry forward the core functionality of the domain-specific LLM across other business use cases without incurring increased computational costs every step of the way.
- Always be optimizing: To optimize LLMs for summarization and question-answering tasks during model training, parameter-efficient fine-tuning (PEFT) methods, such as Low-rank adaptation (LoRA), are highly effective. This approach preserves the LLM’s existing knowledge while enabling its adaptation to new domains in a compute and cost-efficient manner. By leveraging PEFT, organizations can fine-tune models with fewer parameters, reducing the computational burden and costs associated with training. Employing different optimization techniques, such as hyperparameter tuning and regularization, further enhances the model’s performance and efficiency.
Separating GenAI leaders from laggards
Building domain-specific LLMs is a complex process with a wide range of diverse tooling options available, but the benefits in terms of cost, accuracy, and latency cannot be overstated. The fact is that many GenAI projects being implemented right now are struggling when it comes to scaling across the entire enterprise, and many are already failing to meet their goals for return on investment. The key to getting past those hurdles and ensuring that the project is delivering the right benefits for the right reasons is taking a more focused, domain-specific approach to LLMs, one that addresses the complexities of tooling, data management, and compliance requirements.
To learn more, visit us here.
About the authors:
Anand Logani is chief digital officer, Arturo Devesa is vice president and chief AI architect, and Shubham Jain is vice president, data and analytics, UK, and Europe, at EXL, a leading data analytics and digital operations and solutions company.