Skip to content

Instantly share code, notes, and snippets.

@w0rd-driven
Forked from hamelsmu/fine-tuning.md
Created March 1, 2025 18:35
Show Gist options
  • Save w0rd-driven/d4be4f445d720e6dd5ecff521ffc9a97 to your computer and use it in GitHub Desktop.
Save w0rd-driven/d4be4f445d720e6dd5ecff521ffc9a97 to your computer and use it in GitHub Desktop.
From OpenAI Deep Research, in response to https://x.com/simonw/status/1895301139819860202

Success Stories of Fine-Tuning LLMs Across Industries

Below is a summary of diverse use cases where companies fine-tuned large language models (LLMs) to solve business challenges that previous methods struggled with. Each case highlights the challenge, the fine-tuning approach, and the key results achieved.

Summary of Fine-Tuning Success Cases

Use Case Key Results Source Link
Wealth Management Assistant (Finance) 98% advisor adoption; document access up from 20% to 80% OpenAI & Morgan Stanley
Insurance Claims AI (Insurance) 30% accuracy improvement vs. generic LLMs Insurance News (EXL)
Review Summarization at Scale (Retail) 100K reviews distilled into 5K summaries in months (vs 11 years manually) with 80% approval rate Microsoft (CarMax)
Radiology & Treatment Planning (Healthcare) Outperformed general LLMs (higher ROUGE, domain-specific accuracy) Mayo Clinic (RadOnc-GPT)

Finance: Fine-Tuned Wealth Management Assistant (Morgan Stanley)

(Shaping the future of financial services | OpenAI) Morgan Stanley fine-tuned GPT-4 to create an internal AI assistant that helps financial advisors quickly retrieve information from a vast knowledge base.

Challenge: Morgan Stanley’s wealth management division manages a huge repository of research and guidance for advisors. Traditional search tools had low efficiency – advisors accessed only ~20% of relevant documents, struggling to find answers quickly. This meant wasted time and missed opportunities to assist clients.

Fine-Tuning Approach: In partnership with OpenAI, Morgan Stanley deployed a GPT-4-powered assistant fine-tuned (and retrieval-augmented) on the firm’s internal content. The LLM was trained exclusively on thousands of proprietary research reports, investment insights, and operational manuals to ground its answers in Morgan Stanley’s knowledge base. Fine-tuning aligned the model with the company’s compliance and domain-specific terminology.

Results: The AI Assistant achieved over 98% adoption by advisor teams, a dramatic uptake in usage. Advisors can now retrieve information in seconds, with document access jumping from 20% to 80% of relevant content (Shaping the future of financial services | OpenAI). This fine-tuned LLM succeeded where previous search methods failed: it dramatically reduced search time and allowed advisors to spend more time with clients instead of digging through manuals. By delivering precise, context-aware answers, the assistant improved productivity and unlocked new use cases firmwide (now being scaled to other divisions). The success validates that a domain-tuned LLM can transform knowledge management in finance when earlier tools provided limited help.

Insurance: Specialized Claims Processing LLM (EXL)

Challenge: In the insurance sector, off-the-shelf general language models struggled with industry-specific jargon and processes. Claims handling involves nuanced tasks (e.g. coverage interpretation, anomaly detection) that generic AI often misinterpreted, leading to errors, compliance risks, and slow settlement cycles. Insurance companies had inefficiencies and “claims leakage” because previous AI methods couldn’t fully understand policy language or complex medical notes.

Fine-Tuning Approach: EXL, an insurance analytics firm, developed a fine-tuned insurance LLM using a large proprietary dataset of claims, policies, and underwriting documents. They leveraged Nvidia’s generative AI platform to update a base model’s weights with insurance-specific terminology and workflows. The fine-tuning incorporated thousands of Q&A pairs, tagged clauses, and domain knowledge so the model could handle tasks like claims reconciliation and coverage Q&A with expert-level accuracy.

Results: The specialized model outperforms generic LLMs by a wide margin – about 30% higher accuracy on insurance tasks compared to a general model (EXL rolls out ‘fine-tuned’ insurance LLM - Insurance News - insuranceNEWS.com.au). It can aggregate and interpret hundreds of thousands of claims and medical records, automatically flagging anomalies and summarizing chronologies. This led to faster, more accurate claim decisions and even “real-time conversations” with customers about their claims status. Importantly, the fine-tuned LLM maintained compliance with strict insurance regulations while doing so. By focusing exclusively on insurance data, EXL’s model succeeded where previous methods failed – generic AI and rule-based systems couldn’t parse the subtleties of claims, but the fine-tuned LLM drastically reduced processing time and errors in this highly specialized domain.

Retail: AI-Powered Review Summarization at CarMax

(CarMax takes gen AI for spin) CarMax used a fine-tuned GPT-3 model to summarize 100,000 customer reviews into concise insights, greatly accelerating content creation for their website.

Challenge: CarMax, the largest used-car retailer in the U.S., faced a content scalability problem. To help buyers, CarMax wanted to display summaries of customer reviews for each car model – but there were over 100,000 reviews across thousands of models. Manually writing these summaries would have taken an estimated 11 years of effort. Traditional text mining techniques weren’t up to the task of producing readable, high-quality summaries, and leaving the reviews unsummarized meant customers might be overwhelmed by raw feedback.

Fine-Tuning Approach: Using OpenAI’s API via Azure, CarMax fine-tuned GPT-3 on their extensive corpus of vehicle reviews. The model was trained to produce a short paragraph highlighting common sentiments and pros/cons for each car model year. CarMax’s team refined the model outputs with a bit of additional tuning and human-in-the-loop review to align with the company’s tone. Essentially, the LLM “learned” the language of car shoppers and what aspects mattered (e.g. handling, design, mileage).

Results: The AI solution generated 5,000 review summaries in just a few months, a task that was previously infeasible by manual methods. Quality exceeded expectations – after a little fine-tuning, 80% of the AI-generated summaries were approved by editors without major changes. The summaries were deployed on CarMax’s site, providing shoppers with instant insights and improving SEO rankings by adding fresh content (CarMax puts customers first with car research tools powered by Azure OpenAI Service | Microsoft Customer Stories). Fine-tuning succeeded where earlier attempts failed: it maintained CarMax’s brand voice and accuracy at scale, something that basic templates or generic models could not achieve. This not only saved massive labor hours but also enhanced customer experience by condensing crowd wisdom into readable advice.

Healthcare: Domain-Tuned LLM for Radiology and Treatment Planning

(Case Studies of Successful LLM Fine-Tuning in Healthcare - IT Supply Chain) Medical institutions fine-tune LLMs on specialized data (like patient records and radiology reports) to assist in diagnoses and treatment planning with greater accuracy.

Challenge: Healthcare data is complex and jargon-heavy. Generic LLMs like GPT-4, while knowledgeable, fell short in specialized fields like radiation oncology. For example, creating a treatment plan for a cancer patient requires interpreting medical imaging reports, lab results, and clinical notes with absolute precision. Traditional expert systems or out-of-the-box models weren’t reliable enough – they might miss subtle indicators or suggest unsafe treatments, as they weren’t tuned to the intricacies of this medical domain.

Fine-Tuning Approach: Researchers at Mayo Clinic developed RadOnc-GPT, a large language model fine-tuned specifically for radiation oncology. They started with Meta’s LLaMA-2 model and instruction-tuned it on a trove of de-identified radiation therapy records and oncology guidelines. The fine-tuning dataset included tasks such as generating detailed radiotherapy regimens, choosing optimal radiation modalities, and assigning diagnostic codes from patient cases. By training on this domain-specific data (while respecting privacy), the model learned the terminology and cause-effect patterns (e.g. how a certain tumor type influences dose planning) far better than a general model.

Results: The specialized model demonstrated significantly higher accuracy (measured by ROUGE score) on oncology tasks compared to baseline GPT-4 outputs (). For instance, RadOnc-GPT’s treatment recommendations aligned more closely with expert plans, capturing critical details that generic LLM answers missed. This shows that fine-tuning imbued the model with clarity and specificity needed for clinical use. While still under evaluation for safety, the fine-tuned LLM shows promise in reducing the time doctors spend drafting treatment plans and in checking that no relevant factor is overlooked. It succeeded where previous methods failed by handling niche medical knowledge – earlier AI tools or non-specialized LLMs either ignored up-to-date research or produced too many errors for practical use. Now, with fine-tuning, healthcare LLMs can support diagnoses, early risk detection (e.g. flagging high-risk sepsis patients from health records), and even patient Q&A, all with accuracy that meaningfully augments physician decision-making.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment