Case Study

Cutting Fitch Ratings' LLM Costs by 66% with a Rebuilt RAG Pipeline

Cutting Fitch Ratings’ LLM Costs by 66% with a Rebuilt RAG Pipeline

Roko Labs cut token usage and operational costs by 66% on a RAG-based enterprise AI system at Fitch Ratings. The work targeted document chunking, retrieval precision, and prompt construction. Output quality held.

Roko Labs cut token usage and operational costs by 66% on a RAG-based enterprise AI system at Fitch Ratings. The work targeted document chunking, retrieval precision, and prompt construction. Output quality held.

an expert showing his manager the improved financial application

Client

Fitch Ratings

Industry

Financial Services

Services

Ai Services
AI Due Diligence

Project Duration

<1 month

The Challenge

Fitch was leveraging a Retrieval-Augmented Generation (RAG) system to generate insights from a large corpus of financial articles and documents. While the enterprise AI system was functional, it suffered from high LLM token usage and rapidly increasing costs, making it inefficient to operate at scale.

A closer technical evaluation revealed that the issue was not the underlying language model, but how data was being processed, retrieved, and passed into it. The RAG pipeline relied on inefficient document chunking and limited retrieval precision, resulting in large volumes of irrelevant or low‑value context being included in each query.

This led to several downstream issues:

• Excessive token usage caused by oversized prompts
• Retrieval of low-relevance content that reduced output quality
• Rising operational costs without meaningful improvements in results
• An inefficient RAG pipeline design that did not prioritize retrieval precision

As a result, the system became more expensive to run and less effective for a production-grade enterprise AI environment.

As one key insight from the engagement highlighted, “If your retrieval is wrong, everything after it becomes expensive.”

Our Vision

The goal of the engagement was to demonstrate that we could reduce LLM operating costs and improve RAG efficiency without compromising output quality. A technical evaluation confirmed that inefficient retrieval was the primary cost driver, as unnecessary context was consistently being passed into the model.

To solve this, the approach focused on optimizing how documents were segmented, retrieved, and assembled within the RAG pipeline, placing a strong emphasis on precision in retrieval.

The document chunking strategy was redesigned to ensure content was broken into smaller, semantically coherent chunks, improving retrieval accuracy and relevance. Retrieval logic was then refined to prioritize high-quality, contextually relevant matches over volume, ensuring only the most relevant information was selected for each query.

In parallel, prompt construction was optimized to eliminate redundant, low-value content. This further reduced token usage while maintaining consistent output quality. Together, these improvements aligned the system with modern enterprise RAG best practices, ensuring the LLM received focused, high-signal inputs instead of large, unfocused datasets.

By restructuring the flow of data through the RAG pipeline, the system became more efficient, more cost-effective, and better aligned with scalable enterprise AI design principles.

Solution

We implemented targeted optimizations across the RAG pipeline, including:

• Redesigned document chunking strategy to improve semantic relevance
• Refined retrieval logic to increase precision in retrieval
• Reduced unnecessary context passed to the LLM
• Optimized prompt construction for more efficient model usage

These changes improved how information was selected, structured, and delivered to the model, reducing waste while maintaining high-quality outputs.

RAG pipeline before and after optimization diagram highlighting precision in retrieval, semantic chunks, and efficient pipeline design to reduce LLM token usage

Results: Significant Cost Reduction and Improved RAG Efficiency

The optimized RAG system cut LLM token usage by 66%. Operational costs dropped with it. Output quality held.

Retrieval precision and reduced context per call drove the improvement, with no changes to the model itself. The new chunking and retrieval strategy feeds the LLM more relevant inputs, producing more consistent results across queries.

The optimized system delivered:

66% reduction in token usage and LLM costs
Improved efficiency across the RAG pipeline
Greater precision in the retrieved context
Better alignment between retrieval and generation
Improved scalability for enterprise AI workloads

Optimizing retrieval, chunking, and prompt construction is where RAG cost lives. The model is rarely the problem.

efficiency

& improved precision of the RAG pipeline.

66%

reduction in token usage and cost.

Need an overhaul of your AI systems?
Our engineers can help.

Talk to an expert

More Case Studies

CASE STUDY

Rebuilt 73 Strings Valuation Platform 600+ Screens, 6 Months

INDUSTRY

Financial Services

SERVICE

Due Diligence

CASE STUDY

Rebuilt 73 Strings Valuation Platform 600+ Screens, 6 Months

INDUSTRY

Financial Services

SERVICE

Due Diligence

software dewelopers working in an open office

CASE STUDY

Re-Engineering Software Delivery with AI at Roko Labs

INDUSTRY

Software Development

SERVICE

AI Development

CASE STUDY

Roko Labs Re-Engineered Software Delivery with System-Level AI

INDUSTRY

Software Development

SERVICE

AI Services

CASE STUDY

Intelligent Metric Search Auditable AI Research for AlphaSense

INDUSTRY

Financial Services

SERVICE

AI Due Diligence

CASE STUDY

Intelligent Metric Search Auditable AI Research for AlphaSense

INDUSTRY

Financial Services

SERVICE

AI Due Diligence

Have a similar task or project? Let's talk about it!

1250 Broadway, 36th Floor, New York, NY, 10001

Cutting Fitch Ratings' LLM Costs by 66% with a Rebuilt RAG Pipeline

Cutting Fitch Ratings’ LLM Costs by 66% with a Rebuilt RAG Pipeline

Roko Labs cut token usage and operational costs by 66% on a RAG-based enterprise AI system at Fitch Ratings. The work targeted document chunking, retrieval precision, and prompt construction. Output quality held.

The Challenge

Our Vision

Solution

Results: Significant Cost Reduction and Improved RAG Efficiency

efficiency

66%

Need an overhaul of your AI systems? Our engineers can help.

More Case Studies

Rebuilt 73 Strings Valuation Platform 600+ Screens, 6 Months

Rebuilt 73 Strings Valuation Platform 600+ Screens, 6 Months

Re-Engineering Software Delivery with AI at Roko Labs

Roko Labs Re-Engineered Software Delivery with System-Level AI

Intelligent Metric Search Auditable AI Research for AlphaSense

Intelligent Metric Search Auditable AI Research for AlphaSense

Have a similar task or project? Let's talk about it!

Have a similar task or project? Let's talk about it!

Need an overhaul of your AI systems?
Our engineers can help.