
import React from 'react'
import { mdx } from '@mdx-js/react'

/* @jsxRuntime classic */
/* @jsx mdx */
import { RyuImage, RyuFlex } from '@ramp/ryu'
import ryne_photo from './ryne_photo.jpg'
import rag_banner from './rag_banner.jpg'
import wizehire_result from './wizehire_result.jpg'
import old_vs_new_system_same_naics from './old_vs_new_system_same_naics.png'
import old_vs_new_system_diff_naics from './old_vs_new_system_diff_naics.png'
import naics_system_design from './naics_system_design.png'
import embedding_performance from './embedding_performance.png'
export const meta = {
  date: '2025-01-15T17:00:00.000Z',
  title: 'From RAG to Richness: How Ramp Revamped Industry Classification',
  description: 'How Ramp used Retrieval-Augmented Generation (RAG) to build a state-of-the-art in-house industry classification model.',
  authors: [{
    name: 'Ryne Carbone',
    website: 'https://runthedata.dev',
    twitter: '',
    position: 'Staff Machine Learning Engineer',
    bio: 'Ryne is a Machine Learning Engineer at Ramp where he works on Risk and ML Infrastructure.',
    photo: ryne_photo
  }]
};

const layoutProps = {
  meta
};
const MDXLayout = "wrapper"
export default function MDXContent({
  components,
  ...props
}) {
  return <MDXLayout {...layoutProps} {...props} components={components} mdxType="MDXLayout">



    <hr></hr>
    <RyuImage src={rag_banner} alt="Revamping Industry Classification with RAG" mdxType="RyuImage" />
    <hr></hr>
    <p><em parentName="p">{`For data professionals and decision-makers alike, classifying customers is important and challenging.
At Ramp, industry classification used to rely on homegrown taxonomies patched together with translation layers,
resulting in multiple sources of truth that were not auditable. Below, we show how
migrating to a standardized system, powered by an in-house Retrieval-Augmented Generation (RAG) model,
simplified workflows, improved data quality, and unlocked performance gains.`}</em></p>
    <hr></hr>
    <h2>{`The Industry Classification Challenge`}</h2>
    <p>{`Ramp's mission is to save our customers time and money.
A precise understanding of a customer's industry is vital to serving them well through many cross-cutting initiatives:
from compliance, to portfolio monitoring, to sales targeting and product analytics.
Industry classification, however, is challenging for a number of reasons.
Industry boundaries are fuzzy, and the lack of a ground truth makes it hard to evaluate predictions.
What's more, the data used to generate predictions can be sparse and have a non-uniform distribution.
In this article, we'll discuss how we built an in-house industry classification model using
`}<a parentName="p" {...{
        "href": "https://en.wikipedia.org/wiki/Retrieval-augmented_generation"
      }}>{`Retrieval-Augmented Generation`}</a>{` (RAG)
and improved our understanding of our customers.`}</p>
    <h2>{`Industry Taxonomies`}</h2>
    <p>{`Having a consistent and accurate industry classification system is crucial. For example,
if the Risk team and Sales team use separate taxonomies, they cannot have quick feedback loops on targeting
and segmentation. Likewise, any communication with an external partner will require a translation
to their preferred industry mappings.`}</p>
    <h3>{`Standard Systems`}</h3>
    <p>{`There are two standard taxonomies for industry classification in the US:`}</p>
    <ul>
      <li parentName="ul">{`(Newer) `}<a parentName="li" {...{
          "href": "https://www.census.gov/naics/"
        }}>{`North American Industry Classification System`}</a>{` (NAICS)`}</li>
      <li parentName="ul">{`(Older) `}<a parentName="li" {...{
          "href": "https://www.sec.gov/search-filings/standard-industrial-classification-sic-code-list"
        }}>{`Standard Industry Classification`}</a>{` (SIC)`}</li>
    </ul>
    <p>{`Four-digit SIC codes were developed by the US in 1937, later replaced by six-digit NAICS codes in the 1990s.
These codes are hierarchical — you can look at subsets of leading digits to get a broader classification.
The systems attempt to group industries by similar production processes. While a business can have more than
one applicable code, the line of business that generates the most income is generally chosen as the primary code.`}</p>
    <h3>{`Ramp's (Old) System`}</h3>
    <p>{`In the past, Ramp mainly used a third, homegrown, non-standard industry classification system.
Businesses were classified using a stitched together web of third-party data, Sales-entered data, and customer self-reporting.
The Homegrown system had four common issues:`}</p>
    <ul>
      <li parentName="ul">{`Categories were occasionally obviously incorrect`}</li>
      <li parentName="ul">{`Categories could be so generic it was unhelpful`}</li>
      <li parentName="ul">{`Categories for similar businesses could be different`}</li>
      <li parentName="ul">{`The system was not auditable or interpretable`}</li>
    </ul>
    <p>{`Consider an actual Ramp customer: `}<a parentName="p" {...{
        "href": "https://ramp.com/customers/wizehire-case-study"
      }}>{`WizeHire`}</a>{`, a hiring platform
that helps small businesses grow. In the Homegrown system, WizeHire was classified as "Professional Services". This category
is overly broad and can capture a wide spectrum of businesses like law firms, dating apps, and consulting firms.
For Ramp's Sales and Marketing teams, this made it hard to understand what businesses like WizeHire need
and how best to serve them. For Ramp's Risk team, this made it difficult to profile credit risk and satisfy compliance
requirements in this segment.`}</p>
    <p>{`Additionally, some teams would convert these Homegrown industry labels to SIC codes, while others would directly use NAICS codes.
In order to go from the Homegrown System to NAICS or SIC codes, we would need to apply many-to-many mappings
from the 100+ internal levels to thousands of codes. It was not out of the ordinary for one internal industry level
to map to 50 potential NAICS codes.`}</p>
    <h3>{`Solution`}</h3>
    <p>{`We tamed this complexity by migrating all Ramp industry classification to NAICS codes. This allows internal
teams to have a consistent, expressive taxonomy while also enabling easier communication with external partners who were already
using NAICS codes.`}</p>
    <p>{`Revisiting the previous example, within NAICS, WizeHire is classified as "561311 - Employment Placement Agencies".
Furthermore, because NAICS codes are hierarchical, we can extract more general categories from this code. The full hierarchy
is displayed below:`}</p>
    <hr></hr>
    <RyuImage src={wizehire_result} alt="NAICS Hierarchy" width={"100%"} mdxType="RyuImage" />
    <hr></hr>
    <p>{`The combination of precise and pertinent labels with the ability to roll up into more general categories
gives teams at Ramp the flexibility to decide which level of granularity is best for each use case.
To enable the migration to NAICS, we needed a classification model that could predict six-digit NAICS codes
for all Ramp businesses. Third-party solutions are a quick way to get good, general performance; however,
Ramp has unique needs with complex data, and we decided to build an in-house model.`}</p>
    <h2>{`Building A Classification Model with RAG`}</h2>
    <p>{`We chose a RAG system as our in-house industry classification model. RAG systems have three main stages:`}</p>
    <ol>
      <li parentName="ol">{`Calculate text embeddings of a query and a knowledge base`}</li>
      <li parentName="ol">{`Calculate similarity scores to generate recommendations from the knowledge base`}</li>
      <li parentName="ol">{`Use an LLM to make a final prediction from the filtered recommendations`}</li>
    </ol>
    <p>{`One of the main benefits of using a RAG system is the ability to constrain the output of an LLM
to the domain of a knowledge base (in our case, valid NAICS codes). Instead of an open-ended free response,
we are giving the LLM a multiple choice question.`}</p>
    <h3>{`Metrics`}</h3>
    <p>{`When developing any machine learning model, it is important to identify a set of relevant metrics to
evaluate performance. Because we are working with a multi-stage system, we chose to
break the problem down into two components and identified metrics for each stage. We took care
to ensure that the metrics for each stage didn't interfere with each other and were aligned with
the overall goal of the system.`}</p>
    <p>{`The first stage was to generate curated recommendations from the knowledge base.
We chose accuracy at k (`}<inlineCode parentName="p">{`acc@k`}</inlineCode>{`) as the primary metric for this stage: how often is the correct NAICS code in
the top `}<inlineCode parentName="p">{`k`}</inlineCode>{` recommendations? This is a sensible  metric because it represents a ceiling on the performance of the
full system. If the correct code is not in the top `}<inlineCode parentName="p">{`k`}</inlineCode>{` recommendations, the LLM will not be able to select it.`}</p>
    <p>{`The second stage was to select a final prediction from the recommendations. We chose to define a custom fuzzy-accuracy
metric. Because NAICS codes are hierarchical, we want to make sure that predictions that are correct for part of the hierarchy
are scored better than predictions that are completely wrong. For example, if the correct code is `}<inlineCode parentName="p">{`123456`}</inlineCode>{`,
a prediction of `}<inlineCode parentName="p">{`123499`}</inlineCode>{` should be scored better than `}<inlineCode parentName="p">{`999999`}</inlineCode>{` because the first four digits are correct.`}</p>
    <h3>{`Generating Recommendations`}</h3>
    <p>{`Generating recommendations involves identifying the most relevant items from the knowledge base (NAICS codes) given a
query (business). This part of the system has a variety of hyperparameters to choose:`}</p>
    <ul>
      <li parentName="ul">{`Knowledge base field to embed`}</li>
      <li parentName="ul">{`Query field to embed`}</li>
      <li parentName="ul">{`Embedding model`}</li>
      <li parentName="ul">{`Number of recommendations`}</li>
    </ul>
    <p>{`For each parameter there are tradeoffs to consider. For example, certain business attributes may be more informative than
others but may have higher missing rates. Additionally, different embedding models have different resource requirements
that don't necessarily correlate with performance on the specific data we have.`}</p>
    <p>{`In the end, we profiled the performance of different configurations and created `}<inlineCode parentName="p">{`acc@k`}</inlineCode>{` curves. Note that we can't
determine the optimal number of recommendations to generate without considering the downstream LLM performance. If we
naively optimize for `}<inlineCode parentName="p">{`acc@k`}</inlineCode>{` we would end up with a system that just recommends the whole knowledge base (guaranteed that
the correct label is present if we recommend all possible labels).`}</p>
    <p>{`We found that optimizations in this stage lead to significant performance boosts of up to 60% in `}<inlineCode parentName="p">{`acc@k`}</inlineCode>{`. We also
identified economical embedding models that could be used in production without sacrificing performance compared to
the largest models.`}</p>
    <hr></hr>
    <RyuImage src={embedding_performance} alt="Text Embedding Performance" width={"70%"} mdxType="RyuImage" />
    <p><em parentName="p">{`We profiled performance with `}<inlineCode parentName="em">{`acc@k`}</inlineCode>{` curves. We looked for groupings with the best performance (purple and pink curves)
and selected those with the least resource requirements and best data coverage.`}</em></p>
    <hr></hr>
    <h3>{`Selecting Predictions`}</h3>
    <p>{`The second stage of the RAG system involves selecting a final prediction from the recommendations using an LLM. This part
also has a variety of hyperparameters to choose:`}</p>
    <ul>
      <li parentName="ul">{`Number of recommendations`}</li>
      <li parentName="ul">{`Fields to include in the prompt (business and knowledge base)`}</li>
      <li parentName="ul">{`Prompt variations`}</li>
      <li parentName="ul">{`Number of prompts`}</li>
      <li parentName="ul">{`Structured output class and fields`}</li>
    </ul>
    <p>{`Just like the first stage, there are a number of tradeoffs to consider. For example, including  more recommendations
in the prompt gives the LLM a better chance at finding the correct code, but it also increases the context size and can
lead to degraded performance if the LLM is unable to focus on the most relevant recommendations. Likewise, longer or more
descriptive information can help the LLM better understand a business or a NAICS code, but will also greatly increase the
context size.`}</p>
    <p>{`In the end we chose a two-prompt system to get the best of both worlds. In the first prompt we include many recommendations
but don't include the most specific descriptions, asking the LLM to return a small list of the most relevant codes. In the
second prompt, we then ask the LLM to choose the best one and provide more context for each code. For each parameter
we searched, we found a `}<inlineCode parentName="p">{`5%-15%`}</inlineCode>{` improvement in fuzzy accuracy after optimization.`}</p>
    <h3>{`Final Design`}</h3>
    <p>{`Piecing our findings together, we designed an online RAG system as shown in the diagram below. We have internal services
that handle embeddings for new businesses and LLM prompt evaluations. Knowledge base embeddings are pre-computed and
stored in Clickhouse for fast retrieval of recommendations using similarity scores. We log intermediate results using Kafka
so that we can diagnose pathological cases and iterate on prompts.`}</p>
    <p>{`Although RAG helps constrain LLM outputs, we also have added guardrails. While hallucinations are generally negative,
we've also found cases where the LLM predicts the correct code despite it not being present in the recommendations.
To filter out just "bad" hallucinations, we validate that the output NAICS codes from each LLM prompt are valid.`}</p>
    <hr></hr>
    <RyuImage src={naics_system_design} alt="Industry Classification RAG System Design" hasBorder={true} width={"100%"} mdxType="RyuImage" />
    <p><em parentName="p">{`Our RAG system design (dashed black box). Embeddings and LLM prompts are handled by internal services (green). Similarity
scores are calculated using Clickhouse (orange). Intermediate results are logged using Kafka (orange).`}</em></p>
    <hr></hr>
    <h2>{`Ramp's (New) System`}</h2>
    <p>{`Since deploying the RAG system, we've already realized a number of benefits.`}</p>
    <h3>{`Ownership and Control`}</h3>
    <p>{`Besides increased accuracy, we have control over the algorithm.
We can (and do) make tweaks to any of the dozens of hyperparameters we searched to address concerns as they come up.
Because we log all intermediate steps, we can pinpoint where issues are cropping up (retrieval vs re-ranking).
From performance degradation to latency requirements to cost sensitivity, we can adjust the model on the fly.
In contrast, with a third-party solution, we would be stuck with their roadmap, pricing, and iteration speed.
Furthermore, we can audit and interpret the model's decisions. We ask the LLM for justifications to clarify the reasoning
behind each prediction.`}</p>
    <h3>{`Before and After`}</h3>
    <p>{`To demonstrate the impact of the new model, we've included examples below of how a business
was classified in the Homegrown system compared to how they are classified in the NAICS-based RAG system. In the
first table, we see three cases where the businesses were all very similar but were classified into different categories
in the Homegrown system. Using our RAG model, however, these businesses are all categorized under the same NAICS code. In
the second table, we see three instances where the Homegrown system categorized all businesses in the same, overly broad
category. In contrast, the RAG model was able to correctly assign these businesses to more descriptive NAICS codes.`}</p>
    <hr></hr>
    <RyuImage src={old_vs_new_system_same_naics} alt="Old vs New Industry Classification System - Same NAICS Code" width={"100%"} mdxType="RyuImage" />
    <RyuImage src={old_vs_new_system_diff_naics} alt="Old vs New Industry Classification System - Different NAICS Code" width={"100%"} mdxType="RyuImage" />
    <p><em parentName="p">{`Examples of how businesses were categorized in the old, Homegrown system compared to the
NAICS-based RAG system. In the first table we see cases where the Homegrown system classified
similar businesses into separate categories, while the NAICS system correctly classifies them together.
In the second table, we see cases where an overly broad category in the Homegrown system is split
into more apt and descriptive categories with NAICS.`}</em></p>
    <hr></hr>
    <h3>{`Reception`}</h3>
    <p>{`This model has greatly improved our data quality and solved many pain-points across Ramp. Our teams are thrilled to
start using it as evidenced by comments we've gotten from affected stakeholders:`}</p>
    <blockquote>
      <p parentName="blockquote">{`"This is a big deal — it will significantly upgrade our data quality and understanding of our customers."`}</p>
    </blockquote>
    <blockquote>
      <p parentName="blockquote">{`"I've waited years for this."`}</p>
    </blockquote>
    <blockquote>
      <p parentName="blockquote">{`"The existing classification wasn't nuanced enough to satisfy industry exclusion requirements. This is perfect."`}</p>
    </blockquote>
    <blockquote>
      <p parentName="blockquote">{`"As we diversify our customer base, this will be an incredible driver of our business success."`}</p>
    </blockquote>
    <h2>{`Wrap Up`}</h2>
    <p>{`To migrate from a tangle of taxonomies to a standardized industry classification system, we built an in-house RAG model.
Ultimately, our model lead to increased accuracy in industry classification with full control over updates, tuning, and costs.
This model now helps Ramp’s internal teams work more cohesively, and enables more precise communication with external partners.
Our teams are excited about how this model has brought an increase in clarity and understanding of our customers, and
how it's helping us better serve them.`}</p>
    </MDXLayout>;
}

;
MDXContent.isMDXComponent = true;