
import React from 'react'
import { mdx } from '@mdx-js/react'

/* @jsxRuntime classic */
/* @jsx mdx */
import { RyuImage, RyuFlex } from '@ramp/ryu'
import banner from './banner.jpg'
import beta_dists from './beta_dist.png'
import multiplexing_simulation from './ts_simulation.png'
import provider_diagram from './provider_diagram.png'
import ryne_photo from './ryne_photo.jpg'
import ts_example from './ts_example.png'
export const meta = {
  date: '2024-06-11T17:00:00.000Z',
  title: 'Make Better Decisions by Embracing Uncertainty',
  description: 'How Ramp applied Thompson Sampling to improve bank linking success rates',
  authors: [{
    name: 'Ryne Carbone',
    website: 'https://runthedata.dev',
    twitter: '',
    position: 'Machine Learning Engineer',
    bio: 'Ryne is a Machine Learning Engineer at Ramp where he works on Risk and ML Infrastructure.',
    photo: ryne_photo
  }]
};

const layoutProps = {
  meta
};
const MDXLayout = "wrapper"
export default function MDXContent({
  components,
  ...props
}) {
  return <MDXLayout {...layoutProps} {...props} components={components} mdxType="MDXLayout">



    <hr></hr>
    <RyuImage src={banner} alt="Make Better Decisions by Embracing Uncertainty" mdxType="RyuImage" />
    <hr></hr>
    <p>{`When making decisions with incomplete, noisy, or stale information, how can we make the most of
the data we have and weigh the underlying uncertainty? In this post, we'll explore how Ramp navigates these decisions
using `}<a parentName="p" {...{
        "href": "https://en.wikipedia.org/wiki/Multi-armed_bandit"
      }}>{`multi-armed bandits`}</a>{` to help customers directly link their
bank accounts. We've talked about `}<a parentName="p" {...{
        "href": "https://engineering.ramp.com/making-big-bets-in-business"
      }}>{`asymmetric outcomes`}</a>{`
before, and we'll dig into how our solution tackles the classic
`}<a parentName="p" {...{
        "href": "https://en.wikipedia.org/wiki/Exploration-exploitation_dilemma"
      }}>{`exploration vs exploitation tradeoff`}</a>{`.`}</p>
    <hr></hr>
    <h2>{`Bank Linking at Ramp`}</h2>
    <p>{`Ramp is a modern finance platform with corporate cards, automated expense management, bill pay, travel,
procurement, and more. To get started, we lay the context for the problem we are solving: connecting to a
business's bank account.`}</p>
    <h3>{`Show Me The Money`}</h3>
    <p>{`As part of our underwriting process, Ramp offers credit limits based on bank account data like recent banking transactions.
To collect this data, we ask the customer to link their bank account during the application. The customer benefits from
directly linking their account by receiving faster decisions, and they save time by avoiding continued manual document uploads.`}</p>
    <p>{`Behind the scenes, we leverage third party `}<em parentName="p">{`bank linking providers`}</em>{` which allow the customer to
search for financial institutions (i.e. the customer's bank: Chase, TD Bank, etc.) and securely link their
associated bank accounts.`}</p>
    <p>{`For each financial institution, we need to determine which provider to use in order to maximize
the likelihood of a successful bank linking connection, taking into account noisy and imperfect information.
Under the hood, providers use different strategies, and we don't know apriori which provider will have the highest
success rate for which financial institutions. Historical volume and recency of
attempts affects how precisely we can measure these rates and how much uncertainty there is.`}</p>
    <hr></hr>
    <RyuImage src={provider_diagram} alt="Bank linking providers diagram" mdxType="RyuImage" />
    <hr></hr>
    <h3>{`Testing, Testing`}</h3>
    <p>{`A natural instinct is to run A/B tests to determine which provider is best. While technically possible, running A/B
tests has several shortcomings:`}</p>
    <ul>
      <li parentName="ul"><strong parentName="li">{`Static`}</strong>{`: Measuring performance at a specific point in time leads to making decisions with outdated information.`}</li>
      <li parentName="ul"><strong parentName="li">{`Multiple Testing`}</strong>{`: We need to support thousands of financial institutions. This requires a separate A/B test for each.`}</li>
      <li parentName="ul"><strong parentName="li">{`Interpreting Results`}</strong>{`: When the experiment is over, there are still decisions to be made: measurements must be turned
into actions. For example: route all traffic through provider A, or split traffic 50/50 between both providers.`}</li>
    </ul>
    <hr></hr>
    <h2>{`Multi-Armed Bandits`}</h2>
    <p>{`Enter the `}<a parentName="p" {...{
        "href": "https://en.wikipedia.org/wiki/Multi-armed_bandit"
      }}>{`multi-armed bandit`}</a>{`: a reinforcement learning framework that
dynamically balances exploration of options with exploitation of historically top-performing choices. Multi-armed bandits
address the shortcomings of A/B testing:`}</p>
    <ul>
      <li parentName="ul"><strong parentName="li">{`Dynamic`}</strong>{`: Multi-armed bandits are adaptable and continuously update their decisions based on new data.`}</li>
      <li parentName="ul"><strong parentName="li">{`Extensible`}</strong>{`: Multi-armed bandits can be applied to any number of options, and independent
bandits can be run in parallel.`}</li>
      <li parentName="ul"><strong parentName="li">{`Automated`}</strong>{`: Decisions are made automatically by the algorithm: there is no need for interpretation.`}</li>
    </ul>
    <p>{`Furthermore, multi-armed bandits are viable long-term solutions: we can keep the algorithm running indefinitely, and
it will continue to learn and make decisions while incorporating new data.`}</p>
    <h3>{`What's For Dinner?`}</h3>
    <p>{`Multi-armed bandits get their name from slot machines in casinos, which used to be called "one-armed bandits". A gambler
playing slot machines does not know the probability of winning at any given machine, and so must trade off exploration
(gathering more data on the expected payoff of each machine) with exploitation (playing the machine that seems to have
the highest expected payoff so far).`}</p>
    <p>{`Multi-armed bandits aren't always the right choice. They can be helpful in situations where any of the following are true:`}</p>
    <ul>
      <li parentName="ul">{`Outcomes are probabilistic`}</li>
      <li parentName="ul">{`Measurements are noisy`}</li>
      <li parentName="ul">{`Circumstances change over time`}</li>
    </ul>
    <p>{`For example, let's imagine we want to decide where to get dinner. We have a favorite burger joint: it has consistent
quality, and we know we'll enjoy the meal. However, we're also curious about a new bistro that just opened up. Finally,
there is a ramen place that we tried once and didn't like, but now there is a new chef. Using a multi-armed bandit strategy
in this situation would help to maximize our long-term restaurant satisfaction by balancing our known preferences
(burger joint) with exploration of new options (bistro) and re-evaluation of previously suboptimal choices (ramen place).`}</p>
    <p>{`On the other hand, multi-armed bandits would not be useful for deciding which switch turns on a light in a room. Once we
try every switch, we can be extremely confident that we know what will happen next time we use each switch. The outcomes
are not probabilistic (always either turns on light of interest or not), measurement is not noisy (we can easily tell if
the light is on or not), and circumstances do not change over time (the effectiveness does not evolve, hopefully).`}</p>
    <h3>{`Thompson Sampling`}</h3>
    <p>{`There are many solutions to the multi-armed bandit problem, but we chose one of the most popular choices, Thompson
Sampling, because it is simple to implement and effective in practice. The crux of algorithm involves four steps:`}</p>
    <ol>
      <li parentName="ol">{`As we collect data, model the probability distribution of success for each action`}</li>
      <li parentName="ol">{`At each decision point, draw a sample from the current success distributions for each action`}</li>
      <li parentName="ol">{`Choose the action with the highest sampled value`}</li>
      <li parentName="ol">{`Update the modeled distribution for the chosen action after observing the result`}</li>
    </ol>
    <p>{`We use the `}<a parentName="p" {...{
        "href": "https://en.wikipedia.org/wiki/Beta_distribution"
      }}>{`Beta distribution`}</a>{` to model the success probability
of each provider. The Beta distribution is parameterized by two values: `}{`α`}{` (number of successes + 1) and `}{`β`}{`
(number of failures + 1). When there are few total attempts, the distribution is wide which codifies our uncertainty
about the true success rate. As we collect more data, the distribution becomes more peaked indicating less uncertainty.`}</p>
    <hr></hr>
    <RyuImage src={beta_dists} alt="Example beta distributions" hasBorder={true} width={"100%"} mdxType="RyuImage" />
    <hr></hr>
    <p>{`In the image above, we plot two Beta distributions: one with large uncertainty (orange, 1 success and 2 failures), and
one with small uncertainty (green, 11 successes and 9 failures). Although the green distribution has a higher average,
the orange distribution has a wider spread which affords it the chance to sample a higher value. This is a key
result of Thompson Sampling: `}<em parentName="p">{`we can leverage uncertainty to explore high upside options`}</em>{`.`}</p>
    <p>{`Below we show a simple Python implementation of Thompson Sampling where we simulate one day of bank linking attempts.
We have a total volume of attempts that we need to divide between the providers. We input the historical total counts
and total successes for each provider, then simulate how much traffic is routed to each provider.`}</p>
    <hr></hr>
    <pre><code parentName="pre" {...{
        "className": "language-python"
      }}>{`import numpy as np

def simulate_day(
    volume: int,
    trials_success_list: list[tuple[int, int]],
    debug: bool = False
) -> list[int]:
    """
    Given counts/successes for list of variants, simulate one day with volume v

    Returns counts of draws for each variant

    :param volume: volume for day
    :param trials_success_list: list of (trials, success)
    :param debug: print draws for debugging
    :return: counts of draws for each variant
    """
    # Draw #(volume) random numbers from beta distribution for each variant,
    draws = np.array([
        np.random.default_rng().beta(success_i + 1, trial_i - success_i + 1, volume)
        for trial_i, success_i in trials_success_list
    ])
    # For each draw in #(volume), select variant with the highest value
    draw_results = np.argmax(draws, axis=0)
    # Count total results for each variant
    result_counts = [
        (draw_results == i).sum()
        for i in range(len(trials_success_list))
    ]
    if debug:
        print('Draws: \\n', draws)
        print('Results: ', draw_results)
        print('Counts: ', result_counts)

    return result_counts
`}</code></pre>
    <hr></hr>
    <p>{`Below, we run this simulation with five attempts using the Beta distributions from above.
The first provider (orange) has a lower historical success rate (33%) compared to the second provider (green, 55%)
but we end up choosing it 3 out of 5 times. This is due to the larger uncertainty, and we end up exploring more
than exploiting.`}</p>
    <hr></hr>
    <RyuImage src={ts_example} alt="Example Thompson Sampling simulation for one day" mdxType="RyuImage" />
    <hr></hr>
    <h2>{`Applied Science`}</h2>
    <p>{`Before we could roll out Thompson Sampling to production, we needed to understand our data. We used simulation to
anticipate how the algorithm would behave with the data we had. In order to apply the algorithm, we built a data
collection feedback loop and crafted a measurement plan.`}</p>
    <h3>{`In a World ...`}</h3>
    <p>{`To understand how the algorithm behaves, we simulated Thompson Sampling for two providers under a variety of conditions.`}</p>
    <ul>
      <li parentName="ul">{`First, we generated true success rate trends for each provider, and then added noise on top.`}</li>
      <li parentName="ul">{`Next, we simulated daily bank linking attempt volume, and used the algorithm to decide how to split the volume
between the providers.`}</li>
      <li parentName="ul">{`Finally, we used the generated success rate of each provider to simulate outcomes for each attempt that was routed to it.`}</li>
    </ul>
    <hr></hr>
    <RyuImage src={multiplexing_simulation} alt="Thompson sampling simulation with changing success rates" width={"75%"} mdxType="RyuImage" />
    <hr></hr>
    <p>{`In the example above, we simulated a situation where one provider has a stable success rate while the other has a linearly
increasing rate. The dots are the daily generated success rates. The solid green line is the overall observed success rate
of Thompson Sampling. The solid purple and orange lines are the observed success rates for each provider. In general, we
found that Thompson Sampling was quick to exploit the best provider by diverting nearly all the volume to the one that
was clearly better. On the other hand, when both providers were close in success rate, Thompson Sampling would adapt and
split the volume more evenly.`}</p>
    <p>{`To gauge the robustness of the algorithm and anticipate how long it would take to converge on the best provider, we
investigated the impact of six factors through simulation. For all parameters, we used historical data to determine
the feasible search space.`}</p>
    <ul>
      <li parentName="ul"><strong parentName="li">{`Effect Size`}</strong>{`: Magnitude difference in success rate between providers`}</li>
      <li parentName="ul"><strong parentName="li">{`Noise Level`}</strong>{`: Day-to-day random variation in success rate`}</li>
      <li parentName="ul"><strong parentName="li">{`Volume`}</strong>{`: Number of daily bank linking attempts`}</li>
      <li parentName="ul"><strong parentName="li">{`Window Size`}</strong>{`: Length of trailing rolling window used to calculate parameters in Beta distribution`}</li>
      <li parentName="ul"><strong parentName="li">{`Trend Type`}</strong>{`: How success rate changes over time (flat, linear, sinusoidal, step change)`}</li>
      <li parentName="ul"><strong parentName="li">{`Initial Conditions`}</strong>{`: Parameters to seed Beta distribution at the start (cold start, correct rates with full volume,
incorrect rates)`}</li>
    </ul>
    <p>{`The most important finding was that the time for the algorithm to converge on the best provider was similar
across a broad range of parameters. There were some notable exceptions to this, however:`}</p>
    <ul>
      <li parentName="ul">{`Shorter window sizes enabled quicker adaptation, even at the expense of lower total volume`}</li>
      <li parentName="ul">{`Incorrect initial conditions were the most detrimental to performance, but the algorithm was still able to recover eventually`}</li>
    </ul>
    <h3>{`Measuring Success`}</h3>
    <p>{`When deciding on a rollout plan, we landed on a quasi-experimental design. Our primary metric measured the rate of
businesses that successfully linked their bank accounts.`}</p>
    <ul>
      <li parentName="ul"><strong parentName="li">{`Baseline`}</strong>{`: status quo (hardcoding which provider to use for which financial institution
deterministically)`}</li>
      <li parentName="ul"><strong parentName="li">{`Treatment`}</strong>{`: Thompson Sampling`}</li>
    </ul>
    <p>{`The baseline and treatment were sequential in time, in contrast to a true randomized experiment. We chose this
design because ultimately we were not choosing between two viable variants: we knew the status quo was not maintainable
long-term. We still wanted to measure the impact of Thompson Sampling, however, and using a sequential design afforded
us a quicker measurement.`}</p>
    <p>{`In the end, we observed a 10% increase in financial institution linking success rate, and a 25%
decrease in Ramp customers with any manually uploaded bank statements. We took a deeper look at individual
financial institutions and found that the algorithm adapted to changes as expected, and diverted traffic to
the top-performing provider.`}</p>
    <hr></hr>
    <h2>{`Wrap Up`}</h2>
    <p>{`We've just seen how Ramp successfully applied Thompson Sampling to improve bank linking success rates, thereby
helping customers link their bank accounts more easily and quicken the underwriting process.
More direct connections translates to hours of saved customer time searching for documentation and waiting for
Ramp to review them. This framework of continuous learning and automated
decision-making removed friction, increased velocity, and introduced adaptability. Through implementing Thompson Sampling,
we were able to take into account uncertainty and capitalize on high upside but uncertain options.`}</p>
    </MDXLayout>;
}

;
MDXContent.isMDXComponent = true;