Building an Optimal Poker Bot

Published Dec 19, 2023

I recently built a poker bot which simultaneously played several $2000 buyin tables. Most of its time was spent in the $200 buyin fast-fold format where it broke even over a sample of 70k hands.

This post examines technical decisions made during development of a game theory optimal (GTO) poker bot. It assumes the reader already has basic poker knowledge.

Exploitative vs GTO Strategy

Before getting into technical details, I want to very briefly cover poker strategy and how it relates to the bot:

An exploitative bot would make decisions which maximize earnings against suboptimal opponents. The catch is that it would leave the bot open to counter-exploits by experienced players.

A GTO bot would play a fixed, unexploitable strategy. Against human opponents, it would not maximize earnings but it would prevent loss in the long run.

I didn’t want to write custom (brittle) code to exploit specific player types or worry about the bot losing money by being exploited. So connecting the bot to an existing GTO solver and letting it earn passively whenever opponents made mistakes seemed like a good starting point.

Birds-Eye View

I’ll go over each component of the bot in detail below. But the end result looks something like this:

With client-server architecture, we can run the solver on powerful remote hardware while running less intensive bot logic on a local machine.

The local machine is responsible for:

Using simple computer vision to scrape poker client game state
Building game state slices into a sequence of coherent player actions
Making preflop playing decisions using local precomputed solutions
Augmenting postflop ranges ¹ using player statistics
Requesting solutions from the solver server for postflop play
UI interactions

The solver server is responsible for:

Solving postflop games on the fly or transparently loading solutions from the database if available
Maintaining solver game state in memory for each bot client simultaneously (one client per poker table)

Let’s look at a few techniques needed for handling the preflop stage first.

Preflop

Solving the preflop stage of poker is too time-consuming to fit within a player’s time to act window so we’ll have to use precomputed GTO solutions.

Serialized Game Trees

Internally, solvers represent a solution as a game tree data structure where each node describes a game state and each edge corresponds to a player action (fold, call or raise). Each node contains a strategy: probability distributions for taking a particular action by an optimal player.

Here’s a simplified game tree for the first three player actions of a game:

The path from root to leaf represents player actions taken during a game:

Under the gun (UTG) folds
Hijack (HJ) raises 2.5 BB
Cutoff (CO) raises 8 BB

Each node contains probability distributions for playing each hole card pair. For example, here’s cutoff before raising to 8 BB:

AKs

AQs

AJs

ATs

A9s

A8s

A7s

A6s

A5s

A4s

A3s

A2s

AKo

KQs

KJs

KTs

K9s

K8s

K7s

K6s

K5s

K4s

K3s

K2s

AQo

KQo

QJs

QTs

Q9s

Q8s

Q7s

Q6s

Q5s

Q4s

Q3s

Q2s

AJo

KJo

QJo

JTs

J9s

J8s

J7s

J6s

J5s

J4s

J3s

J2s

ATo

KTo

QTo

JTo

T9s

T8s

T7s

T6s

T5s

T4s

T3s

T2s

A9o

K9o

Q9o

J9o

T9o

98s

97s

96s

95s

94s

93s

92s

A8o

K8o

Q8o

J8o

T8o

98o

87s

86s

85s

84s

83s

82s

A7o

K7o

Q7o

J7o

T7o

97o

87o

76s

75s

74s

73s

72s

A6o

K6o

Q6o

J6o

T6o

96o

86o

76o

65s

64s

63s

62s

A5o

K5o

Q5o

J5o

T5o

95o

85o

75o

65o

54s

53s

52s

A4o

K4o

Q4o

J4o

T4o

94o

84o

74o

64o

54o

43s

42s

A3o

K3o

Q3o

J3o

T3o

93o

83o

73o

63o

53o

43o

32s

A2o

K2o

Q2o

J2o

T2o

92o

82o

72o

62o

52o

42o

32o

Raise 12.1%

Call 3.3%

Fold 84.6%

We want our bot to have this kind of data readily available for playing decisions. So we’ll scrape GTO solutions from a third-party service, build them into our own game trees in memory using Python and pickle them into files.

When a preflop round begins, the bot will select a game tree and deserialize it. As the game progresses, it will traverse through the tree in order to access strategies and make optimal playing decisions. Once the preflop stage ends, the bot will use the last visited nodes (exit nodes) as range inputs to the postflop solver.

Postflop

Unlike preflop, there are some cases in the postflop stage where we can solve on the fly. But we’ll want to host a third-party solver on a remote server in order to quickly compute solutions.

Looking at open-source offerings on GitHub, I eventually landed on postflop-solver. It’s well-documented and claims to be performant. It also has a permissive license which allows integration with our bot.

Solver Limitations

Since computing a complete solution requires enormous amounts of time and disk space, solvers make various concessions which affect the bot’s design:

Solving requires limiting each player’s bet sizes to discrete amounts. The bot needs to be able to handle cases where a real opponent uses an unexpected bet size.
Postflop solving requires specifying ranges for each player. In practice, the range we pass to the solver for the opponent could be far from what they might play in reality.
Postflop solving is often restricted to two-player games. Luckily, higher stakes in online poker tend to go postflop with only two players anyways. Nonetheless, the bot needs to handle the rare case where the flop is dealt to more than two players.

Solving Postflop

In a perfect world, we would just solve every postflop scenario on the fly. Unfortunately, CFR-based solvers often cannot generate an accurate solution within the time limit to act in online poker.

Parallelization helps but still doesn’t bring us close enough: experiments with postflop-solver showed that beyond a certain number of CPU cores, solve time improvements reached a point of diminishing returns because memory I/O became the bottleneck. Regardless of what hardware I tried, I couldn’t find a way to consistently get fast solutions.

Ultimately, the bot needs to rely on a mixture of both precomputed solutions and live solving. We’ll need to figure out specifically which scenarios require precomputing and which permit live solving.

Building a Solution Dataset

To figure out which solutions need to be precomputed, we’ll use the fact that only two pieces of game state significantly affect solve time: stack-to-pot ratio (SPR) and street.

The smaller the effective player stacks are in relation to the pot size, the faster the solver can reach an accurate solution. In addition, solving from the turn or river is much faster than solving from the flop.

In fact, the following scenarios can be solved within a few seconds to a high degree of accuracy:

Flops with a shallow SPR (e.g., 6.0 or lower)
Any scenario which begins at the turn or river

In poker, SPR is largely dictated by preflop raises — if a re-raise happens, we end up in shallow SPR territory. So, if we assume standard starting stack sizes, the only scenarios we need to presolve are single-raised flops.

Let’s get a sense for the space/time requirements of solving one game. And then figure out how many games in total we’ll need to solve.

Spoiler alert: it’s going to require many terabytes and months of compute time.

Only two inputs to postflop-solver have a significant effect on solve time: bet sizes and target exploitability.

Bet Sizes

Our bot needs to be able to respond to any bet size the opponent chooses. Unfortunately, solvers require restricting each players’ bet sizes to discrete values. And more bet sizes means longer solve time. So we’ll need to strike a balance between adequate coverage across the full range of possible bets and keeping the solve time reasonable:

Flop bet sizes: 33%, 50%, 66%, 125%, a
Turn bet sizes: 33%, 50%, 66%, 125%, a
River bet sizes: 33%, 50%, 66%, 125%, 200%, a
Raise sizes: 3.0x, 6.0x, a

The 125% and 200%² overbet sizes are necessary to include since they’re commonly used in higher stakes.

The a bet size allows the bot to move all-in.

How do we handle cases where the opponent uses an unexpected sizing such as 80% on the flop? We’ll just have to round the opponent’s bet size to the nearest predetermined value: 66%. Doing so may affect the bot’s profitability and decisions on further streets which I’ll discuss below as an open question.

Luckily, once the game progresses beyond the flop, we no longer have to worry about rounding bet sizes to fixed intervals. We’ve already learned that solving from the turn or river is fast. So we can re-solve on the fly with more granular intervals and even include the opponent’s exact bet size as one of the options.

It’s also worth pointing out that we’re precomputing single-raised flops where bet sizes tend to be cut-and-dry and fall cleanly into our preselected sizings: 33% or 66% for the in-position (IP) player and 3x raise for the out-of-position (OOP) player.

Once the bot is permitted to solve live, it can use much more granular sizings:

Flop bet sizes: 10%, 25%, 33%, 50%, 66%, 125%, a
Turn bet sizes: 10%, 33%, 50%, 66%, 125%, e, a
River bet sizes: 10%, 33%, 50%, 66%, 125%, 200%, e, a
Raise sizes: 3.0x, 6.0x, 12.0x, a

I’ve included 10% probe bets, 25% bets (common in 4-bet pots), geometric sizing (e) and oversized raises (sometimes seen on the river).

Target Exploitability

The closer we set this value to zero, the less exploitable the bot’s poker strategy will be but the longer our solutions will take to precompute.

For poker study, PioSOLVER recommends setting this to 0.01 BB. postflop-solver accepts this value as a percentage of the pot. For our single-raised 5.5 BB pots, that works out to 0.18% of the pot.

To give a rough sense of time required, I computed a few solutions using the bet sizes above along with various target exploitability: ³

0.18%: 5516.65 seconds ⁴
0.5%: 3130.44 seconds
5%: 699.96 seconds
50%: 198.76 seconds

We notice a major reduction in solve time from 0.5% to 5%. Since we’ll be precomputing tens of thousands of solutions, that’s a huge improvement in total time saved. Can the bot afford to operate with solutions that are exploitable for 5% of a given pot? I discuss this below as an open question but we’ll use 5% exploitability going forward.

To figure out how many solutions we need to compute, we have to understand two more solver inputs: player ranges and flop cards.

Player Ranges

In poker theory, a player’s starting range on the flop is determined by two things: their preflop actions and their configuration (position at the table respective to an opponent).

In higher stakes fast-fold poker, we frequently encounter only five configurations during single-raised pots:

BB (Big Blind) vs BTN (Button)
BB vs SB (Small Blind)
BB vs CO (Cutoff)
BB vs HJ (Hijack)
BB vs UTG (Under the Gun)

For now, we can skip solving the remaining 10 configurations to cut down on initial compute time. If players behave erratically and the bot arrives at the flop with an unexpected configuration, we’ll just bite the bullet and use a low quality live-solve.

What about multi-way configurations (three or more players)? The open questions section discusses presolving the most common multi-way configurations. But in general, we can handle that rare case with a simple non-GTO strategy.

It turns out that each postflop configuration can only be achieved through one sequence of preflop player actions. The most commonly occurring is BB vs BTN: three players fold, the button position open raises, the next player folds and the big blind player defends by calling.

Now, to give something useful to the solver, we need to know the actual ranges for button’s raise and big blind’s call. For now, we can get away with just hardcoding them based on existing preflop GTO solutions. A more complete solution using a game tree is discussed in the preflop section.

Flop Cards

Along with player ranges and stack sizes, the other solver input to consider is the flop cards dealt. In poker, there are 22,100 unique flop combinations. With that many flops, precomputation becomes prohibitively expensive.

But we can use a technique which reduces the problem space to just 1755 strategically unique flop subsets. So instead of solving every flop, we can just solve one flop within a subset.

The bot can then use the presolved flop for its playing decisions while the actual flop dealt by the poker site may be different. The bot will need an algorithm for mapping between actual and presolved flops which I discuss below.

Effective Stack Size

Another solver input parameter is stack size. So far, we’ve assumed a normal 100 BB stack size but what about scenarios with short or deep stacks? We can repeat all precomputations with all common stack sizes (e.g. from 30 BB up to 300 BB in regular intervals). As you can probably guess, recomputing everything for each stack size multiplies the total time and space requirements.

Luckily, we can get by with an initial version of the bot which does not support deep stacks: fast-fold poker on our poker site permits ratholing. We can code the bot to leave and rejoin the table to ensure it stays within a stack size which we have solutions for.

Rake Rate & Rake Cap

Long-term, we’d like to have separate solutions for various rake structures. But to keep things simple, we can get away with just targeting the $1/$2 fast-fold game type on our poker site. For this game, they list the rake rate as 5% and rake cap as $3.00. Note that the rake cap input should be specified in big blinds. So we have:

Rake cap: 1.5 BB ($3.00 in a 200NL game)
Rake rate: 0.05

Partial Solutions

I want to highlight one space-saving solver feature which we need to make use of: partial solution files. postflop-solver allows saving smaller solution files by discarding information after the river deal. The benefit is a massive reduction in file size: from roughly 10 GB down to 100 MB! Without partial solutions, our total storage requirement would be too large for a home server (several petabytes).

The downside of partial solutions is that we’ll need to re-solve on the fly when the game progresses from the flop to the turn. But as we learned above, those types of solutions easily fit within the time to act window (15 seconds).

Opening Raise Size

TODO: Add this. Explain how we assume 2.5 BB open raise but really our dataset should support 3 more common types: 2 BB, 2.25 BB, 3 BB.

Dataset Time/Space Requirements

Finally, we can put everything together to figure out total disk space and compute time required. At minimum, we need to precompute 5 configurations with one stack size and 1755 flops each. And we know that each 100 MB solution takes roughly 11.6 minutes to generate.

We can build up our dataset over time in stages:

5 common configurations, 100 BB stack size, 1755 flops per configuration: 71 days, 875.5 GB (8755 solution files)
Remaining 10 configurations: 142 days, 1755 GB (17,550 solution files)
5 common configurations with short stack sizes ⁵: 427 days, 5265 GB (52,650 solution files)
Remaining 10 configurations: 711 days, 8775 GB (87,750 solution files)
5 common configurations with deep stack sizes ⁶: 284 days, 3510 GB (35,100 solution files)
Remaining 10 configurations: 569 days, 7020 GB (70,200 solution files)

After stage 1, we’ll have a functional bot which handles remaining stages using low-quality live-solving or ratholing. Getting to this point requires 71 days and 875.5 GB disk space.

Things looks dismal when we consider all stages: the complete dataset would require a total of 6 years and 87 TB!

Clearly, we’ll need to split computation over several machines to avoid waiting years. So we’ll design our generation script to allow splitting work into subsets.

Generation Script

To start building our solution dataset, we’ll write a Python script that iterates over all combinations of inputs and runs the solver each time. It should have the following features:

Rerunning the script should never waste time recomputing existing solutions.
It should allow computing subsets of data which will allow us to parallelize work across multiple machines.
It should allow computing data in stages of quality. Since we’re impatient, we’d like to quickly generate an initial dataset and iteratively improve on it with better exploitability and bet size choices.

TODO: Finish this

Playing Postflop

TODO: Add this

Mapping Isomorphic Flops

The 1755 strategically unique flops enumerate all rank combinations but are grouped according to suit patterns:

Monotone (all suits equal: 4♣3♣2♣, 5♣3♣2♣, 6♣3♣2♣, …)
Rainbow (all suits different: 2♣2♦2♥, 3♥2♣2♦, 4♥2♣2♦, …)
Two-tone hi-low (high and low cards same suit: 3♣3♦2♣, 4♣3♦2♣, 4♣4♦2♣, …)
Two-tone hi-mid (high and middle cards same suit: 3♣2♣2♦, 4♣2♣2♦, 5♣2♣2♦, …)
Two-tone mid-low (middle and low cards same suit: 4♦3♣2♣, 5♦3♣2♣, 6♦3♣2♣, …)

What this means for the bot is that all flops within a subset will be played the same way. It will play 4♣3♣2♣ in exactly the same way as 4♥3♥2♥ because both belong to the subset 432_monotone. We just need to store a precomputed solution for 4♣3♣2♣.

Similarly, it would play 2♣2♦2♥ in the same way as 2♠2♦2♥ since both belong to the subset 222_rainbow.

We just need an algorithm for mapping from the suit dealt on the poker site to the suit we have a precomputed solution for.

Let’s say the bot is dealt the flop K♣Q♥9♠ but our precomputed solution is for K♥Q♦9♣. By examining the two flops, we can come up with the following suit mapping:

♣ → ♥

♥ → ♦

♠ → ♣

There is one issue though. We’ve converted hearts to diamonds but we did not map the now-displaced diamonds to another suit. So our deck of 52 cards effectively ends up with duplicates. The issue can be immediately noticed if you assume the bot has the hole cards 5♥5♦. The above mapping would result in an impossible hand: 5♦5♦

Which suit do we map diamonds to? The only one which is currently displaced in our mapping above: spades.

♣ → ♥

♥ → ♦

♠ → ♣

♦ → ♠

TODO: Finish this

Multi-way Strategy

TODO: Add this

Open Questions

Can the bot still be profitable while using solutions with 0.5% exploitability? How about 5% exploitability?
How much effect does rounding opponent bet sizes to fixed presolved sizes have on the bot? Is there an effect on overall EV or ranges on later streets?
Can the most common multi-way single-raised spots be presolved? For example, BB vs BTN vs SB.

A range is a collection of possible hands a player could have at any given moment.↩︎
Bet sizes are specified in terms of percentage of the pot and raises are multiples of the opponent’s bet size.↩︎
All solutions are computed on a c6a.24xlarge EC2 instance (96 vCPU and 192 GB memory). The flop used is A♥5♦2♣ with standard GTO ranges and 100 BB stacks.↩︎
Terminated early at a 1000 iteration limit with 0.25% exploitability.↩︎
40 BB, 50 BB, 60 BB, 70 BB, 80 BB, 90 BB↩︎
125 BB, 150 BB, 200 BB, 300 BB↩︎