Building an Optimal Poker Bot

Published Dec 19, 2023

I recently built a poker bot which simultaneously played several $2000 buyin tables. Most of its time was spent in the $200 buyin fast-fold format where it broke even over a sample of 70k hands.

This post examines technical decisions made during development of a game theory optimal (GTO) poker bot. It assumes the reader already has basic poker knowledge.

Exploitative vs GTO Strategy

Before getting into technical details, I want to very briefly cover poker strategy and how it relates to the bot:

An exploitative bot would make decisions which maximize earnings against suboptimal opponents. The catch is that it would leave the bot open to counter-exploits by experienced players.

A GTO bot would play a fixed, unexploitable strategy. Against human opponents, it would not maximize earnings but it would prevent loss in the long run.

I didn’t want to write custom (brittle) code to exploit specific player types or worry about the bot losing money by being exploited. So connecting the bot to an existing GTO solver and letting it earn passively whenever opponents made mistakes seemed like a good starting point.

Birds-Eye View

I’ll go over each component of the bot in detail below. But the end result looks something like this:

High-level bot architecture

With client-server architecture, we can run the solver on powerful remote hardware while running less intensive bot logic on a local machine.

The local machine is responsible for:

The solver server is responsible for:

Let’s look at a few techniques needed for handling the preflop stage first.

Preflop

Solving the preflop stage of poker is too time-consuming to fit within a player’s time to act window so we’ll have to use precomputed GTO solutions.

Serialized Game Trees

Internally, solvers represent a solution as a game tree data structure where each node describes a game state and each edge corresponds to a player action (fold, call or raise). Each node contains a strategy: probability distributions for taking a particular action by an optimal player.

Here’s a simplified game tree for the first three player actions of a game:

Game tree

The path from root to leaf represents player actions taken during a game:

  1. Under the gun (UTG) folds
  2. Hijack (HJ) raises 2.5 BB
  3. Cutoff (CO) raises 8 BB

Each node contains probability distributions for playing each hole card pair. For example, here’s cutoff before raising to 8 BB:

AA
AKs
AQs
AJs
ATs
A9s
A8s
A7s
A6s
A5s
A4s
A3s
A2s
AKo
KK
KQs
KJs
KTs
K9s
K8s
K7s
K6s
K5s
K4s
K3s
K2s
AQo
KQo
QQ
QJs
QTs
Q9s
Q8s
Q7s
Q6s
Q5s
Q4s
Q3s
Q2s
AJo
KJo
QJo
JJ
JTs
J9s
J8s
J7s
J6s
J5s
J4s
J3s
J2s
ATo
KTo
QTo
JTo
TT
T9s
T8s
T7s
T6s
T5s
T4s
T3s
T2s
A9o
K9o
Q9o
J9o
T9o
99
98s
97s
96s
95s
94s
93s
92s
A8o
K8o
Q8o
J8o
T8o
98o
88
87s
86s
85s
84s
83s
82s
A7o
K7o
Q7o
J7o
T7o
97o
87o
77
76s
75s
74s
73s
72s
A6o
K6o
Q6o
J6o
T6o
96o
86o
76o
66
65s
64s
63s
62s
A5o
K5o
Q5o
J5o
T5o
95o
85o
75o
65o
55
54s
53s
52s
A4o
K4o
Q4o
J4o
T4o
94o
84o
74o
64o
54o
44
43s
42s
A3o
K3o
Q3o
J3o
T3o
93o
83o
73o
63o
53o
43o
33
32s
A2o
K2o
Q2o
J2o
T2o
92o
82o
72o
62o
52o
42o
32o
22
Raise 12.1%
Call 3.3%
Fold 84.6%

We want our bot to have this kind of data readily available for playing decisions. So we’ll scrape GTO solutions from a third-party service, build them into our own game trees in memory using Python and pickle them into files.

When a preflop round begins, the bot will select a game tree and deserialize it. As the game progresses, it will traverse through the tree in order to access strategies and make optimal playing decisions. Once the preflop stage ends, the bot will use the last visited nodes (exit nodes) as range inputs to the postflop solver.

Postflop

Unlike preflop, there are some cases in the postflop stage where we can solve on the fly. But we’ll want to host a third-party solver on a remote server in order to quickly compute solutions.

Looking at open-source offerings on GitHub, I eventually landed on postflop-solver. It’s well-documented and claims to be performant. It also has a permissive license which allows integration with our bot.

Solver Limitations

Since computing a complete solution requires enormous amounts of time and disk space, solvers make various concessions which affect the bot’s design:

Solving Postflop

In a perfect world, we would just solve every postflop scenario on the fly. Unfortunately, CFR-based solvers often cannot generate an accurate solution within the time limit to act in online poker.

Parallelization helps but still doesn’t bring us close enough: experiments with postflop-solver showed that beyond a certain number of CPU cores, solve time improvements reached a point of diminishing returns because memory I/O became the bottleneck. Regardless of what hardware I tried, I couldn’t find a way to consistently get fast solutions.

Ultimately, the bot needs to rely on a mixture of both precomputed solutions and live solving. We’ll need to figure out specifically which scenarios require precomputing and which permit live solving.

Building a Solution Dataset

To figure out which solutions need to be precomputed, we’ll use the fact that only two pieces of game state significantly affect solve time: stack-to-pot ratio (SPR) and street.

The smaller the effective player stacks are in relation to the pot size, the faster the solver can reach an accurate solution. In addition, solving from the turn or river is much faster than solving from the flop.

In fact, the following scenarios can be solved within a few seconds to a high degree of accuracy:

In poker, SPR is largely dictated by preflop raises — if a re-raise happens, we end up in shallow SPR territory. So, if we assume standard starting stack sizes, the only scenarios we need to presolve are single-raised flops.

Let’s get a sense for the space/time requirements of solving one game. And then figure out how many games in total we’ll need to solve.

Spoiler alert: it’s going to require many terabytes and months of compute time.

Only two inputs to postflop-solver have a significant effect on solve time: bet sizes and target exploitability.

Bet Sizes

Our bot needs to be able to respond to any bet size the opponent chooses. Unfortunately, solvers require restricting each players’ bet sizes to discrete values. And more bet sizes means longer solve time. So we’ll need to strike a balance between adequate coverage across the full range of possible bets and keeping the solve time reasonable:

The 125% and 200%2 overbet sizes are necessary to include since they’re commonly used in higher stakes.

The a bet size allows the bot to move all-in.

How do we handle cases where the opponent uses an unexpected sizing such as 80% on the flop? We’ll just have to round the opponent’s bet size to the nearest predetermined value: 66%. Doing so may affect the bot’s profitability and decisions on further streets which I’ll discuss below as an open question.

Luckily, once the game progresses beyond the flop, we no longer have to worry about rounding bet sizes to fixed intervals. We’ve already learned that solving from the turn or river is fast. So we can re-solve on the fly with more granular intervals and even include the opponent’s exact bet size as one of the options.

It’s also worth pointing out that we’re precomputing single-raised flops where bet sizes tend to be cut-and-dry and fall cleanly into our preselected sizings: 33% or 66% for the in-position (IP) player and 3x raise for the out-of-position (OOP) player.

Once the bot is permitted to solve live, it can use much more granular sizings:

I’ve included 10% probe bets, 25% bets (common in 4-bet pots), geometric sizing (e) and oversized raises (sometimes seen on the river).

Target Exploitability

The closer we set this value to zero, the less exploitable the bot’s poker strategy will be but the longer our solutions will take to precompute.

For poker study, PioSOLVER recommends setting this to 0.01 BB. postflop-solver accepts this value as a percentage of the pot. For our single-raised 5.5 BB pots, that works out to 0.18% of the pot.

To give a rough sense of time required, I computed a few solutions using the bet sizes above along with various target exploitability: 3

We notice a major reduction in solve time from 0.5% to 5%. Since we’ll be precomputing tens of thousands of solutions, that’s a huge improvement in total time saved. Can the bot afford to operate with solutions that are exploitable for 5% of a given pot? I discuss this below as an open question but we’ll use 5% exploitability going forward.

To figure out how many solutions we need to compute, we have to understand two more solver inputs: player ranges and flop cards.

Player Ranges

In poker theory, a player’s starting range on the flop is determined by two things: their preflop actions and their configuration (position at the table respective to an opponent).

In higher stakes fast-fold poker, we frequently encounter only five configurations during single-raised pots:

For now, we can skip solving the remaining 10 configurations to cut down on initial compute time. If players behave erratically and the bot arrives at the flop with an unexpected configuration, we’ll just bite the bullet and use a low quality live-solve.

What about multi-way configurations (three or more players)? The open questions section discusses presolving the most common multi-way configurations. But in general, we can handle that rare case with a simple non-GTO strategy.

It turns out that each postflop configuration can only be achieved through one sequence of preflop player actions. The most commonly occurring is BB vs BTN: three players fold, the button position open raises, the next player folds and the big blind player defends by calling.

Now, to give something useful to the solver, we need to know the actual ranges for button’s raise and big blind’s call. For now, we can get away with just hardcoding them based on existing preflop GTO solutions. A more complete solution using a game tree is discussed in the preflop section.

Flop Cards

Along with player ranges and stack sizes, the other solver input to consider is the flop cards dealt. In poker, there are 22,100 unique flop combinations. With that many flops, precomputation becomes prohibitively expensive.

But we can use a technique which reduces the problem space to just 1755 strategically unique flop subsets. So instead of solving every flop, we can just solve one flop within a subset.

The bot can then use the presolved flop for its playing decisions while the actual flop dealt by the poker site may be different. The bot will need an algorithm for mapping between actual and presolved flops which I discuss below.

Effective Stack Size

Another solver input parameter is stack size. So far, we’ve assumed a normal 100 BB stack size but what about scenarios with short or deep stacks? We can repeat all precomputations with all common stack sizes (e.g. from 30 BB up to 300 BB in regular intervals). As you can probably guess, recomputing everything for each stack size multiplies the total time and space requirements.

Luckily, we can get by with an initial version of the bot which does not support deep stacks: fast-fold poker on our poker site permits ratholing. We can code the bot to leave and rejoin the table to ensure it stays within a stack size which we have solutions for.

Rake Rate & Rake Cap

Long-term, we’d like to have separate solutions for various rake structures. But to keep things simple, we can get away with just targeting the $1/$2 fast-fold game type on our poker site. For this game, they list the rake rate as 5% and rake cap as $3.00. Note that the rake cap input should be specified in big blinds. So we have:

Partial Solutions

I want to highlight one space-saving solver feature which we need to make use of: partial solution files. postflop-solver allows saving smaller solution files by discarding information after the river deal. The benefit is a massive reduction in file size: from roughly 10 GB down to 100 MB! Without partial solutions, our total storage requirement would be too large for a home server (several petabytes).

The downside of partial solutions is that we’ll need to re-solve on the fly when the game progresses from the flop to the turn. But as we learned above, those types of solutions easily fit within the time to act window (15 seconds).

Opening Raise Size

TODO: Add this. Explain how we assume 2.5 BB open raise but really our dataset should support 3 more common types: 2 BB, 2.25 BB, 3 BB.

Dataset Time/Space Requirements

Finally, we can put everything together to figure out total disk space and compute time required. At minimum, we need to precompute 5 configurations with one stack size and 1755 flops each. And we know that each 100 MB solution takes roughly 11.6 minutes to generate.

We can build up our dataset over time in stages:

  1. 5 common configurations, 100 BB stack size, 1755 flops per configuration: 71 days, 875.5 GB (8755 solution files)

  2. Remaining 10 configurations: 142 days, 1755 GB (17,550 solution files)

  3. 5 common configurations with short stack sizes 5: 427 days, 5265 GB (52,650 solution files)

  4. Remaining 10 configurations: 711 days, 8775 GB (87,750 solution files)

  5. 5 common configurations with deep stack sizes 6: 284 days, 3510 GB (35,100 solution files)

  6. Remaining 10 configurations: 569 days, 7020 GB (70,200 solution files)

After stage 1, we’ll have a functional bot which handles remaining stages using low-quality live-solving or ratholing. Getting to this point requires 71 days and 875.5 GB disk space.

Things looks dismal when we consider all stages: the complete dataset would require a total of 6 years and 87 TB!

Clearly, we’ll need to split computation over several machines to avoid waiting years. So we’ll design our generation script to allow splitting work into subsets.

Generation Script

To start building our solution dataset, we’ll write a Python script that iterates over all combinations of inputs and runs the solver each time. It should have the following features:

TODO: Finish this

Playing Postflop

TODO: Add this

Mapping Isomorphic Flops

The 1755 strategically unique flops enumerate all rank combinations but are grouped according to suit patterns:

What this means for the bot is that all flops within a subset will be played the same way. It will play 4♣3♣2♣ in exactly the same way as 4♥3♥2♥ because both belong to the subset 432_monotone. We just need to store a precomputed solution for 4♣3♣2♣.

Similarly, it would play 2♣2♦2♥ in the same way as 2♠2♦2♥ since both belong to the subset 222_rainbow.

We just need an algorithm for mapping from the suit dealt on the poker site to the suit we have a precomputed solution for.

Let’s say the bot is dealt the flop K♣Q♥9♠ but our precomputed solution is for K♥Q♦9♣. By examining the two flops, we can come up with the following suit mapping:

There is one issue though. We’ve converted hearts to diamonds but we did not map the now-displaced diamonds to another suit. So our deck of 52 cards effectively ends up with duplicates. The issue can be immediately noticed if you assume the bot has the hole cards 5♥5♦. The above mapping would result in an impossible hand: 5♦5♦

Which suit do we map diamonds to? The only one which is currently displaced in our mapping above: spades.

TODO: Finish this

Multi-way Strategy

TODO: Add this

Open Questions


  1. A range is a collection of possible hands a player could have at any given moment.↩︎

  2. Bet sizes are specified in terms of percentage of the pot and raises are multiples of the opponent’s bet size.↩︎

  3. All solutions are computed on a c6a.24xlarge EC2 instance (96 vCPU and 192 GB memory). The flop used is A♥5♦2♣ with standard GTO ranges and 100 BB stacks.↩︎

  4. Terminated early at a 1000 iteration limit with 0.25% exploitability.↩︎

  5. 40 BB, 50 BB, 60 BB, 70 BB, 80 BB, 90 BB↩︎

  6. 125 BB, 150 BB, 200 BB, 300 BB↩︎