Estimating Skill in Short-run Poker: Card Luck Factors

The following is an article from Two Plus Two Internet Magazine, Vol. 4, No. 6 (= June 2008 Edition). Parts of it are clearly outdated, as poker bots have massively improved since then.

As we all know, short-term success in poker depends in large part on luck. This is true for other sports as well, and at least poker commentators understand it whereas soccer commentators, for example, often don't. However, the luck component in poker is much larger. Roger Federer won the past five Wimbledon tournaments, a feat that would be unthinkable if Wimbledon were a poker rather than a tennis tournament.

If someone does manage to win a number of poker tournaments, say Phil Hellmuth, a debate may ensue regarding to how much of it was luck and how much was skill. There is one type of discussion in poker forums that seems to go on forever, with no progress being made. On one side we find the Hellmuth-LivePro-faction, arguing Hellmuth's results prove he is very good. On the other side the AntiHellmuth-InternetPro-faction, retorting there are so few live tournaments that their results can't tell us much about poker skill, there is bound to be someone running hot over these few tournaments, and Hellmuth is bad simply because we can see him making bad plays.

The last argument "this successful player is bad as we can see him/her making bad plays" is fundamentally problematic. Important aspects of poker strategy are not settled enough to argue in such a way. Furthermore, even if some concepts are really obvious, someone could deliberately and skillfully ignore them in order to confuse opponents or to obtain a certain table image for future gains on the same table. Therefore, to estimate someone's skill there is no alternative to using their results. But what can we do if not enough results data is available to constitute a "long run"?

When it comes to examination of the luck factor, poker actually offers an important advantage. The short-run luck element is large, but much of it is "explicit" and (sometimes) exactly reproducible: the cards, which can only fall in a known finite number of ways.

For the sake of simplicity, this article will focus on the currently most popular variant of poker, No-Limit Hold 'em. In a televised NLHE tournament, players' hole cards become visible. We have to find out how players "should" fare for a given fall of the cards. This being no-limit, it seems sensible to express the result as a card luck factor, denoting the percentage of his/her stack at the start of the hand that a player would be expected to hold after its conclusion. As an extreme example, consider a scenario where two players' stacks are of the same size and not too deep, one is dealt pocket aces and the other pocket kings, and the board comes non-threatening king-high. Here, for the unfortunate player holding aces the card luck factor should be zero, whereas the lucky player with kings should at least double up (depending on who else comes along in the hand): a card luck factor equal or greater than two.

For every combination of board cards dealt and players' hole cards we would like to have a collection of card luck factors, one for each player. His/her card luck factors could then be compared to how a player did in the real game, for a single hand or, by multiplying the factors of each hand and comparing the result to the actual development of the player's stack, over the course of a longer game. All card luck factors could be put into a card luck table, in a manner akin to the compilation of logarithm tables.

How would card luck factors be determined? For a given card constellation, a poker expert may have a good idea of what the factors should be. Yet as there are so many combinations we need to automate the process. Also, we may want "objective", reproducible results. So we feed the cards and stack sizes to suitable poker bots (poker-playing computer programs), one representing each player; let them play out the hand; and take the result. But, as the term "bot" is often associated with cheating, a better name for them might be hand simulator.

If the actual hand didn't reach the river then we face the problem that not all board cards are known. So we should calculate card luck factors for every possible completion of the board and take the average. Unfortunately, that may be a lot to calculate. Another strain on resources is the size of card luck tables, as there are so many possible card combinations. And what's more, every combination of stack sizes (measured in relation to the blinds) needs its own card luck table. (Incidentally, if we deal with limit rather than no-limit, stack sizes won't usually matter, so we need only one card luck table and want to add/subtract rather than multiply, i.e. instead of something like "this card constellation should cost you ten percent of your stack" we might be saying something like "this card constellation should cost you three big bets".) Where storage is a bigger hurdle than computing power, we can resort to determining card luck factors on the fly, whenever their card combinations occur. If both present insurmountable hurdles, we need to somehow simplify the methodology, sacrificing some accuracy in favor of efficiency. But that's a topic I will leave to others.

The poker expert's role is to verify that the card luck factors arrived at in this way make sense, and to adjust the hand simulators if they don't. What must hand simulators know? Most importantly, of course, they need an idea of how much equity they can assume for their hand; and whether the pot size, compared to the stack sizes, means they are committed to the pot with this equity. If not, they should only be prepared to invest up to a certain limit and fold if faced with more bets. In the example above with aces, kings, shallow stacks and a safe board, both have to know it's time to get all-in. But if stacks are deep, they have to be able to fold a high overpair in the face of extreme strength.

This leaves many questions as to how exactly they should play. How much should they bet? Which hands would they regard as drawing hands that's better to flatcall than to raise with? This article won't try to answer such questions in detail. Instead it will point out how to decide what is important for hand simulators.

Creating good hand simulators is not necessarily the same task as creating good poker-playing programs. Hand simulators don't have to compete; their purpose is to help calculate meaningful card luck factors. So, if someone asks us whether hand simulators need to know a certain weapon out of the poker player's arsenal, our answer is: no, by default they don't, except if you show us card combinations that would get assigned wrong card luck factors otherwise.

For example, if they merrily bet and raise up to a certain limit and then give up, won't that make them exploitable? What about pot control? Show us a fall of cards where lack of pot control causes hand simulators to produce wrong card luck factors, then and only then do we have to teach them pot control. And as for exploitability, hand simulators playing optimal unexploitable game-theory strategy would obviously be ideal for the job (in fact, they could be used to theoretically define card luck factors: in that sense good hand simulation and good poker play converge at the top); but failing that, nobody is out to exploit them anyway as they don't compete.

Now, imagine deep stacks and a fall of cards where a small pocket pair is up against a collection of unpaired non-premium hands, all missing the board. If hand simulators only play straightforward, the pair will always take down the hand and its card luck factor won't depend on whether it flops a set or not. But it should. Without a set the pair is so vulnerable to getting bluffed out of the hand that a lower factor would usually seem in order. To capture this effect, hand simulators are required to bluff occasionally. That's a weapon they can't do without. "Occasionally" means a random component, thus we have to take the average on the results after playing the hand many times instead of just once, so often that this average will come out roughly the same if we rerun the whole experiment. An alternative route might be to classify all "bluffing constellations" (who bluffs when), play out each of them only once, and use a suitable weighted average.

In the end, it may well turn out that hand simulators do indeed need to know quite a bit about poker strategy. In addition to made hands, they must be able to bet very strong draws: otherwise hands flopping these draws will get assigned wrong card luck factors. They should know about position and certain common moves such as stealing on the button or re-stealing from the big blind: otherwise aces under the gun could get the same card luck factor as aces on the button although they are usually worth even more on the button, from where a raise may invite suspicion and desirable resistance. To capture the effect of a river card giving someone a second-best hand who hadn't any hand before, some kind of slowplaying (or maybe alternatively bluff-catching) seems required. And so on.

Apart from the fall of the cards, one important luck factor in a poker tournament is the table draw. But once we had card luck factors at our disposal, we could undertake further calculations on players' performances and ultimately arrive at a situation comparable to chess, where Elo ratings indicate players' skill. In such a world it would be easy to measure the luck factor of a particular table draw.

Of course, we cannot really hope ever to measure all imaginable components of poker luck. For example, not all hands are equal: there are particular hands when you choose to make certain plays. You don't want your opponent to sit on a monster hand at exactly the time you are bluffing big. Card luck factors cannot capture the special importance of such chosen moments.

Also, there is the little matter of implementing it all. As indicated above, trade-offs between accuracy and efficiency may be necessary. Still, I think an approach somewhere along these lines is the way forward if we want to estimate the skill element in short-run poker. And maybe, at some point in the future, a television presentation of a no-limit poker game will not only show players' hole cards, and equity percentages for each street of each hand, but also card luck factors; and let viewers compare them to players' results.