# Optimal Blackjack Strategy Using MDP Solvers

On the other hand, the naive strategy has an expected loss of a much larger $1.6, with a standard deviation of 1¢ on that estimate. Using ttest2, we can assert that the MDP strategy is better with near perfect confidence. While the difference between the MDP performance and that of the cheat sheet is very marginal, the likeliest for why is performs worse is that I made a small error in painstakingly filling out the almost two million state transitions probabilities in $P$. However, what is clearly evident is that the exercise does generate a near optimal strategy for playing Blackjack, one that performs much better than a naive decision policy. ## Conclusion The purpose of this project was to get familiar with using Markov Decision Processes to provide optimal strategies in discrete, finite state stochastic environments. Blackjack seemed to be a perfect candidate to try the algorithm on. To formulate the game in a manner acceptable to an MDP solver, I first specified the probabilities for transitioning from each gamestate to every other under all actions using either explicit analysis or Monte Carlo simulation. I then specified the rewards for transitioning from game states to terminal states under the different actions. With these matrices specified, I used the MDP solver to provide the optimal strategy. I compared this strategy to a widely available “Blackjack Cheat Sheet”, which claims to be the optimal blackjack strategy. Repeatedly simulating the strategies in the game of Blackjack, I found that both the MDP’s strategy and that of the cheat sheet lose only about 5¢ per$10 bet on average, while a more naive strategy loses \$1.6 on the same bet. According to wizardofodds.com, the “house advantage” for a game of Blackjack with the assumed rules is around 0.5%. The house advantage indicates the expected value of the dealer assuming the player is playing optimally, and thus provides evidence that the strategy given by the MDP is indeed close to optimal.

I did this project with the help of Dilip Ravindran, a close friend and graduate student at Columbia Economics.