Compute expected (remaining) wins in a playoffs series, given current state of the series

47 Views Asked by At

Consider a basketball series, best 4 out of 7. In R, we have the following function for computing the expected number of wins in such a series for a team with a certain single-game win probability wp_a:

get_expected_wins <- function(wp_a = 0.50, num_games = 7, to_win = 4) {
  # compute expected wins for team_a
  # wp_a: a team's odds to win a single game
  # num_games: the maximum number of possible games remaining in the series
  # to_win: how many more games a team needs to win the series
  # 7,4 correspond to winning a best 4 out of 7 series
  
  # expected wins for the team
  prob_to_win_n_games <- dbinom(x = 0:num_games, size = num_games, prob = wp_a)
  num_wins <- c(0:to_win, rep(to_win, num_games - to_win))
  ewins <- sum(prob_to_win_games_a * num_wins)
  
  # and return
  return(ewins)
}

In the function, prob_to_win_n_games should be the team's probability of winning 0, 1, 2, up to num_games number of games. Consider a playoff series where a team is trailing 0-3, and we are trying to compute their expected remaining number of wins in the series. Keep in mind that 1 more loss by the team would end the series. We want to call get_expected_wins(0.5, 4, 4)

In this series, this team has a 50% chance of winning 0 more games (lose the next game), 25% to win 1 game (win, then lose), 12.5% to win 2 games (win, win, then lose), 6.25% to win 3 games (win, win, win, then lose) and 6.25% to win 4 games (win 4x). Their expected wins in the series is then 0.5*0 + 0.25*1 + 0.125*2 + 0.0625*3 + 0.0625*4 = .9375

In this example, num_games = 4 and to_win = 4, and prob_to_win_n_games is incorrectly computed as 0.0625 0.2500 0.3750 0.2500 0.0625. The binomial fails to account for the series ending after an additional loss. It computes a 25% chance of 3 wins, based on the calculation (4 choose 3) * (0.5 ^ 4), however 3 of the 4 possible sequences (L W W W, W L W W, W W L W) are not possible in our theoretical playoff series where one additional loss by the team would end the series. Only W W W L gets the team to 3 wins.

How can we update this function to correctly compute a team's probability of winning a certain number of games, given the parameters we set for the playoff series.

2

There are 2 best solutions below

2
jblood94 On BEST ANSWER

If the number of wins is less than to_win, you have to subtract 1 from the top number in the binomial coefficient (first argument of choose) from what dbinom would give.

The reason for this is that the only way to lose a series is to lose the final game of the series. There is no other restriction on the ordering of the wins/loses for the loser. This means the wins for the series loser can be distributed among all but the last game, which is why we must subtract one from the top number in the binomial coefficient.

This will return the probability of seeing 0:to_win wins:

get_expected_wins <- function(wp_a = 0.50, num_games = 7, to_win = 4) {
  i <- to_win:1
  wins <- choose(num_games - i, to_win - i)*wp_a^(to_win - i)*(1 - wp_a)^(num_games - to_win + 1)
  setNames(c(wins, 1 - sum(wins)), 0:to_win)
}

get_expected_wins(0.5, 7, 4)
#>       0       1       2       3       4 
#> 0.06250 0.12500 0.15625 0.15625 0.50000
get_expected_wins(0.5, 6, 3)
#>       0       1       2       3 
#> 0.06250 0.12500 0.15625 0.65625
get_expected_wins(0.5, 4, 4)
#>      0      1      2      3      4 
#> 0.5000 0.2500 0.1250 0.0625 0.0625

Alternatively,

get_expected_wins <- function(wp_a = 0.50, num_games = 7L, to_win = 4L) {
  k <- 0:(to_win - 1L)
  n <- (num_games - to_win + 1L):num_games
  wins <- dbinom(k, n, wp_a)*(n - k)/n
  setNames(c(wins, 1 - sum(wins)), 0:to_win)
}
0
Canovice On
df_approach <- function(p, num_games, to_win) {
  to_lose <- num_games - to_win + 1
  
  rows <- 2^num_games
  games_df <- data.frame(rows = 1:rows, wp = 1, wins = 0, loss = 0)
  for (i in 1:num_games) {
    games_df[paste0('gm', i)] <- c(rep(1, 2^(i-1)), rep(0, 2^(i-1)))
    games_df[paste0('p', i)] <- c(rep(p, 2^(i-1)), rep(1-p, 2^(i-1)))
    if (i == 1) {
      games_df$wp <- c(rep(p, 2^(i-1)), rep(1-p, 2^(i-1)))
    } else {
      games_df$wp <- zed$wp * c(rep(p, 2^(i-1)), rep(1-p, 2^(i-1)))
    }
  }
  
  for (i in 1:num_games) {
    col_key <- paste0('gm', i)
    games_df <- games_df %>%
      dplyr::mutate(wins = ifelse(wins == to_win | loss == to_lose, wins, wins + !!sym(col_key) )) %>%
      dplyr::mutate(loss = ifelse(wins == to_win | loss == to_lose, loss, loss + (1 - !!sym(col_key)) ))
  }
  
  ewins <- sum(games_df$wp * games_df$wins)
  return(ewins)
}

Before @jblood94 posted his solution, I took a pass at a brute force approach that builds a dataframe with every possible playoff sequence, computes the odds of each playoff sequence occurring, adding up the wins with the constraint of the playoff series ending after 4 wins/4losses, and computing the sumproduct of the wins in each sequence * the odds of each sequence.

I just compared df_approach() to get_expected_wins() and the outputs are the same for every different set of parameters that I have tried.