Win Odds Confidence Intervals in R
Continuous distributions
Mann-Whitney estimate for the win probability
Consider two independent, continuous RVs (random variables) \(\xi\) and \(\eta\). The following probability is called the win probability of RV \(\eta\) against the RV \(\xi\)
\[\begin{align*} \theta=P(\eta>\xi). \end{align*}\]
Given an i.i.d. (independent, identically distributed) random sample from \(\xi,\) denoted by \(X=(X_1,\cdots,X_{n_1})\) and an i.i.d. sample from \(\eta\) denoted by \(Y=(Y_1,\cdots,Y_{n_2})\) we are interested in estimating the unknown parameter \(\theta.\)
The following estimator is called the Mann-Whitney estimator (or, the win proportion) for the win probability
\[\begin{align*} \hat\theta_N=\frac{1}{n_1n_2}\sum_{i=1}^{n_1}\sum_{j=1}^{n_2}I(X_i<Y_j). \end{align*}\] Here \(N=n_1+n_2\) is the total sample size, whereas \(I(\cdot)\) is the indicator function which takes the value 1 if the underlying inequality is true and 0 otherwise. When \(n_1\rightarrow+\infty,\,n_2\rightarrow+\infty\) then the win proportion is a consistent estimator (convergence in probability) for the win probability \[\begin{align*} \hat\theta_N\longrightarrow\theta. \end{align*}\] If \(\frac{n_1}{N}\rightarrow \alpha,\) as \(n_1\rightarrow+\infty,\,n_2\rightarrow+\infty,\) then the win proportion is also asymptotically normal (Van der Vaart 2000) \[\begin{align*} \sqrt{N}(\hat\theta_N-\theta)\Longrightarrow{\mathcal N}\left(0,\frac{1}{1-\alpha}\sigma_{10}^2+\frac{1}{\alpha}\sigma_{01}^2\right). \end{align*}\] Here, \[\begin{align*} \sigma_{10}^2=Cov(I(\xi<\eta),I(\xi'<\eta))=P(\xi<\eta,\xi'<\eta)-P(\xi<\eta)^2,\\ \sigma_{01}^2=Cov(I(\xi<\eta),I(\xi<\eta'))=P(\xi<\eta,\xi<\eta')-P(\xi<\eta)^2, \end{align*}\] where \(\xi'\) has the same distribution as \(\xi\), \(\eta'\) has the same distribution as \(\eta\). All \(\xi,\xi',\eta,\eta'\) are independent.
Application to exponential distributions
Suppose now that \(\xi\sim{\mathbb E}(\lambda)\) and \(\eta\sim{\mathbb E}(\mu).\) In this case,
\[\begin{align*} \sigma_{10}^2=\frac{2\lambda}{(\lambda+\mu)(2\lambda+\mu)}-\frac{\lambda^2}{(\lambda+\mu)^2}=\frac{\lambda^2\mu}{(\lambda+\mu)^2(2\lambda+\mu)}\\ \sigma_{01}^2=\frac{\lambda}{\lambda+2\mu}-\frac{\lambda^2}{(\lambda+\mu)^2}=\frac{\lambda\mu^2}{(\lambda+\mu)^2(\lambda+2\mu)}, \end{align*}\] therefore, the asymptotic variance of the win proportion will be \[\begin{align*} \sigma^2=\frac{\lambda\mu}{(\lambda+\mu)^2}\left(\frac{1}{1-\alpha}\frac{\lambda}{2\lambda+\mu}+\frac{1}{\alpha}\frac{\mu}{\lambda+2\mu}\right). \end{align*}\]
We can check this by the following simulations
n1 <- 700
n2 <- 100
N <- n1 + n2
m <- 1000
lambda <- 2
mu <- 10
alpha <- n1/(n1+n2)
k <- lambda/(lambda+mu)
WR <- NULL
for(i in 1:m){
set.seed(i)
X1 <- rexp(n1, lambda)
X2 <- rexp(n2, mu)
d <- expand.grid(x = X1, y = X2)
d$w <- ifelse(d$y > d$x, 1, ifelse(d$y == d$x, 0.5, 0))
WR[i] <- sqrt(N)*(mean(d$w) - k)
}
x0 <- 3
int <- seq(-x0, x0, 0.001)
Coeff0 <- mu*lambda/(lambda + mu)^2
Coeff <- Coeff0*(1/(1-alpha)*lambda/(2*lambda+mu) + 1/alpha*mu/(lambda + 2*mu))
hist(WR, nclass = 20, freq = FALSE, xlim = c(-x0, x0),
ylim = c(0, 1.1), col = "lightblue", border = "blue")
lines(int, dnorm(int, mean = 0, sd = sqrt(Coeff)), col = "2")
Definition of the win odds
Consider two independent, continuous RVs (random variables) \(\xi\) and \(\eta\). The odds of the win probability is called the win odds of RV \(\eta\) against the RV \(\xi\)
\[\begin{align*} \omega=\frac{P(\eta>\xi)}{P(\eta<\xi)}=\frac{\theta}{1-\theta}. \end{align*}\]
The Mann-Whitney estimate of the win probability can be transformed by the function \(f(x)=\frac{x}{1-x},\ \ x\in(0,1)\) to get an estimate for the win odds. Using the same transformation and the asymptotic normality of the Mann-Whitney estimate it is possible to construct asymptotic confidence intervals for the win odds, for a given asymptotic confidence level.
If \(\omega=1\) then the random variables \(\xi\) and \(\eta\) are stochastically equivalent, while \(\omega>1\) means that \(\eta\) is stochastically greater than (wins against) \(\eta.\) The case \(\omega<1\) means that \(\eta\) loses against \(\xi\). The asymptotic confidence interval of the win odds can be used to test the hypothesis whether \(\omega=1\).
To use the asymptotic normality result described above we need to estimate the asymptotic variance. The package sanon in R allows to estimate the asymptotic standard error of the Mann-Whitney estimate.
Ordinal random variables
The package sanon
We will be using the package sanon (Kawaguchi and Koch 2015)
#install.packages("sanon")
library(sanon)
The dataset resp contains data from a randomized clinical trial to compare a test treatment to placebo for a respiratory disorder.
data(resp, package = "sanon")
head(resp)
## center treatment sex age baseline visit1 visit2 visit3 visit4
## 1 1 A F 32 1 2 2 4 2
## 2 2 A F 37 1 3 4 4 4
## 3 1 A F 47 2 2 3 4 4
## 4 2 A F 39 2 3 4 4 4
## 5 1 A M 11 4 4 4 4 2
## 6 2 A F 60 4 4 3 3 4
The column visit4 is a numeric vector for patient global ratings of symptom control according to 5 categories (4 = excellent, 3 = good, 2 = fair, 1 = poor, 0 = terrible), measured at visit 4. To compare the effect of active treatment against the placebo we will use the win probability, which, as we defined previously, is an unknown theoretical quantity. The null hypothesis is that there is no treatment difference which can be written in terms of the win probability as \(\theta=0.5.\) The Mann-Whitney estimate of the win probability can be calculated as follows
fit <- sanon(visit4 ~ grp(treatment, ref="P"), data = resp)
fit
## Call:
## sanon.formula(formula = visit4 ~ grp(treatment, ref = "P"), data = resp)
##
## Sample size: 111
##
## Response levels:
## [visit4; 5 levels] (lower) 0, 1, 2, 3, 4 (higher)
##
## Design Matrix:
## [,1]
## visit4 1
##
## Mann-Whitney Estimate
## for comparison [ A / P ] :
## visit4
## 0.6174
confint(fit)
## M-W Estimate and 95% Confidence Intervals
## :
## Estimate Lower Upper
## visit4 0.6174 0.5173 0.7176
fit$p
## [,1]
## [1,] 0.02150601
The p-value based on the asymptotic confidence interval of level 0.05 is less than 0.05, hence the null hypothesis of no treatment difference is rejected. The Mann-Whitney estimate can be transformed to get an estimate for the win odds.
confint(fit)$ci/(1-confint(fit)$ci)
## Estimate Lower Upper
## visit4 1.614013 1.071762 2.540746
For the win odds the null hypothesis of no treatment difference is \(\omega=1.\) The win odds 1.61 characterizes the treatment effect difference.