Ken Kleinman

Biostatistician

  • Increase font size
  • Default font size
  • Decrease font size

Predicted Probabilities

E-mail Print PDF

Several models are commonly used for dichotomous outcomes in medicine.  Most of these are versions of the generalized linear model.  These include the ubiquitous logistic regression (binomial or Bernoulli outcome with the canonical logit link), the so-called log-binomial model (binomial or Bernoulli outcome with a log link), and the linear-binomial model (binomial or Bernoulli outcome with an identity link).  The binomial model with either the log or identity link often fails to converge, and it is sometimes suggested that the Poisson model be using in this case, either the log-Poisson or linear-Poisson model. 

The wisdom of using the Poisson model aside, the log link model for either binomial of Poisson outcomes has the advantage that (when it fits) the parameter estimates reflect the log relative risk, rather that the log relative odds found in logistic regression.  Odds ratios are notoriously hard to get an intuitive feel for-- they are a ratio of a ratio, after all.  The risk ratio is much easier to grasp.  The results of the linear link model are even more attractive, as the parameter estimates are simple differences in probabilities.

Nonetheless, for all of these models, the best way to present and understand the results is through the predicted probabilities.  These are just the probability of the outcome suggested by some values of the covariates multiplied with the parameter estimates, and passed back through the link function. 

To be concrete, I'll illustrate with the logistic model.  Suppose we want to model homeless status as a function of age and gender.  We can write the logistic regression as:

logit(Pr[homeless|age, gender]) = \beta_0 + \beta_1 age + \beta_2 male

where logit(p) = log\left(\frac{p}{1-p}\right) and male = 1 if the subject is male and 0 otherwise.  When we fit the model, we'll find estimates for the \betas, which we'll designate as \hat{\beta}

If I want to estimate the probability based on the model, I invert the logit function:

Pr[homeless|age, gender] = expit[\hat{\beta}_0+ \hat{\beta}_1 age + \hat{\beta}_2 male}].

You can find the expit function through a little algebra: expit(x) = \frac{e^x}{1+e^x}, so that

Pr[homeless|age, gender] = \frac{e^{\hat{\beta}_0+ \hat{\beta}_1 age + \hat{\beta}_2 male}} {1 + e^{\hat{\beta}_0+ \hat{\beta}_1 age + \hat{\beta}_2 male}}

 

So suppose the intercept has an estimated value of -7 and that \hat{\beta}_1 = 0.01.  Then for a female subject aged 50, the predicted probability of homelessness would be e-6.5/(1 + e-6.5) = 0.0015.  Since the type of subject we're predicting for is female, \hat{\beta}_2  is multiplied with 0, and we didn't even need to know its value.  Now, suppose that \hat{\beta}_2 = 2-- in this population, men are much more likely to be homeless than women.  For a 50 year-old man, the predicted probability of homelessness is e-4.5/(1 + e-4.5) = 0.01. 

The model also suggests that the estimated odds ratio of homelessness for men vs. women is e2 = 7.38.  That's a very large odds ratio.  (It's also similar to the risk ratio of 0.01/0.0015 = 6.66.)  If you're used to looking at odds ratios, you might even be surprised that the probability of homelessness for men is so small.  The strength of the predicted probabilities is that they reveal not just the ratio, but the actual proportion of people we'd expect to have the outcome, for a given set of covariates.

 

Here is a presentation I've given several times about predicted probabilities, odds ratios, and risk ratios.

Here is a poster (delivered at the 2011 North Amrican Congress of Epidemiology in Montreal) outlining a diagnostic plot (with SAS and R code) I developed to help decide whether to use logistic, log-binomial, or linear-binomial regression for dichotomous outcomes.

 

The following papers illustrate some approaches to using and reporting the predicted probabilties.

 

Smith LA, Bokhour B, Hohman KH, Miroshnik I, Kleinman KP, Cohn E, Cortes DE, Galbraith A, Rand C, Lieu TA. Modifiable risk factors for suboptimal control and controller medication underuse among children with asthma. Pediatrics 2008; 122:760-769

Huang SH, Hinrichsen VL, Stevenson AE, Rifas-Shiman SL, Kleinman K, Pelton SI, Lipsitch M, Hanage WP, Lee GM. Continued Impact of Pneumococcal Conjugate Vaccine on Carriage in Young Children. Pediatrics 2009;124; e1-e11

Gillman MW, Rifas-Shiman SL, Kleinman K, Oken E, Rich-Edwards JW, Taveras EM. Developmental origins of childhood overweight: Potential public health impact. Obesity (Silver Spring) 2008; 16:1651-1656

Oken E, Kleinman KP, Belfort MB, Hammitt JK, Gillman MW. Associations of gestational weight gain with short- and longer term maternal and child health outcomes. American Journal of Epidemiology 2009, 170: 173-180

 

 

Contents