Background to logistic regression

In the last chapter we started to look at how we fit models of the relationships between categorical variables. We have also seen throughout the book how we can use categorical variables to predict continuous outcomes. However, we haven't looked at the reverse process: predicting categorical outcomes from continuous or categorical predictors. In a nutshell, logistic regression is multiple regression but with an outcome variable that is categorical and predictor variables that are continuous or categorical. In its simplest form, this means that we can predict which of two categories a person is likely to belong to given certain other information. A trivial example is to look at which variables predict whether a person is male or female. We might measure laziness, pig-headedness, alcohol consumption and daily flatulence. Using logistic regression, we might find that all of these variables predict the gender of the person. More important, the model we build will enable us to predict whether a new person is likely to be male or female based on these variables. So, if we picked a random person and discovered that they scored highly on laziness, pig-headedness, alcohol consumption and flatulence, then our model might tell us that, based on this information, this person is likely to be male. Logistic regression can have life-saving applications. In medical research it is used to generate models from which predictions can be made about the likelihood that a tumour is cancerous or benign (for example). A database of patients is used to establish which variables are influential in predicting the malignancy of a tumour. These variables can then be measured for a new patient and their values placed in a logistic regression model, from which a probability of malignancy could be estimated. If the probability value of the tumour being malignant is low then the doctor may decide not to carry out expensive and painful surgery that in all likelihood is unnecessary. We might not face such life-threatening decisions, but logistic regression can nevertheless be a very useful tool. When we are trying to predict membership of only two categorical outcomes the analysis is known as binary logistic regression, but when we want to predict membership of more than two categories we use multinomial (or polychotomous) logistic regression.