# Logistic regression

In this next exercise, we will again use five independent variables to predict the probability of good health (health), but instead of using bmi as a continuous variable, we will use the variable bmicat, which classifies the women based on BMI values as normal weight, overweight, obese, or morbidly obese. In the opening logistic regression dialog box, put health into the Dependent slot, and move smoker, bmicat, stressed, worknow, and age into the slot for Covariates. Click the Categorical pushbutton and move bmicat into the slot for Categorical Covariates, leaving the default options of indicator coding using the last category (morbidly obese) as the reference group. Click the Options pushbutton and select Hosmer-Lemeshow goodness of fit and CI for Exp(B), 95%. Then click Continue and OK to run the analysis and answer the following questions: (a) How many new variables were created to be predictors for the bmicat variable? How were morbidly obese women coded on these new variables? How many cases were classified as morbidly obese? (b) What do the results suggest about the goodness of fit of the overall model, using both the likelihood ratio and the Hosmer-Lemeshow tests? (c) What was the value of the Nagelkerke R2 in this analysis? How does this compare to the value obtained in Exercise B3 when the continuous bmi was used as a predictor? (d) What percentage of cases was correctly classified in this analysis? Compare the classification success obtained here with that obtained using the continuous BMI variable (Exercise B3). (e) Which independent variables were significant predictors of good health? (f) Comment on the pattern of odds ratios for the bmicat variables. What does the pattern suggest about the assumption of linearity between the original BMI values and the logit for being in good-to-excellent health?

Exercise B3

In this next exercise, we will use five predictors to predict the probability of good health (health) in a standard logistic regression: The predictors include smoking status (smoker) and four additional predictors, which include the woman’s age (age), whether or not she is currently employed (worknow), her body mass index (bmi), and how much stress she has been experiencing (stressed). In the opening logistic regression dialog box (Analyze ➜ Regression ➜ Binary Logistic), move health into the Dependent slot, and move the five predictors into the slot for Covariates. Click the Options pushbutton and in the next dialog box select Hosmer-Lemeshow goodness of fit, Casewise listing of residuals, and CI for Exp(B), 95%. Then click Continue and OK to run the analysis and answer the following questions: (a) In the null model, what percent of the cases were correctly classified? Comment on the nature of the misclassifications. (b) In the null model, what was the odds ratio for being in good-to-excellent health? (c) What is the value of the likelihood ratio chi-square statistic for the omnibus test of the model? Was this statistically significant? (d) What was the value of

2LL for the full model? Using this information and the value of the model chi-square statistic, compute the value of

2LL for the null model. (e) What were the values of the pseudo R2 statistics? (f) What was the value of the Hosmer-Lemeshow chi-square test? Was this value statistically significant? What does this suggest? (g) What percentage of cases was correctly classified with the full model? Comment on the degree of improvement over the null model. (h) Based on the Wald statistics, which independent variables were significantly predictive of the women’s health status? (i) Interpret the meaning of the OR for age in this analysis. (j) Interpret the meaning of the OR for worknow. (k) According to the Casewise listing panel, how many cases were outliers that exceeded the criterion of 2.58 (absolute value)? Were these cases correctly classified?