The Book of R (Question 20.2) Please answer using R code.
Continue using the survey data frame from the package MASS forthe next few exercises.
- The survey data set has a variable named Exer , a factor with k= 3 levels describing the amount of physical exercise time eachstudent gets: none, some, or frequent. Obtain a count of the numberof students in each category and produce side-by-side boxplots ofstudent height split by exercise.
- Assuming independence of the observations and normality asusual, fit a linear regression model with height as the responsevariable and exercise as the explanatory variable (dummy coding).What’s the default reference level of the predictor? Produce amodel summary.
- Draw a conclusion based on the fitted model from (b)—does itappear that exercise frequency has any impact on mean height? Whatis the nature of the estimated effect?
- Predict the mean heights of one individual in each of the threeexercise categories, accompanied by 95 percent predictionintervals.
- Do you arrive at the same result and interpretation for theheight-by-exercise model if you construct an ANOVA table using aov?
- Is there any change to the outcome of (e) if you alter themodel so that the reference level of the exercise variable is“none”? Would you expect there to be?
Now, turn back to the ready-to-usemtcars data set. One of the variables in this data frame is qsec ,described as the time in seconds it takes to race a quarter mile;another is gear , the number of forward gears (cars in this dataset have either 3, 4, or 5 gears).
- Using the vectors straight from the data frame, fit a simplelinear regression model with qsec as the response variable and gearas the explanatory variable and interpret the model summary.
- Explicitly convert gear to a factor vector and refit the model.Compare the model summary with that from (g). What do youfind?
- Explain, with the aid of a relevant plot in the same style asthe right image of Figure 20-6 why you think there is a differencebetween the two models (g) and (h).