Question:
3. Identify all the variables and their types (i.e., nominal, ordinal, count or continuous; don’t worry about distinguishing ratio versus interval).
4. Construct an appropriate histogram of initial HDL levels. Comment on any features you see.
5. The drugs are designed to raise HDL (which correspondingly lowers LDL cholesterol, the “bad” kind). Create an appropriate plot comparing initial to final HDL measurements.
6. The line of best fit relating final HDL to initial HDL is: final.HDL = 18.56 + 0.74?initial.HDL,and the correlation between them is 0.6945.a. What proportion of variability in final HDL is attributable to initial HDL measurements?
b. For someone with initial HDL of 50 mg/dL, what final HDL would you predict for them if they took one of these drugs?
8 An analysis of variance (ANOVA) procedure is suggested to assess differences between the drugs’ effects on HDL levels. Calculate the standard deviations of the HDL differences derived in Question 5 for the groups of patients taking each drug. What do these standard deviations help us assess, in terms of the appropriateness of an ANOVA for this data?
Answer:
1. This idea suggested by the researcher of assigning Drug C to all old patients is not good. This is because if all the old patients are given Drug C, there would not be any way of comparing the claim made by the junior researcher. If the Drug C in old people does indeed have some positive impact, then it needs to be compared with the other drugs which may or may not be equally effective or even more. Besides, it may be possible that Drug C may not work best for old people but for age groups that are younger. This needs to be established through statistical research and experimentation as the statement by the junior researcher is a mere assertion with no valid support.
2. In the event that all the recruited patients are women, then the external validity of the results would be severely curtailed as the results cannot be generalised for males and therefore the conclusion would be limited only to female gender. Hence, it is imperative that a healthy mix of both genders must be included in the study so that external validity of the study could be extended to human and not remain limited to a specific gender.
3. The various variables in the given case along with their type are listed below.
Drug – This is a nominal data type as it captures labelling as A, B, C ,D and E and these do not denote any particular order.
AgeGp – This is an ordinal data type as there are three divisions namely 40-49,50-64 and 65+ and it is apparent that these can be arranged in an order from lower to higher age.
Initial HDL – This is a count data type since the data only has whole numbers as the values and does not take decimal values and hence cannot be considered as continuous.
Final HDL – This is a count data type since the data only has whole numbers as the values and does not take decimal values and hence cannot be considered as continuous.
4. The requisite histogram is shown below.
From the above, it is apparent that initial HDL distribution is non-normal as the distribution is skewed towards the right and hence a positive skew exists. However, the data seems unimodal as only a single peak exists.
5. The requisite plot to summarise the given data is shown below.
6. It is known that the correlation coefficient is 0.6945
Hence, coefficient of determination = (0.6945)^{2} = 0.4833
The above value indicates that 48.33% of the variability in final HDL is attributable to the initial HDL measurements.
The line of best fit is stated as shown below.
Final HDL = 18.56 + 0.74*Initial HDL
As per the question, Initial HDL = 50mg/dL
Hence, Final HDL = 18.56 + 0.74*50 = 55.56 mg/dL
7. The new variable is called “Change in HDL Levels”. The requisite plot is shown below.
8. The requisite standard deviation has been computed for each drug as shown below.
Differences in HDL LEVEL | ||||
Drug A | Drug B | Drug C | Drug D | Drug E |
6 | -2 | 2 | 7 | -4 |
12 | 3 | 12 | 5 | 2 |
10 | -2 | 11 | 12 | -19 |
9 | 4 | 0 | -9 | -9 |
4 | 19 | 11 | 13 | 3 |
-4 | -8 | 6 | -8 | -1 |
-5 | -7 | 23 | 10 | 5 |
6 | 14 | 10 | 1 | 8 |
12 | 7 | 17 | 11 | -6 |
-6 | 23 | 8 | 1 | 1 |
7.00 | 10.65 | 6.73 | 7.96 | 7.87 |
Clearly, since the standard deviations of differences in HDL levels is different for each drug, hence this makes a strong case for the usage of ANOVA method in determining the efficiency of given drugs based on the sample data given.
9. The requisite plot to access age group distribution in each of the five drug groups is shown below.