In this exercise, we will look at descriptive statistics and howto explore and summarize data sets. For this, we use the HeartDisease dataset from the UCI data repository. This dataset consistsof 4 small datasets of people with heart disease admitted to 4hospitals.
For now, we only work with the file. this data consists of 271instances with 7 attributes. The attributes are described asbelow:
Age: age in years
sex: 1 = male; 0 = female
cp: chest pain type
Value 1: typical angina
Value 2: atypical angina
Value 3: non-anginal pain
Value 4: asymptomatic
Trestbps: resting blood pressure
Chol: cholesterol level
Thalach: maximum heart rate achieved
heart_problem: 1= have heart problem; 0=No heart problem
Instruction: Use Microsoft Excel to do your work. Please submityour work as ONE MS excel file and create one tab for eachquestion. Show your work as rigorously as possible. name the fileas lastname_fastname_hw1.excel.
Using the attached data, answer the following questions:
1. How many patients have heart disease? (0.5)
2. What is the average Cholesterol level of people with heartdisease and without heart disease? What is the standard deviation?(1)
3. What is the median and average age of people with,
a. cholesterol higher than 240.0? (0.5)
b. cholesterol higher than 240.0 with heart disease? (0.5)
c. cholesterol higher than 240.0 without heart disease?(0.5)
4. Create a histogram of resting blood pressure. (1)
5. Create boxplots based on the sex of the patients for thefollowing attributes:
a. cholesterol level (1.5)
b. maximum heart rate achieved (1.5)
6. For each Box plot, answer the following questions:
a. What is the H-Spread (Q3-Q1) of cholesterol level for maleand females? (0.5)
b. What are the Lower Hinge and Upper Hinge values for maximumheart rate for male and female? (0.5)
7. In order to find if two attributes are related and theirvalues change together, we can use Scatter plot. Follow theinstruction below and answer the questions:
a. Create two scatter plots of age and resting blood pressurefor people with heart disease and without heart disease. Is thereany visual correlation? (1+1)
b. Calculate the average resting blood pressure of each age(HINT : Use Groupby for age) for people with heart disease. (1)
c. Calculate the average resting blood pressure of each age(HINT : Use Groupby for age) for people without heart disease.(1)
d. Now create two scatter plots using the previous results. Doyou observe a correlation now? Do people without heart disease havehigher blood pressure as they age than people with heart disease?(2)
8.Compare the resting blood pressure of people with heartdisease and without. (1)
LINK TO Data set
https://docs.google.com/document/d/1KYER8cMeWPcOlMJpegWNIDAF4maIAthKTM3Hrpr8rxk/edit?usp=sharing