- Information
- AI Chat
Was this document helpful?
Capstone-predict-diabetes - Jupyter Notebook
Course: python programming (cs8151)
86 Documents
Students shared 86 documents in this course
University: Anna University
Was this document helpful?
7/18/23, 12:14 PM
capstone-predict-diabetes - Jupyter Notebook
localhost:8888/notebooks/capstone-predict-diabetes.ipynb#
1/27
Healthcare
Problem Statement
NIDDK (National Institute of Diabetes and Digestive and Kidney Diseases) research creates knowledge
about and treatments for the most chronic, costly, and consequential diseases. The dataset used in this
project is originally from NIDDK. The objective is to predict whether or not a patient has diabetes, based on
certain diagnostic measurements included in the dataset. Build a model to accurately predict whether the
patients in the dataset have diabetes or not.
Dataset Description The datasets consists of several medical predictor variables and one target variable
(Outcome). Predictor variables includes the number of pregnancies the patient has had, their BMI, insulin
level, age, and more.
Variables Description
Pregnancies Number of times pregnant
Glucose Plasma glucose concentration in an oral glucose tolerance test
BloodPressure Diastolic blood pressure (mm Hg)
SkinThickness Triceps skinfold thickness (mm)
Insulin Two hour serum insulin
BMI Body Mass Index
DiabetesPedigreeFunction Diabetes pedigree function
Age Age in years
Outcome Class variable (either 0 or 1). 268 of 768 values are 1, and the others are 0
Data Exploration:
1. Perform descriptive analysis. Understand the variables and their corresponding values. On the columns
below, a value of zero does not make sense and thus indicates missing value:
• Glucose
• BloodPressure
• SkinThickness
• Insulin
• BMI
2. Visually explore these variables using histograms. Treat the missing values accordingly.
3. There are integer and float data type variables in this dataset. Create a count (frequency) plot
describing the data types and the count of variables.
Data Exploration:
1. Check the balance of the data by plotting the count of outcomes by their value. Describe your findings
and plan future course of action.
2. Create scatter charts between the pair of variables to understand the relationships. Describe your
findings.
3. Perform correlation analysis. Visually explore it using a heat map.