This project uses the Pima Indians Diabetes Dataset to build a machine learning classification model that predicts whether a person has diabetes based on medical attributes.
Use patient medical records to classify if an individual has diabetes. Early prediction can help in timely intervention and better management of the disease.
- Clone this repository.
- Place the dataset file
2. Diagnose Diabetes.csv
in the same directory. - Open the Python script or Jupyter Notebook (if applicable).
- Run all the cells or execute the script to train the model and evaluate performance.
- Data Preprocessing: Loaded and cleaned the CSV data.
- Train-Test Split: 80-20 split for training and testing.
- Model Used: Random Forest Classifier.
- Evaluation:
- Confusion Matrix
- Accuracy
- Precision
- Recall
- Visualization: Confusion matrix plotted using Seaborn heatmap.
- Accuracy: 0.72
- Precision: 0.61
- Recall: 0.62
The project generates:
- A confusion matrix heatmap showing model performance.
- Accuracy, Precision, and Recall metrics.
- Dataset: Pima Indians Diabetes Dataset
Source: UCI Repository - Libraries:
- Pandas
- Seaborn
- scikit-learn
- Matplotlib
Vaishnavi Mishra
B.Tech – CSE (AI)
KIET Group of Institutions