roeggealissa / Credit_Risk_Analysis Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Determining potential credit risk with machine learning

0 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Images		Images
.DS_Store		.DS_Store
CC_Classification.png		CC_Classification.png
CC_balance.png		CC_balance.png
CC_confusion.png		CC_confusion.png
EEC_Classification.png		EEC_Classification.png
EEC_Confusion.png		EEC_Confusion.png
EEC_balance.png		EEC_balance.png
LoanStats_2019Q1.csv		LoanStats_2019Q1.csv
Oversample_Balance.png		Oversample_Balance.png
Oversample_Classification.png		Oversample_Classification.png
Oversample_Confusion.png		Oversample_Confusion.png
README.md		README.md
RF_Classification.png		RF_Classification.png
RF_Confusion.png		RF_Confusion.png
RF_balance.png		RF_balance.png
SMOTEENN_Classification.png		SMOTEENN_Classification.png
SMOTEENN_Confusion.png		SMOTEENN_Confusion.png
SMOTEENN_balance.png		SMOTEENN_balance.png
SMOTE_Balance.png		SMOTE_Balance.png
SMOTE_Classification.png		SMOTE_Classification.png
SMOTE_Confusion.png		SMOTE_Confusion.png
credit_risk_ensemble_code.ipynb		credit_risk_ensemble_code.ipynb
credit_risk_resampling_code.ipynb		credit_risk_resampling_code.ipynb

Repository files navigation

Credit_Risk_Analysis

Loan Prediction Risk Analysis

Introduction

Project Goals

The objective of this project is to determine what machine learning method is best for predicting credit risk. Overall, six methods are used and compared.

Results

We ran six machine learning algorithms on the loan data provided by Fast Lending. The first three are using RandomOverSampler, SMOTE, and ClusterCentroids with the LogisticRegression classifier. The fourth method uses SMOTEENN which combines under and over sampling with the LogisticRegression classifier. The final two use the BalancedRandomForestClassifier and the EasyEnsembleClassifier.

Over Sampling

Balance Score

Confusion Matrix

Classification Report

SMOTE

Balance Score

Confusion Matrix

Classification Report

Undersampling

Balance Score

Confusion Matrix

Classification Report

SMOTEENN

Balance Score

Confusion Matrix

Classification Report

Random Forest

Balance Score

Confusion Matrix

Classification Report

Easy Ensemble

Balance Score

Confusion Matrix

Classification Report

Conclusions

All models have a low precision for the high risk credit for a loan. The undersampling model has the worst recall with an avg/total at 0.40 with the high risk recall contributing the most to it also at 0.40. The Easy Ensemble method had the highest recall at 0.94, with a recall of 0.94 and 0.91 for the high risk and low risk respectively. The Easy Ensemble method also has the highest F1 score at 0.97. The Easy Ensemble method has the highest balance accuracy score at 0.925. SMOTEENN, SMOTE, and Oversampling all have similar results, but fair poorer than the Easy Ensemble and Random Forest Method. The Random Forest Method has a high avg/total recall score at 0.91, but the recall for high risk is low at 0.67.

Overall the suggestion is to use the Easy Ensemble Classifier method due to it's overall superiority in every catagory.

Issues

sklearn has a known issue where many of the larger machine learning algorithms will kill the kernel if too much memory is alloted to the process. The ClusterCentroids portion of the code had to be run in Google Colab to ensure adaquate disk space and RAM for the process. No other algorithm had this issue.

About

Determining potential credit risk with machine learning

machine-learning classification

Report repository

Releases

No releases published

Packages

No packages published

Languages

Jupyter Notebook 100.0%