Skip to content

roeggealissa/Credit_Risk_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Credit_Risk_Analysis

Loan Prediction Risk Analysis

Introduction

Project Goals

The objective of this project is to determine what machine learning method is best for predicting credit risk. Overall, six methods are used and compared.

Results

We ran six machine learning algorithms on the loan data provided by Fast Lending. The first three are using RandomOverSampler, SMOTE, and ClusterCentroids with the LogisticRegression classifier. The fourth method uses SMOTEENN which combines under and over sampling with the LogisticRegression classifier. The final two use the BalancedRandomForestClassifier and the EasyEnsembleClassifier.

Over Sampling

Balance Score Balance

Confusion Matrix Confusion

Classification Report Classification

SMOTE

Balance Score Balance

Confusion Matrix Confusion

Classification Report Classification

Undersampling

Balance Score Balance

Confusion Matrix Confusion

Classification Report Classification

SMOTEENN

Balance Score Balance

Confusion Matrix Confusion

Classification Report Classification

Random Forest

Balance Score Balance

Confusion Matrix Confusion

Classification Report Classification

Easy Ensemble

Balance Score Balance

Confusion Matrix Confusion

Classification Report Classification

Conclusions

All models have a low precision for the high risk credit for a loan. The undersampling model has the worst recall with an avg/total at 0.40 with the high risk recall contributing the most to it also at 0.40. The Easy Ensemble method had the highest recall at 0.94, with a recall of 0.94 and 0.91 for the high risk and low risk respectively. The Easy Ensemble method also has the highest F1 score at 0.97. The Easy Ensemble method has the highest balance accuracy score at 0.925. SMOTEENN, SMOTE, and Oversampling all have similar results, but fair poorer than the Easy Ensemble and Random Forest Method. The Random Forest Method has a high avg/total recall score at 0.91, but the recall for high risk is low at 0.67.

Overall the suggestion is to use the Easy Ensemble Classifier method due to it's overall superiority in every catagory.

Issues

sklearn has a known issue where many of the larger machine learning algorithms will kill the kernel if too much memory is alloted to the process. The ClusterCentroids portion of the code had to be run in Google Colab to ensure adaquate disk space and RAM for the process. No other algorithm had this issue.

About

Determining potential credit risk with machine learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published