top of page

Python and PySpark - Credit score - Loan propensity

Sample datafile is providing numerical and categorical variables for each case id.

We can calculate final score for each case id and see which case id is suitable for loan approval.

Approximately 10% of case ids is clearly suitable for loan.



# There are 22500 case ids with low propensity to loan approval

# There are 4500 case ids with medium propensity to loan approval

# There are 3000 case ids with high propensity to loan approval, suitable for loan


Provided metrics:


30 000 rows, 1 year data file, daily case id scores


Target:


There are 3 output variables: ap090, ct090 and PCTILE_SCORE


Methodologies used:


MEAN imputation

PYSPARK RANDOM FOREST ITERATIVE multivariate imputation

Ridge cross validation

Random Forest Regressor cross validation

Percentile score classification



# create connection

conn = sqlite3.connect("dataset.db")

cursor = conn.cursor()


# list 2 table names in connected database: dataset and metadata

x=cursor.execute("SELECT distinct name FROM sqlite_master where type='table'")

for y in x.fetchall():

print(y)


# metadata - display variables description in data frame








https://towardsdatascience.com/ml-basics-loan-prediction-d695ba7f31f6

https://datasciencesphere.com/analytics/handling-missing-values-in-python/

https://www.kaggle.com/parulpandey/a-guide-to-handling-missing-values-in-python

https://towardsdatascience.com/multiple-imputation-with-random-forests-in-python-dec83c0ac55b

https://medium.com/analytics-vidhya/automatic-missingvalueshandler-library-with-a-random-forest-d8d380da1fe0

https://scikit-learn.org/stable/modules/impute.html

https://www.nature.com/articles/s41598-021-89434-7

https://towardsdatascience.com/predicting-apple-stock-prices-with-neural-networks-4aefdf10afd0

https://www.diva-portal.org/smash/get/diva2:1503760/FULLTEXT02

https://medium.com/@polanitzer/logistic-regression-in-python-predict-the-probability-of-default-of-an-individual-8a0091da3775

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3144483/

https://florianwilhelm.info/2017/04/causal_inference_propensity_score/

https://towardsdatascience.com/using-machine-learning-to-predict-customer-churn-cd499cb230db

https://sarahleejane.github.io/learning/python/2015/02/16/training-a-model-for-Washington-DC-bikeshare-kaggle-competition-with-Python.html

https://rpubs.com/aznavar/859640

https://arxiv.org/pdf/2008.12065.pdf

https://stackoverflow.com/questions/31594549/how-to-change-the-figure-size-of-a-seaborn-axes-or-figure-level-plot

https://stackoverflow.com/questions/49554139/boxplot-of-multiple-columns-of-a-pandas-dataframe-on-the-same-figure-seaborn

https://www.kite.com/python/answers/how-to-replace-each-nan-value-in-a-pandas-dataframe-with-the-mean-of-its-column-in-python

https://www.kdnuggets.com/2020/07/guide-survival-analysis-python-part-2.html

https://stackoverflow.com/questions/305378/list-of-tables-db-schema-dump-etc-using-the-python-sqlite3-api

https://www.semanticscholar.org/paper/Propensity-to-Pay%3A-Machine-Learning-for-Estimating-Bashar-Astin-Walmsley/647da1aa8bd39087a77c5e1e043cdddb9b78b9e5

https://www.altexsoft.com/blog/propensity-model/

https://www.youtube.com/watch?v=T9kgWBmUIRk

https://dunnsolutions.com/insights/analytics-blog/-/blogs/propensity-to-pay-modeling-to-collect-debit-in-the-banking-industry

https://matheusfacure.github.io/python-causality-handbook/11-Propensity-Score.html

https://stackoverflow.com/questions/67536687/propensity-scores-using-random-forests-in-python

https://en.wikipedia.org/wiki/Propensity_score_matching

https://towardsdatascience.com/a-hands-on-introduction-to-propensity-score-use-for-beginners-856302b632ac

https://towardsdatascience.com/apply-propensity-score-methods-in-causal-inference-part-1-stratification-afce2e85730c

https://www.med.uio.no/studier/sensur/euhem-og-hepma/heval5140/2018/heart-propensity-score-matching.pdf

https://medium.com/@bmiroglio/introducing-the-pymatch-package-6a8c020e2009

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8246231/

https://www.researchgate.net/publication/351820022_Propensity_Score_Matching_for_Multiple_Treatments_using_Generalized_Boosted_Models

https://srees.org/project/predict-churn

https://datatonic.com/insights/propensity-modelling-tensorflow-cloud-ai/

https://medium.com/@Minyus86/causallift-python-package-for-uplift-modeling-in-real-world-business-e60264812a26

https://www.programcreek.com/python/example/111193/tensorflow.data

https://readthedocs.org/projects/causalml/downloads/pdf/latest/

https://onlinelibrary.wiley.com/doi/full/10.1002/sim.8502

https://academic.oup.com/aje/article/190/7/1424/6145104


27 views0 comments

Recent Posts

See All

Python - Basic regression comparison

Regression models are the principles of machine learning models as well. They help to understand the dataset distributions. The objective...

Comentarios


bottom of page