Sample datafile is providing numerical and categorical variables for each case id.
We can calculate final score for each case id and see which case id is suitable for loan approval.
Approximately 10% of case ids is clearly suitable for loan.
# There are 22500 case ids with low propensity to loan approval
# There are 4500 case ids with medium propensity to loan approval
# There are 3000 case ids with high propensity to loan approval, suitable for loan
Provided metrics:
30 000 rows, 1 year data file, daily case id scores
Target:
There are 3 output variables: ap090, ct090 and PCTILE_SCORE
Methodologies used:
MEAN imputation
PYSPARK RANDOM FOREST ITERATIVE multivariate imputation
Ridge cross validation
Random Forest Regressor cross validation
Percentile score classification
# create connection
conn = sqlite3.connect("dataset.db")
cursor = conn.cursor()
# list 2 table names in connected database: dataset and metadata
x=cursor.execute("SELECT distinct name FROM sqlite_master where type='table'")
for y in x.fetchall():
print(y)
# metadata - display variables description in data frame
https://towardsdatascience.com/ml-basics-loan-prediction-d695ba7f31f6
https://datasciencesphere.com/analytics/handling-missing-values-in-python/
https://www.kaggle.com/parulpandey/a-guide-to-handling-missing-values-in-python
https://towardsdatascience.com/multiple-imputation-with-random-forests-in-python-dec83c0ac55b
https://medium.com/analytics-vidhya/automatic-missingvalueshandler-library-with-a-random-forest-d8d380da1fe0
https://scikit-learn.org/stable/modules/impute.html
https://www.nature.com/articles/s41598-021-89434-7
https://towardsdatascience.com/predicting-apple-stock-prices-with-neural-networks-4aefdf10afd0
https://www.diva-portal.org/smash/get/diva2:1503760/FULLTEXT02
https://medium.com/@polanitzer/logistic-regression-in-python-predict-the-probability-of-default-of-an-individual-8a0091da3775
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3144483/
https://florianwilhelm.info/2017/04/causal_inference_propensity_score/
https://towardsdatascience.com/using-machine-learning-to-predict-customer-churn-cd499cb230db
https://sarahleejane.github.io/learning/python/2015/02/16/training-a-model-for-Washington-DC-bikeshare-kaggle-competition-with-Python.html
https://rpubs.com/aznavar/859640
https://arxiv.org/pdf/2008.12065.pdf
https://stackoverflow.com/questions/31594549/how-to-change-the-figure-size-of-a-seaborn-axes-or-figure-level-plot
https://stackoverflow.com/questions/49554139/boxplot-of-multiple-columns-of-a-pandas-dataframe-on-the-same-figure-seaborn
https://www.kite.com/python/answers/how-to-replace-each-nan-value-in-a-pandas-dataframe-with-the-mean-of-its-column-in-python
https://www.kdnuggets.com/2020/07/guide-survival-analysis-python-part-2.html
https://stackoverflow.com/questions/305378/list-of-tables-db-schema-dump-etc-using-the-python-sqlite3-api
https://www.semanticscholar.org/paper/Propensity-to-Pay%3A-Machine-Learning-for-Estimating-Bashar-Astin-Walmsley/647da1aa8bd39087a77c5e1e043cdddb9b78b9e5
https://www.altexsoft.com/blog/propensity-model/
https://www.youtube.com/watch?v=T9kgWBmUIRk
https://dunnsolutions.com/insights/analytics-blog/-/blogs/propensity-to-pay-modeling-to-collect-debit-in-the-banking-industry
https://matheusfacure.github.io/python-causality-handbook/11-Propensity-Score.html
https://stackoverflow.com/questions/67536687/propensity-scores-using-random-forests-in-python
https://en.wikipedia.org/wiki/Propensity_score_matching
https://towardsdatascience.com/a-hands-on-introduction-to-propensity-score-use-for-beginners-856302b632ac
https://towardsdatascience.com/apply-propensity-score-methods-in-causal-inference-part-1-stratification-afce2e85730c
https://www.med.uio.no/studier/sensur/euhem-og-hepma/heval5140/2018/heart-propensity-score-matching.pdf
https://medium.com/@bmiroglio/introducing-the-pymatch-package-6a8c020e2009
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8246231/
https://www.researchgate.net/publication/351820022_Propensity_Score_Matching_for_Multiple_Treatments_using_Generalized_Boosted_Models
https://srees.org/project/predict-churn
https://datatonic.com/insights/propensity-modelling-tensorflow-cloud-ai/
https://medium.com/@Minyus86/causallift-python-package-for-uplift-modeling-in-real-world-business-e60264812a26
https://www.programcreek.com/python/example/111193/tensorflow.data
https://readthedocs.org/projects/causalml/downloads/pdf/latest/
https://onlinelibrary.wiley.com/doi/full/10.1002/sim.8502
https://academic.oup.com/aje/article/190/7/1424/6145104
Comments