Python - Basic regression comparison

Sep 22, 2022

Regression models are the principles of machine learning models as well. They help to understand the dataset distributions. The objective of this article is to describe the Tenure and Cashback relationship, compare few basic regressions and predictions and plot the trendline for existing dataset. Multiple regression or polynomial regression are explaining Tenure-Cashback relationship better than unweighted Exponential trendline. Tenure length and Average Cashback Amount is a very good example of increasing linear dependency.

Compared models: Linear, Multivariate, Polynomial, Logarithmic, Exponential, Ridge, Quantile, Lasso, Bayesian

Compared metrics: average Cashback Amount per Tenure (and Customer Churn for log type )

Dataset: free online available old E-Commerce data set for customer Churn calculation from year 2010

1. Theory

bayesian linear regression - it reflects the bayesian framework, it forms an initial estimate and improves the estimate as more data comes in. The Bayesian viewpoint is an intuitive way of looking at the problem, results are estimates or treatment effects, not probabilities. Bayesian regression does not provide you the prior, it is often based on randomly generated numbers.

exponential regression - data shape is similar to exponential curve, non linear. y is response variable, x is predictor and a, b are coefficients describing relationship between x and y. Poly fit and curve fit are 2 obvious Python methods to plot the exponential regression curve.

lasso regression - it is an extension of linear regression that adds a regularisation penalty to the loss function during training. Performs L1 regularization, i.e. adds penalty equivalent to absolute value of the magnitude of coefficients. It provides coefficient shrinkage and reduces model complexity and also is selecting the most important features. It is possible to set up many hyperparameters for lasso regression as well, to boost its performance.

linear regression (OLS, ordinary least squares) - relationship between scalar response, dependent variable (y) and one or more explanatory variables (x). In 1973 statistician F. Anscombe explored 4 datasets with identical linear regression but very different distributions. They are called Anscombe quartet.

logarithmic regression - Logarithmic regression is a type of regression used to model situations where growth or decay accelerates rapidly at first and then slows.

logistic regression - assumes the response is distributed as a binomial and log-linear regression assumes the response is distributed as Poisson. It is logit model and output is the probability. Logistic regression belongs to family of supervised ML. Logistic regression is prone to overfitting if there is high dimensionality. Use cases are fraud detection, disease prediction or churn prediction.

multiple regression - there are many independent variables. We have many types: hierarchical, stepwise, setwise, standard. Hierarchical can model data better than a regular multiple linear regression. In hierarchical regression researcher decides which terms to enter at what stage of data modeling, but in stepwise regression (forward, backward) computer decides about predictor variables. In setwise type the interest is in sets of variables rather than in single variables.

polynomial (power) regression - the relationship between the independent variable x and the dependent variable y is modelled as an nth degree polynomial in x. Polynomial interpolation is the interpolation of a given data set by the polynomial of lowest possible degree that passes through the points of the dataset.

quantile regression - it models the relationship between a set of predictor (x, independent) variables and specific percentiles (quantiles) of a target (y, dependent) variable, most often the median. Tree-based learning algorithms are also available for quantile regression (e.g. Quantile Regression Forests, … ).

ridge regression - it is a linear method of estimating the coefficients of multiple-regression models in scenarios where the independent variables are highly correlated. It has been used in many fields including econometrics, chemistry, and engineering. Ridge regression penalizes ridge coefficients. Performs L2 regularization, i.e. adds penalty equivalent to square of the magnitude of coefficients. It provides coefficient shrinkage and reduces model complexity.

2. Python code

3. References

https://stackoverflow.com/questions/331

https://www.geeksforgeeks.org/how-to-do-expo77548/typeerror-expected-1d-vector-for-x

https://www.kaggle.com/code/dssant85/linear-regression-with-logarithmic-transformation/notebook

https://www.geeksforgeeks.org/implementation-of-bayesian-regression/

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.BayesianRidge.html

https://towardsdatascience.com/how-to-build-a-bayesian-ridge-regression-model-with-full-hyperparameter-integration-f4ac2bdaf329

https://scikit-learn.org/stable/auto_examples/gaussian_process/plot_compare_gpr_krr.html#sphx-glr-auto-examples-gaussian-process-plot-compare-gpr-krr-py

https://www.geeksforgeeks.org/log-and-natural-logarithmic-value-of-a-column-in-pandas-python/

https://machinelearningmastery.com/lasso-regression-with-python/

https://en.wikipedia.org/wiki/Linear_regression

https://seaborn.pydata.org/examples/anscombes_quartet.html

https://files.eric.ed.gov/fulltext/ED534385.pdf

https://www3.nd.edu/~rwilliam/stats1/x95.pdf

https://journals.sagepub.com/doi/10.1177/001316447103100315

https://rowannicholls.github.io/python/curve_fitting/exponential.html

https://en.wikipedia.org/wiki/Polynomial_regression

https://www.statology.org/logarithmic-regression-in-r/

https://www.ibm.com/topics/logistic-regression

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html

https://towardsdatascience.com/introduction-to-bayesian-linear-regression-e66e60791ea7

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_introbayes_sect015.htm

https://www.analyticsvidhya.com/blog/2016/01/ridge-lasso-regression-python-complete-tutorial/

https://en.wikipedia.org/wiki/Quantile_regression

https://en.wikipedia.org/wiki/Exponential_function

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html

https://registry.khronos.org/OpenCL/sdk/1.2/docs/man/xhtml/log.html

https://rowannicholls.github.io/python/curve_fitting/exponential.html#method-2-curve_fit

https://stackoverflow.com/questions/8409095/set-markers-for-individual-points-on-a-line-in-matplotlib

Sarka Pribylova

Email: sarka.pribylova@gmail.com

Python - Basic regression comparison

Recent Posts

Comments

Sarka Pribylova Email: sarka.pribylova@gmail.com

Python - Basic regression comparison

Recent Posts

Comments

Sarka Pribylova

Email: sarka.pribylova@gmail.com