These days I have a chance to meet participants of mathematical conference at the highest mountain of the Czech republic. It inspires me to perform few Kolmogorov tests in Python. KS Test is one of the general goodness of fit tests. The objective of each goodness of fit test is to achieve the most accurate CDF/SF/PDF/PPF/ISF computations. KS test is based on Kolmogorov complexity and KS distribution. Other Goodness of fit tests are e.g. : Chi-Square Test, Student`s t test, Fisher`s exact test, ANOVA, Kruskal-Wallis, Pearson, Spearman, …
Kolmogorov complexity — absolute value is not computable, there isn’t a single function, that will return the complexity of an arbitrary string, picture or system. Compressible string can be reduced by C symbols by a compression program. The final string that cannot be reduced by even one symbol is said to be incompressible.
Measure complexity of networks — measure the entropy of network invariants, such as adjacency matrices or degree sequences. Entropy and all entropy-based measures have several vulnerabilities.
Complexity versus entropy — Shannon entropy represents the average complexity over all strings emitted by a random string generator, whereas the Kolmogorov complexity represents the complexity of a particular string.
KS Test — Goodness of Fit test. To achieve the most accurate CDF/SF/PDF/PPF/ISF computations is to use the stats.kstwobign distribution.
CDF — Cumulative Distribution function (Empirical CDF — randomly generated, Target CDF — target values to fit the test )
scipy.special.kolmogorov — Complementary cumulative distribution CCDF (Survival Function) function of Kolmogorov distribution.
scipy.special.kolmogi — Inverse Survival Function of Kolmogorov distribution. Returns y such that kolmogorov(y) == p.
1 . One sample kolmogrov test
You can first install the obvious libraries:
from numpy.random import seed
from numpy.random import poisson
from scipy.stats import kstest
P-value is less than .05, we reject the null hypothesis. We have sufficient evidence to say that the sample data does not come from a normal distribution. We generated the sample data using the poisson() function, generated random values follow a Poisson distribution.
# set seed (e.g. make this example reproducible)
seed(0)
# generate dataset of 100 values that follow a Poisson distribution with mean=5
data = poisson(5, 100)
# perform Kolmogorov-Smirnov testkstest(data, 'norm')
KstestResult(statistic=0.9072498680518208, pvalue=1.0908062873170218e-103)ort kstest
2 . One sample kolmogrov test
You can first install the obvious libraries:
from numpy.random import seed
from numpy.random import randn
from numpy.random import lognormal
from scipy.stats import ks_2samp
Since the p-value is less than .05, we reject the null hypothesis. We have sufficient evidence to say that the two sample datasets do not come from the same distribution. The first sample is using the standard normal distribution. Values for the second sample are using the lognormal distribution.
# set seed (e.g. make this example reproducible)
seed(0)
# generate two datasets
data1 = randn(100)
data2 = lognormal(3, 1, 100)
# perform Kolmogorov-Smirnov test
ks_2samp(data1, data2)
Ks_2sampResult(statistic=0.99, pvalue=4.417521386399011e-57)
3 . scipy kolmogorov
You can first install the obvious libraries:
from scipy.special import kolmogorov
from scipy.stats import kstwobign
import numpy as np
kolmogorov([0,·0.5,·1.0])
array([1. , 0.96394524, 0.26999967])
Compare a sample of size 1000 drawn from a Laplace(0, 1) distribution against the target distribution, a Normal(0, 1) distribution.
from scipy.stats import norm,laplace
rng=np.random.default_rng()
n=1000
lap01=laplace(0,1)
x=np.sort(lap01.rvs(n,random_state=rng))
np.mean(x),np.std(x)
(-0.00591602532125853, 1.365355645380573)
Generate Empirical CDF and the KS statistic Dn.
target = norm(0,1) # Normal mean 0, stddev 1
cdfs = target.cdf(x)
ecdfs = np.arange(n+1, dtype=float)/n
gaps = np.column_stack([cdfs - ecdfs[:n], ecdfs[1:] - cdfs])
Dn = np.max(gaps)
Kn = np.sqrt(n) * Dn
print('Dn=%f, sqrt(n)*Dn=%f' % (Dn, Kn))
Dn=0.054133, sqrt(n)*Dn=1.711848
Print results:
print(chr(10).join(['For a sample of size n drawn from a N(0, 1) distribution:',
... ' the approximate Kolmogorov probability that sqrt(n)*Dn>=%f is %f' % (Kn, kolmogorov(Kn)),... ' the approximate Kolmogorov probability that sqrt(n)*Dn<=%f is %f' % (Kn, kstwobign.cdf(Kn))]))
For a sample of size n drawn from a N(0, 1) distribution:
the approximate Kolmogorov probability that sqrt(n)*Dn>=1.711848 is 0.005698
the approximate Kolmogorov probability that sqrt(n)*Dn<=1.711848 is 0.994302
Plot the Empirical CDF against the target N(0, 1) CDF.
import matplotlib.pyplot as plt
plt.step(np.concatenate([[-3], x]), ecdfs, where='post', label='Empirical CDF', color='grey')
x3 = np.linspace(-3, 3, 100)
iminus, iplus = np.argmax(gaps, axis=0)
plt.vlines([x[iminus]], ecdfs[iminus], cdfs[iminus], color='black', linestyle='dashed', lw=4)
plt.vlines([x[iplus]], cdfs[iplus], ecdfs[iplus+1], color='black', linestyle='dashed', lw=4)
plt.plot(x3, target.cdf(x3), label='CDF for N(0, 1)', color='lightgrey')
plt.ylim([0, 1]); plt.grid(True); plt.legend();
4 . scipy kolmogi
We can create inverse survival function for previous kolmogorov distribution.
from scipy.special import kolmogi
kolmogi([0,·0.1,·0.25,·0.5,·0.75,·0.9,·1.0])
array([ inf, 1.22384787, 1.01918472, 0.82757356, 0.67644769,
0.57117327, 0. ])
Refrences:
https://github.com/MLWave/koolmogorov
https://towardsdatascience.com/face-recognition-through-kolmogorov-complexity-16ac5542235b
https://en.wikipedia.org/wiki/Kolmogorov_complexity
https://en.wikipedia.org/wiki/Andrey_Kolmogorov
https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.kolmogi.html
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjGtfXUr9P4AhWaRPEDHadjBtQQFnoECBIQAw&url=https%3A%2F%2Fwww.i-programmer.info%2Fprogramming%2Ftheory%2F13793-programmers-guide-to-theory-kolmogorov-complexity.html%3Fstart%3D1&usg=AOvVaw2B145xgyi0D2qEpTMn9V8P
https://en.wikipedia.org/wiki/Kolmogorov–Smirnov_test
https://www.quora.com/What-is-the-relationship-between-Kolmogorov-complexity-and-Shannon-entropy
https://www.youtube.com/watch?v=KyB13PD-UME
http://en.wikipedia.org/wiki/Kolmogorov_complexity
http://www.neilconway.org/talks/kolmogorov.pdf
http://people.cs.uchicago.edu/~fortnow/papers/quaderni.pdf
https://www.youtube.com/watch?v=QkwPf3fcxBs
https://www.statology.org/kolmogorov-smirnov-test-python/
https://towardsdatascience.com/comparing-sample-distributions-with-the-kolmogorov-smirnov-ks-test-a2292ad6fee5
https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.kolmogorov.html
Comments