Source code of kurtosis in R and python
Publish date: Sep 13, 2019Tags: Programming Stats
Why I can’t see the source code of function kurtosis
in R? And how is kurtosis calculated?
in R
kurtosis()
is a function from library “timeDate”
It is worth noting that some methods are actually hidden inside some packages’ namespace. Those methods will have a *
next to their names when calling methods()
to list all of the methods of a function. For example, when inputting methods("kurtosis")
as following:
> methods(kurtosis)
[1] kurtosis.data.frame* kurtosis.default* kurtosis.POSIXct* kurtosis.POSIXlt*
see '?methods' for accessing help and source code
As is shown above, all the methods of function kurtosis
are hidden, so when we cannot call >kurtosis.default()
directly, nor see its source code by calling kurtosis.default
. We need getS3method()
to see the source code as following:
> getS3method("kurtosis","default")
function (x, na.rm = FALSE, method = c("excess", "moment", "fisher"),
...)
{
method = match.arg(method)
stopifnot(NCOL(x) == 1)
if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) {
warning("argument is not numeric or logical: returning NA")
return(as.numeric(NA))
}
if (na.rm)
x = x[!is.na(x)]
n = length(x)
if (is.integer(x))
x = as.numeric(x)
if (method == "excess") {
kurtosis = sum((x - mean(x))^4/as.numeric(var(x))^2)/length(x) -
3
}
if (method == "moment") {
kurtosis = sum((x - mean(x))^4/as.numeric(var(x))^2)/length(x)
}
if (method == "fisher") {
kurtosis = ((n + 1) * (n - 1) * ((sum(x^4)/n)/(sum(x^2)/n)^2 -
(3 * (n - 1))/(n + 1)))/((n - 2) * (n - 3))
}
attr(kurtosis, "method") <- method
kurtosis
}
<bytecode: 0x7faf948e2190>
<environment: namespace:timeDate>
Then we can see there are three different fomula to calculate kurtosis. We’ve learned the method of the fourth moment in class, and we can see the function use excess method by default. And there is another method to calculate the unbiased estimator of kurtosis named fisher .
>kurtosis(c(1,2,3,4,4,4,4))
[1] -1.364331
attr(,"method")
[1] "excess"
>kurtosis(c(1,2,3,4,4,4,4),method = "moment")
[1] 1.635669
attr(,"method")
[1] "moment"
>kurtosis(c(1,2,3,4,4,4,4),method = "fisher")
[1] -2.301775
attr(,"method")
[1] "fisher"
in Python
While in Python package Scipy v.1.3.0, the source code is as following:
def kurtosis(a, axis=0, fisher=True, bias=True, nan_policy='propagate'):
"""
Compute the kurtosis (Fisher or Pearson) of a dataset.
Kurtosis is the fourth central moment divided by the square of the
variance. If Fisher's definition is used, then 3.0 is subtracted from
the result to give 0.0 for a normal distribution.
If bias is False then the kurtosis is calculated using k statistics to
eliminate bias coming from biased moment estimators
Use `kurtosistest` to see if result is close enough to normal.
Parameters
----------
a : array
data for which the kurtosis is calculated
axis : int or None, optional
Axis along which the kurtosis is calculated. Default is 0.
If None, compute over the whole array `a`.
fisher : bool, optional
If True, Fisher's definition is used (normal ==> 0.0). If False,
Pearson's definition is used (normal ==> 3.0).
bias : bool, optional
If False, then the calculations are corrected for statistical bias.
nan_policy : {'propagate', 'raise', 'omit'}, optional
Defines how to handle when input contains nan. 'propagate' returns nan,
'raise' throws an error, 'omit' performs the calculations ignoring nan
values. Default is 'propagate'.
Returns
-------
kurtosis : array
The kurtosis of values along an axis. If all values are equal,
return -3 for Fisher's definition and 0 for Pearson's definition.
References
----------
.. [1] Zwillinger, D. and Kokoska, S. (2000). CRC Standard
Probability and Statistics Tables and Formulae. Chapman & Hall: New
York. 2000.
Examples
--------
>>> from scipy.stats import kurtosis
>>> kurtosis([1, 2, 3, 4, 5])
-1.3
"""
a, axis = _chk_asarray(a, axis)
contains_nan, nan_policy = _contains_nan(a, nan_policy)
if contains_nan and nan_policy == 'omit':
a = ma.masked_invalid(a)
return mstats_basic.kurtosis(a, axis, fisher, bias)
n = a.shape[axis]
m2 = moment(a, 2, axis)
m4 = moment(a, 4, axis)
zero = (m2 == 0)
olderr = np.seterr(all='ignore')
try:
vals = np.where(zero, 0, m4 / m2**2.0)
finally:
np.seterr(**olderr)
if not bias:
can_correct = (n > 3) & (m2 > 0)
if can_correct.any():
m2 = np.extract(can_correct, m2)
m4 = np.extract(can_correct, m4)
nval = 1.0/(n-2)/(n-3) * ((n**2-1.0)*m4/m2**2.0 - 3*(n-1)**2.0)
np.place(vals, can_correct, nval + 3.0)
if vals.ndim == 0:
vals = vals.item() # array scalar
return vals - 3 if fisher else vals
Here, the parameter Fisher
here only depends whether to minus 3 and the estimator of kurtosis is always unbiased.
References:
https://rdrr.io/rforge/timeDate/src/R/stats-kurtosis.R
https://stackoverflow.com/questions/6583265/what-does-s3-methods-mean-in-r
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kurtosis.html