Source code of kurtosis in R and python

Publish date: Sep 13, 2019
Tags: Programming Stats

Why I can’t see the source code of function kurtosis in R? And how is kurtosis calculated?

in R

kurtosis() is a function from library “timeDate”

It is worth noting that some methods are actually hidden inside some packages’ namespace. Those methods will have a * next to their names when calling methods() to list all of the methods of a function. For example, when inputting methods("kurtosis") as following:

> methods(kurtosis)
[1] kurtosis.data.frame* kurtosis.default*    kurtosis.POSIXct*    kurtosis.POSIXlt*   
see '?methods' for accessing help and source code

As is shown above, all the methods of function kurtosis are hidden, so when we cannot call >kurtosis.default() directly, nor see its source code by calling kurtosis.default. We need getS3method() to see the source code as following:

> getS3method("kurtosis","default")
function (x, na.rm = FALSE, method = c("excess", "moment", "fisher"), 
    ...) 
{
    method = match.arg(method)
    stopifnot(NCOL(x) == 1)
    if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) {
        warning("argument is not numeric or logical: returning NA")
        return(as.numeric(NA))
    }
    if (na.rm) 
        x = x[!is.na(x)]
    n = length(x)
    if (is.integer(x)) 
        x = as.numeric(x)
    if (method == "excess") {
        kurtosis = sum((x - mean(x))^4/as.numeric(var(x))^2)/length(x) - 
            3
    }
    if (method == "moment") {
        kurtosis = sum((x - mean(x))^4/as.numeric(var(x))^2)/length(x)
    }
    if (method == "fisher") {
        kurtosis = ((n + 1) * (n - 1) * ((sum(x^4)/n)/(sum(x^2)/n)^2 - 
            (3 * (n - 1))/(n + 1)))/((n - 2) * (n - 3))
    }
    attr(kurtosis, "method") <- method
    kurtosis
}
<bytecode: 0x7faf948e2190>
<environment: namespace:timeDate>

Then we can see there are three different fomula to calculate kurtosis. We’ve learned the method of the fourth moment in class, and we can see the function use excess method by default. And there is another method to calculate the unbiased estimator of kurtosis named fisher .

>kurtosis(c(1,2,3,4,4,4,4))
[1] -1.364331
attr(,"method")
[1] "excess"

>kurtosis(c(1,2,3,4,4,4,4),method = "moment")
[1] 1.635669
attr(,"method")
[1] "moment"

>kurtosis(c(1,2,3,4,4,4,4),method = "fisher")
[1] -2.301775
attr(,"method")
[1] "fisher"

in Python

While in Python package Scipy v.1.3.0, the source code is as following:


def kurtosis(a, axis=0, fisher=True, bias=True, nan_policy='propagate'):
    """
    Compute the kurtosis (Fisher or Pearson) of a dataset.
    Kurtosis is the fourth central moment divided by the square of the
    variance. If Fisher's definition is used, then 3.0 is subtracted from
    the result to give 0.0 for a normal distribution.
    If bias is False then the kurtosis is calculated using k statistics to
    eliminate bias coming from biased moment estimators
    Use `kurtosistest` to see if result is close enough to normal.
    Parameters
    ----------
    a : array
        data for which the kurtosis is calculated
    axis : int or None, optional
        Axis along which the kurtosis is calculated. Default is 0.
        If None, compute over the whole array `a`.
    fisher : bool, optional
        If True, Fisher's definition is used (normal ==> 0.0). If False,
        Pearson's definition is used (normal ==> 3.0).
    bias : bool, optional
        If False, then the calculations are corrected for statistical bias.
    nan_policy : {'propagate', 'raise', 'omit'}, optional
        Defines how to handle when input contains nan. 'propagate' returns nan,
        'raise' throws an error, 'omit' performs the calculations ignoring nan
        values. Default is 'propagate'.
    Returns
    -------
    kurtosis : array
        The kurtosis of values along an axis. If all values are equal,
        return -3 for Fisher's definition and 0 for Pearson's definition.
    References
    ----------
    .. [1] Zwillinger, D. and Kokoska, S. (2000). CRC Standard
       Probability and Statistics Tables and Formulae. Chapman & Hall: New
       York. 2000.
    Examples
    --------
    >>> from scipy.stats import kurtosis
    >>> kurtosis([1, 2, 3, 4, 5])
    -1.3
    """
    a, axis = _chk_asarray(a, axis)

    contains_nan, nan_policy = _contains_nan(a, nan_policy)

    if contains_nan and nan_policy == 'omit':
        a = ma.masked_invalid(a)
        return mstats_basic.kurtosis(a, axis, fisher, bias)

    n = a.shape[axis]
    m2 = moment(a, 2, axis)
    m4 = moment(a, 4, axis)
    zero = (m2 == 0)
    olderr = np.seterr(all='ignore')
    try:
        vals = np.where(zero, 0, m4 / m2**2.0)
    finally:
        np.seterr(**olderr)

    if not bias:
        can_correct = (n > 3) & (m2 > 0)
        if can_correct.any():
            m2 = np.extract(can_correct, m2)
            m4 = np.extract(can_correct, m4)
            nval = 1.0/(n-2)/(n-3) * ((n**2-1.0)*m4/m2**2.0 - 3*(n-1)**2.0)
            np.place(vals, can_correct, nval + 3.0)

    if vals.ndim == 0:
        vals = vals.item()  # array scalar

    return vals - 3 if fisher else vals

Here, the parameter Fisher here only depends whether to minus 3 and the estimator of kurtosis is always unbiased.

References:

https://rdrr.io/rforge/timeDate/src/R/stats-kurtosis.R

https://stackoverflow.com/questions/6583265/what-does-s3-methods-mean-in-r

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kurtosis.html