Next: , Up: Statistics


25.1 Basic Statistical Functions

— Function File: mean (x, dim, opt)

If x is a vector, compute the mean of the elements of x

          mean (x) = SUM_i x(i) / N
     

If x is a matrix, compute the mean for each column and return them in a row vector.

With the optional argument opt, the kind of mean computed can be selected. The following options are recognized:

"a"
Compute the (ordinary) arithmetic mean. This is the default.
"g"
Computer the geometric mean.
"h"
Compute the harmonic mean.

If the optional argument dim is supplied, work along dimension dim.

Both dim and opt are optional. If both are supplied, either may appear first.

— Function File: median (x)

If x is a vector, compute the median value of the elements of x.

                      x(ceil(N/2)),             N odd
          median(x) =
                      (x(N/2) + x((N/2)+1))/2,  N even
     

If x is a matrix, compute the median value for each column and return them in a row vector.

— Function File: std (x)
— Function File: std (x, opt)
— Function File: std (x, opt, dim)

If x is a vector, compute the standard deviation of the elements of x.

          std (x) = sqrt (sumsq (x - mean (x)) / (n - 1))
     

If x is a matrix, compute the standard deviation for each column and return them in a row vector.

The argument opt determines the type of normalization to use. Valid values are

0:
normalizes with N-1, provides the square root of best unbiased estimator of the variance [default]
1:
normalizes with N, this provides the square root of the second moment around the mean

The third argument dim determines the dimension along which the standard deviation is calculated.

— Function File: cov (x, y)

If each row of x and y is an observation and each column is a variable, the (i, j)-th entry of cov (x, y) is the covariance between the i-th variable in x and the j-th variable in y. If called with one argument, compute cov (x, x).

— Function File: corrcoef (x, y)

If each row of x and y is an observation and each column is a variable, the (i, j)-th entry of corrcoef (x, y) is the correlation between the i-th variable in x and the j-th variable in y. If called with one argument, compute corrcoef (x, x).

— Function File: kurtosis (x, dim)

If x is a vector of length N, return the kurtosis

          kurtosis (x) = N^(-1) std(x)^(-4) sum ((x - mean(x)).^4) - 3
     

of x. If x is a matrix, return the kurtosis over the first non-singleton dimension. The optional argument dim can be given to force the kurtosis to be given over that dimension.

— Function File: mahalanobis (x, y)

Return the Mahalanobis' D-square distance between the multivariate samples x and y, which must have the same number of components (columns), but may have a different number of observations (rows).

— Function File: skewness (x, dim)

If x is a vector of length n, return the skewness

          skewness (x) = N^(-1) std(x)^(-3) sum ((x - mean(x)).^3)
     

of x. If x is a matrix, return the skewness along the first non-singleton dimension of the matrix. If the optional dim argument is given, operate along this dimension.

— Function File: values (x)

Return the different values in a column vector, arranged in ascending order.

— Function File: var (x)

For vector arguments, return the (real) variance of the values. For matrix arguments, return a row vector contaning the variance for each column.

The argument opt determines the type of normalization to use. Valid values are

0:
normalizes with N-1, provides the square root of best unbiased estimator of the variance [default]
1:
normalizes with N, this provides the square root of the second moment around the mean

The third argument dim determines the dimension along which the variance is calculated.

— Function File: [t, l_x] = table (x)
— Function File: [t, l_x, l_y] = table (x, y)

Create a contingency table t from data vectors. The l vectors are the corresponding levels.

Currently, only 1- and 2-dimensional tables are supported.

— Function File: studentize (x, dim)

If x is a vector, subtract its mean and divide by its standard deviation.

If x is a matrix, do the above along the first non-singleton dimension. If the optional argument dim is given then operate along this dimension.

— Function File: statistics (x)

If x is a matrix, return a matrix with the minimum, first quartile, median, third quartile, maximum, mean, standard deviation, skewness and kurtosis of the columns of x as its rows.

If x is a vector, treat it as a column vector.

— Function File: spearman (x, y)

Compute Spearman's rank correlation coefficient rho for each of the variables specified by the input arguments.

For matrices, each row is an observation and each column a variable; vectors are always observations and may be row or column vectors.

spearman (x) is equivalent to spearman (x, x).

For two data vectors x and y, Spearman's rho is the correlation of the ranks of x and y.

If x and y are drawn from independent distributions, rho has zero mean and variance 1 / (n - 1), and is asymptotically normally distributed.

— Function File: run_count (x, n)

Count the upward runs along the first non-singleton dimension of x of length 1, 2, ..., n-1 and greater than or equal to n. If the optional argument dim is given operate along this dimension

— Function File: ranks (x, dim)

If x is a vector, return the (column) vector of ranks of x adjusted for ties.

If x is a matrix, do the above for along the first non-singleton dimension. If the optional argument dim is given, operate along this dimension.

— Function File: range (x)
— Function File: range (x, dim)

If x is a vector, return the range, i.e., the difference between the maximum and the minimum, of the input data.

If x is a matrix, do the above for each column of x.

If the optional argument dim is supplied, work along dimension dim.

— Function File: [q, s] = qqplot (x, dist, params)

Perform a QQ-plot (quantile plot).

If F is the CDF of the distribution dist with parameters params and G its inverse, and x a sample vector of length n, the QQ-plot graphs ordinate s(i) = i-th largest element of x versus abscissa q(if) = G((i - 0.5)/n).

If the sample comes from F except for a transformation of location and scale, the pairs will approximately follow a straight line.

The default for dist is the standard normal distribution. The optional argument params contains a list of parameters of dist. For example, for a quantile plot of the uniform distribution on [2,4] and x, use

          qqplot (x, "uniform", 2, 4)
     

If no output arguments are given, the data are plotted directly.

— Function File: probit (p)

For each component of p, return the probit (the quantile of the standard normal distribution) of p.

— Function File: [p, y] = ppplot (x, dist, params)

Perform a PP-plot (probability plot).

If F is the CDF of the distribution dist with parameters params and x a sample vector of length n, the PP-plot graphs ordinate y(i) = F (i-th largest element of x) versus abscissa p(i) = (i - 0.5)/n. If the sample comes from F, the pairs will approximately follow a straight line.

The default for dist is the standard normal distribution. The optional argument params contains a list of parameters of dist. For example, for a probability plot of the uniform distribution on [2,4] and x, use

          ppplot (x, "uniform", 2, 4)
     

If no output arguments are given, the data are plotted directly.

— Function File: moment (x, p, opt, dim)

If x is a vector, compute the p-th moment of x.

If x is a matrix, return the row vector containing the p-th moment of each column.

With the optional string opt, the kind of moment to be computed can be specified. If opt contains "c" or "a", central and/or absolute moments are returned. For example,

          moment (x, 3, "ac")
     

computes the third central absolute moment of x.

If the optional argument dim is supplied, work along dimension dim.

— Function File: meansq (x)
— Function File: meansq (x, dim)

For vector arguments, return the mean square of the values. For matrix arguments, return a row vector contaning the mean square of each column. With the optional dim argument, returns the mean squared of the values along this dimension

— Function File: logit (p)

For each component of p, return the logit log (p / (1-p)) of p.

— Function File: kendall (x, y)

Compute Kendall's tau for each of the variables specified by the input arguments.

For matrices, each row is an observation and each column a variable; vectors are always observations and may be row or column vectors.

kendall (x) is equivalent to kendall (x, x).

For two data vectors x, y of common length n, Kendall's tau is the correlation of the signs of all rank differences of x and y; i.e., if both x and y have distinct entries, then

                   1
          tau = -------   SUM sign (q(i) - q(j)) * sign (r(i) - r(j))
                n (n-1)   i,j
     

in which the q(i) and r(i) are the ranks of x and y, respectively.

If x and y are drawn from independent distributions, Kendall's tau is asymptotically normal with mean 0 and variance (2 * (2n+5)) / (9 * n * (n-1)).

— Function File: iqr (x, dim)

If x is a vector, return the interquartile range, i.e., the difference between the upper and lower quartile, of the input data.

If x is a matrix, do the above for first non singleton dimension of x.. If the option dim argument is given, then operate along this dimension.

— Function File: cut (x, breaks)

Create categorical data out of numerical or continuous data by cutting into intervals.

If breaks is a scalar, the data is cut into that many equal-width intervals. If breaks is a vector of break points, the category has length (breaks) - 1 groups.

The returned value is a vector of the same size as x telling which group each point in x belongs to. Groups are labelled from 1 to the number of groups; points outside the range of breaks are labelled by NaN.

— Function File: cor (x, y)

The (i, j)-th entry of cor (x, y) is the correlation between the i-th variable in x and the j-th variable in y.

For matrices, each row is an observation and each column a variable; vectors are always observations and may be row or column vectors.

cor (x) is equivalent to cor (x, x).

— Function File: cloglog (x)

Return the complementary log-log function of x, defined as

          - log (- log (x))
     

— Function File: center (x)
— Function File: center (x, dim)

If x is a vector, subtract its mean. If x is a matrix, do the above for each column. If the optional argument dim is given, perform the above operation along this dimension