Next: , Up: Statistics

### 25.1 Basic Statistical Functions

— Function File: mean (x, dim, opt)

If x is a vector, compute the mean of the elements of x

```          mean (x) = SUM_i x(i) / N
```

If x is a matrix, compute the mean for each column and return them in a row vector.

With the optional argument opt, the kind of mean computed can be selected. The following options are recognized:

`"a"`
Compute the (ordinary) arithmetic mean. This is the default.
`"g"`
Computer the geometric mean.
`"h"`
Compute the harmonic mean.

If the optional argument dim is supplied, work along dimension dim.

Both dim and opt are optional. If both are supplied, either may appear first.

— Function File: median (x)

If x is a vector, compute the median value of the elements of x.

```                      x(ceil(N/2)),             N odd
median(x) =
(x(N/2) + x((N/2)+1))/2,  N even
```

If x is a matrix, compute the median value for each column and return them in a row vector.

— Function File: std (x)
— Function File: std (x, opt)
— Function File: std (x, opt, dim)

If x is a vector, compute the standard deviation of the elements of x.

```          std (x) = sqrt (sumsq (x - mean (x)) / (n - 1))
```

If x is a matrix, compute the standard deviation for each column and return them in a row vector.

The argument opt determines the type of normalization to use. Valid values are

0:
normalizes with N-1, provides the square root of best unbiased estimator of the variance [default]
1:
normalizes with N, this provides the square root of the second moment around the mean

The third argument dim determines the dimension along which the standard deviation is calculated.

— Function File: cov (x, y)

If each row of x and y is an observation and each column is a variable, the (i, j)-th entry of `cov (`x`, `y`)` is the covariance between the i-th variable in x and the j-th variable in y. If called with one argument, compute `cov (`x`, `x`)`.

— Function File: corrcoef (x, y)

If each row of x and y is an observation and each column is a variable, the (i, j)-th entry of `corrcoef (`x`, `y`)` is the correlation between the i-th variable in x and the j-th variable in y. If called with one argument, compute `corrcoef (`x`, `x`)`.

— Function File: kurtosis (x, dim)

If x is a vector of length N, return the kurtosis

```          kurtosis (x) = N^(-1) std(x)^(-4) sum ((x - mean(x)).^4) - 3
```

of x. If x is a matrix, return the kurtosis over the first non-singleton dimension. The optional argument dim can be given to force the kurtosis to be given over that dimension.

— Function File: mahalanobis (x, y)

Return the Mahalanobis' D-square distance between the multivariate samples x and y, which must have the same number of components (columns), but may have a different number of observations (rows).

— Function File: skewness (x, dim)

If x is a vector of length n, return the skewness

```          skewness (x) = N^(-1) std(x)^(-3) sum ((x - mean(x)).^3)
```

of x. If x is a matrix, return the skewness along the first non-singleton dimension of the matrix. If the optional dim argument is given, operate along this dimension.

— Function File: values (x)

Return the different values in a column vector, arranged in ascending order.

— Function File: var (x)

For vector arguments, return the (real) variance of the values. For matrix arguments, return a row vector contaning the variance for each column.

The argument opt determines the type of normalization to use. Valid values are

0:
normalizes with N-1, provides the square root of best unbiased estimator of the variance [default]
1:
normalizes with N, this provides the square root of the second moment around the mean

The third argument dim determines the dimension along which the variance is calculated.

— Function File: [t, l_x] = table (x)
— Function File: [t, l_x, l_y] = table (x, y)

Create a contingency table t from data vectors. The l vectors are the corresponding levels.

Currently, only 1- and 2-dimensional tables are supported.

— Function File: studentize (x, dim)

If x is a vector, subtract its mean and divide by its standard deviation.

If x is a matrix, do the above along the first non-singleton dimension. If the optional argument dim is given then operate along this dimension.

— Function File: statistics (x)

If x is a matrix, return a matrix with the minimum, first quartile, median, third quartile, maximum, mean, standard deviation, skewness and kurtosis of the columns of x as its rows.

If x is a vector, treat it as a column vector.

— Function File: spearman (x, y)

Compute Spearman's rank correlation coefficient rho for each of the variables specified by the input arguments.

For matrices, each row is an observation and each column a variable; vectors are always observations and may be row or column vectors.

`spearman (`x`)` is equivalent to `spearman (`x```, ```x`)`.

For two data vectors x and y, Spearman's rho is the correlation of the ranks of x and y.

If x and y are drawn from independent distributions, rho has zero mean and variance `1 / (n - 1)`, and is asymptotically normally distributed.

— Function File: run_count (x, n)

Count the upward runs along the first non-singleton dimension of x of length 1, 2, ..., n-1 and greater than or equal to n. If the optional argument dim is given operate along this dimension

— Function File: ranks (x, dim)

If x is a vector, return the (column) vector of ranks of x adjusted for ties.

If x is a matrix, do the above for along the first non-singleton dimension. If the optional argument dim is given, operate along this dimension.

— Function File: range (x)
— Function File: range (x, dim)

If x is a vector, return the range, i.e., the difference between the maximum and the minimum, of the input data.

If x is a matrix, do the above for each column of x.

If the optional argument dim is supplied, work along dimension dim.

— Function File: [q, s] = qqplot (x, dist, params)

Perform a QQ-plot (quantile plot).

If F is the CDF of the distribution dist with parameters params and G its inverse, and x a sample vector of length n, the QQ-plot graphs ordinate s(i) = i-th largest element of x versus abscissa q(if) = G((i - 0.5)/n).

If the sample comes from F except for a transformation of location and scale, the pairs will approximately follow a straight line.

The default for dist is the standard normal distribution. The optional argument params contains a list of parameters of dist. For example, for a quantile plot of the uniform distribution on [2,4] and x, use

```          qqplot (x, "uniform", 2, 4)
```

If no output arguments are given, the data are plotted directly.

— Function File: probit (p)

For each component of p, return the probit (the quantile of the standard normal distribution) of p.

— Function File: [p, y] = ppplot (x, dist, params)

Perform a PP-plot (probability plot).

If F is the CDF of the distribution dist with parameters params and x a sample vector of length n, the PP-plot graphs ordinate y(i) = F (i-th largest element of x) versus abscissa p(i) = (i - 0.5)/n. If the sample comes from F, the pairs will approximately follow a straight line.

The default for dist is the standard normal distribution. The optional argument params contains a list of parameters of dist. For example, for a probability plot of the uniform distribution on [2,4] and x, use

```          ppplot (x, "uniform", 2, 4)
```

If no output arguments are given, the data are plotted directly.

— Function File: moment (x, p, opt, dim)

If x is a vector, compute the p-th moment of x.

If x is a matrix, return the row vector containing the p-th moment of each column.

With the optional string opt, the kind of moment to be computed can be specified. If opt contains `"c"` or `"a"`, central and/or absolute moments are returned. For example,

```          moment (x, 3, "ac")
```

computes the third central absolute moment of x.

If the optional argument dim is supplied, work along dimension dim.

— Function File: meansq (x)
— Function File: meansq (x, dim)

For vector arguments, return the mean square of the values. For matrix arguments, return a row vector contaning the mean square of each column. With the optional dim argument, returns the mean squared of the values along this dimension

— Function File: logit (p)

For each component of p, return the logit `log (`p``` / (1-```p`))` of p.

— Function File: kendall (x, y)

Compute Kendall's tau for each of the variables specified by the input arguments.

For matrices, each row is an observation and each column a variable; vectors are always observations and may be row or column vectors.

`kendall (`x`)` is equivalent to `kendall (`x```, ```x`)`.

For two data vectors x, y of common length n, Kendall's tau is the correlation of the signs of all rank differences of x and y; i.e., if both x and y have distinct entries, then

```                   1
tau = -------   SUM sign (q(i) - q(j)) * sign (r(i) - r(j))
n (n-1)   i,j
```

in which the q(i) and r(i) are the ranks of x and y, respectively.

If x and y are drawn from independent distributions, Kendall's tau is asymptotically normal with mean 0 and variance `(2 * (2`n`+5)) / (9 * `n` * (`n`-1))`.

— Function File: iqr (x, dim)

If x is a vector, return the interquartile range, i.e., the difference between the upper and lower quartile, of the input data.

If x is a matrix, do the above for first non singleton dimension of x.. If the option dim argument is given, then operate along this dimension.

— Function File: cut (x, breaks)

Create categorical data out of numerical or continuous data by cutting into intervals.

If breaks is a scalar, the data is cut into that many equal-width intervals. If breaks is a vector of break points, the category has `length (`breaks`) - 1` groups.

The returned value is a vector of the same size as x telling which group each point in x belongs to. Groups are labelled from 1 to the number of groups; points outside the range of breaks are labelled by `NaN`.

— Function File: cor (x, y)

The (i, j)-th entry of `cor (`x`, `y`)` is the correlation between the i-th variable in x and the j-th variable in y.

For matrices, each row is an observation and each column a variable; vectors are always observations and may be row or column vectors.

`cor (`x`)` is equivalent to `cor (`x`, `x`)`.

— Function File: cloglog (x)

Return the complementary log-log function of x, defined as

```          - log (- log (x))
```

— Function File: center (x)
— Function File: center (x, dim)

If x is a vector, subtract its mean. If x is a matrix, do the above for each column. If the optional argument dim is given, perform the above operation along this dimension