combine - Spectral combination utilities

Optimal combination in presence of outliers.

See also:

Requirements:

outlier_clipping(x, dx[, method, clip, …]) Iterative outlier-clipping of x, dx along last axis.
inspec.combine.sample_mean(x, dx)[source]

Standard (unweighted) mean and associated error along last axis.

inspec.combine.sample_weighted_mean(x, dx, intrinsic=0)[source]

Inverse-variance weighted mean (including intrinsic dispersion if any) and associated error along last axis.

inspec.combine.sample_median(x, dx)[source]

Median and associated error along last axis.

Reference: Kendall & Stuart (1977), via SWarp v2.21 User’s Guide

inspec.combine.weights_AS(x, dx, mu)[source]
inspec.combine.mean_AS(x, dx=None, mu=None, itermax=0, fracmax=0.1, verbose=False)[source]

Robust mean estimate using Artifical Skepticism (Stetson 1989).

  • Initial mu defaults to median estimate.
  • If itermax > 0 or fracmax > 0, use iterative Expectation-Maximization algorithm up to convergence (step < fracmax × estimated error) or maximum iterations
  • Error on output mu neglect the covariance between weighted points (through previous estimate of mean).

Warning

native support for 1D-arrays only; multi-dimensional array support through sub-optimal index trickery.

inspec.combine.zscore(x, dx=None, modified=False, intrinsic=0)[source]

Compute the z-score of observations x along last axis.

inspec.combine.pull(x, dx, fixed_mean=None, fixed_mean_error=0)[source]

Pull distribution from x, dx along last axis.

  • Input data: x = \(x_i\), associated error dx = \(s_i\)
  • Optimal (variance-weighted) mean: \(X = (\sum_i x_i/s_i^2) / (\sum_i 1/s_i^2)\)
  • Variance on weighted mean: \(\sigma_X^2 = 1/(\sum_i 1/s_i^2)\)
  • Pull: \(p_i = (x_i - X_i(x))/\sqrt{\sigma_{X_i}^2 + s_i^2}\) where \(X_i\) and \(\sigma_{X_i}^2\) are computed without point i.

If errors dx are correct, the pull distribution is centered on 0 with standard deviation of 1.

Pull is related to z-score \(z_i = (x_i - \bar{X_i})/s_i\) with \(\bar{X_i}\) and \(s_i\) being the sample mean and sample standard deviation, respectively.

inspec.combine.outlier_clipping(x, dx, method='pull', clip=5, side=0, maxiter=None, verbose=False, nmin=2)[source]

Iterative outlier-clipping of x, dx along last axis.

Availaible methods:

  • sigma: unweighted sigma-clipping
  • weighted: weighted sigma-clipping
  • robust: nMAD-clipping
  • pull: pull-clipping

Clipping is symmetric (two-tailed) if side = 0, or asymmetric (lower or upper one-tailed) if side = ±1.

Warning

N.ma.amax(keepdims=True) is buggy in numpy v1.11 (https://github.com/numpy/numpy/issues/7720). Use explicit N.ma.expand_dims.

Warning

if input x is a masked array, the mask will be updated with detected outliers. Provide a copy if you don’t want it to be modified.

inspec.combine.grubbs_gmax(n, alpha=0.05, one_sided=False)[source]

Critical value for Grubbs’ test for outliers.

inspec.combine.grubbs_test(x, dx=None, side=0, alpha=0.05, modified=False, verbose=False)[source]

Grubbs’ test for outliers (Grubbs 1969 and Stefansky 1972, also known as maximum normed residual test) is used to detect a single outlier in a univariate data set that follows an approximately normal distribution.

>>> t = N.array([199.31, 199.53, 200.19, 200.82,
                 201.92, 201.95, 202.18, 245.57])
>>> grubbs_test(t, side=+1, alpha=0.05, modified=False, verbose=True)
H0:  there are no outliers in the data
Ha:  the maximum value is an outlier
<BLANKLINE>
Test statistic:      G = [ 2.46876461]
Significance level:  α = 0.05
Critical value for an upper one-tailed test: G_max = 2.03165200155
Critical region:     Reject H0 if G > G_max
array([False, False, False, False, False, False, False,  True])

Source: NIST/SEMATECH e-Handbook of Statistical Methods

Warning

Because of potential screening effects (multiple outliers weaken the single-outlier test performance), it is not appropriate to apply a single-outlier test sequentially in order to detect multiple outliers.

inspec.combine.chauvenet_dmax(n, p=0.5, one_sided=False)[source]

Critical value for Chauvenet criterion for outliers.

inspec.combine.chauvenet_criterion(x, dx=None, side=0, p=0.5, modified=False, verbose=False)[source]

Chauvenet criterion to identify outliers in a univariate data set that follows an approximately normal distribution.

>>> t = N.array([199.31, 199.53, 200.19, 200.82,
                 201.92, 201.95, 202.18, 245.57])
>>> chauvenet_criterion(t, side=0, p=0.5, modified=False, verbose=True)
H0:  there are no outliers in the data
Ha:  the extremal value are outliers
<BLANKLINE>
Test statistic:         D = [0.45  0.44  0.39  0.35  0.28  0.28  0.27  2.47]
Rejection probability:  P = 0.5
Critical value for a two-tailed test: D_max = 2.15387469406
Critical region:        Reject H0 if D > D_max
array([False, False, False, False, False, False, False,  True])
inspec.combine.detection_rates(outliers, identified, rates=('TPR', 'TNR', 'MCC'), verbose=True)[source]

Compute various success and failure rates by comparing true outliers to identified ones.

Source: https://en.wikipedia.org/wiki/Confusion_matrix

inspec.combine.plot_pull(x, dx, fixed_mean=None, ax=None, **kwargs)[source]

Pull histogram.

inspec.combine.probplot(x, dist='norm', sparams=(), fit=True, ax=None, **kwargs)[source]

Probability plot (observed vs. expected quantiles).

Note

Just a wrapper to scipy.stats.probplot.

inspec.combine.plot_rv(rv, eps=0.001, ax=None, **kwargs)[source]

Plot (frozen) random variable PDF.

Plot (x, PDF(x)) for (frozen) random variable ‘rv’, in PPF range (eps, 1 - eps).

inspec.combine.plot_samples(x, dx=None, outliers=None, identified=None)[source]

Plot sample points, along with marginalized histogram.

Flag outliers in red if any.

inspec.combine.plot_comparison(x, y, dx=None, dy=None, xlabel='x', ylabel='y')[source]