combine - Spectral combination utilities¶
Optimal combination in presence of outliers.
See also:
Requirements:
- python >= 3.6 (f-string)
- numpy >= v1.12 (https://github.com/numpy/numpy/issues/7720)
- matplotllib >= 2.2 (hist density kwarg)
outlier_clipping(x, dx[, method, clip, …]) |
Iterative outlier-clipping of x, dx along last axis. |
-
inspec.combine.sample_mean(x, dx)[source]¶ Standard (unweighted) mean and associated error along last axis.
-
inspec.combine.sample_weighted_mean(x, dx, intrinsic=0)[source]¶ Inverse-variance weighted mean (including intrinsic dispersion if any) and associated error along last axis.
-
inspec.combine.sample_median(x, dx)[source]¶ Median and associated error along last axis.
Reference: Kendall & Stuart (1977), via SWarp v2.21 User’s Guide
-
inspec.combine.mean_AS(x, dx=None, mu=None, itermax=0, fracmax=0.1, verbose=False)[source]¶ Robust mean estimate using Artifical Skepticism (Stetson 1989).
- Initial
mudefaults to median estimate. - If
itermax > 0orfracmax > 0, use iterative Expectation-Maximization algorithm up to convergence (step < fracmax × estimated error) or maximum iterations - Error on output
muneglect the covariance between weighted points (through previous estimate of mean).
Warning
native support for 1D-arrays only; multi-dimensional array support through sub-optimal index trickery.
- Initial
-
inspec.combine.zscore(x, dx=None, modified=False, intrinsic=0)[source]¶ Compute the z-score of observations x along last axis.
-
inspec.combine.pull(x, dx, fixed_mean=None, fixed_mean_error=0)[source]¶ Pull distribution from x, dx along last axis.
- Input data: x = \(x_i\), associated error dx = \(s_i\)
- Optimal (variance-weighted) mean: \(X = (\sum_i x_i/s_i^2) / (\sum_i 1/s_i^2)\)
- Variance on weighted mean: \(\sigma_X^2 = 1/(\sum_i 1/s_i^2)\)
- Pull: \(p_i = (x_i - X_i(x))/\sqrt{\sigma_{X_i}^2 + s_i^2}\) where \(X_i\) and \(\sigma_{X_i}^2\) are computed without point i.
If errors dx are correct, the pull distribution is centered on 0 with standard deviation of 1.
Pull is related to z-score \(z_i = (x_i - \bar{X_i})/s_i\) with \(\bar{X_i}\) and \(s_i\) being the sample mean and sample standard deviation, respectively.
-
inspec.combine.outlier_clipping(x, dx, method='pull', clip=5, side=0, maxiter=None, verbose=False, nmin=2)[source]¶ Iterative outlier-clipping of x, dx along last axis.
Availaible methods:
- sigma: unweighted sigma-clipping
- weighted: weighted sigma-clipping
- robust: nMAD-clipping
- pull: pull-clipping
Clipping is symmetric (two-tailed) if side = 0, or asymmetric (lower or upper one-tailed) if side = ±1.
Warning
N.ma.amax(keepdims=True)is buggy in numpy v1.11 (https://github.com/numpy/numpy/issues/7720). Use explicitN.ma.expand_dims.Warning
if input
xis a masked array, the mask will be updated with detected outliers. Provide a copy if you don’t want it to be modified.
-
inspec.combine.grubbs_gmax(n, alpha=0.05, one_sided=False)[source]¶ Critical value for Grubbs’ test for outliers.
-
inspec.combine.grubbs_test(x, dx=None, side=0, alpha=0.05, modified=False, verbose=False)[source]¶ Grubbs’ test for outliers (Grubbs 1969 and Stefansky 1972, also known as maximum normed residual test) is used to detect a single outlier in a univariate data set that follows an approximately normal distribution.
>>> t = N.array([199.31, 199.53, 200.19, 200.82, 201.92, 201.95, 202.18, 245.57]) >>> grubbs_test(t, side=+1, alpha=0.05, modified=False, verbose=True) H0: there are no outliers in the data Ha: the maximum value is an outlier <BLANKLINE> Test statistic: G = [ 2.46876461] Significance level: α = 0.05 Critical value for an upper one-tailed test: G_max = 2.03165200155 Critical region: Reject H0 if G > G_max array([False, False, False, False, False, False, False, True])
Source: NIST/SEMATECH e-Handbook of Statistical Methods
Warning
Because of potential screening effects (multiple outliers weaken the single-outlier test performance), it is not appropriate to apply a single-outlier test sequentially in order to detect multiple outliers.
-
inspec.combine.chauvenet_dmax(n, p=0.5, one_sided=False)[source]¶ Critical value for Chauvenet criterion for outliers.
-
inspec.combine.chauvenet_criterion(x, dx=None, side=0, p=0.5, modified=False, verbose=False)[source]¶ Chauvenet criterion to identify outliers in a univariate data set that follows an approximately normal distribution.
>>> t = N.array([199.31, 199.53, 200.19, 200.82, 201.92, 201.95, 202.18, 245.57]) >>> chauvenet_criterion(t, side=0, p=0.5, modified=False, verbose=True) H0: there are no outliers in the data Ha: the extremal value are outliers <BLANKLINE> Test statistic: D = [0.45 0.44 0.39 0.35 0.28 0.28 0.27 2.47] Rejection probability: P = 0.5 Critical value for a two-tailed test: D_max = 2.15387469406 Critical region: Reject H0 if D > D_max array([False, False, False, False, False, False, False, True])
-
inspec.combine.detection_rates(outliers, identified, rates=('TPR', 'TNR', 'MCC'), verbose=True)[source]¶ Compute various success and failure rates by comparing true outliers to identified ones.
-
inspec.combine.probplot(x, dist='norm', sparams=(), fit=True, ax=None, **kwargs)[source]¶ Probability plot (observed vs. expected quantiles).
Note
Just a wrapper to
scipy.stats.probplot.
-
inspec.combine.plot_rv(rv, eps=0.001, ax=None, **kwargs)[source]¶ Plot (frozen) random variable PDF.
Plot (x, PDF(x)) for (frozen) random variable ‘rv’, in PPF range (eps, 1 - eps).