statistics - Statistical utilities

RobustFitter([loss, optimizer])
fit_adaptive_polynom(x, y[, v, pmax, dmax, …]) Adaptive continuum fit.
fit_adaptive_emilines(x, y[, v, pmax, nmax, …]) Adaptive emission line fit.
fit_adaptive_spectrum(x, y[, v, pmax, dmax, …]) Adaptive continuum + emission line fit.
inspec.statistics.rms(arr, weights=None, axis=None)[source]

[Weighted] RMS.

inspec.statistics.mad(a, axis=None)[source]

Median Absolute Deviation of an array along given axis.

inspec.statistics.sample_mean(x, vx=None, axis=None)[source]

Sample mean and associated error.

Compute [weighted] arithmetic mean and variance on mean of x along given axis. If variance vx is given, the computations are inverse variance-weighted.

inspec.statistics.sample_std(x, axis=None, normal=False)[source]

Sample standard deviation and associated error.

Compute standard deviation (and its standard error) of x along given axis, from estimate of variance:

\[\begin{split}Var(\sigma^2) &= \frac{1}{N} \left(m^4 - \frac{N-3}{N-1}\sigma^4\right) \\ &\simeq \frac{1}{N} × 2\sigma^4 \quad\text{in the normal approximation}\end{split}\]

See also: scipy.stats.bayes_mvs

inspec.statistics.get_range(x, range=None, log=False, percentiles=False)[source]

Range utility.

Get range from x and range = (min, max) or None. If min (resp. max) is None, xmin is set to min of x, or of strictly positive x if log (resp. xmax is set to the max of x). If percentiles, range is actually expressed in percentiles (in percents).

>>> import numpy as N
>>> get_range(N.linspace(0, 10, 101), range=(5, 95), percentiles=True)
(0.5, 9.5)
inspec.statistics.hist_binwidth(x, choice='FD', range=None, percentiles=False)[source]

Optimal histogram binwidth.

Choices are:

  • ‘S’: Scott’s choice
  • ‘FD’: Freedman & Diaconis (1981), fast, fair if single-peaked [default]
  • ‘SS’: Shimazaki and Shinomoto (2007), slow, best choice if double-peaked
  • ‘BR’: Birgé and Rozenholc (2006), slow

Analysis is restricted to range*=(*min, max) if not None (full range by default).

References:

See also: similar functions in Choosing Histogram Bins in Astropy v1.1.

Warning

deprecated, use N.histogram_bin_edges.

inspec.statistics.hist_nbin(x, choice='FD', range=None, percentiles=False)[source]

Optimal number of bins. See hist_binwidth() for details.

inspec.statistics.hist_bins(x, choice='FD', range=None, percentiles=False, log=False)[source]

Optimal binning. See hist_binwidth() for details.

inspec.statistics.savitzky_golay(data, kernel=11, order=4, derivative=0)[source]

Savitzky-Golay filter.

Parameters:
  • data – input numpy 1D-array
  • kernel – a positive odd integer > order+2 giving the kernel size
  • order – order of the polynomial
  • derivative – 1 or 2 for 1st or 2nd derivatives
Returns:

smoothed data as a numpy array

inspec.statistics.p_better_fit(dchi2, ddof)[source]

Is better fit if p < pmax.

class inspec.statistics.RobustFitter(loss='pseudo_huber', optimizer=<class 'astropy.modeling.optimizers.Simplex'>)[source]
classmethod get_loss(loss)[source]

loss='xxx' is a loss function name (loss_xxx) or a compound ‘loss_neg/loss_pos’ name.

classmethod loss_squared(a)[source]
classmethod loss_squaredtop(a, ascale=1.0)[source]
classmethod loss_biweight(a, ascale=1.0)[source]
classmethod loss_pseudo_huber(a, ascale=1.0)[source]
classmethod loss_cauchy(a, ascale=1.0)[source]
classmethod derivative(loss_fn, eps=1e-06)[source]
static residuals(measured_vals, updated_model, y_sigma, x)[source]
statistic(measured_vals, updated_model, y_sigma, x)[source]
classmethod plot_losses(amax=3, loss=None)[source]

loss can be a list of valid loss function names.

inspec.statistics.fit_adaptive_polynom(x, y, v=None, pmax=0.05, dmax=3, loss='pseudo_huber', verbose=0)[source]

Adaptive continuum fit.

Fit increasingly higher-order polynoms until it does not significantly improve the objective function (according to the likelihood-ratio test). Use specific loss function, or standard (weighted) least-squares if null.

inspec.statistics.find_peak(y, sgfilter=(11, 4), dy=None)[source]

Find significant emission lines.

Locate maximum of y, potentially after Savitzky-Golay filtering.

Return position (index), peak amplitude and estimate of stddev.

inspec.statistics.fit_adaptive_emilines(x, y, v=None, pmax=0.05, nmax=3, sgfilter=(11, 4), verbose=0)[source]

Adaptive emission line fit.

Fit increasing number of emission lines until it does not significantly improve the chi2 (according to the likelihood-ratio test).

inspec.statistics.fit_adaptive_spectrum(x, y, v=None, pmax=0.05, dmax=3, nmax=3, verbose=0)[source]

Adaptive continuum + emission line fit.

Fit adaptive polynomial continuum and emission lines until it does not significantly improve the objective function (according to the likelihood-ratio test).

Return continuum, lines.