F-statistics

F Statistics

For analyzing the quality of fits obtained with different parameter values, the variance of the fit (chi-square) is a very useful statistical quantity. The ratio of the chi-square of two fits is distributed like a Fisher (F) distribution. Therefore, this distribution can be used to judge if a given variance increase (e.g. after change of a parameter value) has a magnitude in a range that could occur just by statistical error in the data (assuming normally distributed noise), or if the variance increase is significant.

Sedfit has a F-statistics calculator in the Options menu. The calculator is used in the example for using Sedfit for testing the significance of a contaminating species, and the confidence limit of an s-value.

The principle mostly adopted when using the F-statistics for parameter error analysis is to constrain one parameter (the one for which the error estimates are being calculated) while optimizing (floating) all others to achieve the best-fit given this one constraint. Obviously, because of the constraint in the first parameter, the chi-square of the distribution increases. F-statistics can predict the increase of the sum of squares that is associated e.g. with one standard deviation contour of the parameters. This will depend also on the number of data points and the overall best-fit sum of squares. This procedure, as described in Bevington: Data Reduction in the Physical Sciences and in Press et al.: Numerical Recipes in C, incorporates correlation of the fit parameters into the error estimates, and does not make assumptions about the shape of error contour map. The F-statistics is implemented as described in Johnson & Straume (ref1).

A second usage for the F-statistics is to decide if the signal contribution from a species is significant or not.

Runs Test

The display after simulation and during the fitting procedure contains the runs-test. It gives a measure, Z, for how random the residuals are, based on the sequence of the sign (positive or negative) of the residuals. The Z-value gives the number of standard deviations by which the given number of runs is statistically different from the expectation value for normally distributed noise. For Z-values smaller than 2-4, the residuals are pretty random, for very large Z-values the residuals are non-random, and have systematics (runs or positive or negative residuals).

This feature is useful to assess the statistical quality of the fit, in particular, when the large number of scans does obscure their inspection for randomness.

This test is implemented as described in ML Johnson, Methods Enzymol 1992, (Numerical Methods volume 210) p 96.

References:

(1) M.L. Johnson and M. Straume (1994). Comments on the analysis of sedimentation equilibrium experiments. In: Modern Analytical Ultracentrifugation (Schuster TM and Laue TM, eds) Birkhauser, Boston, p.37-65