It is possible to fit functional forms to data points read from files by using the fit command. A simple example might be:1
f(x) = a*x+b fit f() 'data.dat' index 1 using 2:3 via a,b
The first line specifies the functional form which is to be used. The coefficients within this function which are to be varied during the fitting process are listed after the keyword via in the fit command. The modifiers index, every and using have the same meanings here as in the plot command.2 For example, given the following data file which contains a sampled square wave, entitled “square.dat”:
0.314159 1 0.942478 1 1.570796 1 2.199115 1 2.827433 1 3.455752 -1 4.084070 -1 4.712389 -1 5.340708 -1 5.969026 -1
the following script fits a truncated Fourier series to it. The output can be found in Figure 2.3.
f(x) = a1*sin(x) + a3*sin(3*x) + a5*sin(5*x) fit f() 'square.dat' via a1, a3, a5 set xlabel '$x$' ; set ylabel '$y$' plot 'square.dat' title 'data' with points pointsize 2, \ f(x) title 'Fitted function' with lines
This is useful for producing best-fit lines3, and also has applications for estimating the gradients of datasets. The syntax is essentially identical to that used by Gnuplot, though a few points are worth noting:
When fitting a function of variables, at least
columns (or rows – see Section 4.4) must be specified after the using modifier. By default, the first
columns are used. These correspond to the values of each of the
inputs to the function, plus finally the value which the output from the function is aiming to match.
If an additional column is specified, then this is taken to contain the standard error in the value that the output from the function is aiming to match, and can be used to weight the data points which are input into the fit command.
By default, the starting values for each of the fitting parameters is . However, if the variables to be used in the fitting process are already set before the fit command is called, these initial values are used instead. For example, the following would use the initial values
:
f(x) = a*x+b a = 100 b = 50 fit f() 'data.dat' index 1 using 2:3 via a,b
As with all numerical fitting procedures, the fit command comes with caveats. It uses a generic fitting algorithm, and may not work well with poorly behaved or ill-constrained problems. It works best when all of the values it is attempting to fit are of order unity. For example, in a problem where was of order
, the following might fail:
f(x) = a*x fit f() 'data.dat' via a
However, better results might be achieved if were artificially made of order unity, as in the following script:
f(x) = 1e10*a*x fit f() 'data.dat' via a
A series of ranges may be specified after the fit command, using the same syntax as in the plot command, as described in Section 2.10. If ranges are specified then only data points falling within these ranges are used in the fitting process; the ranges refer to each of the variables of the fitted function in order.
For those interested in the mathematical details, the workings of the fit command is discussed in more detail in Chapter D.
At the end of the fitting process, the best-fitting values of each parameter are output to the terminal, along with an estimate of the uncertainty in each. Additionally, the Hessian, covariance and correlation matrices are output in both human-readable and machine-readable formats, allowing a more complete assessment of the probability distribution of the parameters.
Footnotes