In my last post, where I wrote about sequential parameter optimization, I asked you if you did some assumptions about the underlying function when trying to find the optimal spot. (If you missed this and you’re interested in an easy introduction to sequential parameter optimization you might like to have a look here.)
But back to the assumptions. If you don’t want to do a random search, you definitely take some assumptions about the underlying function. That might be something simple, like ‘following the steepest path leads to the highest spot’ or more complex assumptions about the nature of the function, like ‘it might be polynomial’.
The former assumption can be used for a simple search strategy, the latter for building a surrogate model that hopefully resembles the function.
A good surrogate model enables to find better spots with lesser function evaluations, which might be expensive. In this post I want to introduce a very simple surrogate model to give a general idea of surrogate models.
Hands on ‘Least Squares Regression’
Given the situation that you did some experiments where you changed only one parameter and measured the results. Your results might be a little noisy since you already exhausted your possibilities in terms of accuracy. But still, only after a small set of experiment setups you get the idea that there might be a linear relation between the parameter and the resulting value. If that would be the case, a straight line would be the best model to fit the data. The Least Squares Method is a technique to find a line that fits the data, minimizing the error between the line and the actual data.
In this little demo you can experiment yourself how this method works. By clicking into the white area you add points, that resemble the results of the experiment. At least two points are needed to fit a line. By clicking the button the algorithm tries to find the best line through these points, by minimizing the quadratic error, that results from deviation from the line in y-direction.
Theory of the method
So how does it work? We started with the assumption that there is a linear relation in the data, thus we want to find the line that fits the data best.
A line can be described as a function of x with y-intercept b and slope m as:
We said, we want to find the line, that fits the data ‘best’. In this case best would be the line that has the least sum of squared errors. Given known points of data we can define the error
between the line and the actual points as:
If we’re heading to minimize the sum of squared errors we’re looking at this term:
With this term we obtain a function of the parameters und
, which, for a set of given points
, returns the sum of squared errors this line would produce:
Hence, we’re searching the values for the parameters and
that would minimize the error of the line, in other words: we want to find the minimum of this function. To solve this minimization problem we determine the position where the partial derivatives of the function are zero.
This way we obtain a system of two linear equations. Now the first row is divided by and the linear system is written in matrix notation:
We simplify the notation by replacing the average values:
So our linear system now looks like this:
A solution for depending on
can already be read from the system here:
We obtain a complete solution by multiplying both sides of the equation with the inverse of our matrix:
The inverse of the matrix is:
And with this we get as solution for and
:
After multiplication the result for can still be simplified by using Steiner’s theorem, so that we end up with a term, that can be calculated quite easy:
We also get a solution for , but now already knowing the value
it is easier to refer to the short formula using
we already found in the first steps:
To actually fit the line of course you only need these two formulas – calculating these values is actually nearly all the little demo does 😀 Hope you liked it.