6.5 Example Optimisers

There are a couple example Optimistaion Solutions provided in optimization/kick/examples/ as example such as DoNothing.py and RandomSearch.py as well as a more complex implementation of a Genetic Algorithm with simulated annealing in SA.py. These Optimisers all inherit from the OptimMethod base class which serves to maintain some consistency across different optimisers.

6.5.1 DoNothing.py

Code: examples/DoNothing.py.

This literally does no optimisation at all. It takes in the input individual, calculates its score, and just returns this. This is basically the simplest form of using the interface.

6.5.2 RandomSearch.py

Code can be found in: examples/RandomSearch.py.

This method starts off with a basic solution, and then iteratively changes it slightly. As soon as this updated solution is better than the current one, it takes that as its current best solution.

The mutation is the line new_indiv = current_indiv.mutate_and_copy(0.05), which basically randomly perturbs 5% of the values in the parameter list.

In pseudocode:

current = input parameters
loop
        new = slightly_change(current)
        if new is better than current:
                current = new

return current

6.5.3 SA.py

Code can be found in: examples/SA.py.

Simulated annealing (https://en.wikipedia.org/wiki/Simulated_annealing) is an optimisation algorithm that minimises a function (which is why we negate the fitness, as we want to maximise that). It does so in a similar way to random search, but much more cleverly. The fundamental idea is that you still have a current solution and candidate (that is obtained by mutate_and_copy). In random search, you only changed the current solution if the new one was strictly better. If this is the case, then SA also chooses this new solution, but it also has a mechanism for changing the current solution even if the candidate is worse. This helps a lot with local minima.

There are 2 key parts to this:

Temperature (T): This basically calculates how random the solutions must be. When the temperature is high, then we choose worse solutions more often (think of gas in a container, it moves around much more frequently / faster when it is hot). This temperature is slowly reduced throughout the algorithm, so that at the beginning it is very random, and swaps to worse solutions often (i.e. exploration). At the end, as the temperature decreases, it becomes more unlikely to swap to a worse solution, and it becomes closer to the random search method above.
Acceptance probability. The math is not terribly important, but the intuition is that it should be less likely to swap to a solution that is much worse.

The above are quantified in the function should_swap. We swap if the new score is better (the if new_score > 0: is if the score was invalid, probably caused by some simulator issue, or really bad parameters).

Then, we swap with a probability of exp(-(new_score - current_score) / T), i.e. higher if new_score is larger and higher if T is higher (if T = infinity, exp(a/T) = exp(0) = 1 -> 100% probability).

    def should_swap(self, current_score: float, new_score: float):
        if new_score < current_score:
            return True
        if new_score > 0:
            return False
        p = new_score - current_score
        K = -p / self.T
        should = np.exp(K) < np.random.rand()
        print(f"new score = {new_score}, old = {current_score}, K = {K}, P = {np.exp(K)}, should swap = {should}")
        return should

The structure of this algorithm is:

current = input parameters
T = 1 # any start temp 
loop
        new = slightly_change(current)
        if new is better than current:
                current = new
        else:
                choose new, with probability P
                P is higher the smaller the difference between new and current is.
                P is higher when the temperature is higher.
        
        T *= 0.99 # make temperature lower.

return current

6.5.4 Bonus - Genetic Algorithm (GA)

The code is here optimization/lib/optim/ga.py. Beware though.

Usually, genetic algorithm (https://en.wikipedia.org/wiki/Genetic_algorithm) consist of a population of individuals, each possessing a genotype (e.g. the list of parameters), which can be thought of as the individual’s genes. This genotype impacts the phenotype, which can be thought of as the manifestation of the genotype in the problem domain (e.g. the behaviour of the robot when using these parameters).

High performing individuals are combined using crossover, which is the process of combining two parents to form new individuals for the next generation. This new generation is also mutated slightly to facilitate exploration and prevent stagnation. In general, ‘high performing’ is quantified by the fitness function (e.g. how far does the agent kick in our case). For example, when maximising a function f(x), the fitness function could simply br the function value.