6.4 Codebase Explanation

The following video serves as an outline to the optimisation codebase as well as its key features and how the work together.


All skills such as WALK or KICK or even GETUP are defined by a set of parameters which govern the way in which the agent will perform these tasks. It is these parameters which we need to optimise for better performance. The codebase came with a framework for creating optimistaion tasks and we have added upon that.

6.4.1 Important Files and Folders

In General the important things regarding Opitimistaion to look for will be found in the optimization and paramfiles folder.

6.4.1.1 ParamFiles

Parameter files contain parameter values that the agent can load in at runtime. Files should be formatted with a set of parameters as key value pairs from strings to floats. The parameter name should be separated from its value with a tab (not a space) and parameters should be separated from each other with a single newline. Parameter files support C++ style comments of // and /* */ as well as #.

When agents are loaded onto the server their paramfile needs to specified with the --paramsfile <parameter_file> command line argument, and multiple parameter files can be loaded one after the other with newly loaded parameter values replacing the values of previously loaded parameters with the same name (key). Parameters are loaded into an std::map called namedParams. If you were to open start.sh or sample_start-optimisation.sh you would be able to see reference to this. However, for the case of our optimistaion you do not need to worry about that.

For example below are some the kick parameters used for the KICK_IK skill which can be found in the defaultParams_t.txt file. It is important to note that when performing optimistion in most cases it is not nessasry to optimise for all paramters but rather a subset of them. This will not result in better performance but reduced training times.

##########################
### IK KICK PARAMETERS ###
##########################

kick_ik_0_xoffset   -0.18184725746865413
kick_ik_0_yoffset   -0.007990019340567048
kick_ik_0_x0            0.09855534262963274
kick_ik_0_y0            0.04897226608420107
kick_ik_0_z0            0.06004895070570849
kick_ik_0_x1            -0.13267256199213984
kick_ik_0_y1            0.15055665409986765
kick_ik_0_z1            0.3048635084962904
kick_ik_0_x2            -0.075918848350498
kick_ik_0_y2            0.010843367764323163
kick_ik_0_z2            -0.03228058151402973
kick_ik_0_x3            0.3514121512894722
kick_ik_0_y3            -0.0915098467211551
kick_ik_0_z3            0.2932735025335922
kick_ik_0_a0            -2.0713675817098482
kick_ik_0_b0            4.168030311789961
kick_ik_0_c0            -0.17712625804502546
kick_ik_0_a1            -2.3258316746549554
kick_ik_0_b1            9.39335694003392
kick_ik_0_c1            -5.4878969788579175
kick_ik_0_a2            2.254184572289742
kick_ik_0_b2            0.014404161833793745
kick_ik_0_c2            -16.34929405684522
kick_ik_0_a3            -0.1703513663364682
kick_ik_0_b3            77.12670393386878
kick_ik_0_c3            -21.212384580007893
kick_ik_0_wait          0.06679452466769868
kick_ik_0_scale         2.434596016520202
kick_ik_0_off3_0        6.8002354818317885
kick_ik_0_off4_0        23.957167469656504
kick_ik_0_off5_0        -7.433399813693172
kick_ik_0_off3_1        -16.624470935986754
kick_ik_0_off4_1        20.351676522363075
kick_ik_0_off5_1        -25.63678390762887
kick_ik_0_off3_2        -50.00201321637502
kick_ik_0_off4_2        -39.33897746613399
kick_ik_0_off5_2        54.047464010320134

kick_ik_0_max_displacement_right    0.025
kick_ik_0_max_displacement_left 0.025
kick_ik_0_max_displacement_top  0.025
kick_ik_0_max_displacement_bottom   0.025
kick_ik_0_cw_angle_thresh   2
kick_ik_0_ccw_angle_thresh  2
kick_ik_0_angle         0

6.4.1.2 sample_start-optimisation.sh

sample_start-optimisation.sh works in a similar way to how start.sh does for the normal team of agents. However in addition to loading agents it also runs the server and the visualiser if specified. It essentially was the default way to start an optimisation process as it handle the running of the tasks and the laoding of the paramfiles.

6.4.1.3 run.sh

This script simlply runs the run_optimisation.py file while setting up the correct PYTHONPATH.

6.4.1.4 run_optimisation.py

This python file basically initializes our optimisation algorithm and starts the optomisation process with the call to optimise. Firstly though our starting individual is initialised using the OptimMethod.from_filename(filename_to_start_with) method. filename_to_start_with is chosen when the main method is called as follows :

if __name__ == '__main__':
    # TODO: You can change this filepath to optimise other params
    # Or to start with a file that you previously optimised.
    main('../../paramfiles/new_Params_0_Kick.txt')

The paramfile you chose here needs to contain the selected parameters you want to optomise for, which parameters you chose is up to but we recommend using the IK parameters when optomising the kick.

start_indiv is of type Individual and is a simple numpy array of the paramters loaded from file.

6.4.1.5 Individual

This class is defined in kick/lib/optim/__pycache/ga.py and is used to hold the current parameter values of the individual used for population based optimisation. This class comes with built in functions to do crossover and mutation.

class Individual:
    def __init__(self, x: np.ndarray, range: np.ndarray):
        self.x = x
        self.fitness = 0
        self.range = range
    
    def crossover(self, other) -> "Individual":
        if len(self.x <= 2):
            k = np.random.randint(0, len(self.x)-1)
        else:
            k = np.random.randint(1, len(self.x)-1)
        out = np.zeros_like(self.x)
        out[:k] += self.x[:k]

        out[k:] += other.x[k:]
        return Individual(out, self.range)


        ## 000000
        ## 111111
        # k = 2
        # 00 1111
        ## 000 111

    def mutate(self, prob):
        t = np.random.rand(self.x.shape[0]) <= prob
        if t.sum():
            self.x[t] += (2 * np.random.rand(self.x[t].shape[0]) - 1) * self.range[t] # np.maximum((, 0)
    
    def mutate_single_variable(self):
        t = np.random.randint(0, self.x.shape[0])
        self.x[t] += (2 * np.random.rand() - 1) * self.range[t]
    
    def mutate_and_copy(self, prob: float=0.05) -> "Individual":
        """This returns a mutated copy of an individual without modifying the original
        """
        c = copy.deepcopy(self)
        c.mutate(prob)
        return c

6.4.1.6 OptimMethod

This class serves as the base class for the optimisation methods we have been implementing, if you wish to create your own optimisation algorithm you should base it off of the examples which inherit from this class.

In gerneral this class implements a few functions namely:

  1. __init__ - which initialises the optimisation method with important information in particular the init_x which is the initial individual as well as the config_runner which is explained further down. Here you can also specify where to read from and save results to.

  2. optimise this method starts the process of learning better paramaters, most importantly it needs to execute the run method and save the best individual which is returned. This function does not need to be tampered with or overloaded for the most part.

  3. run this is the most important method as it is how where you will find the implementation of your optimisation algorithm. You will see each of the example optimisation algorithms overlaod this function with their own specific functionality.

  4. get_fitness

  5. from_filename

  6. save_indiv

class OptimMethod():
    """
        Base class for optimisation approach. This should provide utilities and things to do common operations like:
        1. Reading a file
        2. Getting fitness
        3. etc.
    """
    def __init__(self, init_x: Individual,
                       labels: List[str],
                       unique_identifier: str = 'latest_run',
                       verbose: bool=True,
                       task_name: str = 'kick_ik') -> None:
        self.config_runner: ConfigRunner = ConfigRunner(task_name, 0, False, relative_path='..')
        self.verbose = verbose
        self.index = int(time.time())
        self.labels = labels
        self.init_x: Individual = init_x
        self.best_score_ever: float = -1
        
        self.save_dir = os.path.join('results', 'sa', f"mode_{self.config_runner.optim_task}", f"type_{self.config_runner.unit_type}", f"{get_date()}-norm_True_{self.config_runner.optim_task}")

        self.unique_identifier = unique_identifier
        os.makedirs('results/recent', exist_ok=True)
        self.overwrite_filename = os.path.join('results/recent', f'{unique_identifier}.txt')

        os.makedirs(self.save_dir, exist_ok=True)
        self.filename = os.path.join(self.save_dir, 'best_params.txt')

    def optimise(self, N: int) -> Tuple[Individual, float]:
        """Run this when doing optimisation.

        Args:
            N (int): The number of steps to perform.

        Returns:
            Tuple[Individual, float]: Best indiv, best score
        """
        best_indiv, best_score = self.run(N)
        self.save_indiv(best_indiv, best_score)
        return best_indiv, best_score


    def run(self, N: int) -> Tuple[Individual, float]:
        """You need to implement this. Return the best individual and score.
            After doing optimisation
        """
        raise NotImplementedError("Do this in the subclass")

    def get_fitness(self, agent: Individual) -> float:
        idx = self.index
        val = self.config_runner.denorm(agent.x, self.labels)
        write_to_file('out_params%d.txt' % (idx), self.labels, self.config_runner.denorm(val, self.labels))
        p = Process(target=lambda: self.config_runner.run_robocup(idx, val))
        s = timer()
        p.start()        
        p.join()
        e = timer()
        if self.verbose:
            print(f"One run took {round((e - s)*100)/100}s")

        a_file = "out%d.txt" % idx
        a_exist = os.path.exists(a_file)
        total_time_waiting = 0
        P = 30
        while not a_exist:
            a_exist = a_exist = os.path.exists(a_file)
            time.sleep(5)
            total_time_waiting += 5
            if total_time_waiting >= P:
                break;
        a_exist = a_exist = os.path.exists(a_file)    
        if total_time_waiting >= P and not a_exist:
            print("Waited very long for out.txt to load. Going to kill all simulators and do something with the fitnesses")
            kill_all_simulators()
            # now for the files that weren't available
            fitness = -1000
        else:
            S = read_from_file("./out%d.txt" % (idx))
            fitness = float(S[0])
        try:
            os.unlink(a_file)
            os.unlink('out_params%d.txt' % (idx))
        except Exception as e:
            print("Error deleting", e)
        return fitness

    @staticmethod
    def from_filename(filename: str, range: float = 0.1) -> Tuple[Individual, List[str]]:
        labels, values = get_labels_and_values(read_from_file(filename))
        values = np.array(values)
        # assert type(values) == np.ndarray
        range = np.ones_like(values) * range
        return (Individual(values, range=range), labels)

    def save_indiv(self, agent: Individual, score: float=-1):
        if self.best_score_ever < score:
            self.best_score_ever = score
            self.best_agent = copy.deepcopy(agent)
            # write to a file.
            print(f"\tWriting best results (with score {self.best_score_ever} now) to {self.filename} and {self.overwrite_filename}")
            write_to_file(self.filename, self.labels, self.config_runner.denorm(agent.x, self.labels))
            write_to_file(self.overwrite_filename, self.labels, self.config_runner.denorm(agent.x, self.labels))

6.4.1.7 ConfigRunner

6.4.2 Workflow

  1. You run run.sh

  2. This runs run_optimisation.py with the correct paths

  3. This initializes our defined OptimMethod and loads our initial Paramfile

  4. During this initialisation a ConfigRunner is setup which contains a function called run_robocup which will be used to actually setup and run the robocup environment which we can then obtain fitness information from.

  5. Inside run_optimisation.py we call optimise

  6. The optimise method calls the run method whose base implementation is found in OptimMethod however it is overloaded in your own implementation as can be seen in the examples such as RandomSearch.py

  7. Inside run we call get_fitness on the current Individual after which we enter the training loop as seen here in RandomSearch.py - run :

for i in range(N):
            print(f"Step {i + 1} / {N}. Current score = {np.round(current_score, 2)}. Overall best = {self.best_score_ever}")

            # get a new, random individual and score
            new_indiv = current_indiv.mutate_and_copy(0.05)
            new_score = self.get_fitness(new_indiv)
            
            # Swap if better
            if current_score < new_score:
                current_indiv, current_score = new_indiv, new_score

            # Save every now and then
            if i % 100 == 0:
                self.save_indiv(current_indiv, current_score)
  1. Inside get_fitness which is implemented in the base OptimMethod class we then define a new Process which runs the run_robocup method defined by the ConfigRunner as mentioned above.

  2. Inside run_robocup we run the sample_start-optimization.sh script which is what starts the server and visualiser if selected as well as loading the agent defined to complete the designated task.

  3. Inside sample_start-optimization.sh we start the server as follows:

rcssserver3d --agent-port $SPARK_AGENTPORT --server-port $SPARK_SERVERPORT & 
PID=$!

we start the visualiser if the flag is chosen

#To view task while it runs uncomment the following line
if [ "$do_vis" == "true" ]; then
  echo "DOING VIS"
  /roboviz/roboviz.sh --serverPort=$SPARK_SERVERPORT &
fi
# 

Lastly we load the desired agent which is the behaviour type which represents the task we want to optomise

  $DIR_SCRIPT/../agentspark --unum 2 --type $TYPE --paramsfile $DIR_SCRIPT/../paramfiles/defaultParams.txt --paramsfile $DIR_SCRIPT/../paramfiles/defaultParams_t$TYPE.txt --paramsfile $PARAMS_FILE --experimentout $OUTPUT_FILE --optimize $task --port $SPARK_AGENTPORT --mport $SPARK_SERVERPORT &  
  1. The agents main loop will then begin executing as defined in main.cc
while (gLoop)
    {
        GetMessage(msg);
        string msgToServer = behavior->Think(msg);
        // To support agent sync mode
        msgToServer.append("(syn)");
        PutMessage(msgToServer);
        if (mPort != -1) {
            PutMonMessage(behavior->getMonMessage());
        }
    }
  1. The agents state will be initialised as per usual if you were to have simply run start.sh which will result in the agents beam function being called. This function is defined in the optimizationbehaviors.cc for the corrosponding agent type you have chosen for this optimisation.

  2. Inside this loop the Think behaviour is called which in turn calls this->updateFitness(); which is our overloaded updateFitness defined for the particular task inside optimizationbehaviors.cc.

  3. In addition to updateFitness() the Think method calls act which in turn calls selectSkill(). This is once again the overloaded selectSkill() defined for the particular task within optimizationbehaviors.cc

  4. Inside this same updateFitness() method you will find a counter called kick in most cases, this variable keeps track of the number of optimisation tasks have occurred. This is because for each training step we run a number of runs for the optimistaion task. The fitness which is ultimately returned is the average of these runs this can be seen here

if (kick == 10) {
        writeFitnessToOutputFile(totalFitness/(double(kick)));
        return;
    }

For this example we average over 10 runs. One can also consider the updateFitness() function as to be quite similar to a reward function in Reinforcement learning.

16.writeFitnessToOutputFile will then write the average fitness value to an output file.

  1. The gloop will then end after which the Process p will finish and return back to the get_fitness method

  2. get_fitness will then read from the output file that was written and return the fitness value stored in that file back to the run method and repeat the process from step 7 until the total training episodes has been completed.