6.4 Codebase Explanation
The following video serves as an outline to the optimisation codebase as well as its key features and how the work together.
All skills such as WALK or KICK or even GETUP are defined by a set of parameters which govern the way in which the agent will perform these tasks. It is these parameters which we need to optimise for better performance. The codebase came with a framework for creating optimistaion tasks and we have added upon that.
6.4.1 Important Files and Folders
In General the important things regarding Opitimistaion to look for will be found in the optimization and paramfiles folder.
6.4.1.1 ParamFiles
Parameter files contain parameter values that the agent can load in at runtime. Files should be formatted with a set of parameters as key value pairs from strings to floats. The parameter name should be separated from its value with a tab (not a space) and parameters should be separated from each other with a single newline. Parameter files support C++ style comments of //
and /* */
as well as #
.
When agents are loaded onto the server their paramfile needs to specified with the --paramsfile <parameter_file>
command line argument, and multiple parameter files can be loaded one after the other with newly loaded parameter values replacing the values of previously loaded parameters with the same name (key). Parameters are loaded into an std::map
called namedParams
. If you were to open start.sh or sample_start-optimisation.sh you would be able to see reference to this. However, for the case of our optimistaion you do not need to worry about that.
For example below are some the kick parameters used for the KICK_IK skill which can be found in the defaultParams_t.txt file. It is important to note that when performing optimistion in most cases it is not nessasry to optimise for all paramters but rather a subset of them. This will not result in better performance but reduced training times.
##########################
### IK KICK PARAMETERS ###
##########################
kick_ik_0_xoffset -0.18184725746865413
kick_ik_0_yoffset -0.007990019340567048
kick_ik_0_x0 0.09855534262963274
kick_ik_0_y0 0.04897226608420107
kick_ik_0_z0 0.06004895070570849
kick_ik_0_x1 -0.13267256199213984
kick_ik_0_y1 0.15055665409986765
kick_ik_0_z1 0.3048635084962904
kick_ik_0_x2 -0.075918848350498
kick_ik_0_y2 0.010843367764323163
kick_ik_0_z2 -0.03228058151402973
kick_ik_0_x3 0.3514121512894722
kick_ik_0_y3 -0.0915098467211551
kick_ik_0_z3 0.2932735025335922
kick_ik_0_a0 -2.0713675817098482
kick_ik_0_b0 4.168030311789961
kick_ik_0_c0 -0.17712625804502546
kick_ik_0_a1 -2.3258316746549554
kick_ik_0_b1 9.39335694003392
kick_ik_0_c1 -5.4878969788579175
kick_ik_0_a2 2.254184572289742
kick_ik_0_b2 0.014404161833793745
kick_ik_0_c2 -16.34929405684522
kick_ik_0_a3 -0.1703513663364682
kick_ik_0_b3 77.12670393386878
kick_ik_0_c3 -21.212384580007893
kick_ik_0_wait 0.06679452466769868
kick_ik_0_scale 2.434596016520202
kick_ik_0_off3_0 6.8002354818317885
kick_ik_0_off4_0 23.957167469656504
kick_ik_0_off5_0 -7.433399813693172
kick_ik_0_off3_1 -16.624470935986754
kick_ik_0_off4_1 20.351676522363075
kick_ik_0_off5_1 -25.63678390762887
kick_ik_0_off3_2 -50.00201321637502
kick_ik_0_off4_2 -39.33897746613399
kick_ik_0_off5_2 54.047464010320134
kick_ik_0_max_displacement_right 0.025
kick_ik_0_max_displacement_left 0.025
kick_ik_0_max_displacement_top 0.025
kick_ik_0_max_displacement_bottom 0.025
kick_ik_0_cw_angle_thresh 2
kick_ik_0_ccw_angle_thresh 2
kick_ik_0_angle 0
6.4.1.2 sample_start-optimisation.sh
sample_start-optimisation.sh
works in a similar way to how start.sh
does for the normal team of agents. However in addition to loading agents it also runs the server and the visualiser if specified. It essentially was the default way to start an optimisation process as it handle the running of the tasks and the laoding of the paramfiles
.
6.4.1.3 run.sh
This script simlply runs the run_optimisation.py
file while setting up the correct PYTHONPATH
.
6.4.1.4 run_optimisation.py
This python file basically initializes our optimisation algorithm and starts the optomisation process with the call to optimise
. Firstly though our starting individual is initialised using the OptimMethod.from_filename(filename_to_start_with)
method. filename_to_start_with
is chosen when the main method is called as follows :
if __name__ == '__main__':
# TODO: You can change this filepath to optimise other params
# Or to start with a file that you previously optimised.
main('../../paramfiles/new_Params_0_Kick.txt')
The paramfile you chose here needs to contain the selected parameters you want to optomise for, which parameters you chose is up to but we recommend using the IK parameters when optomising the kick.
start_indiv
is of type Individual
and is a simple numpy array
of the paramters loaded from file.
6.4.1.5 Individual
This class is defined in kick/lib/optim/__pycache/ga.py
and is used to hold the current parameter values of the individual used for population based optimisation. This class comes with built in functions to do crossover
and mutation
.
class Individual:
def __init__(self, x: np.ndarray, range: np.ndarray):
self.x = x
self.fitness = 0
self.range = range
def crossover(self, other) -> "Individual":
if len(self.x <= 2):
k = np.random.randint(0, len(self.x)-1)
else:
k = np.random.randint(1, len(self.x)-1)
out = np.zeros_like(self.x)
out[:k] += self.x[:k]
out[k:] += other.x[k:]
return Individual(out, self.range)
## 000000
## 111111
# k = 2
# 00 1111
## 000 111
def mutate(self, prob):
t = np.random.rand(self.x.shape[0]) <= prob
if t.sum():
self.x[t] += (2 * np.random.rand(self.x[t].shape[0]) - 1) * self.range[t] # np.maximum((, 0)
def mutate_single_variable(self):
t = np.random.randint(0, self.x.shape[0])
self.x[t] += (2 * np.random.rand() - 1) * self.range[t]
def mutate_and_copy(self, prob: float=0.05) -> "Individual":
"""This returns a mutated copy of an individual without modifying the original
"""
c = copy.deepcopy(self)
c.mutate(prob)
return c
6.4.1.6 OptimMethod
This class serves as the base class for the optimisation methods we have been implementing, if you wish to create your own optimisation algorithm you should base it off of the examples which inherit from this class.
In gerneral this class implements a few functions namely:
__init__
- which initialises the optimisation method with important information in particular theinit_x
which is the initial individual as well as theconfig_runner
which is explained further down. Here you can also specify where to read from and save results to.optimise
this method starts the process of learning better paramaters, most importantly it needs to execute therun
method and save the best individual which is returned. This function does not need to be tampered with or overloaded for the most part.run
this is the most important method as it is how where you will find the implementation of your optimisation algorithm. You will see each of the example optimisation algorithms overlaod this function with their own specific functionality.get_fitness
from_filename
save_indiv
class OptimMethod():
"""
Base class for optimisation approach. This should provide utilities and things to do common operations like:
1. Reading a file
2. Getting fitness
3. etc.
"""
def __init__(self, init_x: Individual,
labels: List[str],
unique_identifier: str = 'latest_run',
verbose: bool=True,
task_name: str = 'kick_ik') -> None:
self.config_runner: ConfigRunner = ConfigRunner(task_name, 0, False, relative_path='..')
self.verbose = verbose
self.index = int(time.time())
self.labels = labels
self.init_x: Individual = init_x
self.best_score_ever: float = -1
self.save_dir = os.path.join('results', 'sa', f"mode_{self.config_runner.optim_task}", f"type_{self.config_runner.unit_type}", f"{get_date()}-norm_True_{self.config_runner.optim_task}")
self.unique_identifier = unique_identifier
os.makedirs('results/recent', exist_ok=True)
self.overwrite_filename = os.path.join('results/recent', f'{unique_identifier}.txt')
os.makedirs(self.save_dir, exist_ok=True)
self.filename = os.path.join(self.save_dir, 'best_params.txt')
def optimise(self, N: int) -> Tuple[Individual, float]:
"""Run this when doing optimisation.
Args:
N (int): The number of steps to perform.
Returns:
Tuple[Individual, float]: Best indiv, best score
"""
best_indiv, best_score = self.run(N)
self.save_indiv(best_indiv, best_score)
return best_indiv, best_score
def run(self, N: int) -> Tuple[Individual, float]:
"""You need to implement this. Return the best individual and score.
After doing optimisation
"""
raise NotImplementedError("Do this in the subclass")
def get_fitness(self, agent: Individual) -> float:
idx = self.index
val = self.config_runner.denorm(agent.x, self.labels)
write_to_file('out_params%d.txt' % (idx), self.labels, self.config_runner.denorm(val, self.labels))
p = Process(target=lambda: self.config_runner.run_robocup(idx, val))
s = timer()
p.start()
p.join()
e = timer()
if self.verbose:
print(f"One run took {round((e - s)*100)/100}s")
a_file = "out%d.txt" % idx
a_exist = os.path.exists(a_file)
total_time_waiting = 0
P = 30
while not a_exist:
a_exist = a_exist = os.path.exists(a_file)
time.sleep(5)
total_time_waiting += 5
if total_time_waiting >= P:
break;
a_exist = a_exist = os.path.exists(a_file)
if total_time_waiting >= P and not a_exist:
print("Waited very long for out.txt to load. Going to kill all simulators and do something with the fitnesses")
kill_all_simulators()
# now for the files that weren't available
fitness = -1000
else:
S = read_from_file("./out%d.txt" % (idx))
fitness = float(S[0])
try:
os.unlink(a_file)
os.unlink('out_params%d.txt' % (idx))
except Exception as e:
print("Error deleting", e)
return fitness
@staticmethod
def from_filename(filename: str, range: float = 0.1) -> Tuple[Individual, List[str]]:
labels, values = get_labels_and_values(read_from_file(filename))
values = np.array(values)
# assert type(values) == np.ndarray
range = np.ones_like(values) * range
return (Individual(values, range=range), labels)
def save_indiv(self, agent: Individual, score: float=-1):
if self.best_score_ever < score:
self.best_score_ever = score
self.best_agent = copy.deepcopy(agent)
# write to a file.
print(f"\tWriting best results (with score {self.best_score_ever} now) to {self.filename} and {self.overwrite_filename}")
write_to_file(self.filename, self.labels, self.config_runner.denorm(agent.x, self.labels))
write_to_file(self.overwrite_filename, self.labels, self.config_runner.denorm(agent.x, self.labels))
6.4.1.7 ConfigRunner
6.4.2 Workflow
You run
run.sh
This runs
run_optimisation.py
with the correct pathsThis initializes our defined
OptimMethod
and loads our initialParamfile
During this initialisation a
ConfigRunner
is setup which contains a function calledrun_robocup
which will be used to actually setup and run the robocup environment which we can then obtain fitness information from.Inside
run_optimisation.py
we calloptimise
The optimise method calls the
run
method whose base implementation is found inOptimMethod
however it is overloaded in your own implementation as can be seen in the examples such asRandomSearch.py
Inside
run
we callget_fitness
on the currentIndividual
after which we enter the training loop as seen here inRandomSearch.py - run
:
for i in range(N):
print(f"Step {i + 1} / {N}. Current score = {np.round(current_score, 2)}. Overall best = {self.best_score_ever}")
# get a new, random individual and score
new_indiv = current_indiv.mutate_and_copy(0.05)
new_score = self.get_fitness(new_indiv)
# Swap if better
if current_score < new_score:
current_indiv, current_score = new_indiv, new_score
# Save every now and then
if i % 100 == 0:
self.save_indiv(current_indiv, current_score)
Inside
get_fitness
which is implemented in the baseOptimMethod
class we then define a newProcess
which runs therun_robocup
method defined by theConfigRunner
as mentioned above.Inside
run_robocup
we run thesample_start-optimization.sh
script which is what starts the server and visualiser if selected as well as loading the agent defined to complete the designated task.Inside
sample_start-optimization.sh
we start the server as follows:
rcssserver3d --agent-port $SPARK_AGENTPORT --server-port $SPARK_SERVERPORT &
PID=$!
we start the visualiser if the flag is chosen
#To view task while it runs uncomment the following line
if [ "$do_vis" == "true" ]; then
echo "DOING VIS"
/roboviz/roboviz.sh --serverPort=$SPARK_SERVERPORT &
fi
#
Lastly we load the desired agent which is the behaviour type which represents the task we want to optomise
$DIR_SCRIPT/../agentspark --unum 2 --type $TYPE --paramsfile $DIR_SCRIPT/../paramfiles/defaultParams.txt --paramsfile $DIR_SCRIPT/../paramfiles/defaultParams_t$TYPE.txt --paramsfile $PARAMS_FILE --experimentout $OUTPUT_FILE --optimize $task --port $SPARK_AGENTPORT --mport $SPARK_SERVERPORT &
- The agents main loop will then begin executing as defined in
main.cc
while (gLoop)
{
GetMessage(msg);
string msgToServer = behavior->Think(msg);
// To support agent sync mode
msgToServer.append("(syn)");
PutMessage(msgToServer);
if (mPort != -1) {
PutMonMessage(behavior->getMonMessage());
}
}
The agents state will be initialised as per usual if you were to have simply run
start.sh
which will result in the agentsbeam
function being called. This function is defined in theoptimizationbehaviors.cc
for the corrosponding agent type you have chosen for this optimisation.Inside this loop the
Think
behaviour is called which in turn callsthis->updateFitness();
which is our overloadedupdateFitness
defined for the particular task insideoptimizationbehaviors.cc
.In addition to
updateFitness()
theThink
method callsact
which in turn callsselectSkill()
. This is once again the overloadedselectSkill()
defined for the particular task withinoptimizationbehaviors.cc
Inside this same
updateFitness()
method you will find a counter calledkick
in most cases, this variable keeps track of the number of optimisation tasks have occurred. This is because for each training step we run a number of runs for the optimistaion task. The fitness which is ultimately returned is the average of these runs this can be seen here
if (kick == 10) {
writeFitnessToOutputFile(totalFitness/(double(kick)));
return;
}
For this example we average over 10 runs. One can also consider the updateFitness()
function as to be quite similar to a reward function in Reinforcement learning.
16.writeFitnessToOutputFile
will then write the average fitness value to an output file.
The
gloop
will then end after which theProcess p
will finish and return back to theget_fitness
methodget_fitness
will then read from the output file that was written and return the fitness value stored in that file back to therun
method and repeat the process from step 7 until the total training episodes has been completed.