Sunday, 1 January 2023

RLlib: Collecting metrics for different variation of the same experiment

I have a multi-agent reinforcement learning experiment that I'm running with RLlib. I would like to do a certain measurement, and I'm not sure which tool to use and how.

I wrote a multi-agent environment implementing a certain game for two agents. I train two PPO policies on that environment for 200 training iterations, while collecting metrics about the agents' performances in each iteration. As the training iterations advance, the mean reward rises until it reaches a plateau. I measure other interesting metrics besides the reward, one of them being "Metric X".

Now, when writing this environment, one of the decisions I make is what kind of data I expose to the agent in their observations, and how that data is expressed. I would like to compare the results I get for Metric X for different ways that I might represent the observation.

I thought up of 4 different ways for setting up the observation. Let's call these 4 "arrangements". Right now I switch between the 4 arrangements by commenting out different sections of the code that calculates the observation. This is clunky, of course.

I would like to do many runs of 200 training iterations on this environment, under each of the 4 arrangements. For each arrangement, I'd like to see what values it produces for Metric X on the 200th training iteration. I'm not just looking for which arrangement gives me the highest Metric X, and not even the mean Metric X for the arrangement. I want to get a list of all the Metric X results for each of these arrangements, so I may find the mean, standard deviation and plot histograms of them.

What tool or framework or method should I use to collect this information? Is Ray Tune fit for this job? I need some way to programmatically tell Ray to sometimes include some values in the observation and sometimes not to. I need to run Ray many times and have these metrics automatically collected and aggregated by arrangement. I could write logic that does it, but I want to know how this is usually dne.



from RLlib: Collecting metrics for different variation of the same experiment

No comments:

Post a Comment