This will be very hastily written because I've been on the road for two days and did not get the chance to do the work I wanted to complete. This also means I won't reveal the stupid inspiration I have for this game until next week, but as a teaser, it has to do with penguins.
I reviewed the footage of the best performing agents from last week and realized they were not performing as well as I thought. They were collecting food, and would only think to return the food to their nest if they were close by.
In the video below, the agents have different colors so as to recognize them easily. They share the same color as their respective nest.
Clearly, most of the agents don't do exactly what's desired. This week (or the two days of this week I was able to work), I tried training them again to see what the issue was, and it seems they just pursue a piece of food whenever the opportunity presents itself, even if they are full and thus cannot pick up anymore food. I introduced a much more gradual curriculum with some fourteen lessons. This curriculum initially allows the agents to collect five pieces of food so they can learn to collect food and avoid poison, and then gradually reduces the food drop-off radius (how far they have to be from their nest to "drop off" food) from 35 meters to zero. Even so, they weren't doing so hot.
I went back and played the game myself to see how much food I could get if I played really well in the five thousand steps an agent is allotted before the game resets. I managed to get eight or nine pieces of food back to my nest, so I assumed a decently performing agent would manage at least five. I once again redesigned the curriculum to reflect these (MUCH) higher expectations for the agents (which I presume is part of the reason they weren't performing adequately):
The new curriculum includes thirteen lessons. The first one just requires the agents to learn that food = good. The next several only allow the agent to collect one piece of food before they must drop it off at their nest to acquire more. The drop-off radius gradually decreases to zero before the agents are allowed to collect two, three, four, and finally five pieces of food before they have to make a pit-stop at their nest to keep going.
Before trying out this new curriculum, I thought a bit about how I could reduce the input space (for the neural networks). I decided that the poison balls wouldn't really have a function in my final game, so I removed them from the game. I also decided that since the agents are already passed the direction and distance to their own nest as float observations, they don't need to be able to see it (via the RayPerceptionSensorComponent3D). I also decided it was not useful for them to be able to see a frozen agent. This removed three observable tags from the RayPerceptionSensorComponent3D.
These modifications unfortunately make the game somewhat similar to the penguin example, the differences being the presence of other, competitive agents, and the ability to become aggressive. That being said, on some initial training attempts before I had fully implemented all of these changes, I was happy to find that the agents would sometimes deliberately pursue and attack other agents.
This was especially satisfying because they weren't receiving any kind of reward for attacking other agents. They were just being assholes.
Next week, I'll have a playable web build up of the game complete with intelligent and aggressive agents. Until then, here I am, sitting in a hotel, about 70 thousand iterations into my last attempt at training these magnanimously stupid cube children before setting out on the road for another six hour drive.