For better or worse, another week without a dumb title
This week, I continued training the agents to connect ice slabs to their nests. After continuing to train the agents with the rewards setup I had last week, I found that they weren't improving much past the point at which I left off in last week's blog post. I tried various methods to improve their performance:
- I removed the presence of other agents in the scene and introduced a curriculum: a single agent will start out in a play area of reduced size, with fewer ice slabs. This was to encourage association of ice slabs and the nest with rewards, since there wasn't much space to do anything else. The play area increased in size as the agent hit reward thresholds, as did the number of ice slabs in the play area. This setup didn't work much better than the one I had last week.
- I tried to allow the agents to "grab" an ice slab and hold it in front of them. I was hoping that holding onto an object and then just having to move toward the nest would be simpler to learn than the delicate physics of aiming and pushing a square shaped ice slab across a frictionless surface. Despite training with this setup and the aforementioned curriculum overnight (and only hitting 450 thousand iterations across those seven hours), this, too, performed poorly. To my dismay, the agents were also starting to run face first into the wall and stay there for the rest of the training episode again. This wastes a lot of training time, so I reduced the agent's max step count from 5000 to 2000 and added a punishment for touching or sticking to a wall. I retrained the agents with this new punishment, and as if to mock me, they learned to wedge a single ice slab between themselves and the wall, thus avoiding the penalty while still being as close as possible to their beloved. A forbidden romance that agonizes only the one who wrote it. Or trained it, in this case.
- I was pretty frustrated by this point, as I typically have been throughout this project, and considered the fact that the game is now fairly different from the original ML-Agents example off which it was based (FoodCollector). I recalled that there's an ML-Agents example called PushBlock, where the agent's only goal is to move a single block to a green target strip. It can see all of these things through raycasts, and the play area is VERY small. This example is considerably less complex than mine due to the reduced size of the play area, and the smaller number of interactable objects and actions the agent can take. Even so, I tried to learn from it by turning the nest into a strip (like the goal strip in the PushBlock example) that gradually reduces in size through a curriculum. I also checked out the "Visual" variations on many of the ML-Agents examples (for instance, VisualPushBlock). The "Visual" version of an ML-Agents example replaces raycast input with input from an 84 by 84 camera. My agents were set up with both raycasts and a camera, so I tried not only got rid of the raycasts, but also removed the vector observations that told the agent the direction and distance to their nest. After that, the only input they'd receive is the camera input, the color of their nest as three floats, and whether or not they're aggressive.
I cut that final attempt at training short after receiving much-needed discouragement from the sparse online inquiries into the poorly documented features of ML-Agents, specifically concerning the use of camera input. I was very well aware that introducing such a massive input space would make the game much tougher to learn. Now, with incredibly limited time left, I'm certain this is not the way I should go.
My latest attempt is set up the same as the first, but without the use of a camera. Last week, I used a similar set up, but I did not have the curriculum, nor did I have the punishment for touching the wall. I'm hoping the combination of these two things will produce better results than did last week.
In this final week, I'll prioritize polishing the game build, and I'll retrain the agents to see if I can get them to perform better.