Both had high computational and sample requirements. THOSE WHO REMAIN HARDCORE FREEOf the very few published examples that have solved it, one used Evolutionary Strategy - a gradient free method essentially trying millions of policies and evaluating them, and a custom A3C method that was tailored to solve this particular environment. Not to mention all of the other things it needs to learn to be successful. THOSE WHO REMAIN HARDCORE SERIESIn short, vanilla DDPG lacks enough exploration power to find the complex series of actions required before it converges on not going near the pit. Each time it fails, it learns that being close to a pit is highly likely to result in a large penalty, and in an effort to maximise it's rewards, will often remain stationary with a naive method like DDPG as the rewards for doing that are higher than trying and falling into the pit once more. To learn to leap over a pit in the environment for example, the agent must perform a complex sequence of actions that is difficult to discover by random chance. THOSE WHO REMAIN HARDCORE FULLTo expand on why DDPG doesn't solve it, when although buggy, BipedalWalkerHardcore-v2 is solvable: The solution landscape to this problem is as full of pits as the environment itself. You might want to try retraining your agent now! (although it is still a difficult environment to solve) It seems to me that solutions to BipedalWalkerHardcore-v2 have not just learned to deal with the complex environment - but advanced a step ahead, and are able to deal with the complex environment and sensory hallucinations causing them to jump at the slightest hint of a cube, and keep running even when it looks like the ground is not visible below their feet, relying more on the touch sensor than the lidar, or perhaps recognising the difference in "shape" between a real pit and a "fake pit" (A real pit has a floor)īipedalWalkerHardcore-v2 has been bumped to BipedalWalkerHardcore-v3 with these fixes as of Jan 31, 2020. Pit, that only appears as it approaches the pit:Īfter fix - lidar is stopped by terrain, even when another object is behind it: Īfter triple checking the docs - I've submitted a minor tweak (returning -1 instead of 1 for an object that should be ignored) - it now seems legs are correctly ignored, and the traces are accurate in all situations! Giving the agent the impression of a "phantom canyon" in front of the You may be very interested to know that there was a bug in the v2 Lidar tracing, making the agent think there were phantom objects, and sometimes intersecting with its own legs:įinding this bug makes me even more impressed anyone has solved BipedalWalkerHardcore-v2 - it seems the observations from lidar have been inconsistent and incorrect, returning the furthest hit result instead of closest.īefore fix - lidar traces through ground, and hits the side of a pit, Thank you so much for any help! It would be greatly appreciated!
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |