Those who remain hardcore

11/11/2022

THOSE WHO REMAIN HARDCORE FULL
THOSE WHO REMAIN HARDCORE SERIES
THOSE WHO REMAIN HARDCORE FREE

Both had high computational and sample requirements.

THOSE WHO REMAIN HARDCORE FREE

Of the very few published examples that have solved it, one used Evolutionary Strategy - a gradient free method essentially trying millions of policies and evaluating them, and a custom A3C method that was tailored to solve this particular environment. Not to mention all of the other things it needs to learn to be successful.

THOSE WHO REMAIN HARDCORE SERIES

In short, vanilla DDPG lacks enough exploration power to find the complex series of actions required before it converges on not going near the pit. Each time it fails, it learns that being close to a pit is highly likely to result in a large penalty, and in an effort to maximise it's rewards, will often remain stationary with a naive method like DDPG as the rewards for doing that are higher than trying and falling into the pit once more. To learn to leap over a pit in the environment for example, the agent must perform a complex sequence of actions that is difficult to discover by random chance.

THOSE WHO REMAIN HARDCORE FULL

To expand on why DDPG doesn't solve it, when although buggy, BipedalWalkerHardcore-v2 is solvable: The solution landscape to this problem is as full of pits as the environment itself. You might want to try retraining your agent now! (although it is still a difficult environment to solve) It seems to me that solutions to BipedalWalkerHardcore-v2 have not just learned to deal with the complex environment - but advanced a step ahead, and are able to deal with the complex environment and sensory hallucinations causing them to jump at the slightest hint of a cube, and keep running even when it looks like the ground is not visible below their feet, relying more on the touch sensor than the lidar, or perhaps recognising the difference in "shape" between a real pit and a "fake pit" (A real pit has a floor)īipedalWalkerHardcore-v2 has been bumped to BipedalWalkerHardcore-v3 with these fixes as of Jan 31, 2020. Pit, that only appears as it approaches the pit:Īfter fix - lidar is stopped by terrain, even when another object is behind it: Īfter triple checking the docs - I've submitted a minor tweak (returning -1 instead of 1 for an object that should be ignored) - it now seems legs are correctly ignored, and the traces are accurate in all situations! Giving the agent the impression of a "phantom canyon" in front of the You may be very interested to know that there was a bug in the v2 Lidar tracing, making the agent think there were phantom objects, and sometimes intersecting with its own legs:įinding this bug makes me even more impressed anyone has solved BipedalWalkerHardcore-v2 - it seems the observations from lidar have been inconsistent and incorrect, returning the furthest hit result instead of closest.īefore fix - lidar traces through ground, and hits the side of a pit, Thank you so much for any help! It would be greatly appreciated!

All of the above, with both DDPG as well as TD3.
Replay memory sizes all the way up to 5000000.
Number of epochs ranging from 500 all the way to 100000.
Similar experiments yielded similar scores (or less), and included:
Network: 512, 256 (relu activation on inputs, tanh on outputs).
Env interacts: about 8.4mil (around 2600 epochs).
I realise it has been solved using other custom implementations (also utilising only dense layers in Tensorflow, not convolution), but I don't seem to understand why it seems so difficult to solve using OpenAI's implementation of DDPG? Can anyone please point out where I might be going wrong? Thank you so much for any help! As the question suggests, I'm trying to see if I can solve OpenAI's hardcore version of their gym's bipedal walker using OpenAI's DDPG algorithm.īelow is a performance graph from my latest attempt, including the hyper parameters, along with some other attempts I've made.

0 Comments

Those who remain hardcore

THOSE WHO REMAIN HARDCORE FREE

THOSE WHO REMAIN HARDCORE SERIES

THOSE WHO REMAIN HARDCORE FULL

Leave a Reply.

Author

Archives

Categories