Highest Rated Comments


tchlux1 karma

AFAIK most model-based reinforcement learning algorithms are more data efficient than model-free (that don't create an explicit model of the environment). However, all the model-based techniques I've seen eventually "throw away" data and stop using it for model training.

Could we do better (lower sample complexity) if we didn't throw away old data? I imagine an algorithm that keeps track of all past observations as "paths" through perception space, and can use something akin to nearest neighbor to identify when it is seeing a similar "path" again in the future.

I.e., what if the model learned a compression from perception space into a lower dimension representation (like the first 10 principle components), could we then record all data and make predictions about future states with nearest neighbor? This method would benefit from "immediate learning". Does this direction sound promising?