Highest Rated Comments


thosehippos2 karma

On the note of exploration: Even if we were able to get provably correct exploration strategies from tabular learning (like r-max) to work in function approximation settings, it seems like the number of states to explore in a real-ish domain is to high to exhaustively explore. How do you think priors play into this, especially with respect to provability and guarantees?

Thanks!

thosehippos1 karma

- Inductive Bias: Awesome! Thanks!

- https://arxiv.org/pdf/1911.05815.pdf (edit: will read this in more detail! Very interesting!): Block MDPs like the ones used in your paper (and extending current work beyond them) is of particular interest to me. I also have some work on latent state learning in Block MDPs (https://arxiv.org/pdf/2006.03465.pdf) focusing on generalization capability.

Do you have thoughts on what assumptions from Block MDPs (ex: uniqueness of underlying state based on observation) are reasonable in realistic tasks and which are potentially limiting?

- Go-Explore/State Abstraction: That's very true, I haven't thought of it that way before. I'm trying to determine if there exists some general representation function (like image downsampling) that's "good enough" for a set of tasks (ex: household robotics or atari games) or whether we need to learn task-specific representations. I suppose this is somewhat in line with a generalization vs adaptation argument