-Ulkurz-30 karma

Thank you for doing this AMA! My question is around applying RL for real-world problems. As we already know, oftentimes it's difficult to build a simulator or a digital twin for most real-world processes or environments, which kind of nullifies the idea of using online RL.

But this is where offline/batch RL can be helpful in terms of using large datasets collected via some process, from which a policy can be learned offline. We've already seen a lot of success in a supervised learning setting where an optimal model is learned offline from large volumes of data.

Although there has been a lot of fundamental research around offline/batch RL, I have not seen much real-world applications. Could you please share some of your own experiences around this, if possible, with some use cases related to the application of batch/offline RL in the real-world? Thanks!

-Ulkurz-2 karma

Aren't both of these examples related to learning in a simulated environment? Any use cases around offline/batch RL?