Research and development - Seminars
In this seminar, Juan Pablo introduces a theoretical model for learning in high-risk environments, where mistakes can have irreversible consequences. Instead of allowing the reinforcement learning agent to explore blindly, the model incorporates the option to ask for help from a mentor, enabling safe learning in one-shot, non-reversible Markov decision processes.
The project, conducted at UC Berkeley's CHAI lab with Benjamin Plummer and Stuart Russell, extends standard MDPs by incorporating a mentor policy and studies the conditions under which an agent can learn near-optimal behavior while making only a sublinear number of queries. The approach provides a formal foundation for incorporating risk, safety, and human guidance into machine learning.
YouTube – Quantil Matemáticas Aplicadas
Not available
Get information about Data Science, Artificial Intelligence, Machine Learning and more.