Research and development - Seminars
The evaluation of policies in Reinforcement Learning is studied in scenarios of large dimension or with uncertainty. In this case, the value of the policy to be evaluated is approximated linearly, and is developed using Stochastic Linear Approximation with Markovian noise. The classical methods, Time Differences and Gradients of Time Differences, are inefficient in estimating the value function. Therefore, we study the alternative offered by the Online Bootstrap Inference algorithm, which promises to be an improvement to the existing methods.
YouTube – Quantil Matemáticas Aplicadas
1. Presentation
Get information about Data Science, Artificial Intelligence, Machine Learning and more.