Reinforcement Learning and Adaptation in Competitive Environments, May 2002 (Technion)

Click here for Introduction, background, and summary (almost no proofs !).

Abstract

Learning to act optimally in a dynamic, noisy, and competitive environment is a difficult problem. Various threads of research from Reinforcement Learning, Machine Learning, Operations Research, Statistics, Game Theory, and Optimal Control are concerned with several variants of this problem. The existence of non-stationary elements in the environment precludes the application of standard learning and control techniques and calls for new ideas. The mere definition of optimality in this case is not obvious anymore. In this thesis we present and analyze solution concepts that have the capability of coping with the non-stationary and dynamic nature of the environment.

Specifically, we introduce two solution concepts. The first concept is based on the empirical measurement of statistics related to the non-stationary elements in the environment. We take a worst-case estimate over unknown information and re-define the problem as a vector-valued stochastic game. We define a concrete goal, the convex Bayes envelope, and show that it is attainable using approachability theory. This envelope is shown to be safe (above the worst-case performance guarantees) and adaptive (strictly above the worst-case performance guarantees if the non-stationary elements appear non-hostile). The second concept is based on viewing the stochastic game as a repeated super-game with partial monitoring. The super-actions are the unobserved strategies of the opponent and the stage game is a long interval of the stochastic game. By considering this problem as an adversarial multi-armed bandit problem, we show that the super-game approach leads to performance guarantees which are also safe and adaptive.

In the process of developing the first solution concept we generalize current results regarding approachability theory. We also devise learning algorithms for approaching a prescribed target set in vector-valued stochastic games and vector-valued Markov decision processes. These algorithms are extended for approaching target sets which are not prescribed, and are demonstrated to be effective for learning in constrained games and constrained Markov decision processes. The building blocks of the learning algorithms for vector-valued games are Reinforcement Learning algorithms for scalar average reward zero-sum games. Two Q-learning style algorithms for scalar average reward zero-sum games are suggested.

A different topic of research is the existence of effective linear weak learners. We provide explicit conditions for the validity of the weak learner assumption, which is a key assumption required for the convergence of the generalization error of Boosting algorithms. Based on Geometric Discrepancy theory, we suggest conditions under which sufficiently strong linear learners exist, and hence Boosting algorithms are consistent.