Click here for Introduction, background, and summary (almost no proofs !).
Abstract
Learning to act optimally in a dynamic, noisy, and
competitive environment is a difficult problem. Various threads of research from
Reinforcement Learning, Machine Learning, Operations Research, Statistics, Game
Theory, and Optimal Control are concerned with several variants of this problem.
The existence of non-stationary elements in the environment precludes the
application of standard learning and control techniques and calls for new ideas.
The mere definition of optimality in this case is not obvious anymore. In this
thesis we present and analyze solution concepts that have the capability of
coping with the non-stationary and dynamic nature of the environment.
Specifically, we introduce two solution concepts. The first concept is based on
the empirical measurement of statistics related to the non-stationary elements
in the environment. We take a worst-case estimate over unknown information and
re-define the problem as a vector-valued stochastic game. We define a concrete
goal, the convex Bayes envelope, and show that it is attainable using
approachability theory. This envelope is shown to be safe (above the worst-case
performance guarantees) and adaptive (strictly above the worst-case performance
guarantees if the non-stationary elements appear non-hostile). The second
concept is based on viewing the stochastic game as a repeated super-game with
partial monitoring. The super-actions are the unobserved strategies of the
opponent and the stage game is a long interval of the stochastic game. By
considering this problem as an adversarial multi-armed bandit problem, we show
that the super-game approach leads to performance guarantees which are also safe
and adaptive.
In the process of developing the first solution concept we generalize current
results regarding approachability theory. We also devise learning algorithms for
approaching a prescribed target set in vector-valued stochastic games and
vector-valued Markov decision processes. These algorithms are extended for
approaching target sets which are not prescribed, and are demonstrated to be
effective for learning in constrained games and constrained Markov decision
processes. The building blocks of the learning algorithms for vector-valued
games are Reinforcement Learning algorithms for scalar average reward zero-sum
games. Two Q-learning style algorithms for scalar average reward zero-sum games
are suggested.
A different topic of research is the existence of effective linear weak
learners. We provide explicit conditions for the validity of the weak learner
assumption, which is a key assumption required for the convergence of the
generalization error of Boosting algorithms. Based on Geometric Discrepancy
theory, we suggest conditions under which sufficiently strong linear learners
exist, and hence Boosting algorithms are consistent.