Jalal Arabneydi, "Optimal Control of Partiality Observable Markov Processes over a Finite Horizon"
Report based on
- R.D. Smallwood and E.J. Sondik, “The Optimal Control of Partially Observable Markov Processes Over a Finite Horizon”, Operations research, Vol. 21, No. 5, 1071-1088, Sept-Oct 1973.
Mohamad Aziz and Ali Pakniyat, "Partially nested information structures"
Report based on
- Y-C. Ho and K-C. Chu, " ”Team decision theory and information structures in optimal control problems: Part I,” IEEE Transactions on Automatic Control, vol.17, no. 1, pg 15-22, Feb 1972.
Marc-Etienne Brunet and Hadrien Baillargeon Legault, "Discrete Time Stochastic Adaptive Control"
Report based on
- G.C. Goodwin, P.J. Ramadge, and P.E. Caines, “Discrete Time Stochastic Adaptive Control” SIAM J. Control and Optimization, Vol 19, No. 6, Nov 1981.
Ayman Elkasrawy and Hussam Nosair, "Multi-armed bandit problems"
Report based on
- J. Gittins, K. Glazebrook, R. Weber, "Multi-armed Bandit Allocation Indices", Wiley 2011.
- J.N. Tsitsiklis, "A short proof of the Gittins index Theorem", Annals of Applied Probability, Vol. 4, No. 1, pp. 194-199, 1994.
Mehdi Abedinpour Fallah and Atousa Assadi, "Stochastic Stability of the Extended Kalman Filter: Extension to Intermittent Observations"
Report based on
- K. Reif, S. Günther, E. Yaz, and R. Unbehauen, “Stochastic stability of the discrete time extended Kalman filter,” IEEE Trans. Autom. Control, vol. 44, no. 4, pp. 714–728, Apr. 1999.
- S. Kluge, K. Reif, and M. Brokate, “Stochastic stability of the extended Kalman filter with intermittent observations,” IEEE Trans. Autom. Control, vol 55, no 2, pp. 514–518, Feb. 2010.
Yi Feng, "Stochastic control viewpoint in coding and information theory for communication"
Report based
- J.P.M. Schalkwijk and T. Kailath, “A Coding Scheme for Additive Noise Channels with Feedback – Part I: No Bandwidth Constraint,” IEEE Trans. on Information Theory, vol. 12, pp 172-177, April 1966.
- M. Horstein, “Sequential Transmission Using Noiseless Feedback,” IEEE Trans. on Information Theory, vol. 9, pp. 136-143, July 1963 -
- T. P. Coleman, “A Stochastic Control Viewpoint on Posterior Matching-style Feedback Communication Schemes,” in Proc. of IEEE International Symposium on Information Theory, pp. 1520-1524, Seoul, South Korea, June/July 2009.
- S. K. Gorantla, T. P. Coleman, “Information-Theoretic Viewpoints on Optimal Causal Coding-Decoding Problems”, IEEE Trans. on Information Theory, 24 pages, submitted, Jan. 2011.
- J. Omura, “On the Viterbi Decoding Algorithm,” IEEE Trans. on Information Theory, vol.15, pp. 177-179, Jan. 1969.
Wei Huang and Jun Zhang "Constrained Markov Decision Process and Optimal Policies"
Report based on
- F.J. Beutler and K.W. Ross, ”Optimal policies for controlled Markov chains with a constraint”, Journal of Mathematical Analysis and Applications vol 112 no 1, pp. 236-252, 1985.
Khoa Phan and Sandeep Manjanna, "Q-learning for Markov decision processes"
Report based on
- Watkins, "Learning from Delayed Rewards", PhD thesis, University of Cambridge, UL, 1989.
- J.N. Tsitsiklis, “Asynchronous Stochastic Approximation and Q-learning,” Machine Learning, no. 16, pp. 185–202, 1994.