Course Overview

Course Code: SDSC6007 & SDSC8006

Course Name: Dynamic Programming and Reinforcement Learning

Semester: First Semester, 2025 Academic Year

Instructor: Clint Chin Pang Ho

Email: client.ho@cityu.edu.hk

Office: LAU-16-228

Consultation Hours: By appointment

Teaching Assistants (TAs):

Yanbo He (yanbohe3-c@my.cityu.edu.hk)
Ellen Yi Wong (ywong692-c@my.cityu.edu.hk)
Yuqi Zha (charlie.yqzha@my.cityu.edu.hk)

Assessment

Component	Weight	Details
Assignments	20%	Two assignments (10% each). Submit online via Canvas, formats: .pdf, .py, .mp4, .txt.
Midterm Exam	20%	Closed-book exam.
Group Project	30%	One group project. Details below.
Final Exam	30%	Closed-book exam.

Late Submission Policy: If submitted late by $t$ days ( $t > 0$ ), the maximum score is $(0.75)^t \times 100\%$ .
GenAI Policy: Allowed for non-exam tasks (assignments and project), but must be properly cited. Students are responsible for all submitted content.

Schedule and Teaching

Topic Area	Core Content	Remarks
Dynamic Programming Algorithms	Basic concepts and framework.	Includes historical background of Richard Bellman.
Deterministic Systems and Shortest Path	Modeling of sequential decision problems.
Markov Decision Processes (MDPs)	Application theory in operations research and control.
Value Iteration, Policy Iteration, Linear Programming	Solution methods for MDPs.
Model-Free Prediction and Control	Learning without explicit environment models.
Value Function Approximation	Techniques for large-scale problems.
Policy Gradient	Optimization methods.
Multi-Armed Bandit	Balancing exploration and exploitation.

Project Requirements (30% of Total Score)

Nature: A group project focused on solving practical problems using dynamic programming or reinforcement learning.
Key Rules:
- The project must be originally designed for this course; reuse of content from other courses or papers is prohibited.
- Use of GenAI tools must be clearly cited.
- Submit via Canvas, formats: .pdf, .py, .mp4, .txt.
Grading Criteria: Based on scientific value and innovation; plagiarism will be penalized.

Project Components and Requirements (30% of Total Score)

1. Project Components

The project includes the following required submissions:

Component	Description	Submission Format and Naming Convention
Presentation Slides	Slides for a 15-minute presentation, requiring an unedited ZOOM video recording.	`Group_[GroupName]_slides.pdf` (e.g., Group_A_slides.pdf)
Project Report	Detailed report, content should be consistent with the presentation, may include appendices, not exceeding 15 pages.	`Group_[GroupName]_report.pdf`
Code	Code to reproduce experimental results (Python preferred, C++ etc. acceptable with prior notice).	`Group_[GroupName]_c_[filename].txt` or `.py` (e.g., Group_A_c_main.py)
Demo Video	Optional, showing performance comparison before, during, and after training.	`Group_[GroupName]_demo_[number]_[phase].mp4` (e.g., Group_A_demo_1_before.mp4)
Compressed File	All files packaged into a single zip file for submission.	`Group_[GroupName].zip`

2. Timeline and Submission Deadlines

Deadline	Task
By October 7	Form groups (4-5 people), announce group members on Canvas.
By October 14	Confirm topic selection on Canvas, once confirmed, cannot be changed.
By 6:00 PM, November 23	Submit all materials (slides, report, code, video, etc.).
November 25 (Week 13)	Randomly selected groups for live presentation (15 minutes) + Q&A (5-10 minutes).

3. Grading Details (Total 30 points)

Item	Points	Description
Presentation	12 points	Content organization, clarity of expression, time management, slide design, etc.
Content	12 points	Theoretical depth, experimental design, innovation, report structure and logic, etc.
Participation	6 points	Attendance, questions and feedback on other groups’ presentations (anonymous).

📌 Note: Although live presentations are randomly selected, it does not affect the final score (unless absent or not participating in Q&A and feedback).

Optional Topic Scope

Each group must choose one of the following two categories as the project direction:

A. Extension of Course Content (Examples)

Partially Observable MDPs (POMDPs)
Continuous State/Action/Time MDPs
Average Cost MDPs
Stochastic Shortest Path Problems
Deep Reinforcement Learning (DRL)
Stochastic Approximation and RL Methods
Inverse Reinforcement Learning
Multi-Armed Bandit (Advanced Topics)

B. Research Papers after 2010 (Must be from top-tier conferences/journals)

NeurIPS, ICML, COLT, ICLR, AISTATS, JMLR, IEEE TAC, etc.

✅ Presentation and Report Suggestions

No Script Reading: During the presentation, do not read from a script or phone content; only use slides as a reference.
Slide Design: Avoid large blocks of text; use more charts, algorithm pseudocode, visualization of experimental results.
Language Expression: Moderate speaking pace, reduce fillers like “um” and “ah”, practice by recording in advance.
Technical Preparation: Bring your own adapters, save slides in multiple formats to prevent equipment failure.
Interaction and Feedback: Performance during Q&A will count towards the “Participation” score.

⚠️ Academic Integrity Statement

The project must be original to this course; reuse of content from previous courses or papers is prohibited.
Use of GenAI tools (e.g., ChatGPT) must be clearly cited.
Plagiarism or reuse of others’ work will be handled according to university policies.

Course Objectives and Topics

Objectives:
- Understand the concepts and principles of DP and RL.
- Model problems as DP/RL problems and implement solvers in Python.
- Apply methodologies to practical scenarios.
Topics:
- Dynamic Programming Algorithms
- Markov Decision Processes
- Value Iteration and Policy Iteration
- Model-Free Control
- Value Function Approximation
- Policy Gradient
- Multi-Armed Bandit

Reference Books

Bertsekas, D.P. (2019). Reinforcement Learning and Optimal Control.
Sutton, R.S. & Barto, A.G. (2018). Reinforcement Learning: An Introduction.
Puterman, M.L. (2005). Markov Decision Processes: Discrete Stochastic Dynamic Programming.
Additional Resources: Silver (2015) and Brunskill (2019) lecture series.