SDSC6007 Course Information
#sdsc6007 #course information
English / 中文
Course Overview
Course Code: SDSC6007 & SDSC8006
Course Name: Dynamic Programming and Reinforcement Learning
Semester: First Semester, 2025 Academic Year
Instructor: Clint Chin Pang Ho
Email: client.ho@cityu.edu.hk
Office: LAU-16-228
Consultation Hours: By appointment
Teaching Assistants (TAs):
-
Yanbo He (
yanbohe3-c@my.cityu.edu.hk) -
Ellen Yi Wong (
ywong692-c@my.cityu.edu.hk) -
Yuqi Zha (
charlie.yqzha@my.cityu.edu.hk)
Assessment
| Component | Weight | Details |
|---|---|---|
| Assignments | 20% | Two assignments (10% each). Submit online via Canvas, formats: .pdf, .py, .mp4, .txt. |
| Midterm Exam | 20% | Closed-book exam. |
| Group Project | 30% | One group project. Details below. |
| Final Exam | 30% | Closed-book exam. |
-
Late Submission Policy: If submitted late by days (), the maximum score is .
-
GenAI Policy: Allowed for non-exam tasks (assignments and project), but must be properly cited. Students are responsible for all submitted content.
Schedule and Teaching
| Topic Area | Core Content | Remarks |
|---|---|---|
| Dynamic Programming Algorithms | Basic concepts and framework. | Includes historical background of Richard Bellman. |
| Deterministic Systems and Shortest Path | Modeling of sequential decision problems. | |
| Markov Decision Processes (MDPs) | Application theory in operations research and control. | |
| Value Iteration, Policy Iteration, Linear Programming | Solution methods for MDPs. | |
| Model-Free Prediction and Control | Learning without explicit environment models. | |
| Value Function Approximation | Techniques for large-scale problems. | |
| Policy Gradient | Optimization methods. | |
| Multi-Armed Bandit | Balancing exploration and exploitation. |
Project Requirements (30% of Total Score)
-
Nature: A group project focused on solving practical problems using dynamic programming or reinforcement learning.
-
Key Rules:
- The project must be originally designed for this course; reuse of content from other courses or papers is prohibited.
- Use of GenAI tools must be clearly cited.
- Submit via Canvas, formats: .pdf, .py, .mp4, .txt.
-
Grading Criteria: Based on scientific value and innovation; plagiarism will be penalized.
Project Components and Requirements (30% of Total Score)
1. Project Components
The project includes the following required submissions:
| Component | Description | Submission Format and Naming Convention |
|---|---|---|
| Presentation Slides | Slides for a 15-minute presentation, requiring an unedited ZOOM video recording. | Group_[GroupName]_slides.pdf (e.g., Group_A_slides.pdf) |
| Project Report | Detailed report, content should be consistent with the presentation, may include appendices, not exceeding 15 pages. | Group_[GroupName]_report.pdf |
| Code | Code to reproduce experimental results (Python preferred, C++ etc. acceptable with prior notice). | Group_[GroupName]_c_[filename].txt or .py (e.g., Group_A_c_main.py) |
| Demo Video | Optional, showing performance comparison before, during, and after training. | Group_[GroupName]_demo_[number]_[phase].mp4 (e.g., Group_A_demo_1_before.mp4) |
| Compressed File | All files packaged into a single zip file for submission. | Group_[GroupName].zip |
2. Timeline and Submission Deadlines
| Deadline | Task |
|---|---|
| By October 7 | Form groups (4-5 people), announce group members on Canvas. |
| By October 14 | Confirm topic selection on Canvas, once confirmed, cannot be changed. |
| By 6:00 PM, November 23 | Submit all materials (slides, report, code, video, etc.). |
| November 25 (Week 13) | Randomly selected groups for live presentation (15 minutes) + Q&A (5-10 minutes). |
3. Grading Details (Total 30 points)
| Item | Points | Description |
|---|---|---|
| Presentation | 12 points | Content organization, clarity of expression, time management, slide design, etc. |
| Content | 12 points | Theoretical depth, experimental design, innovation, report structure and logic, etc. |
| Participation | 6 points | Attendance, questions and feedback on other groups’ presentations (anonymous). |
📌 Note: Although live presentations are randomly selected, it does not affect the final score (unless absent or not participating in Q&A and feedback).
Optional Topic Scope
Each group must choose one of the following two categories as the project direction:
A. Extension of Course Content (Examples)
-
Partially Observable MDPs (POMDPs)
-
Continuous State/Action/Time MDPs
-
Average Cost MDPs
-
Stochastic Shortest Path Problems
-
Deep Reinforcement Learning (DRL)
-
Stochastic Approximation and RL Methods
-
Inverse Reinforcement Learning
-
Multi-Armed Bandit (Advanced Topics)
B. Research Papers after 2010 (Must be from top-tier conferences/journals)
-
NeurIPS, ICML, COLT, ICLR, AISTATS, JMLR, IEEE TAC, etc.
✅ Presentation and Report Suggestions
-
No Script Reading: During the presentation, do not read from a script or phone content; only use slides as a reference.
-
Slide Design: Avoid large blocks of text; use more charts, algorithm pseudocode, visualization of experimental results.
-
Language Expression: Moderate speaking pace, reduce fillers like “um” and “ah”, practice by recording in advance.
-
Technical Preparation: Bring your own adapters, save slides in multiple formats to prevent equipment failure.
-
Interaction and Feedback: Performance during Q&A will count towards the “Participation” score.
⚠️ Academic Integrity Statement
-
The project must be original to this course; reuse of content from previous courses or papers is prohibited.
-
Use of GenAI tools (e.g., ChatGPT) must be clearly cited.
-
Plagiarism or reuse of others’ work will be handled according to university policies.
Course Objectives and Topics
-
Objectives:
- Understand the concepts and principles of DP and RL.
- Model problems as DP/RL problems and implement solvers in Python.
- Apply methodologies to practical scenarios.
-
Topics:
- Dynamic Programming Algorithms
- Markov Decision Processes
- Value Iteration and Policy Iteration
- Model-Free Control
- Value Function Approximation
- Policy Gradient
- Multi-Armed Bandit
Reference Books
-
Bertsekas, D.P. (2019). Reinforcement Learning and Optimal Control.
-
Sutton, R.S. & Barto, A.G. (2018). Reinforcement Learning: An Introduction.
-
Puterman, M.L. (2005). Markov Decision Processes: Discrete Stochastic Dynamic Programming.
-
Additional Resources: Silver (2015) and Brunskill (2019) lecture series.
