#sdsc6007 #course information

English / 中文

Course Overview

Course Code: SDSC6007 & SDSC8006

Course Name: Dynamic Programming and Reinforcement Learning

Semester: First Semester, 2025 Academic Year

Instructor: Clint Chin Pang Ho

Email: client.ho@cityu.edu.hk

Office: LAU-16-228

Consultation Hours: By appointment

Teaching Assistants (TAs):

  • Yanbo He (yanbohe3-c@my.cityu.edu.hk)

  • Ellen Yi Wong (ywong692-c@my.cityu.edu.hk)

  • Yuqi Zha (charlie.yqzha@my.cityu.edu.hk)

Assessment

Component Weight Details
Assignments 20% Two assignments (10% each). Submit online via Canvas, formats: .pdf, .py, .mp4, .txt.
Midterm Exam 20% Closed-book exam.
Group Project 30% One group project. Details below.
Final Exam 30% Closed-book exam.
  • Late Submission Policy: If submitted late by tt days (t>0t > 0), the maximum score is (0.75)t×100%(0.75)^t \times 100\%.

  • GenAI Policy: Allowed for non-exam tasks (assignments and project), but must be properly cited. Students are responsible for all submitted content.

Schedule and Teaching

Topic Area Core Content Remarks
Dynamic Programming Algorithms Basic concepts and framework. Includes historical background of Richard Bellman.
Deterministic Systems and Shortest Path Modeling of sequential decision problems.
Markov Decision Processes (MDPs) Application theory in operations research and control.
Value Iteration, Policy Iteration, Linear Programming Solution methods for MDPs.
Model-Free Prediction and Control Learning without explicit environment models.
Value Function Approximation Techniques for large-scale problems.
Policy Gradient Optimization methods.
Multi-Armed Bandit Balancing exploration and exploitation.

Project Requirements (30% of Total Score)

  • Nature: A group project focused on solving practical problems using dynamic programming or reinforcement learning.

  • Key Rules:

    • The project must be originally designed for this course; reuse of content from other courses or papers is prohibited.
    • Use of GenAI tools must be clearly cited.
    • Submit via Canvas, formats: .pdf, .py, .mp4, .txt.
  • Grading Criteria: Based on scientific value and innovation; plagiarism will be penalized.

Project Components and Requirements (30% of Total Score)

1. Project Components

The project includes the following required submissions:

Component Description Submission Format and Naming Convention
Presentation Slides Slides for a 15-minute presentation, requiring an unedited ZOOM video recording. Group_[GroupName]_slides.pdf (e.g., Group_A_slides.pdf)
Project Report Detailed report, content should be consistent with the presentation, may include appendices, not exceeding 15 pages. Group_[GroupName]_report.pdf
Code Code to reproduce experimental results (Python preferred, C++ etc. acceptable with prior notice). Group_[GroupName]_c_[filename].txt or .py (e.g., Group_A_c_main.py)
Demo Video Optional, showing performance comparison before, during, and after training. Group_[GroupName]_demo_[number]_[phase].mp4 (e.g., Group_A_demo_1_before.mp4)
Compressed File All files packaged into a single zip file for submission. Group_[GroupName].zip

2. Timeline and Submission Deadlines

Deadline Task
By October 7 Form groups (4-5 people), announce group members on Canvas.
By October 14 Confirm topic selection on Canvas, once confirmed, cannot be changed.
By 6:00 PM, November 23 Submit all materials (slides, report, code, video, etc.).
November 25 (Week 13) Randomly selected groups for live presentation (15 minutes) + Q&A (5-10 minutes).

3. Grading Details (Total 30 points)

Item Points Description
Presentation 12 points Content organization, clarity of expression, time management, slide design, etc.
Content 12 points Theoretical depth, experimental design, innovation, report structure and logic, etc.
Participation 6 points Attendance, questions and feedback on other groups’ presentations (anonymous).

📌 Note: Although live presentations are randomly selected, it does not affect the final score (unless absent or not participating in Q&A and feedback).


Optional Topic Scope

Each group must choose one of the following two categories as the project direction:

A. Extension of Course Content (Examples)

  • Partially Observable MDPs (POMDPs)

  • Continuous State/Action/Time MDPs

  • Average Cost MDPs

  • Stochastic Shortest Path Problems

  • Deep Reinforcement Learning (DRL)

  • Stochastic Approximation and RL Methods

  • Inverse Reinforcement Learning

  • Multi-Armed Bandit (Advanced Topics)

B. Research Papers after 2010 (Must be from top-tier conferences/journals)

  • NeurIPS, ICML, COLT, ICLR, AISTATS, JMLR, IEEE TAC, etc.


✅ Presentation and Report Suggestions

  • No Script Reading: During the presentation, do not read from a script or phone content; only use slides as a reference.

  • Slide Design: Avoid large blocks of text; use more charts, algorithm pseudocode, visualization of experimental results.

  • Language Expression: Moderate speaking pace, reduce fillers like “um” and “ah”, practice by recording in advance.

  • Technical Preparation: Bring your own adapters, save slides in multiple formats to prevent equipment failure.

  • Interaction and Feedback: Performance during Q&A will count towards the “Participation” score.

⚠️ Academic Integrity Statement

  • The project must be original to this course; reuse of content from previous courses or papers is prohibited.

  • Use of GenAI tools (e.g., ChatGPT) must be clearly cited.

  • Plagiarism or reuse of others’ work will be handled according to university policies.

Course Objectives and Topics

  • Objectives:

    • Understand the concepts and principles of DP and RL.
    • Model problems as DP/RL problems and implement solvers in Python.
    • Apply methodologies to practical scenarios.
  • Topics:

    • Dynamic Programming Algorithms
    • Markov Decision Processes
    • Value Iteration and Policy Iteration
    • Model-Free Control
    • Value Function Approximation
    • Policy Gradient
    • Multi-Armed Bandit

Reference Books

  • Bertsekas, D.P. (2019). Reinforcement Learning and Optimal Control.

  • Sutton, R.S. & Barto, A.G. (2018). Reinforcement Learning: An Introduction.

  • Puterman, M.L. (2005). Markov Decision Processes: Discrete Stochastic Dynamic Programming.

  • Additional Resources: Silver (2015) and Brunskill (2019) lecture series.