SDSC6007 Course 1

#sdsc6007

Introduction

The Discrete-Time Dynamic System

The system has the form

xk+1=fk(xk,uk,wk),k=0,1,...,N1,x_{k + 1} = f_{k} (x_k, u_k, w_k ), k = 0, 1, . . . , N − 1,

where

  • kk : index of discrete time

  • NN : the horizon or number of times control is applied

  • xkx_k : the state of the system, from the set of states Sk

  • uku_k : the control/decision variable/action to be selected from the set Uk (xk ) at time k

  • wkw_k : a random parameter (also called disturbance)

  • fkf_k : a function that describes how the state is updated

Assumption
wkw_k ’s are independent. The probability distribution may depend on xkx_k and uku_k .

The Cost Function

The (expected) cost has the form

E[gN(xN)+k=0N1gk(xk,uk,wk)]\mathbb{E} \left[ g_N (x_N ) + \sum^{N−1}_{k=0}{g_k (x_k, u_k, w_k )} \right]

where

  • gk(xk,uk,wk)g_k(x_k, u_k, w_k) : the cost incurred at time kk.

  • gN(xN)g_N(x_N) : the terminal cost incurred at the end of process.

Note
Because of wkw_k , the cost is a random variable and so we are optimizing over the expectation.

A Deterministic Scheduling Problem

Example
Prof Li ZENG wants to produce luxury headphones that perform better than the bear-pods 3 that he is currently using. To do so, four operations must be performed on a certain machine, and they are denoted by A, B, C, D. Assuming that

  1. operation B can only be performed after operation A

  2. operation D can only be performed after operation C

Denote

  • setup cost Cmn for passing any operation m to n

  • initial startup cost SA or SC (can only start with operation A or C)

Solution

  • We need to make three decisions (the last is determined by the first three)

  • This problem is deterministic (no wk )

  • This problem has finite number of states

  • Deterministic problems with finite number of states => transition graph

A Deterministic Scheduling Problem - Transition Graph

Discrete-State and Finite-State Problems

To capture the transition between states, it is often convenient to define the transition probabilities

pij(u,k)=P(xk+1=jxk=i,uk=u)p_{ij}(u,k) = \mathbb{P}(x_{k+1} = j | x_k = i, u_k = u)

In the dynamic system, it means that xk+1=wkx_{k+1} = w_k, where wkw_k follows a probabilitydistribution which has probabilities pij(u,k)p_{ij}(u,k)'s.

Example

Consider a problem of N time periods that a machine can be in any one of n states. We denote g(i)g(i) as the operating cost per period when the machine is in state i. Assume that

g(1)g(2)g(n)g(1) ≤ g(2) ≤ ··· ≤ g(n)

That is, a machine in state i works more efficiently than a machine that is in state i + 1. During a period of time, the state of the machine can become worse or stay the same with probability

pij=P(next state will be jcurrent state is i) and pij=0,ifj<i.p_{ij} =\mathbb{P}(\text{next state will be } j | \text{current state is } i ) \text{ and } p_{ij} =0,\text{if} j < i.

At the start of each period, we can choose

  • let the machine operate one more period

  • repair the machine and bring it to state 1 (and it will stay there for 1 period) at a cost R

截屏2025-09-03 15.01.39.png

Inventory Control Problem

Ordering a quantity of a certain item at each stage to meet a stochastic demand

  • xkx_k: stock available at the beginning of the kth period

  • uku_k: stock ordered and delivered at the beginning of the kth period

  • wkw_k: demand during the kth period with given probability distribution

  • r()r(·): penalty/cost for either positive or negative stock

  • cc: cost per unit ordered

Example: Inventory Control

Expample: Inventory Control

Suppose (u0,u1,,uN1)(u_0^{\star}, u_1^\star, \ldots, u_{N-1}^\star) is the optimal solution of

minE[R(xN)+k=0N1(r(xk+ukwk)+cuk)]\min \mathbb{E} \left[ R(x_N) + \sum_{k=0}^{N-1} \left( r(x_k + u_k - w_k) + c \cdot u_k \right) \right]

What if:w1=w2==wN1=0w_1 = w_2 = \cdots = w_{N-1} = 0? (recall wi’s are the demands.)

We can do better if we can adjust our decisions to different situations!

Open-Loop and Closed-Loop Control

Open-loop Control

At initial time k=0k=0, given initial state x0x_0, find optimal control sequence (u0,u1,,uN1)(u_0^\star, u_1^\star, \ldots, u_{N-1}^\star) minimizing expected total cost:

  • Key feature: Subsequent state information is NOT used to adjust control decisions

Closed-loop Control

At each time kk, make decisions based on current state information xkx_k (e.g. ordering decision at time kk):

  • Core objective: Find state feedback strategy μk()\mu_k(\cdot) mapping state xkx_k to control uku_k

  • Decision characteristics:

  • Re-optimization at each decision point kk

  • Control rule designed for every possible state value xkx_k

  • Computational properties:

  • Higher computational cost (requires real-time state mapping)

  • Same performance as open-loop when no uncertainty exists

Closed-loop Control

Core Concepts

  • Control Law Definition
    Let μk()\mu_k(\cdot) be a function mapping state xkx_k to control uku_k:

uk=μk(xk)u_k = \mu_k(x_k)

  • Control Policy
    Define a policy π\pi as a sequence of control laws:

π=μ0,μ1,,μN1\pi = {\mu_0, \mu_1, \ldots, \mu_{N-1}}

  • Policy Cost Function
    Given initial state x0x_0, the expected cost of policy π\pi is:

Jπ(x0)=E[gN(xN)+k=0N1gk(xk,μk(xk),wk)]J_{\pi}(x_0) = \mathbb{E} \left[ g_N (x_N) + \sum_{k=0}^{N-1} g_k \big( x_k, \mu_k(x_k), w_k \big) \right]

Admissible Policy

A policy π={μ0,,μN1}\pi = \{\mu_0, \ldots, \mu_{N-1}\} is called admissible if and only if:

μk(xk)Uk(xk),xkSk, k=0,1,,N1\mu_k(x_k) \in U_k(x_k), \quad \forall x_k \in S_k, \ \forall k = 0,1,\ldots,N-1

Meaning at each time kk and for every possible state xkx_k, the control uku_k must belong to the allowable control set Uk(xk)U_k(x_k).

Summary

Definition

Consider a function JJ^* defined as:

J(x0)=minπΠJπ(x0),x0S0J^*(x_0) = \min_{\pi \in \Pi} J^{\pi}(x_0), \quad \forall x_0 \in S_0

where:

  • Π\Pi: Set of all admissible policies

  • Jπ(x0)J^{\pi}(x_0): Expected cost of policy π\pi starting from initial state x0x_0

We call JJ^* the optimal value function.

Key Properties

  1. Global Optimality
    JJ^* gives the minimum possible expected cost from any initial state x0x_0

  2. Policy Independence
    Represents theoretical performance limit, independent of specific policies

  3. Benchmarking Role
    Any admissible policy π\pi satisfies:

Jπ(x0)J(x0),x0S0J^{\pi}(x_0) \geq J^*(x_0), \quad \forall x_0 \in S_0

Computational Significance

  • Core Objective of DP: Compute JJ^* exactly via backward induction

  • Control Engineering Application: Measure gap between actual policies and theoretical optimum

The Dynamic Programming Algorithm

Principle of Optimality

Theorem Statement

Let π={μ0,μ1,,μN1}\pi^* = \{\mu_0^*, \mu_1^*, \ldots, \mu_{N-1}^*\} be an optimal policy for the basic problem. Suppose when using π\pi^*, the system reaches state xix_i at time ii with positive probability. Consider the subproblem starting from (xi,i)(x_i, i):

minE[gN(xN)+k=iN1gk(xk,μk(xk),wk)]\min \mathbb{E} \left[ g_N(x_N) + \sum_{k=i}^{N-1} g_k \big( x_k, \mu_k(x_k), w_k \big) \right]

Then the truncated policy {μi,μi+1,,μN1}\{\mu_i^*, \mu_{i+1}^*, \ldots, \mu_{N-1}^*\} is optimal for this subproblem.

Core Implications

  1. Heritability of Optimality
    Any tail portion of a globally optimal policy remains optimal for its starting state

  2. Time Consistency
    Optimal decisions account for both immediate cost and optimal future state evolution

  3. Foundation for Backward Induction
    This principle validates the dynamic programming backward solution approach.

Practical Significance

  • Reduces Computational Complexity: Decomposes problem into nested subproblems

  • Guarantees Global Optimality: Concatenation of locally optimal decisions yields global optimum

  • Enables Real-time Decision Making: Supports receding horizon methods like MPC