注冊 | 登錄讀書好,好讀書,讀好書!
讀書網(wǎng)-DuShu.com
當(dāng)前位置: 首頁出版圖書科學(xué)技術(shù)計(jì)算機(jī)/網(wǎng)絡(luò)操作系統(tǒng)阿爾法零對最優(yōu)模型預(yù)測自適應(yīng)控制的啟示

阿爾法零對最優(yōu)模型預(yù)測自適應(yīng)控制的啟示

阿爾法零對最優(yōu)模型預(yù)測自適應(yīng)控制的啟示

定 價(jià):¥79.00

作 者: [美]德梅萃·P. 博塞克斯(Dimitri P. Bertsekas)
出版社: 清華大學(xué)出版社
叢編項(xiàng):
標(biāo) 簽: 暫缺

ISBN: 9787302684718 出版時(shí)間: 2025-04-01 包裝: 平裝-膠訂
開本: 16開 頁數(shù): 字?jǐn)?shù):  

內(nèi)容簡介

  第一章,從阿爾法零的卓越性能出發(fā),深入解讀其背后著實(shí)不易的成長歷程,揭示其數(shù)學(xué)模型。第二章,從確定性和隨機(jī)動態(tài)規(guī)劃問題入手,介紹決策問題的數(shù)學(xué)模型。第三章,從抽象視角回顧紛繁復(fù)雜的強(qiáng)化學(xué)習(xí)算法,揭示值函數(shù)近似與滾動改進(jìn)的重要作用。第四章,從經(jīng)典的線性二次型最優(yōu)控制問題入手,分析從阿爾法零的成功中學(xué)到的經(jīng)驗(yàn)。第五章,分別從魯棒、自適應(yīng)、模型預(yù)測控制等問題入手,分析值函數(shù)近似與滾動改進(jìn)對算法性能的提升潛力。第六章,從離散優(yōu)化的視角審視阿爾法零的成功經(jīng)驗(yàn)。第七章,總結(jié)全書。適合作為本領(lǐng)域研究者作為學(xué)術(shù)專著閱讀,也適合作為研究生和本科生作為參考書使用。

作者簡介

  [美]德梅萃·P. 博塞克斯(Dimitri P. Bertseka),美國MIT終身教授,美國國家工程院院士,清華大學(xué)復(fù)雜與網(wǎng)絡(luò)化系統(tǒng)研究中心客座教授。電氣工程與計(jì)算機(jī)科學(xué)領(lǐng)域國際知名作者,著有《非線性規(guī)劃》《網(wǎng)絡(luò)優(yōu)化》《動態(tài)規(guī)劃》《凸優(yōu)化》《強(qiáng)化學(xué)習(xí)與最優(yōu)控制》等十幾本暢銷教材和專著。

圖書目錄

1.  AlphaZero, Off-Line Training, and On-Line Play 
1.1. Off-Line Training and Policy Iteration P. 3
1.2. On-Line Play and Approximation in Value Space -
Truncated Rollout p. 6
1.3. The Lessons of AlphaZero p. 8
1.4. A New Conceptual Framework for Reinforcement Learning p.  11 
1.5. Notes and Sources p. 14
2. Deterministic and Stochastic Dynamic Programming
2.1. Optimal Control Over an Infinite Horizon p. 20
2.2. Approximation in Value Space p. 25
2.3. Notes and Sources p. 30
3. An Abstract View of Reinforcement Learning
3.1. Bellman Operators p. 32
3.2. Approximation in Value Space and Newton's Method p. 39
3.3. Region of Stability p. 46
3.4. Policy Iteration, Rollout, and Newton's Method p. 50
3.5. How Sensitive is On-Line Play to the Off-Line
Training Process? p. 58
3.6. Why Not Just Train a Policy Network and Use it Without
On-Line Play? p. 60
3.7. Multiagent Problems and Multiagent Rollout p. 61
3.8. On-Line Simplified Policy Iteration p. 66
3.9. Exceptional Cases p. 72
3.10. Notes and Sources p. 79
4. The Linear Quadratic Case - Illustrations
4.1. Optimal Solution p. 82
4.2. Cost Functions of Stable Linear Policies p. 83
4.3. Value Iteration p. 86
vii 
 
viii Contents
4.4. One-Step and Multistep Lookahead - Newton Step
Interpretations p. 86
4.5. Sensitivity Issues p. 91
4.6. Rollout and Policy Iteration p. 94
4.7. Truncated Rollout - Length of Lookahead Issues .  .  ?     p. 97
4.8. Exceptional Behavior in Linear Quadratic Problems  .  ?     p. 99 
4.9. Notes and Sources p. 100
5. Adaptive and Model Predictive Control 
5.1. Systems with Unknown Parameters - Robust and 
PID Control p. 102
5.2. Approximation in Value Space, Rollout, and Adaptive
Control p. 105
5.3. Approximation in Value Space, Rollout, and Model
Predictive Control p. 109
5.4. Terminal Cost Approximation - Stability Issues .   .   . p. 112
5.5. Notes and Sources p. 118
6.   Finite Horizon Deterministic Problems - Discrete
Optimization
6.1. Deterministic Discrete Spaces Finite Horizon Problems.     p. 120
6.2. General Discrete Optimization Problems p. 125
6.3. Approximation in Value Space p. 128
6.4. Rollout Algorithms for Discrete Optimization .  .  .      p. 132
6.5. Rollout and Approximation in Value Space with Multistep
Lookahead p. 149
6.5.1. Simplified Multistep Rollout - Double Rollout  .   . p. 150
6.5.2. Incremental Rollout for Multistep Approximation in
Value Space p. 153
6.6. Constrained Forms of Rollout Algorithms p. 159
6.7. Adaptive Control by Rollout with a POMDP Formulation p. 173
6.8. Rollout for Minimax Control p. 182
6.9. Small Stage Costs and Long Horizon - Continuous-Time
Rollout p. 190
6.10. Epilogue p. 197
Appendix A: Newton's Method and Error Bounds
A.1. Newton's Method for Differentiable Fixed
Point Problems p. 202
A.2. Newton's Method Without Differentiability of the
Hellman Operator p. 207
 
Contents ix
A.3. Local and Global Error Bounds for Approximation in
Value Space p. 210
A.4. Local and Global Error Bounds for Approximate
Policy Iteration p. 212
References p. 217
 

本目錄推薦

掃描二維碼
Copyright ? 讀書網(wǎng) www.stefanvlieger.com 2005-2020, All Rights Reserved.
鄂ICP備15019699號 鄂公網(wǎng)安備 42010302001612號