Blogs of Reinforcement Learning
Contact me
- Blog -> https://cugtyt.github.io/blog/index
- Email -> cugtyt@qq.com
- GitHub -> Cugtyt@GitHub
本系列博客主要是关于强化学习笔记。
主要内容来源于:论文以及教程(Thomas Simonini Deep Reinforcement Learning Course with Tensorflow, Arthur Juliani Simple Reinforcement Learning with Tensorflow series),OpenAI Spinning Up in Deep RL
OpenAI Spinning Up Part 1: Key Concepts in RL
术语以及概念
- 状态和观察(states and observations)
- 行为空间(action spaces),
- 策略(policies),
- 轨迹(trajectories),
- 不同的回报方式(different formulations of return),
- RL优化问题(the RL optimization problem),
- 值函数(value functions)。
Thomas Simonini Part 5: An intro to Advantage Actor Critic methods: let’s play Sonic the Hedgehog
The problem with Policy Gradients, How Actor Critic works, A2C and A3C, A2C in practice
Thomas Simonini Part 4: An introduction to Policy Gradients with Cartpole and Doom
Two types of policy, Advantages, Disadvantages, Policy Search, Monte Carlo Policy Gradients
Thomas Simonini Part 3+: Improvements in Deep Q Learning: Dueling Double DQN, Prioritized Experience Replay, and fixed Q-targets
Fixed Q-targets, Double DQNs, Dueling DQN, Prioritized Experience Replay (PER), Doom Deathmatch agent
Thomas Simonini Part 3: An introduction to Deep Q-Learning: let’s play Doom
Preprocessing part, The problem of temporal limitation, Experience Replay
Thomas Simonini Part 2: Diving deeper into Reinforcement Learning with Q-Learning
Q-learning algorithm: learning the Action Value Function, The Q-learning algorithm Process, Q* Learning with FrozenLake
Thomas Simonini Part 1: An introduction to Reinforcement Learning
Reinforcement Learning Process, Reward Hypothesis, Episodic or Continuing tasks, Monte Carlo vs TD Learning methods, Exploration/Exploitation trade off, Three approaches to Reinforcement Learning
Arthur Juliani Part 8 - Asynchronous Actor-Critic Agents (A3C)
The 3 As of A3C, Implementing the Algorithm
Arthur Juliani Part 7 - Action-Selection Strategies for Exploration
Greedy Approach, Random Approach, ϵ-Greedy Approach, Boltzmann Approach, Bayesian Approaches (w/ Dropout)
Arthur Juliani Part 6 - Partial Observability and Deep Recurrent Q-Networks
limited, changing world, Recurrent Neural Networks, Implementing in Tensorflow
Arthur Juliani Part 4 - Deep Q-Networks and Beyond
Double DQN, Dueling DQN
Arthur Juliani Part 3 - Model-Based RL
Model-Based
Arthur Juliani Part 2 - Policy-based Agents
Full reinforcement agent, Markov Decision Process, Cart-Pole Task
Arthur Juliani Part 1.5 - Contextual Bandits
Multi-armed bandit, Contextual bandit, Full RL problem, Contextual bandit代码
Arthur Juliani Part 1 - Two-armed Bandit
RL问题, Learning a Policy, Policy Gradients, Value functions, e-greedy policy, policy loss equation, The Multi-armed bandit 代码
Arthur Juliani Part 0 - Q-Learning Agents
Policy Gradient, Q-Learning, Bellman Equation, Q-Table 代码, Q-Learning NN 代码