Skip to the content.

Blogs of Reinforcement Learning

Contact me


本系列博客主要是关于强化学习笔记。

主要内容来源于:论文以及教程(Thomas Simonini Deep Reinforcement Learning Course with TensorflowArthur Juliani Simple Reinforcement Learning with Tensorflow series),OpenAI Spinning Up in Deep RL


OpenAI Spinning Up Part 1: Key Concepts in RL

术语以及概念


Thomas Simonini Part 5: An intro to Advantage Actor Critic methods: let’s play Sonic the Hedgehog

The problem with Policy Gradients, How Actor Critic works, A2C and A3C, A2C in practice


Thomas Simonini Part 4: An introduction to Policy Gradients with Cartpole and Doom

Two types of policy, Advantages, Disadvantages, Policy Search, Monte Carlo Policy Gradients


Thomas Simonini Part 3+: Improvements in Deep Q Learning: Dueling Double DQN, Prioritized Experience Replay, and fixed Q-targets

Fixed Q-targets, Double DQNs, Dueling DQN, Prioritized Experience Replay (PER), Doom Deathmatch agent


Thomas Simonini Part 3: An introduction to Deep Q-Learning: let’s play Doom

Preprocessing part, The problem of temporal limitation, Experience Replay


Thomas Simonini Part 2: Diving deeper into Reinforcement Learning with Q-Learning

Q-learning algorithm: learning the Action Value Function, The Q-learning algorithm Process, Q* Learning with FrozenLake


Thomas Simonini Part 1: An introduction to Reinforcement Learning

Reinforcement Learning Process, Reward Hypothesis, Episodic or Continuing tasks, Monte Carlo vs TD Learning methods, Exploration/Exploitation trade off, Three approaches to Reinforcement Learning


Arthur Juliani Part 8 - Asynchronous Actor-Critic Agents (A3C)

The 3 As of A3C, Implementing the Algorithm


Arthur Juliani Part 7 - Action-Selection Strategies for Exploration

Greedy Approach, Random Approach, ϵ-Greedy Approach, Boltzmann Approach, Bayesian Approaches (w/ Dropout)


Arthur Juliani Part 6 - Partial Observability and Deep Recurrent Q-Networks

limited, changing world, Recurrent Neural Networks, Implementing in Tensorflow


Arthur Juliani Part 4 - Deep Q-Networks and Beyond

Double DQN, Dueling DQN


Arthur Juliani Part 3 - Model-Based RL

Model-Based


Arthur Juliani Part 2 - Policy-based Agents

Full reinforcement agent, Markov Decision Process, Cart-Pole Task


Arthur Juliani Part 1.5 - Contextual Bandits

Multi-armed bandit, Contextual bandit, Full RL problem, Contextual bandit代码


Arthur Juliani Part 1 - Two-armed Bandit

RL问题, Learning a Policy, Policy Gradients, Value functions, e-greedy policy, policy loss equation, The Multi-armed bandit 代码


Arthur Juliani Part 0 - Q-Learning Agents

Policy Gradient, Q-Learning, Bellman Equation, Q-Table 代码, Q-Learning NN 代码