Blogs of Reinforcement Learning

Contact me

Blog -> https://cugtyt.github.io/blog/index
Email -> cugtyt@qq.com
GitHub -> Cugtyt@GitHub

本系列博客主要是关于强化学习笔记。

主要内容来源于：论文以及教程（Thomas Simonini Deep Reinforcement Learning Course with Tensorflow， Arthur Juliani Simple Reinforcement Learning with Tensorflow series），OpenAI Spinning Up in Deep RL

OpenAI Spinning Up Part 1： Key Concepts in RL

术语以及概念

状态和观察(states and observations)

行为空间(action spaces)，

策略(policies)，

轨迹(trajectories)，

不同的回报方式(different formulations of return)，

RL优化问题(the RL optimization problem)，

值函数(value functions)。

Thomas Simonini Part 5: An intro to Advantage Actor Critic methods: let’s play Sonic the Hedgehog

The problem with Policy Gradients, How Actor Critic works, A2C and A3C, A2C in practice

Thomas Simonini Part 4: An introduction to Policy Gradients with Cartpole and Doom

Two types of policy, Advantages, Disadvantages, Policy Search, Monte Carlo Policy Gradients

Thomas Simonini Part 3+: Improvements in Deep Q Learning: Dueling Double DQN, Prioritized Experience Replay, and fixed Q-targets

Fixed Q-targets, Double DQNs, Dueling DQN, Prioritized Experience Replay (PER), Doom Deathmatch agent

Thomas Simonini Part 3: An introduction to Deep Q-Learning: let’s play Doom

Preprocessing part, The problem of temporal limitation, Experience Replay

Thomas Simonini Part 2: Diving deeper into Reinforcement Learning with Q-Learning

Q-learning algorithm: learning the Action Value Function, The Q-learning algorithm Process, Q* Learning with FrozenLake

Thomas Simonini Part 1: An introduction to Reinforcement Learning

Reinforcement Learning Process, Reward Hypothesis, Episodic or Continuing tasks, Monte Carlo vs TD Learning methods, Exploration/Exploitation trade off, Three approaches to Reinforcement Learning

Arthur Juliani Part 8 - Asynchronous Actor-Critic Agents (A3C)

The 3 As of A3C, Implementing the Algorithm

Arthur Juliani Part 7 - Action-Selection Strategies for Exploration

Greedy Approach, Random Approach, ϵ-Greedy Approach, Boltzmann Approach, Bayesian Approaches (w/ Dropout)

Arthur Juliani Part 6 - Partial Observability and Deep Recurrent Q-Networks

limited, changing world, Recurrent Neural Networks, Implementing in Tensorflow

Arthur Juliani Part 4 - Deep Q-Networks and Beyond

Double DQN, Dueling DQN

Arthur Juliani Part 3 - Model-Based RL

Model-Based

Arthur Juliani Part 2 - Policy-based Agents

Full reinforcement agent, Markov Decision Process, Cart-Pole Task

Arthur Juliani Part 1.5 - Contextual Bandits

Multi-armed bandit, Contextual bandit, Full RL problem, Contextual bandit代码

Arthur Juliani Part 1 - Two-armed Bandit

RL问题, Learning a Policy, Policy Gradients, Value functions, e-greedy policy, policy loss equation, The Multi-armed bandit 代码

Arthur Juliani Part 0 - Q-Learning Agents

Policy Gradient, Q-Learning, Bellman Equation, Q-Table 代码, Q-Learning NN 代码