Skip to the content.

An introduction to Deep Q-Learning: let’s play Doom


Contact me

Blog -> https://cugtyt.github.io/blog/index
Email -> cugtyt@qq.com
GitHub -> Cugtyt@GitHub


本系列博客主页及相关见此处

来自Thomas Simonini Deep Reinforcement Learning Course Part 3: An introduction to Deep Q-Learning: let’s play Doom


How does Deep Q-Learning work

arch-DQN

Preprocessing part

DQN-preprocess

The problem of temporal limitation

We stack frames together because it helps us to handle the problem of temporal limitation.

If we give him only one frame at a time, it has no idea of motion. And how can it make a correct decision, if it can’t determine where and how fast objects are moving?

Using convolution networksUsing convolution networks

Experience Replay: making more efficient use of observed experience

Experience replay will help us to handle two things:

Avoid forgetting previous experiences

We have a big problem: the variability of the weights, because there is high correlation between actions and states.

replay-buffer

Reducing correlation between experiences

shoot1

一直学习右枪

shoot2

不会开左枪

We have two parallel strategies to handle this problem.

Our Deep Q-Learning algorithm

bellman-eq

DQN-error

Initialize Doom Environment E
Initialize replay Memory M with capacity N (= finite capacity)
Initialize the DQN weights w
for episode in max_episode:
    s = Environment state
    for steps in max_steps:
         Choose action a from state s using epsilon greedy.
         Take action a, get r (reward) and s' (next state)
         Store experience tuple <s, a, r, s'> in M
         s = s' (state = new_state)

         Get random minibatch of exp tuples from M
         Set Q_target = reward(s,a) +  γmaxQ(s')
         Update w =  α(Q_target - Q_value) *  ∇w Q_value

There are two processes that are happening in this algorithm:

Deep Q Neural Network

代码来源