yuns

DQN(Deep Q-Networks) 본문

paper study/graph

DQN(Deep Q-Networks)

yuuuun 2020. 12. 13. 15:05
반응형

https://sumniya.tistory.com/18 

dnddnjs.gitbooks.io/rl/content/neural_network.html 

DeepMind "Playing Atari with Deep Reinforcement Learning"

What is DQN?

강화학습에서 agent는 환경을 MDP를 통하여 이해 하는데 table형태로 학습을 모든 state에 대하여 action-value function을 저장하고 update시켜나가면 학습이 상당히 느려진다. 이에 따라, nonlinear function apprximator로 approximate시켜 학습한다. Action-Value Function(q-value)를 approximate하는 방법으로 DNN이 택한 RL은 Deep Reinforcement Learning(DeepRL)이다. 

Convolutional Networks를 이용하여 학습하는 방법을 말함

The convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards.

Abstract

We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.

Contributions

  1. Raw pixel을 받아와서 directly input data로 다룬다
  2. CNN을 Function approximator로 사용한다
  3. 하나의 agent가 여러 종류의 Atari game을 학습할 수 있는 능력을 갖춘다.
  4. Experience replay를 사용하여 data efficiency를 향상시킨다.

Deep Q Network

  • Action Value Function을 Approximate하는 model(CNN)로 deep learning의 Model을 도입
  • CNN는 이미지를 학습시키는데 최적화된 Neural Network
    • 게임 픽셀 데이터 그 자체로 학습 가능
    • 게임마다 agent설정을 달리 하지 않아도 여러 게임에 대하여 한 agent로 학습 가능

Input Data

Working directly with raw Atari frames, which are 210X160 pixel images with a 128 color palette, can be computationally demanding, so we apply a basic preprocessing step aimed at reducing the input dimensionality. The raw frames are preprocessed by first converting their RGB representation to gray-scale and down-sampling it to a 110X84 image. The final input representation is obtained by cropping an 84X84 region of the image that roughly captures the playing area

  • Preprocessing작업
  • CNN이 학습할 수 있는 형태로 게임의 화면을 전환시키는 것 (불필요한 정보 삭제)

Algorithm

  • Transition data들을 replay memory에 넣어 놓고 매 time step마다 mini-batch를 random으로 memory에서 꺼내서 update
  • Learning 알고리즘으로 Q-Learning
반응형

'paper study > graph' 카테고리의 다른 글

Q-Learning  (0) 2020.12.13
Markov Decision Process  (0) 2020.12.13
[KDD20] AM-GCN: Adaptive Multi-channel Graph Convolutional Networks  (0) 2020.12.10
Comments