Flappy Bird hack using Deep Reinforcement Learning (Deep Q-learning).

Overview

Using Deep Q-Network to Learn How To Play Flappy Bird

7 mins version: DQN for flappy bird

Overview

This project follows the description of the Deep Q Learning algorithm described in Playing Atari with Deep Reinforcement Learning [2] and shows that this learning algorithm can be further generalized to the notorious Flappy Bird.

Installation Dependencies:

  • Python 2.7 or 3
  • TensorFlow 0.7
  • pygame
  • OpenCV-Python

How to Run?

git clone https://github.com/yenchenlin1994/DeepLearningFlappyBird.git
cd DeepLearningFlappyBird
python deep_q_network.py

What is Deep Q-Network?

It is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards.

For those who are interested in deep reinforcement learning, I highly recommend to read the following post:

Demystifying Deep Reinforcement Learning

Deep Q-Network Algorithm

The pseudo-code for the Deep Q Learning algorithm, as given in [1], can be found below:

Initialize replay memory D to size N
Initialize action-value function Q with random weights
for episode = 1, M do
    Initialize state s_1
    for t = 1, T do
        With probability ϵ select random action a_t
        otherwise select a_t=max_a  Q(s_t,a; θ_i)
        Execute action a_t in emulator and observe r_t and s_(t+1)
        Store transition (s_t,a_t,r_t,s_(t+1)) in D
        Sample a minibatch of transitions (s_j,a_j,r_j,s_(j+1)) from D
        Set y_j:=
            r_j for terminal s_(j+1)
            r_j+γ*max_(a^' )  Q(s_(j+1),a'; θ_i) for non-terminal s_(j+1)
        Perform a gradient step on (y_j-Q(s_j,a_j; θ_i))^2 with respect to θ
    end for
end for

Experiments

Environment

Since deep Q-network is trained on the raw pixel values observed from the game screen at each time step, [3] finds that remove the background appeared in the original game can make it converge faster. This process can be visualized as the following figure:

Network Architecture

According to [1], I first preprocessed the game screens with following steps:

  1. Convert image to grayscale
  2. Resize image to 80x80
  3. Stack last 4 frames to produce an 80x80x4 input array for network

The architecture of the network is shown in the figure below. The first layer convolves the input image with an 8x8x4x32 kernel at a stride size of 4. The output is then put through a 2x2 max pooling layer. The second layer convolves with a 4x4x32x64 kernel at a stride of 2. We then max pool again. The third layer convolves with a 3x3x64x64 kernel at a stride of 1. We then max pool one more time. The last hidden layer consists of 256 fully connected ReLU nodes.

The final output layer has the same dimensionality as the number of valid actions which can be performed in the game, where the 0th index always corresponds to doing nothing. The values at this output layer represent the Q function given the input state for each valid action. At each time step, the network performs whichever action corresponds to the highest Q value using a ϵ greedy policy.

Training

At first, I initialize all weight matrices randomly using a normal distribution with a standard deviation of 0.01, then set the replay memory with a max size of 500,00 experiences.

I start training by choosing actions uniformly at random for the first 10,000 time steps, without updating the network weights. This allows the system to populate the replay memory before training begins.

Note that unlike [1], which initialize ϵ = 1, I linearly anneal ϵ from 0.1 to 0.0001 over the course of the next 3000,000 frames. The reason why I set it this way is that agent can choose an action every 0.03s (FPS=30) in our game, high ϵ will make it flap too much and thus keeps itself at the top of the game screen and finally bump the pipe in a clumsy way. This condition will make Q function converge relatively slow since it only start to look other conditions when ϵ is low. However, in other games, initialize ϵ to 1 is more reasonable.

During training time, at each time step, the network samples minibatches of size 32 from the replay memory to train on, and performs a gradient step on the loss function described above using the Adam optimization algorithm with a learning rate of 0.000001. After annealing finishes, the network continues to train indefinitely, with ϵ fixed at 0.001.

FAQ

Checkpoint not found

Change first line of saved_networks/checkpoint to

model_checkpoint_path: "saved_networks/bird-dqn-2920000"

How to reproduce?

  1. Comment out these lines

  2. Modify deep_q_network.py's parameter as follow:

OBSERVE = 10000
EXPLORE = 3000000
FINAL_EPSILON = 0.0001
INITIAL_EPSILON = 0.1

References

[1] Mnih Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level Control through Deep Reinforcement Learning. Nature, 529-33, 2015.

[2] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing Atari with Deep Reinforcement Learning. NIPS, Deep Learning workshop

[3] Kevin Chen. Deep Reinforcement Learning for Flappy Bird Report | Youtube result

Disclaimer

This work is highly based on the following repos:

  1. [sourabhv/FlapPyBird] (https://github.com/sourabhv/FlapPyBird)
  2. asrivat1/DeepLearningVideoGames
Owner
Yen-Chen Lin
PhD student at MIT CSAIL
Yen-Chen Lin
An ongoing process to make a physics engine using python.

Simple_Physics_Engine An ongoing process to make a physics engine using python. I am using this goal as a way to learn python in and out. I am trying

Jon Sherrick 1 Jan 18, 2022
A hangman game that I created. Thanks to Data Flair for giving me the code!

Hangman A hangman game that I created. Thanks to Data Flair for giving me the code! Run python3 hangman.py in a terminal if you have Python 3. Please

SmashedFrenzy16 0 Dec 24, 2022
An interactive pygame implementation of quadtree spatial quantization

QuadTree-py An interactive pygame implementation of quadtree spatial quantization Contents Installation Usage API Reference TODO Installation Clone th

Ethan 1 Dec 05, 2021
A Simple 2048 Game Built Using Python

Game 2048_Python Dựa trên trò chơi nổi tiếng 2048 của Gabriele Cirulli. Mục tiêu của trò chơi là trượt các khối được đánh số trên một lưới để kết hợp

Le Phuong Anh 3 Dec 13, 2021
Turtle Road Crossing Game in Turtle(python module)

Turtle Road Crossing Game in Turtle(python module) In this project we have built a road crossin game in python with Object-Oriebted Programming. This

Jhenil Parihar 3 Jun 15, 2022
The zero player Darwinism simulation game as described by Conway (demonstrates Turing Completeness)

Conway's Game of Life The zero player Darwinism simulation game as described by Conway (demonstrates Turing Completeness). I created this script after

jachinema 1 Feb 13, 2022
Just a copied of flappy bird game made by Thuongton999

flappy-bird Just a copied of flappy bird game made by Thuongton999 Installation and Usage Using terminal (on Window) Make sure you installed Python an

ThuongTon 9 Aug 08, 2021
We tried to recreate this classic game using python physics libraries.

We tried to recreate this classic game using python physics libraries. The result is certainly hilarious but enjoyable. One of my very first physics application.

Delwys Glokpor 2 Dec 12, 2021
Simple Covid-19 shooter game in python.

Covid_game 🍹 Simple Single Player Covid Game Using Python. 🍹 Has amazing background music theme. 😄 Game Instructions: Initial Health is 5, try to s

Tanya Yadav 2 Aug 05, 2022
MinMax Algo , Python

Write a PYTHON program to play the game of TIC-TAC-TOE on a 3×3 board with alternate inputs from user and computer.

Naman Anand 1 Nov 26, 2021
DOTD - A murder mystery game made in Python

DOTD This repo holds the files for my video game project from ENG101, Disaster o

Ben Bruzewski 1 Jan 13, 2022
Minecraft Bedrock Server Control GUI

A control dashboard to monitor and control your minecraft bedrock dedicated server through an easy user interface. Created by Nathan-Busse 13 January 2022 Made with Python 3.8

Nathan Busse 3 Dec 11, 2022
Python fitting assistant, cross-platform fitting tool for EVE Online

pyfa What is it? Pyfa, short for python fitting assistant, allows you to create, experiment with, and save ship fittings without being in game. Open s

1.4k Dec 22, 2022
Among AIs is a (prototype of) PC Game we developed as part of the Smart Applications course @ University of Pisa.

Among AIs is a PC Game we developed as part of the Smart Applications course @ Department of Computer Science of University of Pisa, under t

Gabriele Pisciotta 5 Dec 05, 2021
You want to uto-update your private minecraft client? Give this to developer and enjoy!

minecraft-hack-installer You want to uto-update your private minecraft client? Give this to developer and enjoy! Steps to do: Install libraries: pip i

EuropeStudio 1 Jun 03, 2022
This is a simple rock paper scissor game created with python.

This is a simple rock paper scissor game created with python.

Fayas Noushad 3 Feb 04, 2022
The Bowling Club (Facebook Game) get all strikes.

TheBowlingClubBot The Bowling Club (Facebook Game) get all strikes. FAQ Q: What is this? A: TheBowlingClubBot is a automation bot with 99.99% guarante

#~Rith 1 Jan 19, 2022
Memory game in Python

Concentration - Memory Game Concentration is a memory game written in Python, inspired by memory-game. Description As stated in the introduction of th

Marco Colonna 0 Jul 21, 2022
I got bored and wrote a wordle solver... Its pretty good though, just saying

Wordle Solver I got bored and wrote a wordle solver... Its pretty good though, just saying. Please go support Josh and have fun with Wordle on the off

Darrell Best 2 Jan 25, 2022
N-Queens game made using pygame library

N-Queens N-Queens game using pygame for AIML201 Testing: 1. git clone https://github.com/python-game-dev/N-Queens.git 2. cd N-Queens 3. python main.py

1 Sep 24, 2021