Flappy Bird hack using Deep Reinforcement Learning (Deep Q-learning).

Overview

Using Deep Q-Network to Learn How To Play Flappy Bird

7 mins version: DQN for flappy bird

Overview

This project follows the description of the Deep Q Learning algorithm described in Playing Atari with Deep Reinforcement Learning [2] and shows that this learning algorithm can be further generalized to the notorious Flappy Bird.

Installation Dependencies:

  • Python 2.7 or 3
  • TensorFlow 0.7
  • pygame
  • OpenCV-Python

How to Run?

git clone https://github.com/yenchenlin1994/DeepLearningFlappyBird.git
cd DeepLearningFlappyBird
python deep_q_network.py

What is Deep Q-Network?

It is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards.

For those who are interested in deep reinforcement learning, I highly recommend to read the following post:

Demystifying Deep Reinforcement Learning

Deep Q-Network Algorithm

The pseudo-code for the Deep Q Learning algorithm, as given in [1], can be found below:

Initialize replay memory D to size N
Initialize action-value function Q with random weights
for episode = 1, M do
    Initialize state s_1
    for t = 1, T do
        With probability ϵ select random action a_t
        otherwise select a_t=max_a  Q(s_t,a; θ_i)
        Execute action a_t in emulator and observe r_t and s_(t+1)
        Store transition (s_t,a_t,r_t,s_(t+1)) in D
        Sample a minibatch of transitions (s_j,a_j,r_j,s_(j+1)) from D
        Set y_j:=
            r_j for terminal s_(j+1)
            r_j+γ*max_(a^' )  Q(s_(j+1),a'; θ_i) for non-terminal s_(j+1)
        Perform a gradient step on (y_j-Q(s_j,a_j; θ_i))^2 with respect to θ
    end for
end for

Experiments

Environment

Since deep Q-network is trained on the raw pixel values observed from the game screen at each time step, [3] finds that remove the background appeared in the original game can make it converge faster. This process can be visualized as the following figure:

Network Architecture

According to [1], I first preprocessed the game screens with following steps:

  1. Convert image to grayscale
  2. Resize image to 80x80
  3. Stack last 4 frames to produce an 80x80x4 input array for network

The architecture of the network is shown in the figure below. The first layer convolves the input image with an 8x8x4x32 kernel at a stride size of 4. The output is then put through a 2x2 max pooling layer. The second layer convolves with a 4x4x32x64 kernel at a stride of 2. We then max pool again. The third layer convolves with a 3x3x64x64 kernel at a stride of 1. We then max pool one more time. The last hidden layer consists of 256 fully connected ReLU nodes.

The final output layer has the same dimensionality as the number of valid actions which can be performed in the game, where the 0th index always corresponds to doing nothing. The values at this output layer represent the Q function given the input state for each valid action. At each time step, the network performs whichever action corresponds to the highest Q value using a ϵ greedy policy.

Training

At first, I initialize all weight matrices randomly using a normal distribution with a standard deviation of 0.01, then set the replay memory with a max size of 500,00 experiences.

I start training by choosing actions uniformly at random for the first 10,000 time steps, without updating the network weights. This allows the system to populate the replay memory before training begins.

Note that unlike [1], which initialize ϵ = 1, I linearly anneal ϵ from 0.1 to 0.0001 over the course of the next 3000,000 frames. The reason why I set it this way is that agent can choose an action every 0.03s (FPS=30) in our game, high ϵ will make it flap too much and thus keeps itself at the top of the game screen and finally bump the pipe in a clumsy way. This condition will make Q function converge relatively slow since it only start to look other conditions when ϵ is low. However, in other games, initialize ϵ to 1 is more reasonable.

During training time, at each time step, the network samples minibatches of size 32 from the replay memory to train on, and performs a gradient step on the loss function described above using the Adam optimization algorithm with a learning rate of 0.000001. After annealing finishes, the network continues to train indefinitely, with ϵ fixed at 0.001.

FAQ

Checkpoint not found

Change first line of saved_networks/checkpoint to

model_checkpoint_path: "saved_networks/bird-dqn-2920000"

How to reproduce?

  1. Comment out these lines

  2. Modify deep_q_network.py's parameter as follow:

OBSERVE = 10000
EXPLORE = 3000000
FINAL_EPSILON = 0.0001
INITIAL_EPSILON = 0.1

References

[1] Mnih Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level Control through Deep Reinforcement Learning. Nature, 529-33, 2015.

[2] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing Atari with Deep Reinforcement Learning. NIPS, Deep Learning workshop

[3] Kevin Chen. Deep Reinforcement Learning for Flappy Bird Report | Youtube result

Disclaimer

This work is highly based on the following repos:

  1. [sourabhv/FlapPyBird] (https://github.com/sourabhv/FlapPyBird)
  2. asrivat1/DeepLearningVideoGames
Owner
Yen-Chen Lin
PhD student at MIT CSAIL
Yen-Chen Lin
3D online shooter written on Panda3D 1.10.10 and Python 3.10.1

на русском itch.io page Droid Game 3D This is a fresh game that was developed using the Panda3D game engine and Python language in the PyCharm IDE (I

Marcus Kemaul 5 Jun 04, 2022
What games should I design next quarter?

Project1_Next-Quarter-Game-Design-Prediction What games should I design next quarter? 상황 : 게임회사 데이터팀 합류 문제 : '다음 분기에 어떤 게임을 설계해야할까' Data Description N

Jayden Lee(JaeHo Lee) 1 Jul 04, 2022
A python3 project for generating WorldEdit shematics for the MineClone2 game for Minetest from images.

MineClone2 MapArt This is a python3 project you can use with the MineClone2 game for Minetest. This project take an image and output a WorldEdit shema

3 Jan 06, 2023
Advanced guessing game made in only python.

Guessing Game This is a number guessing game written in python which consists of three modes; easy,medium and hard. Each mode contains there own diffi

Ayza 2 Nov 30, 2021
TetrisAI - Tetris AI Bot using computer vision to play game automatically

Tetris AI Tetris AI Bot using computer vision to play game automatically bot.py

11 Aug 29, 2022
This is a simple game of rock-paper-scissors developed in Python

This is a simple game of rock-paper-scissors developed in Python. It allows two players to play with one another on different command lines through networking.

NAMAN JAIN 3 Oct 21, 2022
🕹️ Jeu Azul en Python avec 4 IAs 🤖 implémentées, jouable de 1 à 4 joueurs

Projet jeu Azul 🕹️ Jeu Azul en Python avec 4 IAs 🤖 implémentées, jouable de 1 à 4 joueurs Par : Berachem MARKRIA et Tristan MARTINEZ Projet réalisé

Berachem Markria 2 Jun 07, 2022
A multiplayer RPG Discord bot, where you play as a god.

To run Ensure your Python is up to date, and install packages from requirements.txt Duplicate secrets-template.yaml, and name it secrets.yaml Insert y

4 Jan 18, 2022
A Replit Game Know As ROCK PAPER SCISSOR AND ALSO KNOW AS STONE PAPER SCISSOR

🔥 ᴿᴼᶜᴷ ᴾᴬᴾᴱᴿ ᔆᶜᴵᔆᔆᴼᴿ 🔥 ⚙️ Rᴜɴ Oɴ Rᴇᴘʟɪᴛ 🛠️ Lᴀɴɢᴜᴀɢᴇs Aɴᴅ Tᴏᴏʟs If you are taking code from this repository without a fork, then atleast give credit

ANKIT KUMAR 1 Dec 25, 2021
Vac-Man in Python

Vac-Man in Python This is my personal version of Vax-man game using python, which is the first assignment of EA Software Engineering Virtual Experienc

ZiXiang Luo 3 Jan 05, 2022
Rudimentary CMD based implementation of the Tic Tac Toe game

Packages used: questionary random os (Requires Python 3.8 as walrus operators are used in the script) Contains the .py file (tictactoe.py) and an exe

Ashwin 1 Oct 15, 2021
A Pygame Hangman Game coded in Python 3. Run Hangman.py in a terminal if you have Python 3

Hangman A Pygame Hangman Game coded in Python 3. Run python3 Hangman.py in a terminal if you have Python 3.

1 Dec 24, 2022
MiTM proxy server for Darza's Dominion

Midnight A MiTM proxy server for Darza's Dominion, PC version. See this video for a demonstration of godmode: https://youtu.be/uoqvSxmnCJk How to use

2 Oct 24, 2022
Turn NY Times crosswords into Across Lite files

NYT Crossword to Puz A windows program to convert NY Times crosswords from the web to Across Lite compatible files. To run this, first download and de

31 Oct 11, 2022
2D Minecraft Clone made with Python & Pygame & OpenGL

2D Minecraft Clone This is a 2D clone of the well-known game Minecraft made in Python using Pygame and ModernGL I started this mostly as a self-improv

Kadir Aksoy 2 Sep 25, 2022
Découvrez CubeCraft Launcher, une application uniquement codé en Python et en Batch

Découvrez CubeCraft Launcher, une application uniquement codé en Python et en Batch. Grâce à son interface graphique facile et intuitive, vous pouvez vous retrouver facilement.

1 May 21, 2022
A set of functions compatible with the TIC-80 platform

Pygame-80 A set of functions from TIC-80 tiny computer platform ported to Pygame 2.0.1. Many of them are designed to work with the NumPy library to im

7 Aug 08, 2022
A small script to help me solve Wordle because I'm that lazy

Wordle Solver A small script to help me solve Wordle because I'm that lazy. Warning: I didn't write this to be efficient nor elegant at all, so you'll

K4YT3X 3 Feb 11, 2022
Backend application for a game to classify waste for recycling

Waste Organizer Game Backend application used in a game to classify trash for recycling. What is waste organizer game? It is a game developed during t

10 Jun 13, 2021
Wordle player - A Class that plays wordle optimally

wordle_player A Class that plays wordle optimally if you want to play wordle opt

Andrés Hernández 1 Jan 31, 2022