Assignment 2 – Deep Q-Networks

Checkpoint 1: March 7, Thu, 11:59pm

Checkpoint 2: March 24, Thu, 11:59pm Due Date: April 1, Thu, 11:59pm

1 Assignment Overview

The goal of the assignment is to work with value function approximation algorithms, to explore OpenAI Gym environments and getting experience working with a project-management tool. In the first part of the project we will explore OpenAI gym library. In the second part we will implement Deep Q-learning (DQN) following DeepMind’s paper that explains how reinforcement learning algorithms can learn to play Atari from raw pixels. In the third part of the project we will implement an improvement to the DQN algorithm. The purpose of this assignment is to understand the benefits of approximation methods, the role of deep neural networks as well as some of the techniques used in practice to stabilize training and to achieve better performance. We will train our networks on the grid-world and OpenAI gym environments.

For this assignment, libraries with in-built RL methods cannot be used. Submissions with used in-built RL methods (e.g. stable-baselines, keras-RL, TF agents, etc) will not be evaluated.

Part I [20 points] – Introduction to Deep Reinforcement Learning

The goal of this part is to make you comfortable with the application of different neural network structures depending on how the reinforcement learning environment is set up.

Working with various neural network setups

Refer to the Jupyter notebook titled “assignment_2_part_1.ipynb”. You would have to work with an implementation of the Wumpus World environment (environment.py) with four types of observation and three types of actions. Your task is to setup neural networks based on the structure provided to take the observation as input and provide the Q-values, action probabilities, or the action as output.

Refer to the Juypter notebook for more details.

Part I submission includes

Jupyter Notebook (assignment_2_part_1.ipynb) – with saved outputs