Programming Assignment - Australia Assignments

Programming Assignment
Submission: A single PDF with your code (use any programming language), results and analysis.
Part 1
Learning algorithms (e.g., Q-learning, Monte Carlo, dynamic programming, double Q-learning,
TD, SARSA and others). Chose any two algorithms and implement on a grid world goal searching
problem.
1. Choose two algorithms you are going to implement, briefly introduce the algorithms and
provide their pseudo code.
2. Design your own grid world example (should be bigger than 3*2)
3. Show your goal searching process with step-to-go curve, sum of squared error and/or
theoretical value table
4. Please follow the project report guidelines and submit the report/code
Part 2
When you have a large grid world maze setup, it takes a long time for the agent to learn a value
table. One way to eliminate this challenge is to use neural networks to approximate the value
function.
There are two options provided below and choose either one to implement.
a. Based on your results in Part 1, choose to build a neural network (or deep neural network) to
approximate your obtained Q or V table.
In this way, use a neural network to generate your Q or V value so that you can guide the agent
to move to achieve the goal.
b. Implement an actor-critic architecture (ADP) algorithm for grid world maze navigation.
In this way, build an action network and a critic network to learn the Q table from scratch.
Report suggestions for part 2:
1. Choose either option you are going to implement and provide the pseudo code.
2. Design your own grid world example.
3. Show the convergence process of mean square error and the weights trajectories.