Planning and automated reasoning

168 views 9:00 am 0 Comments August 19, 2023

FIT5222 Planning and automated reasoning
FIT5222 Assignment 2
Pacman: Capture the Flag
In this assignment we will develop an AI controller for a multiplayer variant of Pacman. In this game
two teams, red and blue, compete with each other to capture as many dots as possible from the
opposite side of the map.
Originally developed at
UC Berkeley, this setup presents us with a challenging planning domain
and an opportunity to solve a fun problem. At the end of the semester we will hold a contest to find
out who developed the most effective controller. Glory and a fancy certificate await the top three
students!
Part 1: Installation
Piglet PDDL Solver
In this assignment you will use the piglet library to solve PDDL problems. This requires installing
piglet to the python environment:
1. Go to piglet-public folder (If you use a virtual environment, activate the virtual env first. You
could use your flatland virtual environment as well.)
2. git fetch
3. git checkout pddl_solver //if successful, you should find a pddl_solver.py under
lib_piglet/utils/ folder
4. python setup.py install
Check the
myTeam.py starter implementation to see how to use the newly installed library.
The competition server will have the same piglet installed in the environment as a PDDL solver.
Update Pacman Code
Make sure you have the latest version of the Pacman CTF game environment:
FIT5222 Planning and automated reasoning
1. Go to the pacman-public folder (you cloned the repo in week 1)
2. git fetch
3. git reset –hard //in case any changes you made locally
4. git pull
5. If your code is newest, you should see staffTeam.py and berkeleyTeam.py in the repo.
All codes for the game environment are now found in the
pacman folder.
Part 2: Game Rules
We can characterise Pacman CTF as follows:
Multi-agent: Two agents need to work together against an opposing team.
Discrete: The environment is a grid maze and time is discretised into unit time steps.
Dynamic: The environment changes as food is consumed.
Partially observable: Each agent has a limited sensing range.
Sequential: Past decisions affect future actions.
Deterministic: The current state depends only on the teams and their actions.
Offline: The world stops while an agent deliberates.
● Known: All the rules of the game are available a priori.
Environment
The game map is divided into two halves: red (left) and blue (right). When on the red side, a red
agent is a ghost. When crossing into enemy territory, the agent becomes a Pacman. Red
agents must defend the red food while trying to eat the blue food.
Scoring
When a Pacman agent eats a food dot, that dot is removed from the board and placed in a
virtual backpack. When the agent returns to their own side of the board, they automatically
“deposit” the contents of their backpack, earning one point per dot. The Red team scores
positive points, while the Blue team scores negative points.
Watch out! If an agent is caught by a ghost, before reaching their own side of the board, all
dots in their backpack will be deposited back to their original positions on the board. The agent
also returns (as a ghost) to its original starting position. No points are scored in this case. No
points are awarded to the opposing team either (for catching Pacman).

FIT5222 Planning and automated reasoning
Power Capsules
A small number of power capsules are scattered throughout the maze. These can be eaten by
a Pacman agent. When this happens ghosts become “scared” and are susceptible to being
caught by the powered up Pacman. This ghost effect lasts for the next 40 timesteps, or until
caught, whichever comes sooner.
Observations
Agents can only observe an opponent’s configuration (position and direction) if they or their
teammate are within 5 squares (Manhattan distance). In addition, an agent always gets a noisy
distance reading for each agent on the board, which can be used to reason about unobserved
opponents. The reading has an error of +/- 6 steps from the true distance. A general direction
(toward the opponent) is not specified.
Winning
A game ends when one team returns all but two of the opponents’ dots. Games are also limited
to 300 time steps (i.e., 300 moves for each of the four agents). If this move limit is reached,
whichever team has returned the most food wins. If the score is zero (i.e., tied) this is recorded
as a tie game.
Part 3: Task & Student Workflow
At every time step the simulator will present you with the current state of the game board. Your task
will be to reason over this current state and decide how your agents should react.
This will require you to think about the planning process in several different ways: at the high-level,
where you decide what strategy your team will pursue; at the low-level, where you transform that
strategy into a concrete sequence of low-level actions; and you will need to decide when to replan
your agents, in response to changing circumstances.
We illustrate an example of the
student workflow in the diagram below.
FIT5222 Planning and automated reasoning
Your starting point for modifying this workflow is the function chooseAction in file myTeam.py.
This function is called by the simulator at every time step and for each agent. The simulator
provides the agent with a description of the game board. At the end of the function you must return
a concrete action for the agent to execute in the next timestep.
How to decide which action the agent should take is entirely up to you. The starting code provides
a basic template for the decision-making process and some simple, though not very effective,
strategies. You will need to modify this code to create a winning AI.
The next sections give further details about the decision-making process. For more details
regarding the Pacman CTF code, the
myTeam.py implementation, and other general tips, refer to
the additional document “
Assignment 2 Code Documentation.pdf!
3.1. Choose a High Level Goal
What is the high-level strategy that agents should pursue? For example, they could visit the
opposing side and try to score points, or they could stay home, and defend their score. Each agent
chooses its own high level goal, but that decision may be influenced by the current state of the
board, the current score, and the strategy and position of the teammate agent.
Your starting point here is the
getGoals function in the file myTeam.py. Given a PDDL description
of the current state, this function returns positive and negative PDDL state descriptions that
together describe a high-level goal. You can create multiple different goals and then choose
between them based on the information in the current state.

FIT5222 Planning and automated reasoning
3.2. Generate a High Level Plan
To achieve your high-level goals you will need to compute a corresponding sequence of high-level
actions (i.e., a plan). Given a PDDL description of the initial (i.e., the current) states and the goal
states, the function
getHighLevelPlan computes such a plan. You do not need to modify this
function. Just call it as necessary.
But what if the agent already has a high-level plan, computed in a previous iteration? It’s up to you
to decide how to proceed. The agent could retain the existing plan or it could change, and replan
anew from the current state. For example, if the pre-conditions for the next action are no longer
satisfied, the agent will certainly need to generate a new high level plan. The agent may also
decide to revise its current plan (and possibly current goal!) if its execution produces an
unexpected result (e.g., being caught by a ghost or a powered-up Pacman).
The staff implementation checks if we can reuse existing plan:
Note: Only a limited set of predicates are included in the basic PDDL state description (see the file
myTeam.pddl) and not all of them appear in the given actions. You can create a richer model, and
compute more diverse plans, by (1) refining existing actions or creating new ones that use more of
the available predicates; (2) enhancing the PDDL state description with additional information from
the current game state (i.e., add more predicates). We provide useful programmatic facilities for
this purpose. See the
get_pddl_state function for examples of how to collect information from the
simulator environment (
gameState).
FIT5222 Planning and automated reasoning
3.3. Generate a Low Level Plan
High level plans tell the agents what to do, but in broad strokes. For example, if the goal is to score
points, a necessary action is to move to the opposite side of the board. But how should the agent
get there? Once on the opposing side, the next high level action might be to eat a food dot. But
which one? Low level planning provides answers to these questions.
The function
getLowLevelPlan is your starting point here. Its inputs are a high-level action and a
description of the current game state. The function returns a sequence of move actions that the
agent needs to take to achieve the high-level effects of the input action.
You are free to implement this function using any method you choose. For example:
Heuristic Search Strategy: you can develop your own custom strategies by implementing
a new expander class for your own customised state-space representation.
Learning Based Strategy: we provide a very basic example of this approach. You can
make your own implementation or improve it by introducing more features to the model.
Then you will train the new model to obtain a set of good weights. You should carefully
consider, for each high level action, how you will reward or penalise the agent for its
performance. You may need to keep adjusting your implementation, observe how your
agent behaves, and try something new to overcome the drawbacks you observed.
But what if the agent already has a low-level plan, computed during a previous iteration? Again, it’s
up to you to decide how to proceed. The agent could continue following the existing low-level plan
or it might decide to revisit its decisions on account of new information provided by the game state.
For example, if Pacman is heading towards a food dot, but observes a ghost up ahead, continuing
toward the dot may be a bad idea.
The staff implementation reuse low level plans:
Remember! There are many different ways to achieve a high-level action. You may also wish to
consider whether your two agents coordinate their low-level actions or act independently of one
another.

FIT5222 Planning and automated reasoning
3.4. Execute Moves
Your low-level strategy has generated a sequence of directions for the agent to move in. The first
move in this plan is given to the simulator for execution. Each of your two agents, and the two
opposing agents, move according to their own instructions and the timestep is advanced by one.
All of this happens after the completion of the
chooseAction function.
Based on the new state of the game board, you may need to re-evaluate your goals and plans.
This workflow is cyclic and continues until the game is over.
Remember: You can (and should!) compare your implementation (myTeam.py) against the
existing baselines (
staffTeam.py or berkeleyTeam.py). You can find these implementations in the
pacman folder. The baselines are useful to check that your implementation works and to measure
your progress.
Part 4: Competition Setup
We will have a competition to see who has the strongest pacman AI. You will be able to submit
your implementation to a contest server where they will be run against each other.
Upon submission to the server your agent will be evaluated against a staff baseline implementation
(staffTeam.py). Your agent will be evaluated for 49 games on 7 different maps. Within each game
there will be a time limit and the agent that has succeeded in eating the most food during that time
will win that game. To win the match with staff baseline, your team must win convincingly: 28 out of
49 games.
By winning the staffTeam.py, you will get a pass for Criteria 1 competition score (see more details
in marking rubrics), have a position on the leaderboard, and your agent will be evaluated against
all other participants on the leaderboard to get additional victory points.
With each opponent on the leaderboard (excluding the staff baseline implementation), your agent
will be evaluated on the same set of problems:
● In the event of a convincing win (win at least 28 out of 49 games), you will get 3 victory
points.
● In the event of a tie (win less than 28 games, but lose less than 28 games as well,
considering tie games neither win nor lost.), you will get 0.5 victory points.
● In the event of a loss (lose 28 games or more), you will get 0 victory points.
If you win the staff baseline implementation, your marks for Criteria 1 competition will be decided
by (Your total victory points / All available victory points)%, where the all available victory points is
3 × (number of all participants – 1).
You are expected to
clearly win against the staff baseline implementation, otherwise you will
not proceed to the next stage of competing with entries from other students. In this case your
grade for Criteria 1 of the marking rubric will be a fail.

FIT5222 Planning and automated reasoning
The server will keep a record of your matches against every opponent on the leaderboard. When
someone have a new best implementation recorded, your score against that student will be
updated as well.
More details regarding the competition setup will be provided in a supplementary document
“Assignment 2 Contest Submission Instruction.pdf”.
Part 5: Report
Together with your implementation source code you will be required to submit a report that
describes your approach. You will need to write a description of your strategy and give a
justification for algorithmic choices. Make sure that your report is as detailed and complete as
possible, including appropriate references.
When describing your model you should explain not only what it does but also analyse its strengths
and weaknesses (e.g., complexity, guarantees etc). You should also discuss how your code
implements that model: for example, important data structures, necessary optimisations and
parameter choices. Justify your design and implementation choices with discussion and, where
appropriate, with experiments.
You can ask questions about the assignment and you can discuss the merits of different design
choices (e.g., on Ed).
However all submitted work must be completely your own. Do not copy
implementations you happen to find online. Do not ask your friends or colleagues for their
implementations. You will need to work independently and take reasonable steps to make sure
others’ don’t copy or misuse your work. For more information please see
https://www.monash.edu/students/study-support/academic-integrity
PLAGIARISM IS NOT ACCEPTABLE.
YOU WILL FAIL THE ASSIGNMENT AND POSSIBLY THE ENTIRE UNIT.

FIT5222 Planning and automated reasoning
Part 6: Marking Rubric
The rubric specifies the marking criteria for your implementation and report. To avoid disappointment, pay close attention to the requirements.

Criteria N
0%-49%
P50
%-59%
C60
%-69%
D70
%-79%
HD
80%-100%
Weight
(%)
1. Competition Loses to the staff
baseline
implementation
Outranks staff baseline
implementation.
Get 25% ~ 49% of all
available victory points.
Get 50% ~ 74% of all
available victory points.
Get 75% ~ 100% of all
available victory points.
33.3%
2. Agent strategy
implementation.
Minor updates to
example PDDL
implementation in
the code base (for
example, only
changing parameter
values)
Addresses some
drawbacks in existing
implementation.
Improves the model and
high level planner by:
(1) Introducing new actions
to the pddl model, which
uses more listed predicates
or new predicates you
introduced. Or
(2) Design new goal
functions.
Same as previous criteria
plus:
Implements alternative high
level goal prioritisation
scheme (agents switch
goals based on new
observations).
Improves low level planner;
e.g. more features, better
reward functions, or uses
heuristic search for low
level planning.
Same as previous
criteria plus:
Customised low level
plans for distinct high
level actions; e.g.
different low-level
strategies for eat food
(which?) vs. go to
opposing side (where?).
Same as previous criteria
plus:
High level and low level
decisions take into account
the current action and
strategy of the teammate
agent. (Two agents should
cooperate with each other,
and share information with
each other.)
Or design your own model
and agent workflow.
33.3%
3. Report –
Description of
your approach
Incomplete or
insufficient
description of the
approach.
Insufficient or
High level description of the
approach with sufficient
pseudo-code.
Discusses how the new
implementation improves
Same as previous criteria
plus:
Discussion and algorithmic
analysis (as applicable).
E.g., time, space,
Same as previous
criteria plus:
Reflections:
Discuss and compare
the advantages and
Same as previous criteria
plus:
Well documented numerical
experiments, analysing the
efficiency, advantages,
23.3%

FIT5222 Planning and automated reasoning

missing
justifications.
Little or no
pseudo-code.
upon the staff baseline,
staffTeam.py
completeness, optimality.
Provide motivation for your
implementation choices.
Link the agent strategy to
lectures and tutorials.
Appropriate referencing
(where applicable).
disadvantages of at
least 2-3 implemented
strategies/attempts.
drawbacks of your different
implementations. E.g.
improvements on success
rate between different
strategies/attempts, runtime
statistics on planning time,
and appropriate reference
algorithms.
4. Report –
Communication
skills
Hard to follow with
no clear narrative.
Inadequate or no
separation of
discussion text into
coherent sections.
Writing is not
accurate or
articulate.
Inadequate
supporting materials.
Inadequate or
missing referencing.
The writing has a tenuously
logical narrative. Some
attempt at the expected
structural elements (e.g.
Intro, conclusion).
Writing is not accurate or
articulate most of the time.
The document has few
supporting materials
(tables, images,
pseudo-code) .
The student has
attempted to undertake
citing and referencing
with frequent errors.
The text has a clear logical
narrative and expected
structural elements (e.g.
intro, conclusion).
Writing is not accurate or
articulate most of the time.
There are some supporting
materials (tables, images,
pseudo-code) but not well
integrated with the rest of
the text.
The student follows the
requirements for citing and
referencing, with
some notable errors.
The writing is well
composed and has a
clear and logical
narrative and is well
structured.
Writing is generally
accurate and articulate.
The document has
appropriate supporting
materials that are well
integrated with the rest
of the text.
The student follows the
requirements for citing
and referencing, with
some minor errors.
The writing is very well
composed and has a very
clear and logically
formed a narrative.
Writing is accurate and
articulate.
The document is expertly
structured in the style of a
scientific report, including
appropriate supporting
materials that clearly improve
the quality of associated
discussion.
The student follows the
requirements for citing and
referencing.
10%

FIT5222 Planning and automated reasoning
Part 7: Submission Guide
Submission to Moodle:
The following submission items are required:
1. Your implementation source codes, in a single directory called “src” (you can copy
everything in the piglet folder to “src”). Zip the codes directory with file name
last_name_student_id_pacman.zip. (For example, Chen_123456_pacman.zip)
2. The report describing your approaches as a single pdf file. Name the pdf as
last_name_student_id_report_pacman.pdf. (For example,
Chen_123456_report_pacman.pdf)
Submission to Contest Server:
You must make a successful submission on the contest server as part of your mark.
Submit early and submit often. Don’t wait for the last day, when you might be subject
to circumstances out of your control (like server outages!).
Submission deadline : 11:55 pm Sunday 28 May 2022

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,