Submission
Please submit your solution electronically via vUWS. (1) Submit a report as PDF via Turnitin. (2) Create a zip file with your code (use zip, do not use rar), and any other file you want to submit, and upload it to vUWS (to where you got this assignment text), and please include the signed and completed cover sheet that you can find at the end of the document.
Submission is due on 2 Nov 2022, 11:59pm.
Miniracer
Figure 1: 4 frames from miniracer. There are three possible values for a pixel: +2 for the car (the dark 2 x 2 square), 0 for drivable track segments (1 6 pixels, in white), and +1 for non-drivable terrain (here in grey). When the front of the car bumps into non-drivable terrain, the episode finishes. The rear of the car is allowed to go off road.
In this assignment we work with data and a simulation of a simple racing game. The car is represented by the black square in the screenshots above.
In this game, the car remains at the bottom of the screen, and can either move left, right, or keep the current position. At every step, the track scrolls down by one, simulating the driving car. The size of the screenshot is 16 × 16 pixels.
Preparation Download the minirace.py and sprites.py python files. The class Minirace implements the racing game simulation. Running sprites.py will create datasets of screenshots for your first task.
A new racing game can be created like here:
from minirace import Minirace therace = Minirace(level=1)
× |
In this, level sets the information a RL agent gets from the environment. The car is 2 2 pixels, and cannot leave the field. The track segments are 6 pixels wide, and have positions from 1 (left) to 5 (right), and the car has 7 different positions (from 0 to 6). The front of the car (in the second row from the bottom, row 1) must remain on
drivable terrain at all times. The rear of the car (in the first row from the bottom, row 0) is allowed to come off road with no penalty.
At each step during a race, the agent will get a reward of +1. Once the front of the car comes off road, the episode finishes.
Task 1: Train a CNN to predict a clear road ahead 15 points The python program sprites.py creates a training and test set of “minirace” scenes,
× |
trainingpix.csv (1024 examples) and testingpix.csv (256 examples). Each row represents a 16 16 screenshot (flattened in row-major order), plus an extra value of either 0 or 1 that indicates if the car can safely drive straight without going off-road in the immediate next step (i.e., there are 257 columns).
Steps
- Create the datasets by running the sprites.py code.
- Create a CNN that predicts the whether the car can safely remain on the current position (i.e., drive straight) without crashing into non-drivable terrain.
- Describe (no programming): what is a good loss function for this problem?
- Implement and train the CNN on the training set.
- Compute the accuracy of your model on the test data set.
- Your are free to choose the architecture of your network, but there should be at least one convolutional layer.
- You can normalise/standardise the data if it helps improve the training.
What to submit:
- A description of your CNN and the training. Calculate the size of each layer, and include it in the description.
- Include the explanation for the loss function in your description.
- For how long did you train your model (number of epochs, time taken)? What is the performance on the test set?
- Submit the python code for your solution (either as .py or .ipynb).
Task 2: Train a convolutional autoencoder 10 points
Create a convolutional autoencoder that compresses the racing game screenshots to a small number of bytes (the encoder), and transforms them back to original (in the de- coder part).
Steps
- Create and train an undercomplete convolutional autoencoder and train it using the training data set from the first task.
- You can choose the architecture of the network and size of the representation h = f (x). The goal is to learn a representation that is smaller than the original, and still leads to recognisable reconstructions of the original.
- (No programming): Explain the difference between an undercomplete and a de- noising autoencoder.
× |
(No programming): The input images are 16 16 = 256 pixels. What is the size of your hidden representation h = f (x) (the middle layer size of your autoencoder). Include your calculation in your report.
What to submit:
- Submit the python code of your undercomplete autoencoder (either as .py or
.ipynb).
- For your report, write a brief description of your steps to create the model and your prediction. Include the description undercomplete vs. denoising autoencoder, and your calculations. How do you measure the quality of your model?
- Include screenshots of 1-2 output images next to the original inputs (e.g., select a good and a bad example).
Task 3: Create a RL agent for Minirace (level 1) 15 points The code in minirace.py provides an environment to create an agent that can be
trained with reinforcement learning (a complete description at the end of this sheet). The following is a description of the environment dynamics:
- The square represents the car, it is 2 pixels wide. The car always appears in the bottom row, and at each step of the simulation the track scrolls by one row below the car.
- The agent can control the steering of the car, by moving it two pixels to the left or right. The agent can also choose to do nothing, in which case the car drives straight. The car cannot be moved outside the boundaries.
- The agent will receive a positive reward at each step where the front part of the car is still on track.
- An episode is finished when the front of the car hits non-drivable terrain.
In a level 1 version of the game, the observed state (the information made available to the agent after each step) consists of one number: dx. It is the relative position of the middle of the track right in front of the car (i.e., the piece of track in the third row from the bottom of the image). When the track turns left in front of the car, this value will be negative, and when the track turns right, dx is positive. As the track is six pixels wide, the car can drive either on the left, middle, or right of a piece of track (it does not need to drive in the middle of the road).
For this task, you should initialise the simulation like this:
therace = Minirace(level=1)
When you run the simulation, step() returns dx (…, −2, −1, 0, 1, 2, …) for the state.
Steps
- Manually create a policy (no RL) that successfully plays drives the car, just se- lecting actions based on the state information. The minirace.py code contains a function mypolicy() that you should modify for this task.
- (No programming) How many different values for dx are possible in theory (if you ignore that the car may crash)? If you were to create a tabular reinforcement learning agent, what size is your table for this problem (number of rows and columns)?
- Create a (tabular or deep) TD agent that learns to drive. If you decide to use ϵ– greedy action selection, set ϵ = 1, initially, and reduce it during your training to a minimum of 0.01. Keep your training going until you are either happy with the result or the performance does not improve1.
When you run your training, reset the environment after every episode. Store the sum of rewards. After or during the training, plot the total sum of rewards per episode. This plot — the Training Reward plot — indicates the extent to which your agent is learning to improve his cumulative reward. It is your decision when
1This means: do not stop just because ϵ reached 0.01 – you may want to stop earlier, or you may want to keep going, just do not reduce ϵ any further.
to stop training. It is not required to submit a perfectly performing agent, but show how it learns.
- After you decide the training to be completed, run 50 test episodes using your trained policy, but with ϵ = 0.0 for all 50 episodes. Again, reset the environment at the beginning of each episode. Calculate the average over sum-of-rewards-per- episode (call this the Test-Average), and the standard deviation (the Test-Standard- Deviation). These values indicate how your trained agent performs.
What to submit:
- Submit the python code of your solutions (both the manual strategy, and the code of your RL learner).
- For your report, describe the solution, mention the Test-Average and Test-Standard- Deviation, and include the Training Reward plot described above. After how many episodes did you decide to stop training, and how long did it take?
Task 4: Create a RL agent for Minirace (level 2) 10 points In a level 2 version of the game, the observed state (the information made available to
the agent after each step) consists of two numbers: dx1, dx2. The first value (dx1) is the same as dx in level 1 – the relative position of the (middle of the) track in front of the car. The second value (dx2) is the position of the subsequent track (in row 4), relative to the track in front of the car (in row 3).
A second difference is that the track can be more curved: sometimes the track will only overlap on the left or right edge. This means the agent cannot always drive in the middle of the track, because the car can only move one step to the left or right at a time.
For this task, you can initialise like this:
therace = Minirace(level=2)
In the level, step() returns two unnormalised pixel difference values (i.e., two values from …, −2, −1, 0, 1, 2, …).
Steps
- Create a RL agent (using a RL method of your choice) that finds a policy using (all) level 2 state information. A suggested discount factor is γ = 0.95.
- You can choose the algorithm (a tabular approach, deep TD or deep policy gradi- ent).
- Try to train an agent that achieves a running reward > 50 (the minirace.py
file has an example for how to calculate this).
- If you use a neural network, not go overboard with the number of hidden layers as this will significantly increase training time. Try one hidden layer.
- Write a description explaining how your approach works, and how it performs. If some (or all) of your attempts are unsuccessful, also describe some of the things that did not work, and which changes made a difference.
What to submit:
- Submit the python code of your solutions.
- For your report, describe the solution, mention the Test-Average and Test-Standard- Deviation, and include the Training Reward plot described above.
Tips
- For the RL-tasks, it often takes some time until the learning picks up, but they should not take hours. If the agent doesn’t learn, explore different learning rates. For Adam, try values between 5e-3 (faster) and 1e-4 (slower).
- Even if the learning does not work, remember that we would like to see that you understood the ideas behind the code. Describe the ideas that you tried, and still submit your code but say what the problem was.
Minirace python code
If you put the minirace.py file into your working directory, you can import the class like this:
from minirace import Minirace therace = Minirace( level=1)
The Minirace class has several functions that you will have to use. The file contains an example and an explanation for many of the functions (check it out), but here is a brief list:
therace = Minirace( level=level)
n = therace. observationspace() state = therace. state()
state = therace. transition( action) done = therace. terminal()
r = therace. reward()
state , r, done = therace. step( action) state = therace. reset()
action = therace. sampleaction() therace. render( text = False , reward=r) x, z, d = therace.s1
pix = therace. to pix(x, z)
You can ask or answer questions about how to use the files provided with this assign- ment on discord, as long as they are general python / programming questions, for exam- ple if the code provided does not work for you as expected. You must not ask or answer questions to the machine learning questions in this assignment anywhere, including dis- cord. If in doubt, ask your friendly lecturers or tutor first.
Assignment Cover Sheet
School of Computer, Data, and Mathematical Sciences
Student Name | |
Student Number | |
Unit Name and Number | INFO7001: Advanced Machine Learning |
Title of Assignment | Assignment 1 |
Due Date | 2 Nov 2022 |
Date Submitted | |
DECLARATION I hold a copy of this assignment that I can produce if the original is lost or damaged. I hereby certify that no part of this assignment/product has been copied from any other student’s work or from any other source except where due acknowledgement is made in the assignment. No part of this assignment/product has been written/pro- duced for me by another person except where such collaboration has been authorised by the subject lecturer/tutor concerned. Signature: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (Note: An examiner or lecturer/tutor has the right not to mark this assignment if the above declaration has not been signed) |
Task 1 Task 2 Task 3 Task 4 Total
Mark
Possible 15 10 15 10 50
The maximum points possible for this assignment is 50.
Get expert help for Advanced Machine Learning Assignment and many more. 24X7 help, plag-free solution. Order online now!