CSE150B Final (Sample) 2 Please, finishing reading everything on this page before you start. Or we’ll just reply “Page 1” for questions 3 that are already answered here and you still need to read it anyway. 4 • 5 • 6 • 7 8 9 • 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 • 26 27 28 • 29 30 • 31 32 33 Submission deadline: … Submit photos or pdfs of your written answers on Gradescope. Record a video in which you briefly explain the answers. Don’t make it too long, just enough so that we can tell that you get the ideas for each question you answer. The grading will still be based on the written part; will be used to spot serious problems or help you getting partial credits. For the video submission, we recommend the following: 1. 2. 3. 4. 5. Record your video using Zoom or your phone and submit the mp4 file through the link… Make sure the file name has your first name, last name, and PID. Dropbox will send you confirmation. You can update the video before the deadline (do not overdo it, because there might be glitches in Dropbox over multiple versions; uploading just once is the best), and make sure you always use the same file name otherwise we will grade using an arbitrary one from what you sent. Or you can record your video using Zoom and use its option “Record on the Cloud” and then Zoom will send you a link and password for sharing that video. Send the link and password to my email [email protected] (write ” 150B Final” and your PID in the subject). This is less preferred than uploading the video directly, and you do not get a confirmation immediately either. If you recorded using your phone or in other ways, definitely try compressing the video into mp4 format (can use relatively low resolution), otherwise the file may be too large. We highly recommend leaving enough time for uploading – ideally start uploading before Friday is over for you so that there is time to deal with technical issues on Saturday if there is any. If you have difficulty with all these options, send a message on Slack in the #question channel before the submission deadline and we’ll direct message you to figure out a way. When a question has a “why” or “explain” part, make sure to write down a short explanation in writing first and then you can just repeat them in your video. When there is no “why” asked in a question, you only need to write down the answer and then explain the why in the video. You can write your answers on blank paper and take photos of them, or directly annotate on the pdf, etc. Just make sure things are readable. Your final grades will depend on the sum of all points you’ve obtained in the class. So you only need to get enough points in the final that fit the grade you want. Eg, if you just need a B and you got full points for all the PAs already (summing up to 70 or more), then you just need to aim for 10 points from the final – of course, over-budgeting a bit is always good. 1 34 1 Classical Search (6 Points) 35 Consider the state space graph shown in Figure 1. A is the start state and G is the goal state. Each edge 36 can be traversed in both directions. ABC EF D. h G Figure 1: Graph for the classical search questions. 37 Question 1 (3 Points). Let the cost of passing through each node be defined as: c(A) = 0,c(B) = 1,c(C) = 10,c(D) = 3,c(E) = 3,c(F) = 1,c(G) = 10,c (H) = 1. 38 The table below lists three basic graph search strategies and four possible paths. Check which, if any, of the 39 listed paths each strategy may return: each algorithm may return multiple different paths because we have 40 not restricted the order in which we push new states into the frontier or how we break ties. Just mark the corresponding box and make sure to mark all paths that can be returned by each algorithm. AEBA ABCG AEBCG ADHFG Depth-First Search Breadth-First Search Uniform Cost Search 41 42 Is ha consistent heuristic function? What is the path returned by A* search using h? 43 2 Adversarial Search 44 Consider this (upside-down Geisel library) game tree in Figure 2. You can think of it as playing “one round 45 of chess with trembling hands” in the following sense: there are two rational players as usual, Min and Max, 46 and for each player after they determine what action to take, there is a chance player coming in (think of it 47 as the environment, modeling the fact that the two players may face challenges putting the pieces down at 48 places they want) that gives a random outcome according to the probabilities as labeled on the edges in the 49 tree. That is, the “Chance 1” player always gives the left outcome with 0.8 probability and the right one 50 with 0.2 probability, while the “Chance 2” player uses a uniform distribution, giving each possible outcome 51 0.5 probability. As always, Max will maximize, Min will minimize, and the chance players will return a 52 sampled outcome according to the probabilities. Question 2 (3 Points). Define the following heuristic function h: h(A) = 9, h(B) = 9, h(C) = 2, h(D) = 7, h(E) = 1, h(F) = 4, h(G) = 0, h (H) = 10. 2 Chance 1 Max 0.2 0.8 0.2 Chance 2 0.8 0.8 0.2 Min Terminal 10 0.5 -2 0 3 Figure 2: Tree for the adversarial search questions. 53 Question 3 (2 Points). What is the final value for the max player (ie, what is the number you should 54 annotate the root node with)? 55 Question 4 (3 Points). I want to write the value of the max player succinctly as ???? Vmax=maxE minER(τ) aa′ 56 so that this value Vmax corresponds to what you have answered in the previous question. I hope you are 57 convinced that this equation makes sense (I hope you will not ask what E means). Answer the follow 58 questions: How many choices does the maxa range over? And how many choices does the mina′ range over? 59 What should the τ here refer to? What is R(τ)? Also, give an arbitrary instance of τ and its R(τ). 60 3 Markov Decision Processes 61 Consider an MDP with states S = {s1 , s2 , s3 }, actions A = {a}, and the following transition probabilities: 62 P(s1|s1,a) = 0, P(s2|s1,a) = 0.6, P(s3|s1,a) = 0.4. Define rewards R(s1) = 0, R(s2) = 1, R(s3) = 2. For 63 simplicity we do not use a discount factor in this problem (or you can think of it as γ = 1). 64 Question 5 (2 Points). Enumerate all possible trajectories with s1 being the initial state (in the form of 65 state, reward of state, action, next state, and so on until termination), and their probabilities. Use them to 66 compute the value of s1 in the MDP. 67 Question 6 (2 Points). Since s2 and s3 are terminal states, we can consider the MDP as a tree with s1 as 68 its root. Show that the expectimax value of s1 in this tree is the same as the value of s1 in the MDP. (Again 69 we are not using discount factor here.) 3 70 4 Reinforcement Learning 71 We will consider RL on a tiny control problem. Shown in Figure 3 is a simplified inverted pendulum (mass 72 attached to an inelastic weightless rod). The hope is to balance the pendulum at the upright position by 73 giving a torque input at the end of the rod (the box with ±T indicating two choices of torque in opposite 74 directions). The angular position is measured by the difference with the upright position, and tilting right 75 is considered the positive direction. Note that this is a (of course extremely) simplified model of how robots 76 walk and SpaceX lands its rockets. !àà !=à !?à ! ±T Figure 3: Inverted pendulum for the RL questions. 77 We use variable θ to represent the pendulum’s angular position, and ω to represent its angular speed. 78 Suppose s = (θ, ω) be the current state, and s′ = (θ′, ω′) be the next state, after the system runs for a small 79 time step ∆t. The discretized and very aggressively simplified dynamics of the pendulum is as follows: θ′ = θ+ω∆t (1) ω′ = ω+(k1θ+k2a)∆t (2) a ∈ {0,T,−T} (3) 80 where k1,k2 are constants dependent on the physical parameters that we do not need to care about, so let 81 us just assume they are just k1 = k2 = 1. These equations tell you how the current state (θ, ω) transitions 82 to the next state (θ′,ω′) under an action a. This control action a is simply either 0 or T or −T, and we let 83 T = 1. We set the time duration of each step ∆t to also be constant ∆t = 1 (which is not small enough to 84 physically justify the simplifications we made in the equations but let’s not worry about that). 85 Note that there is no probability involved in this setting, but the system dynamics will not be known to 86 the learner, so the agent needs to learn the control policy by simulating the system and observing the effects 87 of the actions. Define the reward function for each state s = (θ, ω) be R(s) = −(θ2 + ω2). By maximizing 88 accumulated rewards, the agent can learn to maintain the pendulum at the upright position, in which case 89 the accumulated rewards will be close to zero (and otherwise much more negative). 90 Question 7 (1 Point). For the problem, is it enough to consider only the angular position θ as the system’s 91 state, without the angular velocity ω component? Why? (Recall what “Markov” means, and although we 92 don’t have probabilities here, think about why the Markov condition was important.) 93 Question 8 (2 Points). Let the initial state be s0 = (0, 1). Take two steps of actions, a0 = T and a1 = −T , 94 which will unroll the system from s0 to the next state s1 and then to another state s2, according to the 95 equations in (1)-(2). We will stop the simulation there and consider s2 as a terminal state. Write down what 96 s1 and s2 are (their numerical values as a vector), and the reward-to-go for each of these three states in this 97 trajectory. If you need, use a discount factor γ = 0.5. 4 98 Question 9 (3 Extra Points). Construct a trajectory of three states s0,s1,s2 such that it ends up at the 99 perfect upright position, ie, s2 = (0,0). We require that s0 ̸= s2 and a0 ̸= 0. (Hint: Write down a few 100 linear equations and you just need to find one of the many possible solutions.) Now, for the state s1 and the 101 corresponding a1 that you came up with in this sequence, what should the optimal Q value be for Q(s1, a1)? 102 (Hint: Note that s2 is the best possible terminal state for the system, so the action a1 you computed is the 103 best possible one). 104 When you are interested (after the quarter ends apparently, and sorry no more extra credits), try coding 105 this environment up and see how Q learning can be used to find a good control policy for it. 106 5 Monte Carlo Tree Search Figure 4: Tree for the MCTS Question 112 Question 10 (2 Points). Draw the first four trees that the MCTS algorithm constructs. Use c = 10 in the 113 UCB formula (refer to slides or pseudocode of PA4). After 6 rounds of the algorithm (6 iterations of the 114 entire MCTS loop), what is the number of wins and the number of visits for Max’s “right” action? 115 6 Constraint Solving 116 We now look at constraint solving in the context of neural networks. Show in Figure 5 is a simple neural 117 network. You don’t need to know about them before (and knowing them will not give any advantage), just 118 follow the descriptions below. Figure 5: Left: Neural network model for the constraint solving questions. Right: The tanh function. 5 Consider the minimax tree in Figure 4. This is the entire game tree, and the leaf nodes are the actual 107 108 terminal states of the game. In this question, wherever there is a random choice to make, the order is from 109 left to right. For instance, when you need to break tie between two children nodes, prioritize the left one. 110 When you need to sample a random outcome between left and right, the random sequence is fixed to be left, 111 right, left, right, etc. In the tree, the number 1 means Max wins, and the number 0 means Min wins. xa? tanh a2 yb? b2 119 The x variable represents the input, and it first passes through a linear transformation with weight a1 120 and bias b1 and then gets fed into the “neuron” which we choose to be the tanh function here. Look up this 121 function on Google and observe that it has a “switch” like behavior, which makes it a continuous version of 122 a logical gate. Also, importantly, it is a monotonic function. Then the output from the tanh function passes 123 through another transformation defined by a2 and b2 and then becomes the output value y. So overall, this 124 network is simply the diagram of a nonlinear function from the input x to the output y: y = a2 tanh(a1x + b1) + b2. 125 This nested form can be flattened in to a constraint system over x, y and two more additional variables s1, s2, 126 in the sense that the input and output values of the network always satisfy the following constraint system: {s1 =a1x+b1,s2 =tanh(s1),y=a2s2 +b2}. (4) 127 In this form, we can use constraint solving algorithms to analyze the relation between inputs and outputs of 128 this network. Note that all the really large networks (with millions or billions of neurons like this) can be 129 encoded and analyzed with the same principle, if the constraint solving algorithms scale well enough. 130 Question 11 (3 Points). Suppose y ∈ [1, 2] is the range for the output to be classified as a certain result (like 131 “cat”). Assume that the network parameters have been trained to be simply a1 = b1 = a2 = b2 = 1. What 132 is the maximum range (lower and upper bounds) of input x that will produce this output range y ∈ [1, 2]? 133 Round up numbers to 2 digits after decimal point in your result. Feel free to use Google or Python for any 134 of the computation needed. (Hint: use arc-consistency to propagate the range of variables from y to x, using 135 the constraints in (4)). 136 Question 12 (3 Points). Suppose I have an input x0 that already produces the output y0. I now want a 137 different input x1 that is quite different from x0, in the sense that |x0 − x1| > ε, where ε is a given positive 138 parameter that defines the threshold, such that the output on x1 is exactly the same as x0’s output. 139 is the constraint system (write it in a similar way as (4)) that you can write down to uniquely define all such 140 x1? Feel free to use all the variables that appeared here (x0, y0, ε) in your constraints. (Note: this in general 141 is the adversarial attack problem for neural networks. By solving such constraints, you can find a picture 142 that is nonsense but still classified as cat, which is what people worry about for face recognition systems and 143 autonomous cars driven by neural perception systems.) 144 7 Propositional Reasoning 145 Question 13 (2 Points). List all the satisfying assignments of the following DNF formula. (p1 ∧p2 ∧p3)∨(¬p1 ∧p2)∨(p2 ∧¬p3) 146 (Hint: Recall the special structure of DNF and you don’t need a truth table to do this.) 147 Question 14 (2 Points). List all the satisfying assignments of the following CNF formula. (¬p1 ∨ ¬p2 ∨ ¬p3)∧(p1 ∨ ¬p2) ∧(¬p2 ∨p3) 148 (Hint: Pay attention to the relation between this formula and the previous one.
Tags: assignmentexpert, assignmenthelp, assignmenthelpaustralia, assignmenthelper, assignmenthelpuk, assignmenthelpusa, assignmentwriting, bestpriceguaranteed, bestqualityguaranteed, london🇬🇧, myassignmenthelp, plagiarismfreework