Search

a. b.
No. Suppose k = 1, then in this case it would take B as the only successor and force BC as the path to the goal even though AC is better.

Constraint Satisfaction

Games

The value at the top MAX node is 4.
F, H, and I are pruned.

Logic

Predicates are: Smart(t), Robot(t), Owns(owner, owned), IsNiceTo(p, niceTo)
- 1. ∀ x: Robot(x) => Smart(x)
- 1. ∃ x: Robot(x) ∧ Smart(x
- 1. Robot(Robbie)
- 1. ∀ x, y: Robot(x) ∧ Owns(y,x) => IsNiceTo(x,y)
- 1. Owns(Bob, Robbie)
- 1. ∃ x, y: Roboy(x) ∧ Owns(y,x) ∧ Smart(y)
Prove using resolution (or converting to KB to CNF and then proving by negation)
- And-Introduction 3 and 4: 7. Robot(Robbie) ∧ Owns(Bob, Robbie)
- UE 4, x/Robbie, y/Bob: 8. Robot(Robbie) ∧ Owns(Bob, Robbie) => IsNiceTo(Robbie, Bob)
- MP 7 and 8: 9. IsNiceTo(Robbie, Bob)

Planning

a. STRIPS is a planner which helps describe a sequence of actions in order ot perform some task (ie. a planning problem). It works on the assumption that it is planning within a closed world, ie. only objects in the world are the ones defined in the problem,

b. Partial Order Planning starts with a partial plan and searches through a set of partial plans (ie. plan space search) in order to determine a plan for reaching a goal. POP works in subgoals, producing plans in which independent subgoals are unordered, whereas Total Order Planning produces a strictly ordered plan.

c. A precondition is a conjunction of operators which must be true in order for an action to be applied / change the world state.

d. A progressive planner begins at the start, searching for applicable operators by matching preconditions.

e. Least commitment is a technique used by partial-order planning which helps make choices that are only relevant to the current subgoal and thus may reduce search time. We can apply this to orderings, leaving actions unordered unless necessary, as well as to bindings, leaving variables unbound unless they need to be unified.

Probabilities and Bayesian Networks

Maximum Likelihood and EM

If I understand correctly, L(θ|D) = ∏i L(θ_i|D). So we can do L(θ_Rainy|D) * L(θ_Picnic|D). I also shorthand P=Picnic, R=Rainy.

For the first case:

L(θ_Rainy|D) = P(D|θ_Rainy) = (0.3)^N(Rainy) * (1 - 0.3)^N(~Rainy) = 0.3^2 * 0.7^4 = 0.021609

L(θ_Picnic|D) = P(D|θ_Picnic)
= P(P|R)^N(P|R) * (1 - P(P|R))^N(~P|R) * P(P|~R)^N(P|~R) * (1 - P(P|~R))^N(~P|~R)
= 0.1^0 * 0.9^2 * 0.9^1 * 0.1^3
= 0.000729

L(θ|D) = 0.021609 * 0.000729 = 0.0000157

For the second case:

L(θ_Picnic|D) = P(D|θ_Picnic) = (0.3)^N(Picnic) * (1 - 0.3)^N(~Picnic) = 0.3^1 * 0.7^5 = 0.050421

L(θ_Rainy|D) = P(D|θ_Rainy)
= P(R|P)^N(R|P) * (1 - P(R|P))^N(~R|P) * P(R|~P)^N(R|~P) * (1 - P(R|~P))^N(~R|~P)
= 0.2^0 * 0.8^1 * 0.7^2 * 0.3^3
= 0.010584

L(θ|D) = 0.050421 * 0.010584 = 0.000533

So the second case has a higher likelihood.

Temporal Probabilistic Models

b. Two points: (X_2=3,Y_2=1) and (X_2=3,Y_2=3)

c. The agent could not have been at (X_1=1, Y_1=2) since moving right would have brought it to the center tile which has no walls, and thus N_2 would have to be 0 since it detects the absence of all 4 walls. It could have been anywhere else, every other tile has either 1 wall or 2 walls in which case the sensor misses a wall. It also could not have been at (X_1=2,Y_1=2) since N_1 would have been 0.

f. As this is a discrete DBN, we can convert it to an HMM. There is 4 state variables, 3 of them have 2 parents and the other has none. In the DBN, this amounts to 32^2 + 12^0 = 13 parameters. In the HMM, there is 2^4 * 2^4 = 256 parameters. Thus it is True.

Utility Theory

We will use a number of different lotteries to do this.
The first lottery is the case where Data participates and is not malfunctioning. L = [0.9, 100; 0.1; 0]. E[L] = 0.9 * 100 + 0.1 * 0 = 90.

The second lottery is the case where Data participates and is malfunctioning. L = [0.6, 50; 0.4, -50]. E[L] = 0.6 * 50 + 0.4 * -50 = 10.

Now we combine these two based on the chance of malfunctioning. L = [0.7, 90; 0.3, 10]. E[L] = 0.7 * 90 + 0.3 * 10 = 66. This is the expected value of entering the contest.

b. We can represent this as a lottery L = [0.7, 0; 0.3, -20]. E[L] = 0.7 * 0 + 0.3 * (-20) = -6

c. Data should enter the contest, as 66 > -6.

MDPs and RL

a. There are 2 states, sL and sR. There are 2 actions, Move and Stay. The rewards are +1 when in SR and 0 when in SL. The transition model is as follows:

s	a	s'	P(s, a, s')
SL	Stay	SL	1
SL	Move	SL	0.25
SL	Move	SR	0.75
SR	Stay	SR	1
SR	Move	SR	0.25
SR	Move	SL	0.75

b. The optimal policy is:

π(sL) = Move
π(sR) = Stay

To do this, we solve the equation V = (I - γT)^-1 * r

Let r = [0, 1] transposed. Let T = [[0.25, 0.75], [0, 1]].

γT = [[0.125, 0.375], [0, 0.5]] I - γT = [[0.875, -0.375], [0, 0.5]]

(I - γT)^-1 = 1/(0.875(0.5) - 0.375(0)) * [[0.5, -0.375], [0, 0.875]]
= 1/0.4375 * [[0.5, -0.375], [0, 0.875]]
= [[1.14286, 0.857143], [0, 2]]

(I - γT)^-1 * r = [[1.14286, 0.857143], [0, 2]] * [[0],[1]] = [[0.857143],[2]]

Therefore Vπ(sL) = 0.857143, Vπ(sR) = 2.

Here's some hacky coffeescript code that confirms via value iteration:

T = [[0.25, 0.75], [0, 1]]
R = [0, 1]
V = [[0, 0]]
gamma = 0.5

for i in [1..100] 
	V.push [0, 0]
	for s in [0...2]
		tmp = R[s]
		for a in [0...2]
			tmp += gamma * (T[s][a] * V[i-1][a])
		V[i][s] = tmp

console.log V

d. Not sure about this...

The trajectory is: (s=sL, r=0),(s=sL, r=0),(s=sL, r=0),(s=sR, r=1),(s=sR, r=1),(s=sL, r=0). Initially set V(s) = 0 ∀ s.

Now we iterate through every timestep. Until we move from L to R, V(sL) doesn't change.

At the timestep where we move from L to R:
V(sL) = V(sL) + 0.1(1 + γV(sR) - V(sL)) = 0.1

Now the timestep where we stay in R:
V(sR) = V(sR) + 0.1(1 + γV(sR) - V(sR)) = 0.1

Now the timestep where we move from L to R:
V(sR) = V(sR) + 0.1(1 + γV(sL) - V(sR)) = 0.1 + 0.1(1 + 0.5(0.1) - 0.1) = 0.195

This leaves us with V(sL) = 0.1, V(sR) = 0.195

jokeofweek/gist:10595214