Computation of Ex[U2] (1/2)
The environment is Markovian by this network structure. Ex[U2|E0 = 0, A0 = R, S1 = 1, A1 = R] Ex[U2|E0 = 0, A0 = R, S1 = 1, A1 = L]
p(E 2|E 00,A 0R ,S11,A 1R )
E U1 p s(E in2|gE 0t h0e,A 0poR l,yS1 t re1,eA 1 aR lg,E o1)rp i(tE h1m|E 00,A 0R ,S11,A 1R )
p(E 2|A 1R ,E 1)p(E 1|E 00,A 0R ,S11)
20.2 概率推理与动作
20.2.2 一个扩展的例子
– E: a state variable {-2, -1, 0, 1, 2} – Each location has a utility U. – E0 = 0 – Ai: the action at the i-th time step {L, R}
E1
•With this probability, the Ex[U2] given A1=R can be calculated. • Similarly, Ex[U2] given A1=L can be calculated. •Then the action that yields the larger value is selected.
E 1
Computation of Ex[U2] (2/2)
p(E1|E00,A0R,S11)kp (S11|E00,A0R,E1)p(E1|E00,A0R) kp (S11|E1)p(E1|E00,A0R)