solve and explain please cheak this is a COE topics HW ...
80.2K
Verified Solution
Link Copied!
Question
Accounting
solve and explain please
cheak this is a COE topics HW
Question 4: (6 Points) Consider a (2X3) game world that has 6 states {A,B,C,D,E,F} and four actions (up, down, left, right) as shown below. For every new episode, the game starts by choosing a random state and it ends at the terminal state (F). When node F is reached, the player receives a reward of +10 and the game ends. For all other actions that do not lead to state F, the reward is -1. Assume that the greedy policy is used after training. Also, assume that =1 and =0.9. Assume that the Q-leaming algorithm was applied, and the following is the initial Q function Q(s,a), where s is a state and a is an action. State \action Up down left right A. Using the initial Q function, perform one action ( B, up) and update the Q function [2 bts! B. Using the initial Q function, perform one episode and update the Q table starting from state A. Note that an episode is defined as full game from a given state until the game ends. [4 pts] Question 4: (6 Points) Consider a (2X3) game world that has 6 states {A,B,C,D,E,F} and four actions (up, down, left, right) as shown below. For every new episode, the game starts by choosing a random state and it ends at the terminal state (F). When node F is reached, the player receives a reward of +10 and the game ends. For all other actions that do not lead to state F, the reward is -1. Assume that the greedy policy is used after training. Also, assume that =1 and =0.9. Assume that the Q-leaming algorithm was applied, and the following is the initial Q function Q(s,a), where s is a state and a is an action. State \action Up down left right A. Using the initial Q function, perform one action ( B, up) and update the Q function [2 bts! B. Using the initial Q function, perform one episode and update the Q table starting from state A. Note that an episode is defined as full game from a given state until the game ends. [4 pts] Question 4: (6 Points) Consider a (2X3) game world that has 6 states {A,B,C,D,E,F} and four actions (up, down, left, right) as shown below. For every new episode, the game starts by choosing a random state and it ends at the terminal state (F). When node F is reached, the player receives a reward of +10 and the game ends. For all other actions that do not lead to state F, the reward is -1. Assume that the greedy policy is used after training. Also, assume that =1 and =0.9. Assume that the Q-leaming algorithm was applied, and the following is the initial Q function Q(s,a), where s is a state and a is an action. State \action Up down left right A. Using the initial Q function, perform one action ( B, up) and update the Q function [2 bts! B. Using the initial Q function, perform one episode and update the Q table starting from state A. Note that an episode is defined as full game from a given state until the game ends. [4 pts] Question 4: (6 Points) Consider a (2X3) game world that has 6 states {A,B,C,D,E,F} and four actions (up, down, left, right) as shown below. For every new episode, the game starts by choosing a random state and it ends at the terminal state (F). When node F is reached, the player receives a reward of +10 and the game ends. For all other actions that do not lead to state F, the reward is -1. Assume that the greedy policy is used after training. Also, assume that =1 and =0.9. Assume that the Q-leaming algorithm was applied, and the following is the initial Q function Q(s,a), where s is a state and a is an action. State \action Up down left right A. Using the initial Q function, perform one action ( B, up) and update the Q function [2 bts! B. Using the initial Q function, perform one episode and update the Q table starting from state A. Note that an episode is defined as full game from a given state until the game ends. [4 pts]
Answer & Explanation
Solved by verified expert
Get Answers to Unlimited Questions
Join us to gain access to millions of questions and expert answers. Enjoy exclusive benefits tailored just for you!
Membership Benefits:
Unlimited Question Access with detailed Answers
Zin AI - 3 Million Words
10 Dall-E 3 Images
20 Plot Generations
Conversation with Dialogue Memory
No Ads, Ever!
Access to Our Best AI Platform: Flex AI - Your personal assistant for all your inquiries!