In reinforcement learning, I got many discrete distributions corresponding to different states, like the following:
import numpy as np
distributions = np.array([[0.1,0.2,0.7],[0.3,0.3,0.4],[0.2,0.2,0.6]])
# array([[0.1, 0.2, 0.7], # \pi(s0)
# [0.3, 0.3, 0.4], # \pi(s1)
# [0.2, 0.2, 0.6]]) # \pi(s2)
Then, I want to get the probabilities of taking action 0 in state s0, taking action 2 in state s1, and taking action 1 in state s2 respectively.
So I stored the index value in a array like the following:
actions = np.array([[0],[2],[1]])
# array([[0], # taking action 0 in state s0
# [2], # taking action 2 in state s1
# [1]]) # taking action 1 in state s2
I want to index distributions
using actions
, and expect to get the the result like:
# array([0.1,0.4,0.2])
# or
# array([[0.1],
# [0.4],
# [0.2]])
I've tried np.take(distributions, actions)
, but the retun array([0.1, 0.7, 0.2])
was obviously what I wanted.
And I also tried distributions[:,actions]
, which gave me another wrong anwer as bellow:
array([[0.1, 0.7, 0.2],
[0.3, 0.4, 0.3],
[0.2, 0.6, 0.2]])
What can I do to solve this problem?