How to get probability vector for all actions in tf-agents?

219 Views Asked by Kushal Jain At 21 September 2021 at 15:37

I'm working on Multi-Armed-Bandit problem, using LinearUCBAgent and LinearThompsonSamplingAgent but they both return a single action for an observation. What I need is the probability for all the action which I can use for ranking.

Original Q&A

There are 1 best solutions below

Carlos Loza On 02 June 2022 at 19:53

You need to add the emit_policy_info argument when defining the agent. The specific values (encapsulated in a tuple) will depend on the agent: predicted_rewards_sampled for LinearThompsonSamplingAgent and predicted_rewards_optimistic for LinearUCBAgent.

For example:

agent = LinearThompsonSamplingAgent(
        time_step_spec=time_step_spec,
        action_spec=action_spec,
        emit_policy_info=("predicted_rewards_sampled")
    )

Then, during inference, you'll need to access those fields and normalize them (via softmax):

action_step = agent.collect_policy.action(observation_step)
scores = tf.nn.softmax(action_step.info.predicted_rewards_sampled)

where tf comes from import tensorflow as tf and observation_step is your observation array encapsulated in a TimeStep (from tf_agents.trajectories.time_step import TimeStep)

Note of caution: these are NOT probabilities, they are normalized scores; similar to the normalized outputs of a fully-connected layer.

How to get probability vector for all actions in tf-agents?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in TENSORFLOW

Related Questions in TENSORFLOW2.0

Related Questions in REINFORCEMENT-LEARNING

Related Questions in TENSORFLOW-AGENTS

Trending Questions

Popular # Hahtags

Popular Questions