VowpalWabbit Contextual wabbit generating just single action irrespective of context

189 Views Asked by Abhishek_09 At 04 August 2022 at 14:02

I am using vowpalwabbit in contextual bandit settings. But I am struck with a strange issue where vowpalwabbit just generating same PMF irrespective of context. Ideally it should generate different PMFs for action selection based on different context. Here is the sample data I am using.

shared |Context t1=a_c t2:5 t3=a_b t4:2 t5:10
|Action arm=a1 
|Action arm=a2 
|Action arm=a3 
|Action arm=a4 
0:-5:0.09 | Action arm=a5 
|Action arm=a6 
|Action arm=a7 
|Action arm=a8 
|Action arm=a9 
|Action arm=a10 
|Action arm=a11

I initialized my vowpalwabbit with following setting.

--cb_explore_adf --cb_type mtr --epsilon 0.05

Here is the action distribution irrespective of context in data.

Action Dist. in data

Action Dist. of Contextual Bandit

Wondering what could be the cause of vowpalwabbit saturating. Is it something with the hyperparams provided?

Original Q&A

There are 1 best solutions below

Abhishek_09 On 04 August 2022 at 22:21

--cb_explore_adf --cb_type mtr -q CA --epsilon 0.05 worked for me.

VowpalWabbit Contextual wabbit generating just single action irrespective of context

There are 1 best solutions below

Related Questions in REINFORCEMENT-LEARNING

Related Questions in VOWPALWABBIT

Trending Questions

Popular # Hahtags

Popular Questions