VowpalWabbit Contextual wabbit generating just single action irrespective of context

189 Views Asked by At

I am using vowpalwabbit in contextual bandit settings. But I am struck with a strange issue where vowpalwabbit just generating same PMF irrespective of context. Ideally it should generate different PMFs for action selection based on different context. Here is the sample data I am using.

shared |Context t1=a_c t2:5 t3=a_b t4:2 t5:10
|Action arm=a1 
|Action arm=a2 
|Action arm=a3 
|Action arm=a4 
0:-5:0.09 | Action arm=a5 
|Action arm=a6 
|Action arm=a7 
|Action arm=a8 
|Action arm=a9 
|Action arm=a10 
|Action arm=a11

I initialized my vowpalwabbit with following setting.

--cb_explore_adf --cb_type mtr --epsilon 0.05

Here is the action distribution irrespective of context in data.

Action Dist. in data

Action Dist. of Contextual Bandit

Wondering what could be the cause of vowpalwabbit saturating. Is it something with the hyperparams provided?

1

There are 1 best solutions below

0
Abhishek_09 On

--cb_explore_adf --cb_type mtr -q CA --epsilon 0.05 worked for me.