How can I understand whether my coded SARSA Algorithm works?

36 Views Asked by At

'''

Q-Table:
State (0, 0, 0): Action (0, 0, 0) -> Q-value: -1377.5596445225826
State (0, 0, 1): Action (0, 0, 0) -> Q-value: -1620.7411224452567
State (0, 0, 2): Action (0, 0, 0) -> Q-value: -1881.045694513057
State (0, 1, 0): Action (0, 0, 0) -> Q-value: -1579.1465355982302
State (0, 1, 1): Action (0, 0, 0) -> Q-value: -1699.6759285258445
State (0, 1, 2): Action (0, 0, 0) -> Q-value: -2104.5898565915318
State (0, 2, 0): Action (0, 0, 0) -> Q-value: -1856.8502992048207
State (0, 2, 1): Action (0, 0, 0) -> Q-value: -2068.830276800621
State (0, 2, 2): Action (0, 0, 0) -> Q-value: -2206.122875324394
State (1, 0, 0): Action (0, 0, 0) -> Q-value: -1718.8354207486213
State (1, 0, 1): Action (0, 0, 0) -> Q-value: -1727.7444050562985
State (1, 0, 2): Action (0, 0, 0) -> Q-value: -2080.197493694121
State (1, 1, 0): Action (0, 0, 0) -> Q-value: -1714.7621382844368
State (1, 1, 1): Action (0, 0, 0) -> Q-value: -1887.2431867464586
State (1, 1, 2): Action (0, 0, 0) -> Q-value: -2179.855795713629
State (1, 2, 0): Action (0, 0, 0) -> Q-value: -2038.1099185621583
State (1, 2, 1): Action (0, 0, 0) -> Q-value: -2132.773798656756
State (1, 2, 2): Action (0, 0, 0) -> Q-value: -2375.298297785684
State (2, 0, 0): Action (0, 0, 0) -> Q-value: -1757.60532412902
State (2, 0, 1): Action (0, 0, 0) -> Q-value: -1869.788976094703
State (2, 0, 2): Action (0, 0, 0) -> Q-value: -1981.0670056182348
State (2, 1, 0): Action (0, 0, 0) -> Q-value: -2037.9880448028086
State (2, 1, 1): Action (0, 0, 0) -> Q-value: -2197.268459819914
State (2, 1, 2): Action (0, 0, 0) -> Q-value: -2430.4885088977794
State (2, 2, 0): Action (0, 0, 0) -> Q-value: -2287.1957351819765
State (2, 2, 1): Action (0, 0, 0) -> Q-value: -2158.3291270694795
State (2, 2, 2): Action (0, 0, 0) -> Q-value: -2339.4016833308183

''' This is output of my Q table. Looking at action value, it seems my code isn't working. How to make sure? Please suggest, I am beginner in Reinforcement learning.

0

There are 0 best solutions below