Maximum entropy inverse reinforcement theorem

52 Views Asked by At

I'm trying to make sense of a part in Brian Ziebart's MaxEnt theorem proof (Theorem 6.10) in Appendix A.3 at page 190.

The equation P(pi) = Product over all trajectories (A, S) of P_MaxEnt(A, S)^pi(A, S) seems difficult to me.

Why are we raising maximum entropy distribution probabilities to the power of policy probabilities?

0

There are 0 best solutions below