Performance difference between TruncNormal and TanhNormal

Hey @danijar. 

I just noticed that the code is using `TruncNormal` as the actor distribution instead of `TanhNormal` as in v1. I wonder did you make some ablations on these two choices and see `TruncNormal` provide better results? Or the change is only because the entropy of `TruncNormal` is easier to compute than `TanhNormal` for the entropy regularizer?