One second of action decoded to >500 tokens

#12

by WonderYear1905 - opened Mar 3

WonderYear1905

Mar 3

HI, one second of actions in my one sec of actin is with shape of 60x44 (freq x action_dim), when trying to use your universal tokenizer and when trying to retrain the tokenizer and use it, I'm getting huge outputs, for example a one second action encoding gets us: 517 tokens.
It doesnt seems reasonable to me to make the vocab larger (currently 1024, default in the code) as I have only ~1000 samples.
(I've normalized and followed your steps)
Ill be happy for any advice.
Thanks!

KarlP

Physical Intelligence org Mar 3

You can try whether increasing vocab will help you, or try decreasing "scale" which will essentially make the compression more lossy. You can also try to train with half-second chunks to make things more practical.
It is possible that for very high-dim robots like yours it's worth trying neural compression again to see whether it gets a better tradeoff (we only tried until 16-dim)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment