How to deal with multi-token units in embedding?

#1
by YameiW - opened

Hello there,

I am working on word embedding and was wondering if there is a way to obtain a single vector for multi-token units in Chinese. For instance, how to get one vector for the Chinese word "公斤" rather than two separate vectors for each of the characters.

Screen Shot 2022-09-28 at 12.51.30 PM.png

Sign up or log in to comment