Text/vision parameter split

by AlexThompson - opened Nov 15, 2024

Nov 15, 2024

what are the parameters split of vision part versus text part

Nexa AI org Nov 15, 2024

We use clip encoder for the vision part of around 400M params and 0.5B params for the text part, which is Qwen-2.5-0.5B

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment