Image and Text features
#3
by
praff1234
- opened
Hello , could you please give a simple example of obtaining text features and imager features. Only image feature example has been added.
Doing it with MLX was kind of the point. I'm digging around in the repo, think you can get the tokenizer from from transformers:
processor = AutoProcessor.from_pretrained(
"apple/aimv2-large-patch14-224-lit",
)
then just find that in the library code and start looking around for a way to get the input_ids that should be what you calling in.venv/lib/python3.10/site-packages/aim/v2/mlx/models.py
It's this call:
class AIMv2LiT(nn.Module):
...
def encode_text(
self,
input_ids: mx.array,
mask: Optional[mx.array] = None,
output_features: bool = False,
) -> Union[mx.array, Tuple[mx.array, Tuple[mx.array, ...]]]:
out = self.text_encoder(input_ids, mask=mask, output_features=output_features)
out = self.text_projector(out)
return out
I'm gonna give up here for now, this is a sidetrack for something else. Please ping me if you fix a way to test it out in MLX!