jfang
/

mars-vit-base-ctx2m

Model card Files Files and versions Community

jichao commited on 11 days ago

Commit

4725c4d

·

1 Parent(s): 6d85953

README

Files changed (1) hide show

README.md +15 -2

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ Model Card for Mars ViT Base Model
 - Dataset: 2 million CTX images
 ## Usage Examples
-### Using timm
 First download checkpoint-1199.pth (backbone only)
@@ -34,6 +34,16 @@ model = timm.create_model(
 )
 model.eval()
 x = torch.randn(1, 1, 224, 224)
 with torch.no_grad():
     features = model.forward_features(x)  # shape [1, tokens, embed_dim]
@@ -54,9 +64,12 @@ image_processor = AutoImageProcessor.from_pretrained("jfang/mars-vit-base-ctx2m"
 from PIL import Image
 image = Image.open("some_image.png").convert("L")  # 1-channel
 inputs = image_processor(image, return_tensors="pt")
 outputs = model(**inputs)
 ```
 ### Limitations
 The model is trained specifically on CTX images and may not generalize well to other types of images without further fine-tuning.

 - Dataset: 2 million CTX images
 ## Usage Examples
+### Using timm (suggested now)
 First download checkpoint-1199.pth (backbone only)
 )
 model.eval()
+# for images, need to convert to single channel, 224, and normalize
+# transform example:
+# transform = transforms.Compose([
+#     transforms.ToTensor(),
+#     transforms.Resize((224, 224)),
+#     transforms.Grayscale(num_output_channels=1),
+#     transforms.Normalize(mean=[0.5], std=[0.5])
+# ])
 x = torch.randn(1, 1, 224, 224)
 with torch.no_grad():
     features = model.forward_features(x)  # shape [1, tokens, embed_dim]
 from PIL import Image
 image = Image.open("some_image.png").convert("L")  # 1-channel
 inputs = image_processor(image, return_tensors="pt")
 outputs = model(**inputs)
 ```
+## MAE reconstruction
+Under ./mae folder, there is full encoder-decoder MAE model and a notebook for visualization.
 ### Limitations
 The model is trained specifically on CTX images and may not generalize well to other types of images without further fine-tuning.