jichao commited on
Commit
4725c4d
·
1 Parent(s): 6d85953
Files changed (1) hide show
  1. README.md +15 -2
README.md CHANGED
@@ -17,7 +17,7 @@ Model Card for Mars ViT Base Model
17
  - Dataset: 2 million CTX images
18
 
19
  ## Usage Examples
20
- ### Using timm
21
 
22
  First download checkpoint-1199.pth (backbone only)
23
 
@@ -34,6 +34,16 @@ model = timm.create_model(
34
  )
35
 
36
  model.eval()
 
 
 
 
 
 
 
 
 
 
37
  x = torch.randn(1, 1, 224, 224)
38
  with torch.no_grad():
39
  features = model.forward_features(x) # shape [1, tokens, embed_dim]
@@ -54,9 +64,12 @@ image_processor = AutoImageProcessor.from_pretrained("jfang/mars-vit-base-ctx2m"
54
  from PIL import Image
55
  image = Image.open("some_image.png").convert("L") # 1-channel
56
  inputs = image_processor(image, return_tensors="pt")
 
 
57
  outputs = model(**inputs)
58
  ```
59
-
 
60
 
61
  ### Limitations
62
  The model is trained specifically on CTX images and may not generalize well to other types of images without further fine-tuning.
 
17
  - Dataset: 2 million CTX images
18
 
19
  ## Usage Examples
20
+ ### Using timm (suggested now)
21
 
22
  First download checkpoint-1199.pth (backbone only)
23
 
 
34
  )
35
 
36
  model.eval()
37
+
38
+ # for images, need to convert to single channel, 224, and normalize
39
+
40
+ # transform example:
41
+ # transform = transforms.Compose([
42
+ # transforms.ToTensor(),
43
+ # transforms.Resize((224, 224)),
44
+ # transforms.Grayscale(num_output_channels=1),
45
+ # transforms.Normalize(mean=[0.5], std=[0.5])
46
+ # ])
47
  x = torch.randn(1, 1, 224, 224)
48
  with torch.no_grad():
49
  features = model.forward_features(x) # shape [1, tokens, embed_dim]
 
64
  from PIL import Image
65
  image = Image.open("some_image.png").convert("L") # 1-channel
66
  inputs = image_processor(image, return_tensors="pt")
67
+
68
+
69
  outputs = model(**inputs)
70
  ```
71
+ ## MAE reconstruction
72
+ Under ./mae folder, there is full encoder-decoder MAE model and a notebook for visualization.
73
 
74
  ### Limitations
75
  The model is trained specifically on CTX images and may not generalize well to other types of images without further fine-tuning.