Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models: https://arxiv.org/pdf/2406.14852
Jiayu (Mila) Wang
MilaWang
AI & ML interests
Large Language Model, Multimodal Large Language Model, Reasoning, Efficient Machine Learning System
Organizations
None yet
Collections
2
Model checkpoints and datasets used in the paper Grammar-Aligned Decoding: https://arxiv.org/abs/2405.21047
-
MilaWang/Mistral-7B-Instruct-v0.2-gad-cp-merged
Text Generation • Updated • 8 -
MilaWang/Mistral-7B-Instruct-v0.2-gad-bv4nogram0-merged
Text Generation • Updated • 4 -
MilaWang/Mistral-7B-Instruct-v0.2-gad-slianogram3-merged
Text Generation • Updated • 4 -
MilaWang/Mistral-7B-Instruct-v0.2-gad-cp8-merged
Text Generation • Updated • 4
Papers
2
models
6
MilaWang/Mistral-7B-Instruct-v0.2-gad-slianogram0-merged
Text Generation
•
Updated
•
6
MilaWang/Mistral-7B-Instruct-v0.2-gad-bv4nogram0-merged
Text Generation
•
Updated
•
4
MilaWang/Mistral-7B-Instruct-v0.2-gad-slianogram3-merged
Text Generation
•
Updated
•
4
MilaWang/Mistral-7B-Instruct-v0.2-gad-bv4nogram3-merged
Text Generation
•
Updated
•
4
MilaWang/Mistral-7B-Instruct-v0.2-gad-cp-merged
Text Generation
•
Updated
•
8
MilaWang/Mistral-7B-Instruct-v0.2-gad-cp8-merged
Text Generation
•
Updated
•
4
datasets
7
MilaWang/SpatialEval
Viewer
•
Updated
•
13.9k
•
168
•
2
MilaWang/gad-slia-no-grammar-0shots
Viewer
•
Updated
•
81
•
63
MilaWang/gad-bv4-no-grammar-0shots
Viewer
•
Updated
•
139
•
46
MilaWang/gad-cp
Viewer
•
Updated
•
2.42k
•
44
MilaWang/gad-cp-8shots
Viewer
•
Updated
•
2.42k
•
50
MilaWang/gad-slia-no-grammar-3shots
Viewer
•
Updated
•
81
•
36
MilaWang/gad-bv4-no-grammar-3shots
Viewer
•
Updated
•
139
•
50