AI & ML interests
VLMs and long context, document processing and understanding, confidence, calibration, alignment, and decision making.
Recent Activity
View all activity
Papers
GutenOCR: A Grounded Vision-Language Front-End for Documents
PubMed-OCR: PMC Open Access OCR Annotations
A collection of RICO screenshot-based datasets for training and evaluation. We've attempted to compile all surrounding metadata for the relevant tasks
-
rootsautomation/RICO-ScreenQA
Viewer • Updated • 86k • 1.04k • 11 -
rootsautomation/RICO-WidgetCaptioning
Viewer • Updated • 48.3k • 471 • 11 -
rootsautomation/RICO-SCA
Viewer • Updated • 71.4k • 229 • 9 -
rootsautomation/RICO-ScreenQA-Short
Viewer • Updated • 86k • 264 • 4
Data and models for optical character recognition
-
PubMed-OCR: PMC Open Access OCR Annotations
Paper • 2601.11425 • Published • 12 -
GutenOCR: A Grounded Vision-Language Front-End for Documents
Paper • 2601.14490 • Published • 37 -
rootsautomation/TABMEpp
Viewer • Updated • 122k • 95 • 5 -
rootsautomation/pubmed-ocr
Viewer • Updated • 1.55M • 1.06k • 70
Data and models for optical character recognition
-
PubMed-OCR: PMC Open Access OCR Annotations
Paper • 2601.11425 • Published • 12 -
GutenOCR: A Grounded Vision-Language Front-End for Documents
Paper • 2601.14490 • Published • 37 -
rootsautomation/TABMEpp
Viewer • Updated • 122k • 95 • 5 -
rootsautomation/pubmed-ocr
Viewer • Updated • 1.55M • 1.06k • 70
A collection of RICO screenshot-based datasets for training and evaluation. We've attempted to compile all surrounding metadata for the relevant tasks
-
rootsautomation/RICO-ScreenQA
Viewer • Updated • 86k • 1.04k • 11 -
rootsautomation/RICO-WidgetCaptioning
Viewer • Updated • 48.3k • 471 • 11 -
rootsautomation/RICO-SCA
Viewer • Updated • 71.4k • 229 • 9 -
rootsautomation/RICO-ScreenQA-Short
Viewer • Updated • 86k • 264 • 4