BigScience Workshop

non-profit

https://bigscience.huggingface.co

bigscience-workshop

AI & ML interests

A one-year long research workshop on large language models: the Summer of Language Models 21 🌸

Recent Activity

shubhamagarwal92 authored a paper about 1 month ago

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

shubhamagarwal92 authored a paper about 1 month ago

LitLLM: A Toolkit for Scientific Literature Review

shubhamagarwal92 authored a paper about 1 month ago

History for Visual Dialog: Do we really need it?

View all activity

authored 2 papers 4 days ago

AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild

Paper • 2605.22715 • Published 10 days ago • 4

TrajPrism: A Multi-Task Benchmark for Language-Grounded Urban Trajectory Understanding

Paper • 2605.10782 • Published 20 days ago

authored a paper 19 days ago

TrajDLM: Topology-Aware Block Diffusion Language Model for Trajectory Generation

Paper • 2605.10020 • Published 20 days ago • 2

shubhamagarwal92

authored 11 papers about 1 month ago

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

Paper • 2412.04626 • Published Dec 5, 2024 • 14

LitLLM: A Toolkit for Scientific Literature Review

Paper • 2402.01788 • Published Mar 21, 2025

History for Visual Dialog: Do we really need it?

Paper • 2005.07493 • Published May 8, 2020

Chitrarth: Bridging Vision and Language for a Billion People

Paper • 2502.15392 • Published Feb 21, 2025

LitLLMs, LLMs for Literature Review: Are we there yet?

Paper • 2412.15249 • Published Dec 15, 2024 • 2

IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs

Paper • 2511.04727 • Published Nov 6, 2025

VoiceAgentBench: Are Voice Assistants ready for agentic tasks?

Paper • 2510.07978 • Published Oct 9, 2025

Seeing Straight: Document Orientation Detection for Efficient OCR

Paper • 2511.04161 • Published Nov 6, 2025

Designing Production-Scale OCR for India: Multilingual and Domain-Specific Systems

Paper • 2602.16430 • Published Feb 18

Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation

Paper • 2502.20420 • Published Feb 27, 2025

MUTANT: A Recipe for Multilingual Tokenizer Design

Paper • 2511.03237 • Published Mar 22

authored a paper 2 months ago

Gained in Translation: Privileged Pairwise Judges Enhance Multilingual Reasoning

Paper • 2601.18722 • Published Jan 26

authored 3 papers 3 months ago

Agentic Uncertainty Reveals Agentic Overconfidence

Paper • 2602.06948 • Published Feb 6

Complex Query Answering with Neural Link Predictors

Paper • 2011.03459 • Published Nov 6, 2020

Rethinking the Harmonic Loss via Non-Euclidean Distance Layers

Paper • 2603.10225 • Published Mar 10

in bigscience/bloom 3 months ago

[SPAM] Deleted

#289 opened 3 months ago by

authored a paper 3 months ago

RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation

Paper • 2603.09723 • Published Mar 10 • 7