Bridging the Data Provenance Gap Across Text, Speech and Video Paper • 2412.17847 • Published Dec 19, 2024 • 11
Saturation-Driven Dataset Generation for LLM Mathematical Reasoning in the TPTP Ecosystem Paper • 2509.06809 • Published Sep 8, 2025 • 3
Reasoning Core: A Scalable RL Environment for LLM Symbolic Reasoning Paper • 2509.18083 • Published Sep 22, 2025 • 5
MortalMATH: Evaluating the Conflict Between Reasoning Objectives and Emergency Contexts Paper • 2601.18790 • Published Jan 26 • 2
Adaptive Text Anonymization: Learning Privacy-Utility Trade-offs via Prompt Optimization Paper • 2602.20743 • Published 8 days ago • 2
Reasoning Core: A Scalable Procedural Data Generation Suite for Symbolic Pre-training and Post-Training Paper • 2603.02208 • Published 1 day ago • 4
Reasoning Core: A Scalable Procedural Data Generation Suite for Symbolic Pre-training and Post-Training Paper • 2603.02208 • Published 1 day ago • 4
MortalMATH: Evaluating the Conflict Between Reasoning Objectives and Emergency Contexts Paper • 2601.18790 • Published Jan 26 • 2
TAROT: Task-Oriented Authorship Obfuscation Using Policy Optimization Methods Paper • 2407.21630 • Published Jul 31, 2024 • 8
Consent in Crisis: The Rapid Decline of the AI Data Commons Paper • 2407.14933 • Published Jul 20, 2024 • 14
Generating multiple-choice questions for medical question answering with distractors and cue-masking Paper • 2303.07069 • Published Mar 13, 2023
Attention Overflow: Language Model Input Blur during Long-Context Missing Items Recommendation Paper • 2407.13481 • Published Jul 18, 2024 • 10
Mining Discourse Markers for Unsupervised Sentence Representation Learning Paper • 1903.11850 • Published Mar 28, 2019
tasksource: Structured Dataset Preprocessing Annotations for Frictionless Extreme Multi-Task Learning and Evaluation Paper • 2301.05948 • Published Jan 14, 2023 • 3
MindGames: Targeting Theory of Mind in Large Language Models with Dynamic Epistemic Modal Logic Paper • 2305.03353 • Published May 5, 2023
Probing neural language models for understanding of words of estimative probability Paper • 2211.03358 • Published Nov 7, 2022 • 1
The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI Paper • 2310.16787 • Published Oct 25, 2023 • 5
Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative Grammars Paper • 2406.11035 • Published Jun 16, 2024 • 1