Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models Paper โข 2406.09206 โข Published Jun 13 โข 1
OpenCulture Collection A multilingual dataset of public domain books and newspapers. โข 27 items โข Updated Nov 6 โข 121
EU20-Benchmarks Collection Evaluation Benchmarks for 20 European languages. โข 5 items โข Updated Oct 11 โข 7
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time Paper โข 2408.13233 โข Published Aug 23 โข 21
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper โข 2407.13623 โข Published Jul 18 โข 53
RETVec: Resilient and Efficient Text Vectorizer Paper โข 2302.09207 โข Published Feb 18, 2023 โข 3
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs Paper โข 2407.03963 โข Published Jul 4 โข 15
AnchorAL: Computationally Efficient Active Learning for Large and Imbalanced Datasets Paper โข 2404.05623 โข Published Apr 8 โข 3
๐งAI Podcasts and Talks! Collection ๐คCool stuff to listen to at any time! โข 10 items โข Updated Oct 6, 2023 โข 5
Small-Text: Active Learning for Text Classification in Python Paper โข 2107.10314 โข Published Jul 21, 2021 โข 1