Post
Web Rephrase Augmented Pre-training (WRAP) enhances language model training efficiency by transforming documents into structured formats.
Key aspects:
* Utilizes an instruction-tuned model to rephrase web content into styles such as Wikipedia or Q/A, creating a blend of synthetic and real data for training.
* Demonstrated improvements of over 10% better perplexity, alongside more than 2% increase in zero-shot question-answering accuracy.
Congrats to the authors for their work!
Paper: Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling (2401.16380)
Key aspects:
* Utilizes an instruction-tuned model to rephrase web content into styles such as Wikipedia or Q/A, creating a blend of synthetic and real data for training.
* Demonstrated improvements of over 10% better perplexity, alongside more than 2% increase in zero-shot question-answering accuracy.
Congrats to the authors for their work!
Paper: Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling (2401.16380)