@vladbogo on Hugging Face: "Web Rephrase Augmented Pre-training (WRAP) enhances language model training…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

vladbogo

posted an update Feb 22

Post

Web Rephrase Augmented Pre-training (WRAP) enhances language model training efficiency by transforming documents into structured formats.

Key aspects:
* Utilizes an instruction-tuned model to rephrase web content into styles such as Wikipedia or Q/A, creating a blend of synthetic and real data for training.
* Demonstrated improvements of over 10% better perplexity, alongside more than 2% increase in zero-shot question-answering accuracy.

Congrats to the authors for their work!

Paper: Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling (2401.16380)

vladbogo

Feb 22

A more detailed overview can be found in my blog: https://huggingface.co/blog/vladbogo/rephrasing-the-web. Feedback is appreciated!

In this post

vladbogo Vlad Bogolin