Training data

#2
by glnmario - opened

What data is this model trained on?

PILE iirc(check original repo for detailed)

Because it was only trained on 300b tokens on part of the Pile, is there any knowledge of the specific datasets it used (e.g. wikipedia, github... or which ones were not included)?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment