About the dataset used for training

by JavierCastellD - opened Jan 21

Jan 21

In your description about the dataset used for training the model, it is specified that it includes web-sourced content and publicly available documents. Does this dataset contain potentially copyrighted information? If not, what was your effort to prevent it?

jsaizant

Language Technologies Unit @ Barcelona Supercomputing Center org Jan 21

Hi @JavierCastellD ! We will soon publish a technical report where you can find more detailed information about the processing of the training data.

jsaizant changed discussion status to closed Jan 21

JavierCastellD

Jan 21

Thanks! Great job and I'm looking forward to it.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment