About the dataset used for training
#3
by
JavierCastellD
- opened
In your description about the dataset used for training the model, it is specified that it includes web-sourced content and publicly available documents. Does this dataset contain potentially copyrighted information? If not, what was your effort to prevent it?
Hi @JavierCastellD ! We will soon publish a technical report where you can find more detailed information about the processing of the training data.
jsaizant
changed discussion status to
closed
Thanks! Great job and I'm looking forward to it.