Spaces:
Build error
Build error
article = """ | |
<img src="https://www.iic.uam.es/wp-content/uploads/2017/12/IIC_logoP.png"> | |
<img src="https://drive.google.com/uc?export=view&id=1S8v94q39QRCfmVTMvjLCACmhMe9lJQdc"> | |
<p style="text-align: justify;"> This app is developed by <a href="https://www.iic.uam.es/">IIC - Instituto de Ingeniería del Conocimiento</a> as part of the <a href="https://www.eventbrite.com/e/registro-hackathon-de-pln-en-espanol-273014111557">Somos PLN Hackaton 2022.</a> | |
The objective of this app is to expand the existing tools regarding long form question answering in Spanish. In fact, multiple novel methods (in Spanish) | |
have been introduced to build this app. | |
The reason for including audio as a possible input and always as an output is because we wanted to make the App much more accessible to people that cannot read or write. | |
Below you can find all the pieces that form the system. | |
1. <a href="https://hf.co/IIC/wav2vec2-spanish-multilibrispeech">Speech2Text</a>: For this we finedtuned a multilingual Wav2Vec2, as explained in the attached link. We use this model to process audio questions. | |
2. <a href="https://hf.co/IIC/dpr-spanish-passage_encoder-allqa-base">Dense Passage Retrieval for Context</a>: Dense Passage Retrieval is a methodology <a href="https://arxiv.org/abs/2004.04906">developed by Facebook</a> which is currently the SoTA for Passage Retrieval, | |
that is, the task of getting the most relevant passages to answer a given question with. You can find details about how it was trained on the link attached to the name. | |
3. <a href="https://hf.co/IIC/dpr-spanish-question_encoder-allqa-base">Dense Passage Retrieval for Question</a>: It is actually part of the same thing as the above. For more details, go to the attached link. | |
4. <a href="https://hf.co/sentence-transformers/distiluse-base-multilingual-cased-v1">Sentence Encoder Ranker</a>: To rerank the candidate contexts retrieved by dpr for the generative model to see. This also selects the top 5 passages for the model to read, it is the final filter before the generative model. | |
5. <a href="https://hf.co/IIC/mt5-base-lfqa-es">Generative Long-Form Question Answering Model</a>: For this we used either mT5 (the one attached) or <a href="https://hf.co/IIC/mbart-large-lfqa-es">mBART</a>. This generative model receives the most relevant | |
passages and uses them to generate an answer to the question. In the attached link there are more details about how we trained it etc. | |
On the other hand, we uploaded, and in some cases created, datasets in Spanish to be able to build such a system. | |
1. <a href="https://hf.co/datasets/IIC/spanish_biomedical_crawled_corpus">Spanish Biomedical Crawled Corpus</a>. Used for finding answers to questions about biomedicine. (More info in the link.) | |
2. <a href="https://hf.co/datasets/IIC/lfqa_spanish">LFQA_Spanish</a>. Used for training the generative model. (More info in the link.) | |
3. <a href="https://hf.co/datasets/squad_es">SQUADES</a>. Used to train the DPR models. (More info in the link.) | |
4. <a href="https://hf.co/datasets/IIC/bioasq22_es">BioAsq22-Spanish</a>. Used to train the DPR models. (More info in the link.) | |
5. <a href="https://hf.co/datasets/PlanTL-GOB-ES/SQAC">SQAC (Spanish Question Answering Corpus)</a>. Used to train the DPR models. (More info in the link.) | |
</p> | |
""" | |
# 1HOzvvgDLFNTK7tYAY1dRzNiLjH41fZks | |
# 1kvHDFUPPnf1kM5EKlv5Ife2KcZZvva_1 | |
description = """ | |
<a href="https://www.iic.uam.es/"> | |
<img src="https://drive.google.com/uc?export=view&id=1HOzvvgDLFNTK7tYAY1dRzNiLjH41fZks" style="max-width: 100%; max-height: 10%; height: 250px; object-fit: fill"> | |
</a> | |
<h1> BioMedIA: Abstractive Question Answering of BioMedical Domain in Spanish </h1> | |
Esta aplicación consiste en sistemas de búsqueda del Estado del Arte en Español junto con un modelo generativo entrenado para componer una respuesta a preguntas a partir de una serie de contextos. | |
""" | |
examples = [ | |
[ | |
"¿Cuáles son los efectos secundarios más ampliamente reportados en el tratamiento de la enfermedad de Crohn?", | |
"vacio.flac", | |
"vacio.flac", | |
60, | |
8, | |
3, | |
1.0, | |
250, | |
"wav2vec2-iic", | |
False, | |
], | |
[ | |
"¿Para qué sirve la tecnología CRISPR?", | |
"vacio.flac", | |
"vacio.flac", | |
60, | |
8, | |
3, | |
1.0, | |
250, | |
"wav2vec2-iic", | |
False, | |
], | |
[ | |
"¿Por qué sentimos ansiedad?", | |
"vacio.flac", | |
"vacio.flac", | |
50, | |
8, | |
3, | |
1.0, | |
250, | |
"wav2vec2-iic", | |
False, | |
], | |
[ | |
"¿Qué es la tecnología CRISPR?", | |
"vacio.flac", | |
"vacio.flac", | |
50, | |
8, | |
3, | |
1.0, | |
250, | |
"wav2vec2-iic", | |
False, | |
], | |
[ | |
"¿Cómo se genera la apendicitis?", | |
"vacio.flac", | |
"vacio.flac", | |
50, | |
8, | |
3, | |
1.0, | |
250, | |
"wav2vec2-iic", | |
False, | |
], | |
[ | |
"¿Qué es la mesoterapia?", | |
"vacio.flac", | |
"vacio.flac", | |
50, | |
8, | |
3, | |
1.0, | |
250, | |
"wav2vec2-iic", | |
False, | |
], | |
[ | |
"¿Qué alternativas al Paracetamol existen para el dolor de cabeza?", | |
"vacio.flac", | |
"vacio.flac", | |
80, | |
8, | |
3, | |
1.0, | |
250, | |
"wav2vec2-iic", | |
False | |
], | |
[ | |
"¿Cuáles son los principales tipos de disartria del trastorno del habla motor?", | |
"vacio.flac", | |
"vacio.flac", | |
50, | |
8, | |
3, | |
1.0, | |
250, | |
"wav2vec2-iic", | |
False | |
], | |
[ | |
"¿Es la esclerosis tuberosa una enfermedad genética?", | |
"vacio.flac", | |
"vacio.flac", | |
50, | |
8, | |
3, | |
1.0, | |
250, | |
"wav2vec2-iic", | |
False | |
], | |
[ | |
"¿Cuál es la función de la proteína Mis18?", | |
"vacio.flac", | |
"vacio.flac", | |
50, | |
8, | |
3, | |
1.0, | |
250, | |
"wav2vec2-iic", | |
False | |
], | |
[ | |
"¿Qué deficiencia es la causa del síndrome de piernas inquietas?", | |
"vacio.flac", | |
"vacio.flac", | |
50, | |
8, | |
3, | |
1.0, | |
250, | |
"wav2vec2-iic", | |
False | |
], | |
[ | |
"¿Cuál es la función del 6SRNA en las bacterias?", | |
"vacio.flac", | |
"vacio.flac", | |
60, | |
8, | |
3, | |
1.0, | |
250, | |
"wav2vec2-iic", | |
False, | |
], | |
[ | |
"¿Por qué los humanos desarrollamos diabetes?", | |
"vacio.flac", | |
"vacio.flac", | |
50, | |
10, | |
3, | |
1.0, | |
250, | |
"wav2vec2-iic", | |
False, | |
], | |
[ | |
"¿Qué factores de riesgo aumentan la probabilidad de sufrir un ataque al corazón?", | |
"vacio.flac", | |
"vacio.flac", | |
80, | |
8, | |
3, | |
1.0, | |
250, | |
"wav2vec2-iic", | |
False | |
], | |
[ | |
"¿Cómo funcionan las vacunas?", | |
"vacio.flac", | |
"vacio.flac", | |
90, | |
8, | |
3, | |
1.0, | |
250, | |
"wav2vec2-iic", | |
False | |
], | |
[ | |
"¿Tienen conciencia los animales?", | |
"vacio.flac", | |
"vacio.flac", | |
70, | |
8, | |
3, | |
1.0, | |
250, | |
"wav2vec2-iic", | |
False | |
], | |
] | |