BordIRlines: A Dataset for Evaluating Cross-lingual Retrieval-Augmented Generation
Abstract
Large language models excel at creative generation but continue to struggle with the issues of hallucination and bias. While retrieval-augmented generation (RAG) provides a framework for grounding LLMs' responses in accurate and up-to-date information, it still raises the question of bias: which sources should be selected for inclusion in the context? And how should their importance be weighted? In this paper, we study the challenge of cross-lingual RAG and present a dataset to investigate the robustness of existing systems at answering queries about geopolitical disputes, which exist at the intersection of linguistic, cultural, and political boundaries. Our dataset is sourced from Wikipedia pages containing information relevant to the given queries and we investigate the impact of including additional context, as well as the composition of this context in terms of language and source, on an LLM's response. Our results show that existing RAG systems continue to be challenged by cross-lingual use cases and suffer from a lack of consistency when they are provided with competing information in multiple languages. We present case studies to illustrate these issues and outline steps for future research to address these challenges. We make our dataset and code publicly available at https://github.com/manestay/bordIRlines.
Community
BordIRlines: A Dataset for Evaluating Cross-lingual Retrieval-Augmented Generation
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Does RAG Introduce Unfairness in LLMs? Evaluating Fairness in Retrieval-Augmented Generation Systems (2024)
- IRSC: A Zero-shot Evaluation Benchmark for Information Retrieval through Semantic Comprehension in Retrieval-Augmented Generation Scenarios (2024)
- QAEncoder: Towards Aligned Representation Learning in Question Answering System (2024)
- LLM for Everyone: Representing the Underrepresented in Large Language Models (2024)
- BSharedRAG: Backbone Shared Retrieval-Augmented Generation for the E-commerce Domain (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Interesting idea, but I would imagine most LLMs and embedding models have seen wikipedia. But I love that there is energy in x-lingual RAG. Keep up the good work!
Exactly, this is how the dataset is designed! the models have likely seen the retrieved Wikipedia passages already, which is the same as TydiQA, MIRACL, any other open domain qa dataset. However, for our task, territorial disputes, the answers are necessarily lingusitically, politically, and culturally dependent. So the explicit retrieval of different relevant passages with different language and viewpoints, and how this affects LLM's responses, is what our benchmark studies.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper