Safetensors
qwen2
Pangea-7B / README.md
lbourdois's picture
Update README.md
baf42d8 verified
|
raw
history blame
1.69 kB
metadata
license: apache-2.0
datasets:
  - neulab/PangeaInstruct
language:
  - am
  - ar
  - bg
  - bn
  - cs
  - de
  - el
  - en
  - es
  - fa
  - fr
  - ga
  - hi
  - id
  - ig
  - it
  - iw
  - ja
  - jv
  - ko
  - nl
  - mn
  - ms
  - 'no'
  - pl
  - pt
  - ro
  - ru
  - si
  - su
  - sw
  - ta
  - te
  - th
  - tr
  - uk
  - ur
  - vi
  - zh
base_model:
  - Qwen/Qwen2-7B-Instruct

Pangea-7B Model Card

Homepage | Pangea-7B | PangeaIns | PangeaBench | Github | Arxiv | PDF

Model details

  • Model: Pangea is an fully open-source Multilingual Multimodal Multicultural LLM.
  • Date: Pangea-7B was trained in 2024.
  • Training Dataset: 6M PangeaIns.

Uses

Direct Use

from transformers import AutoProcessor, AutoModelForCausalLM

processor = AutoProcessor.from_pretrained("neulab/Pangea-7B")
model = AutoModelForCausalLM.from_pretrained("neulab/Pangea-7B")

Citing the Model

BibTeX Citation:

@article{yue2024pangeafullyopenmultilingual,
  title={Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages},
  author={Xiang Yue and Yueqi Song and Akari Asai and Seungone Kim and Jean de Dieu Nyandwi and Simran Khanuja and Anjali Kantharuban and Lintang Sutawika and Sathyanarayanan Ramamoorthy and Graham Neubig},
  year={2024},
  journal={arXiv preprint arXiv:2410.16153},
  url={https://arxiv.org/abs/2410.16153}
}