AI & ML interests

None defined yet.

Recent Activity

IbukunSanni  updated a Space about 2 months ago
ekogenie/README
IbukunSanni  published a Space about 2 months ago
ekogenie/README
View all activity

Dataset Source:

  • Original Source: The English sentences were sourced from oubic domain sources such as https://www.gutenberg.org/ .
  • Translation Tool: Google Translate was used for translating the sentences from English to Yoruba.

Dataset Format:

  • english: The original English sentence.
  • yoruba: The Yoruba translation of the sentence.
  • source: the source of the English sentences.

Example:

en yo source
The subconscious offensiveness of their attitude has constituted old Jolyon's 'home' the psychological moment of the family history, made it the prelude of their drama. Iwa ibinu èrońgbà ti iṣesi wọn ti jẹ “ile” atijọ ti Jolyon ni akoko imọ-jinlẹ ti itan-akọọlẹ ẹbi, jẹ ki o jẹ iṣaaju ti eré wọn. https://www.gutenberg.org/ebooks/2559.txt.utf-8
The Forsytes were resentful of something, not individually, but as a family; this resentment expressed itself in an added perfection of raiment, an exuberance of family cordiality, an exaggeration of family importance, and--the sniff. Awọn Forsytes binu si nkan kan, kii ṣe olukuluku, ṣugbọn gẹgẹbi idile; ibinu yii ṣe afihan ararẹ ni pipe ti aṣọ ti a fi kun, igbadun ti ifarabalẹ idile, iṣaju ti pataki idile, ati --ifun. https://www.gutenberg.org/ebooks/2559.txt.utf-8
Danger--so indispensable in bringing out the fundamental quality of any society, group, or individual--was what the Forsytes scented; the premonition of danger put a burnish on their armour. Ewu - nitorinaa ko ṣe pataki lati mu didara ipilẹ ti awujọ, ẹgbẹ, tabi ẹni kọọkan jade - jẹ ohun ti awọn Forsytes rùn; premonition ti ewu fi kan iná lori wọn ihamọra. https://www.gutenberg.org/ebooks/2559.txt.utf-8

Dataset Size:

  • Number of Entries: 520,000

Usage:

This dataset can be used for:

  • Training machine translation models for Yoruba.
  • Analyzing translation quality and limitations in automated tools.
  • Supporting linguistic research and NLP projects for low-resource languages.

Limitations and Considerations:

  • Quality of Translations: As translations were generated using Google Translate, some sentences may not reflect perfect accuracy. Manual validation is recommended for critical applications.
  • Cultural and Contextual Nuances: Machine translations might miss idiomatic expressions or cultural nuances present in the source language.
  • Biases: Any biases inherent in Google Translate's model may propagate into this dataset.

Licensing:

Source Material License: Public Domain

Tags:

  • machine-translation

  • speech-to-text

  • yoruba-language

  • african-languages

Task_categories:

  • text-classification

  • machine-translation


models

None public yet

datasets

None public yet