MBZUAI
/

AIN

SLMLAH commited on 20 days ago

Commit

b500195

verified ·

1 Parent(s): 77338b3

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -221,7 +221,7 @@ The qualitative evaluation showcases AIN's advanced capabilities in handling div
 A multi-step verification pipeline was implemented to ensure high-quality translations and safe visual data. Translation accuracy was assessed through human evaluation, where native Arabic speakers rated outputs against reference translations, and semantic similarity checks were conducted using **LaBSE**. Additionally, translated samples were reverse-translated and validated using **BLEU, METEOR, and ROUGE scores** to measure correctness, correlation, and overlap. For visual data, toxicity filtering was applied using **LLavaGuard’s safety policies and GPT-4o**, identifying and removing unsafe content related to violence, substance abuse, and harmful imagery, ensuring compliance with ethical AI standards.
 <p align="center">
-   <img src="assets_hf/verify_pipeline.png" width="65%" alt="verify"  style="margin-right: 2px";/>
     <h6>
        <em>  <b>Figure 4.</b> Data verification and filtering pipeline for textual and visual data, ensuring high-quality training data through semantic similarity checks, translation quality evaluations, and toxicity screening for safety compliance. </em>
     </h6>

 A multi-step verification pipeline was implemented to ensure high-quality translations and safe visual data. Translation accuracy was assessed through human evaluation, where native Arabic speakers rated outputs against reference translations, and semantic similarity checks were conducted using **LaBSE**. Additionally, translated samples were reverse-translated and validated using **BLEU, METEOR, and ROUGE scores** to measure correctness, correlation, and overlap. For visual data, toxicity filtering was applied using **LLavaGuard’s safety policies and GPT-4o**, identifying and removing unsafe content related to violence, substance abuse, and harmful imagery, ensuring compliance with ethical AI standards.
 <p align="center">
+   <img src="assets_hf/verify_pipeline.png" width="75%" alt="verify"  style="margin-right: 2px";/>
     <h6>
        <em>  <b>Figure 4.</b> Data verification and filtering pipeline for textual and visual data, ensuring high-quality training data through semantic similarity checks, translation quality evaluations, and toxicity screening for safety compliance. </em>
     </h6>