Update README.md
Browse files
README.md
CHANGED
@@ -221,7 +221,7 @@ The qualitative evaluation showcases AIN's advanced capabilities in handling div
|
|
221 |
A multi-step verification pipeline was implemented to ensure high-quality translations and safe visual data. Translation accuracy was assessed through human evaluation, where native Arabic speakers rated outputs against reference translations, and semantic similarity checks were conducted using **LaBSE**. Additionally, translated samples were reverse-translated and validated using **BLEU, METEOR, and ROUGE scores** to measure correctness, correlation, and overlap. For visual data, toxicity filtering was applied using **LLavaGuard’s safety policies and GPT-4o**, identifying and removing unsafe content related to violence, substance abuse, and harmful imagery, ensuring compliance with ethical AI standards.
|
222 |
|
223 |
<p align="center">
|
224 |
-
<img src="assets_hf/verify_pipeline.png" width="
|
225 |
<h6>
|
226 |
<em> <b>Figure 4.</b> Data verification and filtering pipeline for textual and visual data, ensuring high-quality training data through semantic similarity checks, translation quality evaluations, and toxicity screening for safety compliance. </em>
|
227 |
</h6>
|
|
|
221 |
A multi-step verification pipeline was implemented to ensure high-quality translations and safe visual data. Translation accuracy was assessed through human evaluation, where native Arabic speakers rated outputs against reference translations, and semantic similarity checks were conducted using **LaBSE**. Additionally, translated samples were reverse-translated and validated using **BLEU, METEOR, and ROUGE scores** to measure correctness, correlation, and overlap. For visual data, toxicity filtering was applied using **LLavaGuard’s safety policies and GPT-4o**, identifying and removing unsafe content related to violence, substance abuse, and harmful imagery, ensuring compliance with ethical AI standards.
|
222 |
|
223 |
<p align="center">
|
224 |
+
<img src="assets_hf/verify_pipeline.png" width="75%" alt="verify" style="margin-right: 2px";/>
|
225 |
<h6>
|
226 |
<em> <b>Figure 4.</b> Data verification and filtering pipeline for textual and visual data, ensuring high-quality training data through semantic similarity checks, translation quality evaluations, and toxicity screening for safety compliance. </em>
|
227 |
</h6>
|