MawaredT1 / README.md
Daemontatox's picture
Adding Evaluation Results (#2)
816cf8e verified
metadata
base_model: arcee-ai/Meraj-Mini
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - qwen2
  - trl
license: apache-2.0
language:
  - ar
  - en
model-index:
  - name: MawaredT1
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: IFEval (0-Shot)
          type: wis-k/instruction-following-eval
          split: train
          args:
            num_few_shot: 0
        metrics:
          - type: inst_level_strict_acc and prompt_level_strict_acc
            value: 41.99
            name: averaged accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FMawaredT1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BBH (3-Shot)
          type: SaylorTwift/bbh
          split: test
          args:
            num_few_shot: 3
        metrics:
          - type: acc_norm
            value: 31.9
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FMawaredT1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MATH Lvl 5 (4-Shot)
          type: lighteval/MATH-Hard
          split: test
          args:
            num_few_shot: 4
        metrics:
          - type: exact_match
            value: 14.58
            name: exact match
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FMawaredT1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GPQA (0-shot)
          type: Idavidrein/gpqa
          split: train
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 11.3
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FMawaredT1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MuSR (0-shot)
          type: TAUR-Lab/MuSR
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 18.68
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FMawaredT1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU-PRO (5-shot)
          type: TIGER-Lab/MMLU-Pro
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 41.31
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FMawaredT1
          name: Open LLM Leaderboard

image

Bilingual Assistant Model Card

Overview

This bilingual language model is designed to support seamless text generation and understanding in both Arabic (ar) and English (en). Fine-tuned from the arcee-ai/Meraj-Mini base model, it offers robust multilingual capabilities optimized for various applications such as conversational agents, content creation, and multilingual text analysis.

Key Highlights

  • Multilingual Proficiency: Designed to handle complex linguistic nuances in both Arabic and English, ensuring high-quality outputs in both languages.
  • Performance Optimization: Achieved 2x faster training through innovative methods provided by the Unsloth framework and the Hugging Face TRL library.
  • Transformer-Based Architecture: Utilizes advanced transformer layers to deliver state-of-the-art performance in text generation and inference.

Development Details

  • Developer: Daemontatox
  • License: Licensed under the Apache-2.0, ensuring open accessibility and flexibility for various use cases.
  • Base Model: The model is a fine-tuned variant of arcee-ai/Meraj-Mini.
  • Frameworks Used:
    • Unsloth: Enabled faster and more efficient training.
    • Hugging Face TRL Library: Provided tools for reinforcement learning fine-tuning, enhancing model responsiveness and accuracy.

Training Process

The fine-tuning process was conducted with a focus on:

  • Data Diversity: Leveraged a bilingual corpus to ensure comprehensive language understanding across both supported languages.
  • Optimized Hardware Utilization: Implemented Unsloth's accelerated training methods, significantly reducing resource consumption and training time.
  • Reinforcement Learning: Used Hugging Face's TRL library to fine-tune the model's decision-making and response generation capabilities, particularly for conversational and contextual understanding.

Applications

This model is suited for a variety of real-world applications, including:

  1. Conversational Agents: Powering bilingual chatbots and virtual assistants for customer support and personal use.
  2. Content Generation: Assisting in drafting multilingual articles, social media posts, and creative writing.
  3. Translation Support: Providing context-aware translations and summaries across Arabic and English.
  4. Education: Enhancing learning platforms by offering bilingual educational content and interactive learning experiences.

Future Directions

Plans for extending the model's capabilities include:

  • Additional Language Support: Exploring fine-tuning for additional languages.
  • Domain-Specific Training: Specializing the model for industries such as healthcare, legal, and technical writing.
  • Optimization for Edge Devices: Investigating quantization techniques to deploy the model on resource-constrained hardware like mobile devices and IoT platforms.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here! Summarized results can be found here!

Metric Value (%)
Average 26.63
IFEval (0-Shot) 41.99
BBH (3-Shot) 31.90
MATH Lvl 5 (4-Shot) 14.58
GPQA (0-shot) 11.30
MuSR (0-shot) 18.68
MMLU-PRO (5-shot) 41.31