Mia LLM - Personal AI Ecosystem

Mia LLM is an advanced AI-powered personal assistant built on the Mistral-7B-Instruct-v0.3 model. It is designed to enhance daily life through translation, personal assistance, and interactive services.

Features

Text and Voice Translation

Text-to-Text Translation: Translate messages into 96 languages.
Voice-to-Voice Translation: Enable seamless communication through voice translation.
Real-Time Translation: Provide live translations during audio and video calls.

Personal Assistant Services

Appointment reminders.
Shopping list creation.
Health and fitness recommendations.
Weather updates and navigation.
Bill payment reminders.

Document and Multimedia Translation

PDF and Text Translation: Translate documents into multiple languages.
Video Dubbing: Add voiceovers to videos in different languages.
Audio File Translation: Convert audio recordings into other languages.

Advanced Analysis Capabilities

Sentiment Analysis: Analyze emotional tones in messages.
Body Language and Facial Expression Analysis: Evaluate video call interactions.
Language Processing and Accuracy Optimization: Ensure clarity and correctness in communication.

Interactive Services

Food ordering.
Taxi booking.
Ticket and hotel reservations.
Flight tracking.

Adaptive Learning Capabilities

Learn user habits for personalized services.
Continuously update with new languages and content.

Architecture

Mia LLM is based on the Mistral-7B-Instruct-v0.3 architecture, which leverages a Transformer-based structure for high efficiency and accuracy. The model is fine-tuned with instruction tuning for optimal responsiveness.

1. Introduction to Models

LLama 2 70B, GPT-3.5, and Mixtral 8x7B: These are large language models with varying capacities. The numbers in their names (e.g., 70B) represent the number of parameters in the model. A higher number of parameters generally indicates a greater learning and reasoning capacity.

2. Benchmarks and Test Datasets

Test datasets are used to evaluate the performance of models across different domains.

MMLU (Massive Multitask Language Understanding)

Description: This test evaluates models' knowledge and reasoning abilities across 57 diverse topics using multiple-choice questions.
Results:
- LLama 2: 69.9%
- GPT-3.5: 70.0%
- Mixtral: 70.6%
- Mixtral achieved the highest score in this benchmark, though the margin was small.

HellaSwag

Description: A test of logical reasoning and coherence. Models are fine-tuned with 10 examples and must predict the most plausible continuation of a text.
Results:
- LLama 2: 87.1% (Best performance)
- GPT-3.5: 85.5%
- Mixtral: 86.7%

ARC Challenge

Description: Consists of challenging multiple-choice questions, often requiring scientific and academic knowledge. Models are trained with 25 examples.
Results:
- LLama 2: 85.1%
- GPT-3.5: 85.2%
- Mixtral: 85.8% (Best performance)

WinoGrande

Description: Evaluates natural language understanding by assessing a model's ability to resolve ambiguities and determine correct references in sentences.
Results:
- LLama 2: 83.2% (Best performance)
- GPT-3.5: 81.6%
- Mixtral: 81.2%

MBPP (Multi-turn Programming Benchmark for Python)

Description: Measures programming capabilities by testing the accuracy of Python code generation.
Results:
- LLama 2: 49.8%
- GPT-3.5: 52.2%
- Mixtral: 60.7% (Significantly superior)

GSM-8K

Description: A math problem-solving benchmark that tests models on 8th-grade-level mathematics problems.
Results:
- LLama 2: 53.6%
- GPT-3.5: 57.1%
- Mixtral: 58.4% (Best performance)

MT Bench

Description: A specialized benchmark for instruction-based models, testing their ability to understand and respond to prompts accurately.
Results:
- LLama 2: 6.86
- GPT-3.5: 8.32 (Best performance)
- Mixtral: 8.30

3. General Analysis

Mixtral demonstrated standout performance in several tests, particularly in MBPP, ARC Challenge, and GSM-8K benchmarks, where it outperformed the other models.
GPT-3.5 showcased consistent results and excelled in MT Bench and other instruction-based evaluations.
LLama 2, while not the leader in many benchmarks, maintained competitive and stable performance across the board.

4. Conclusion

The benchmarks highlight the strengths and weaknesses of these models, offering insights into their suitability for specific applications:

Programming: Mixtral is a strong candidate due to its high MBPP score.
Instruction-based tasks: GPT-3.5 is ideal for such use cases, as demonstrated by its MT Bench results.
General-purpose usage: LLama 2 provides a balanced and versatile option.

Resource: https://www.e2enetworks.com/blog/mistral-7b-vs-llama2-which-performs-better-and-why#:~:text=Mistral%207B%20significantly%20outperforms%20Llama2,7B%20comes%20out%20on%20top

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("mektup-mia/Mia-LLM")
model = AutoModelForCausalLM.from_pretrained("mektup-mia/Mia-LLM")

input_text = "Translate this text into Spanish."
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))


---
license: apache-2.0
language:
- en
base_model:
- mistralai/Mistral-7B-Instruct-v0.3
---