File size: 2,892 Bytes
515e8e4 b9cd6f1 0a9b1d0 b9cd6f1 515e8e4 b8bbaab 515e8e4 31d78a7 515e8e4 48f3adf 515e8e4 8fa0a35 0d6394b 8fa0a35 515e8e4 31d78a7 515e8e4 31d78a7 515e8e4 b9cd6f1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
---
license: mit
base_model:
- Goekdeniz-Guelmez/JosiexHelium-v6-2B-mlx-Base
pipeline_tag: text-generation
library_name: mlx
---
# Goekdeniz-Guelmez/Josie-v6-2b-mlx-concept
## Overview
This is a crude proof of concept (PoC) demonstrating the feasibility of fine-tuning a large language model (LLM) on Apple Silicon using the MLX-LM framework. The goal is to explore the capabilities of Apple’s hardware for local LLM training and fine-tuning workflows.
## Model and Training Details
- **Base Model:** `mlx-community/helium-1-preview-2b`
- **Fine-Tuned Model:** `J.O.S.I.E.v6-2b`
- **Context length:** 4098
- **Trained number of Tokens:** ca. 1T
- **Created by:** Gökdeniz Gülmez
- **Fine-Tune Dataset:** Offline private dataset
- **DPO/ORPO Dataset:** Offline private dataset
- **Prompt Template:**
```text
<|im_start|>system
You are Josie my private, super-intelligent assistant.<|im_end|>
<|im_start|>Gökdeniz Gülmez
{{ .PROMPT }}<|im_end|>
<|im_start|>Josie
{{ .RESPONSE }}<|im_end|>
```
- **Training Process:**
- First **10K steps** trained using **LoRA** (Low-Rank Adaptation) with **22 layers** selected.
- Second **1K steps** trained using **full weight training**.
- Final **4K steps** ORPO training using **DoRA** with **22 layers** selected.
## Hardware Used
- **Device:** Apple Mac Mini M4 (32GB RAM)
- **Framework:** Apple MLX-LM
## Quantisations
- [MLX 4 Bit](Goekdeniz-Guelmez/Josie-v6-2b-mlx-concept-4bit)
- [MLX 6 Bit](Goekdeniz-Guelmez/Josie-v6-2b-mlx-concept-6bit)
- [MLX 8 Bit](Goekdeniz-Guelmez/Josie-v6-2b-mlx-concept-8bit)
## Notes & Limitations
- This is an experimental setup; performance and efficiency optimizations are ongoing.
- Dataset details remain private and are not included in this repository.
- The training process may require significant memory and computational resources despite optimizations.
- Further work is needed to explore distributed training and mixed-precision techniques for better performance on Apple Silicon.
## ORPO Training
ORPO training is not yet available in the official `mlx-examples` repository. To use it, you will need to clone and work from my fork:
[https://github.com/Goekdeniz-Guelmez/mlx-examples.git](https://github.com/Goekdeniz-Guelmez/mlx-examples.git)
## Future Improvements
- Experiment with additional quantization techniques to reduce VRAM usage.
- Investigate performance scaling across multiple Apple Silicon devices.
- Optimize training pipelines for better convergence and efficiency.
## Community Feedback
I would love to hear from the MLX community! Should I publish a tutorial on how to fine-tune LLMs on Apple Silicon? If so, would you prefer it in text or video format? Let me know!
## Disclaimer
This project is strictly for research and experimental purposes. The fine-tuned model is not intended for production use at this stage.
Best,
Gökdeniz Gülmez |