MermaidLLama: Unleashing the Power of 8 Billion Parameters
Introducing MermaidLLama, a robust language model designed for Python code understanding and crafting captivating story flow maps. With a staggering 8.3 billion parameters, this model builds on the success of LLaMA-PRO-Instruct ability to maintain versatility in programming, mathematical reasoning, and general language processing.
Key Features:
Code Understanding:
- Masters Python intricacies with finesse.
- Generates clear and accurate Mermaid Diagram Flow Charts.
- Ideal for developers seeking visual representations of their code's logic.
Storytelling Capabilities:
- Converts narrative inputs into captivating Mermaid Diagrams.
- Maps character interactions, plot developments, and narrative arcs effortlessly.
Unmatched Performance:
- Surpasses Mistral and even larger models like GPT-4 in generating well-organized and detailed Mermaid Diagrams for story flows.
Training Insights:
- Trained on a diverse dataset, including 478 Python examples.
- Exhibits emergent properties in story-to-flow map translations.
- Adaptable and efficient in resource utilization.
Collaboration:
MermaidLLama is open to collaboration to further enhance its capabilities. The Alpaca-formatted dataset provides a unique foundation for understanding Python intricacies. If you're interested in contributing or collaborating, feel free to reach out to [email protected]. Your expertise could play a pivotal role in refining MermaidLLama.
Example Use Cases:
Code Documentation: Developers can use MermaidLLama to automatically generate visual flow charts from their Python code, aiding in documentation and code understanding.
Storyboarding: Storytellers and writers can input their narrative and receive visually appealing Mermaid Diagrams, offering a structured overview of character interactions and plot progression.
Project Planning: Project managers can leverage MermaidLLama to create visual project flow maps, facilitating effective communication and planning among team members.
Learning Python: Students and beginners can use MermaidLLama to visually understand Python code structures, enhancing their learning experience.
Game Design: Game developers can utilize MermaidLLama for visualizing game storylines, ensuring a coherent narrative structure and character development.
Proof of Concept:
MermaidLLama proves that innovation thrives in compact packages, delivering exceptional performance across diverse applications. Stay tuned for the release of the VSCode Extension that displays the Live Flow Map every time a user stops typing for more than 10 seconds.
For best results, use full precision with one of the three different instruction types:
- "instruction": "Create the mermaid diagram for the following code:"
- "instruction": "Create the mermaid diagram for the following story:"
- "instruction": "Create the mermaid diagram for the following:"
Exciting times ahead as we delve into the MermaidLLama revolution! ๐
LoRA Rank Also called dimension count. Higher values = larger file, more content control. Smaller values = smaller file, less control. Use 4 or 8 for style, 128 or 256 to teach, 1024+ for fine-detail on big data. More VRAM is needed for higher ranks.
- 2048
LoRA Alpha This divided by the rank becomes the scaling of the LoRA. Higher means stronger. A good standard value is twice your Rank.
- 4096
Batch Size Global batch size. The two batch sizes together determine gradient accumulation (gradientAccum = batch / microBatch). Higher gradient accum values lead to better quality training.
- 1
Micro Batch Size Per-device batch size (NOTE: multiple devices not yet implemented). Increasing this will increase VRAM usage.
- 1
Cutoff Length Cutoff length for text input. Essentially, how long of a line of text to feed in at a time. Higher values require drastically more VRAM.
- 4096
Save every n steps If above 0, a checkpoint of the LoRA will be saved every time this many steps pass.
- 1000
Epochs Number of times every entry in the dataset should be fed into training. So 1 means feed each item in once, 5 means feed it in five times, etc.
- 3
Learning Rate In scientific notation.
- 1e-6
LR Scheduler Learning rate scheduler - defines how the learning rate changes over time. "Constant" means never change, "linear" means to go in a straight line from the learning rate down to 0, cosine follows a curve, etc.
- cosine
Target Modules โผ Selects which modules to target in training. Targeting more modules is closer to a full fine-tune at the cost of increased VRAM requirements and adapter size. NOTE: Only works for model_id='llama', other types will retain default training behavior and not use these settings.
- Enable q_proj
- Enable v_proj
- Enable k_proj
- Enable o_proj
- Enable gate_proj
- Enable down_proj
- Enable up_proj
- Downloads last month
- 20