lucyknada
/

chargoddard_llama3-42b-v0-3.5bpw-EXL2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

chargoddard_llama3-42b-v0-3.5bpw-EXL2 / README.md

lucyknada's picture

Upload folder using huggingface_hub

16d3b58 verified 7 months ago

|

1.8 kB

	---
	license: other
	datasets:
	- JeanKaddour/minipile
	language:
	- en
	tags:
	- axolotl
	- mergekit
	- llama
	---

	Meta's Llama 3 70B pruned to 42B parameters using the methodology described in [The Unreasonable Ineffectiveness of the Deeper Layers](https://arxiv.org/abs/2403.17887). Post-pruning trained using QLoRA for ~100M tokens from [JeanKaddour/minipile](https://huggingface.co/datasets/JeanKaddour/minipile).

	Layers to prune selected using [PruneMe](https://github.com/arcee-ai/PruneMe).

	Still evaluating, don't get too excited! Might be incredibly dumb. Check out these zero-shot MMLU numbers though:


	\| Groups \|Version\|Filter\|n-shot\|Metric\|Value \| \|Stderr\|
	\|------------------\|-------\|------\|-----:\|------\|-----:\|---\|-----:\|
	\|mmlu \|N/A \|none \| 0\|acc \|0.7319\|± \|0.0034\|
	\| - humanities \|N/A \|none \| 0\|acc \|0.6582\|± \|0.0063\|
	\| - other \|N/A \|none \| 0\|acc \|0.7927\|± \|0.0069\|
	\| - social_sciences\|N/A \|none \| 0\|acc \|0.8466\|± \|0.0064\|
	\| - stem \|N/A \|none \| 0\|acc \|0.6702\|± \|0.0079\|

	5-shot:

	\| Groups \|Version\|Filter\|n-shot\|Metric\|Value \| \|Stderr\|
	\|------------------\|-------\|------\|-----:\|------\|-----:\|---\|-----:\|
	\|mmlu \|N/A \|none \| 0\|acc \|0.7669\|± \|0.0034\|
	\| - humanities \|N/A \|none \| 5\|acc \|0.7296\|± \|0.0062\|
	\| - other \|N/A \|none \| 5\|acc \|0.8101\|± \|0.0067\|
	\| - social_sciences\|N/A \|none \| 5\|acc \|0.8668\|± \|0.0060\|
	\| - stem \|N/A \|none \| 5\|acc \|0.6825\|± \|0.0079\|

	[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)