--- license: other license_name: qwen license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE datasets: - rubenroy/GammaCorpus-v2-5m - rubenroy/GammaCorpus-CoT-Math-170k - rubenroy/GammaCorpus-Fact-QA-450k language: - en base_model: - Qwen/Qwen2.5-72B-Instruct pipeline_tag: text-generation tags: - qwen2 - chat - conversational - gilgamesh - gammacorpus library_name: transformers --- # 🔥 Gilgamesh 72B 🔥 > [!NOTE] > Gilgamesh (GGM) 72B is a finetune of Alibaba's **Qwen 2.5 72B Instruct** model. ![GIlgamesh AI Art](https://cdn.ruben-roy.com/AI/Gilgamesh/img/art.png) ## Model Details - **Developed by:** [Ruben Roy](https://huggingface.co/rubenroy) - **Funded by:** [The Ovantage Society](https://huggingface.co/Ovantage) - **License:** Qwen - **Base Model:** [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) - **Type:** Causal Language Models - **Architecture:** transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias - **Number of Parameters:** 72.7B - **Number of Paramaters (Non-Embedding):** 70.0B - **Number of Layers:** 80 - **Number of Attention Heads (GQA):** 64 for Q and 8 for KV > [!IMPORTANT] > Qwen is licensed under the Qwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved. ## Datasets used Gilgamesh 72B was trained on a mixture of specialised datasets designed for factual accuracy, mathematical capabilities and reasoning. The datasets used include: - **[GammaCorpus-v2-5m](https://huggingface.co/datasets/rubenroy/GammaCorpus-v2-5m)**: A large 5 million line general-purpose dataset covering many topics to enhance broad knowledge and conversational abilities. - **[GammaCorpus-CoT-Math-170k](https://huggingface.co/datasets/rubenroy/GammaCorpus-CoT-Math-170k)**: A dataset focused on Chain-of-Thought (CoT) reasoning in mathematics made to help the model improve step-by-step problem-solving. - **[GammaCorpus-Fact-QA-450k](https://huggingface.co/datasets/rubenroy/GammaCorpus-Fact-QA-450k)**: A dataset containing factual question-answer pairs for enforcing some important current knowledge. These datasets were all built and curated by me, however I thank my other team members at [Ovantage Labs](https://huggingface.co/Ovantage) for assisting me in the creation and curation of these datasets. ## Usage You can test out Gilgamesh 72B with the example usage using the Transformers library: ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "rubenroy/Gilgamesh-72B" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = "What are some largely unsolved questions in philosophy that still affect our lives today?" messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=2048 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` ## License This model follows the Qwen License Agreement by Alibaba Cloud. See the [LICENSE file](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE) for more information. ## Special Thanks A huge thanks to my fellow team members at [Ovantage Labs](https://huggingface.co/Ovantage) for providing the H100s that made this training possible.