Triangle104
/

Marco-o1-Q4_K_M-GGUF

Transformers

GGUF

llama-cpp

gguf-my-repo

conversational

Model card Files Files and versions Community

Triangle104 commited on 26 days ago

Commit

ebebbe8

•

1 Parent(s): c2dc597

Update README.md

Browse files

Files changed (1) hide show

README.md +245 -0

README.md CHANGED Viewed

@@ -12,6 +12,251 @@ tags:
 This model was converted to GGUF format from [`AIDC-AI/Marco-o1`](https://huggingface.co/AIDC-AI/Marco-o1) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/AIDC-AI/Marco-o1) for more details on the model.
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)

 This model was converted to GGUF format from [`AIDC-AI/Marco-o1`](https://huggingface.co/AIDC-AI/Marco-o1) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/AIDC-AI/Marco-o1) for more details on the model.
+---
+Model details:
+-
+Marco-o1 not only focuses on disciplines with
+standard answers, such as mathematics, physics, and coding—which are
+well-suited for reinforcement learning (RL)—but also places greater
+emphasis on open-ended resolutions. We aim to address the question: "Can
+ the o1 model effectively generalize to broader domains where clear
+standards are absent and rewards are challenging to quantify?"
+Currently, Marco-o1 Large Language Model (LLM) is powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), reflection mechanisms, and _innovative reasoning strategies_—optimized for complex real-world problem-solving tasks.
+⚠️ Limitations: We would like to emphasize that
+ this research work is inspired by OpenAI's o1 (from which the name is
+also derived). This work aims to explore potential approaches to shed
+light on the currently unclear technical roadmap for large reasoning
+models. Besides, our focus is on open-ended questions, and we have
+observed interesting phenomena in multilingual applications. However, we
+ must acknowledge that the current model primarily exhibits o1-like
+reasoning characteristics and its performance still fall short of a
+fully realized "o1" model. This is not a one-time effort, and we remain
+committed to continuous optimization and ongoing improvement.
+		🚀 Highlights
+Currently, our work is distinguished by the following highlights:
+🍀 Fine-Tuning with CoT Data: We develop Marco-o1-CoT by performing
+full-parameter fine-tuning on the base model using open-source CoT
+dataset combined with our self-developed synthetic data.
+🍀 Solution Space Expansion via MCTS: We integrate LLMs with MCTS
+(Marco-o1-MCTS), using the model's output confidence to guide the search
+ and expand the solution space.
+🍀 Reasoning Action Strategy: We implement novel reasoning action
+strategies and a reflection mechanism (Marco-o1-MCTS Mini-Step),
+including exploring different action granularities within the MCTS
+framework and prompting the model to self-reflect, thereby significantly
+ enhancing the model's ability to solve complex problems.
+🍀 Application in Translation Tasks: We are the first to apply Large
+ Reasoning Models (LRM) to Machine Translation task, exploring inference
+ time scaling laws in the multilingual and translation domain.
+OpenAI recently introduced the groundbreaking o1 model, renowned for
+its exceptional reasoning capabilities. This model has demonstrated
+outstanding performance on platforms such as AIME, CodeForces,
+surpassing other leading models. Inspired by this success, we aimed to
+push the boundaries of LLMs even further, enhancing their reasoning
+abilities to tackle complex, real-world challenges.
+🌍 Marco-o1 leverages advanced techniques like CoT fine-tuning, MCTS,
+ and Reasoning Action Strategies to enhance its reasoning power. As
+shown in Figure 2, by fine-tuning Qwen2-7B-Instruct with a combination
+of the filtered Open-O1 CoT dataset, Marco-o1 CoT dataset, and Marco-o1
+Instruction dataset, Marco-o1 improved its handling of complex tasks.
+MCTS allows exploration of multiple reasoning paths using confidence
+scores derived from softmax-applied log probabilities of the top-k
+alternative tokens, guiding the model to optimal solutions. Moreover,
+our reasoning action strategy involves varying the granularity of
+actions within steps and mini-steps to optimize search efficiency and
+accuracy.
+Figure 2: The overview of Marco-o1.
+🌏 As shown in Figure 3, Marco-o1 achieved accuracy improvements of
++6.17% on the MGSM (English) dataset and +5.60% on the MGSM (Chinese)
+dataset, showcasing enhanced reasoning capabilities.
+Figure 3: The main results of Marco-o1.
+🌎 Additionally, in translation tasks, we demonstrate that Marco-o1
+excels in translating slang expressions, such as translating "这个鞋拥有踩屎感"
+(literal translation: "This shoe offers a stepping-on-poop sensation.")
+to "This shoe has a comfortable sole," demonstrating its superior grasp
+of colloquial nuances.
+Figure 4: The demostration of translation task using Marco-o1.
+For more information,please visit our Github.
+		Usage
+Load Marco-o1-CoT model:
+# Load model directly
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("AIDC-AI/Marco-o1")
+model = AutoModelForCausalLM.from_pretrained("AIDC-AI/Marco-o1")
+Inference:
+ Execute the inference script (you can give any customized inputs inside):
+./src/talk_with_model.py
+# Use vLLM
+./src/talk_with_model_vllm.py
+		👨🏻‍💻 Acknowledgement
+		Main Contributors
+From MarcoPolo Team, AI Business, Alibaba International Digital Commerce:
+Yu Zhao
+Huifeng Yin
+Hao Wang
+Longyue Wang
+		Citation
+If you find Marco-o1 useful for your research and applications, please cite:
+@misc{zhao2024marcoo1openreasoningmodels,
+      title={Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions},
+      author={Yu Zhao and Huifeng Yin and Bo Zeng and Hao Wang and Tianqi Shi and Chenyang Lyu and Longyue Wang and Weihua Luo and Kaifu Zhang},
+      year={2024},
+      eprint={2411.14405},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2411.14405},
+}
+		LICENSE
+This project is licensed under Apache License Version 2 (SPDX-License-identifier: Apache-2.0).
+		DISCLAIMER
+We used compliance checking algorithms during the training process,
+to ensure the compliance of the trained model and dataset to the best of
+ our ability. Due to complex data and the diversity of language model
+usage scenarios, we cannot guarantee that the model is completely free
+of copyright issues or improper content. If you believe anything
+infringes on your rights or generates improper content, please contact
+us, and we will promptly address the matter.
+---
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)