Fancy-MLLM commited on
Commit
a9f0c1e
·
verified ·
1 Parent(s): b9e8de1

Update README.md

Browse files

![未命名1739471912.png](https://cdn-uploads.huggingface.co/production/uploads/65af78bb3e82498d4c65ed2a/8BNyo-v68aFvab2kXxtt1.png)

Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -9,7 +9,7 @@ pipeline_tag: image-text-to-text
9
 
10
  ## Model Overview
11
 
12
- This is a multimodal large language model fine-tuned from Qwen2.5-VL on the **R1-Onevision** dataset. The model enhances vision-language understanding and reasoning capabilities, making it suitable for various tasks such as visual reasoning, image understanding. With its robust ability to perform multimodal reasoning, R1-OneVision emerges as a powerful AI assistant capable of addressing a wide range of problem-solving challenges across different domains.
13
 
14
  ## Training Configuration and Curve
15
  - Framework: The training process uses the open-source **LLama-Factory** library, with **Qwen2.5-VL-Instruct** as the base model. This model comes in three variants: 3B, 7B, and 32B.
@@ -29,6 +29,9 @@ bf16: true
29
  flash_attn: fa2
30
  ```
31
 
 
 
 
32
  ## Usage
33
 
34
  You can load the model using the Hugging Face `transformers` library:
@@ -37,7 +40,7 @@ You can load the model using the Hugging Face `transformers` library:
37
  from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
38
  import torch
39
 
40
- MODEL_ID = "Fancy-MLLM/R1-OneVision-7B"
41
  processor = AutoProcessor.from_pretrained(MODEL_ID, trust_remote_code=True)
42
  model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
43
  MODEL_ID,
@@ -79,9 +82,6 @@ output_text = processor.batch_decode(
79
  print(output_text)
80
  ```
81
 
82
- Training loss curve:
83
- <img src=""/>
84
-
85
  ## Ongoing Work
86
  1. **Rule-Based Reinforcement Learning (RL)**
87
 
 
9
 
10
  ## Model Overview
11
 
12
+ This is a multimodal large language model fine-tuned from Qwen2.5-VL on the **R1-Onevision** dataset. The model enhances vision-language understanding and reasoning capabilities, making it suitable for various tasks such as visual reasoning, image understanding. With its robust ability to perform multimodal reasoning, R1-Onevision emerges as a powerful AI assistant capable of addressing a wide range of problem-solving challenges across different domains.
13
 
14
  ## Training Configuration and Curve
15
  - Framework: The training process uses the open-source **LLama-Factory** library, with **Qwen2.5-VL-Instruct** as the base model. This model comes in three variants: 3B, 7B, and 32B.
 
29
  flash_attn: fa2
30
  ```
31
 
32
+ Training loss curve:
33
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/65af78bb3e82498d4c65ed2a/8BNyo-v68aFvab2kXxtt1.png"/>
34
+
35
  ## Usage
36
 
37
  You can load the model using the Hugging Face `transformers` library:
 
40
  from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
41
  import torch
42
 
43
+ MODEL_ID = "Fancy-MLLM/R1-Onevision-7B"
44
  processor = AutoProcessor.from_pretrained(MODEL_ID, trust_remote_code=True)
45
  model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
46
  MODEL_ID,
 
82
  print(output_text)
83
  ```
84
 
 
 
 
85
  ## Ongoing Work
86
  1. **Rule-Based Reinforcement Learning (RL)**
87