Ostixe360 commited on
Commit
b1f4ff3
1 Parent(s): b830fa4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +93 -165
README.md CHANGED
@@ -1,199 +1,127 @@
1
  ---
2
- library_name: transformers
3
- tags: []
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
 
 
10
 
 
11
 
12
- ## Model Details
13
 
14
- ### Model Description
 
 
 
15
 
16
- <!-- Provide a longer summary of what this model is. -->
 
 
 
 
 
17
 
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
 
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
 
28
- ### Model Sources [optional]
29
 
30
- <!-- Provide the basic links for the model. -->
 
 
 
31
 
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
 
36
- ## Uses
 
37
 
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
 
40
- ### Direct Use
41
 
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 
 
 
 
 
43
 
44
- [More Information Needed]
 
45
 
46
- ### Downstream Use [optional]
47
 
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
 
 
 
49
 
50
- [More Information Needed]
51
 
52
- ### Out-of-Scope Use
53
 
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 
 
 
 
55
 
56
- [More Information Needed]
57
 
58
- ## Bias, Risks, and Limitations
 
 
 
 
 
 
 
59
 
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 
 
 
 
 
 
 
 
 
 
 
 
61
 
62
- [More Information Needed]
63
 
64
- ### Recommendations
 
 
65
 
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
 
67
 
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 
 
 
 
 
 
 
 
69
 
70
- ## How to Get Started with the Model
71
 
72
- Use the code below to get started with the model.
73
 
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
 
1
  ---
2
+ language:
3
+ - zh
4
+ - en
5
+ tags:
6
+ - qwen
7
+ pipeline_tag: text-generation
8
+ inference: false
9
  ---
10
 
11
+ # Qwen-Audio-nf4
12
 
13
+ This is the quantized version of [Qwen-Audio](https://huggingface.co/Qwen/Qwen-Audio)
14
 
15
+ # Original Model Card:
16
 
17
+ # Qwen-Audio
18
 
19
+ <br>
20
 
21
+ <p align="center">
22
+ <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Audio/audio_logo.jpg" width="400"/>
23
+ <p>
24
+ <br>
25
 
26
+ <p align="center">
27
+ Qwen-Audio <a href="https://www.modelscope.cn/models/qwen/QWen-Audio/summary">🤖 <a> | <a href="https://huggingface.co/Qwen/Qwen-Audio">🤗</a>&nbsp | Qwen-Audio-Chat <a href="https://www.modelscope.cn/models/qwen/QWen-Audio-Chat/summary">🤖 <a>| <a href="https://huggingface.co/Qwen/Qwen-Audio-Chat">🤗</a>&nbsp | &nbsp&nbsp Demo<a href="https://modelscope.cn/studios/qwen/Qwen-Audio-Chat-Demo/summary"> 🤖</a> | <a href="https://huggingface.co/spaces/Qwen/Qwen-Audio">🤗</a>&nbsp
28
+ <br>
29
+ &nbsp&nbsp<a href="https://qwen-audio.github.io/Qwen-Audio/">Homepage</a>&nbsp | &nbsp<a href="http://arxiv.org/abs/2311.07919">Paper</a> | &nbsp<a href="https://huggingface.co/papers/2311.07919">🤗</a>
30
+ </p>
31
+ <br><br>
32
 
33
+ **Qwen-Audio** (Qwen Large Audio Language Model) is the multimodal version of the large model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. Qwen-Audio accepts diverse audio (human speech, natural sound, music and song) and text as inputs, outputs text. The contribution of Qwen-Audio include:
34
 
35
+ - **Fundamental audio models**: Qwen-Audio is a fundamental multi-task audio-language model that supports various tasks, languages, and audio types, serving as a universal audio understanding model. Building upon Qwen-Audio, we develop Qwen-Audio-Chat through instruction fine-tuning, enabling multi-turn dialogues and supporting diverse audio-oriented scenarios.
36
+ - **Multi-task learning framework for all types of audios**: To scale up audio-language pre-training, we address the challenge of variation in textual labels associated with different datasets by proposing a multi-task training framework, enabling knowledge sharing and avoiding one-to-many interference. Our model incorporates more than 30 tasks and extensive experiments show the model achieves strong performance.
37
+ - **Strong Performance**: Experimental results show that Qwen-Audio achieves impressive performance across diverse benchmark tasks without requiring any task-specific fine-tuning, surpassing its counterparts. Specifically, Qwen-Audio achieves state-of-the-art results on the test set of Aishell1, cochlscene, ClothoAQA, and VocalSound.
38
+ - **Flexible multi-run chat from audio and text input**: Qwen-Audio supports multiple-audio analysis, sound understading and reasoning, music appreciation, and tool usage for speech editing.
 
 
 
39
 
40
+ **Qwen-Audio** 是阿里云研发的大规模音频语言模型(Large Audio Language Model)。Qwen-Audio 可以以多种音频 (包括说话人语音、自然音、音乐、歌声)和文本作为输入,并以文本作为输出。Qwen-Audio 系列模型的特点包括:
41
 
42
+ - **音频基石模型**:Qwen-Audio是一个性能卓越的通用的音频理解模型,支持各种任务、语言和音频类型。在Qwen-Audio的基础上,我们通过指令微调开发了Qwen-Audio-Chat,支持多轮、多语言、多语言对话。Qwen-Audio和Qwen-Audio-Chat模型均已开源。
43
+ - **兼容多种复杂音频的多任务学习框架**:为了避免由于数据收集来源不同以及任务类型不同,带来的音频到文本的一对多的干扰问题,我们提出了一种多任务训练框架,实现相似任务的知识共享,并尽可能减少不同任务之间的干扰。通过提出的框架,Qwen-Audio可以容纳训练超过30多种不同的音频任务;
44
+ - **出色的性能**:Qwen-Audio在不需要任何任务特定的微调的情况下,在各种基准任务上取得了领先的结果。具体得,Qwen-Audio在Aishell1、cochlscene、ClothoAQA和VocalSound的测试集上都达到了SOTA;
45
+ - **支持多轮音频和文本对话,支持各种语音场景**:Qwen-Audio-Chat支持声音理解和推理、音乐欣赏、多音频分析、多轮音频-文本交错对话以及外部语音工具的使用(如语音编辑)。
46
 
 
 
 
47
 
48
+ We release Qwen-Audio and Qwen-Audio-Chat, which are pretrained model and Chat model respectively. For more details about Qwen-Audio, please refer to our [Github Repo](https://github.com/QwenLM/Qwen-Audio/tree/main). This repo is the one for Qwen-Audio.
49
+ <br>
50
 
51
+ 目前,我们提供了Qwen-Audio和Qwen-Audio-Chat两个模型,分别为预训练模型和Chat模型。如果想了解更多关于信息,请点击[链接](https://github.com/QwenLM/Qwen-Audio/tree/main)查看Github仓库。本仓库为Qwen-Audio仓库。
52
 
 
53
 
54
+ ## Requirements
55
+ * python 3.8 and above
56
+ * pytorch 1.12 and above, 2.0 and above are recommended
57
+ * CUDA 11.4 and above are recommended (this is for GPU users)
58
+ * FFmpeg
59
+ <br>
60
 
61
+ ## Quickstart
62
+ Below, we provide simple examples to show how to use Qwen-Audio with 🤗 Transformers.
63
 
64
+ Before running the code, make sure you have setup the environment and installed the required packages. Make sure you meet the above requirements, and then install the dependent libraries.
65
 
66
+ ```bash
67
+ pip install -r requirements.txt
68
+ ```
69
+ For more details, please refer to [tutorial](https://github.com/QwenLM/Qwen-Audio).
70
 
71
+ #### 🤗 Transformers
72
 
73
+ To use Qwen-Audio for the inference, all you need to do is to input a few lines of codes as demonstrated below. However, **please make sure that you are using the latest code.**
74
 
75
+ ```python
76
+ from transformers import AutoModelForCausalLM, AutoTokenizer
77
+ from transformers.generation import GenerationConfig
78
+ import torch
79
+ torch.manual_seed(1234)
80
 
81
+ tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-Audio", trust_remote_code=True)
82
 
83
+ # use bf16
84
+ # model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Audio", device_map="auto", trust_remote_code=True, bf16=True).eval()
85
+ # use fp16
86
+ # model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Audio", device_map="auto", trust_remote_code=True, fp16=True).eval()
87
+ # use cpu only
88
+ # model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Audio", device_map="cpu", trust_remote_code=True).eval()
89
+ # use cuda device
90
+ model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Audio", device_map="cuda", trust_remote_code=True).eval()
91
 
92
+ # Specify hyperparameters for generation (No need to do this if you are using transformers>4.32.0)
93
+ # model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-Audio", trust_remote_code=True)
94
+ audio_url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Audio/1272-128104-0000.flac"
95
+ sp_prompt = "<|startoftranscript|><|en|><|transcribe|><|en|><|notimestamps|><|wo_itn|>"
96
+ query = f"<audio>{audio_url}</audio>{sp_prompt}"
97
+ audio_info = tokenizer.process_audio(query)
98
+ inputs = tokenizer(query, return_tensors='pt', audio_info=audio_info)
99
+ inputs = inputs.to(model.device)
100
+ pred = model.generate(**inputs, audio_info=audio_info)
101
+ response = tokenizer.decode(pred.cpu()[0], skip_special_tokens=False,audio_info=audio_info)
102
+ print(response)
103
+ # <audio>https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Audio/1272-128104-0000.flac</audio><|startoftranscription|><|en|><|transcribe|><|en|><|notimestamps|><|wo_itn|>mister quilting is the apostle of the middle classes and we are glad to welcome his gospel<|endoftext|>
104
+ ```
105
 
 
106
 
107
+ ## License Agreement
108
+ Researchers and developers are free to use the codes and model weights of Qwen-Audio. We also allow its commercial use. Check our license at [LICENSE](https://github.com/QwenLM/Qwen-Audio/blob/main/LICENSE.txt) for more details.
109
+ <br>
110
 
111
+ ## Citation
112
+ If you find our paper and code useful in your research, please consider giving a star and citation
113
 
114
+ ```BibTeX
115
+ @article{Qwen-Audio,
116
+ title={Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models},
117
+ author={Chu, Yunfei and Xu, Jin and Zhou, Xiaohuan and Yang, Qian and Zhang, Shiliang and Yan, Zhijie and Zhou, Chang and Zhou, Jingren},
118
+ journal={arXiv preprint arXiv:2311.07919},
119
+ year={2023}
120
+ }
121
+ ```
122
+ <br>
123
 
124
+ ## Contact Us
125
 
126
+ If you are interested to leave a message to either our research team or product team, feel free to send an email to qianwen_opensource@alibabacloud.com.
127