FalconLLM commited on
Commit
f82eaf8
β€’
1 Parent(s): 6972514

Update for Falcon-180B release

Browse files
Files changed (1) hide show
  1. README.md +9 -7
README.md CHANGED
@@ -9,29 +9,31 @@ pinned: false
9
 
10
  **Do you believe in a better tomorrow? We do. Our team of expert researchers live the dream and work to build it every day.**
11
 
 
12
 
13
  # News
14
 
15
- * πŸ’₯ **TII has open-sourced Falcon LLM for research and commercial utilization!** Access the [7B](https://huggingface.co/tiiuae/falcon-7b)/[40B](https://huggingface.co/tiiuae/falcon-40b) models, and explore our high-quality web dataset, [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb).
16
  * ✨ **Falcon-[40B](https://huggingface.co/tiiuae/falcon-40b)/[7B](https://huggingface.co/tiiuae/falcon-7b) are now available under the Apache 2.0 license**, TII has [waived all royalties and commercial usage restrictions](https://www.tii.ae/news/uaes-falcon-40b-worlds-top-ranked-ai-model-technology-innovation-institute-now-royalty-free).
17
- * πŸ€— TII is calling for proposals from the global research community and SME entrepreneurs to submit use cases for Falcon LLM, learn more about it on the [Falcon LLM website](https://falconllm.tii.ae).
18
-
19
 
20
  # Falcon LLM
21
 
22
  Falcon LLM is TII's flagship series of large language models, built from scratch using a custom data pipeline and distributed training library. Papers coming soon 😊.
23
 
24
  To promote collaborations and drive innovation, we have open-sourced a number of artefacts:
25
- * The **Falcon-7/40B** pretrained and instruct models, under the Apache 2.0 software license . Falcon-7B/40B models are state-of-the-art for their size, outperforming most other models on NLP benchmarks.
26
- * The **RefinedWeb** dataset, a massive web dataset with stringent filtering and large-scale deduplication, enabling models trained on web data alone to match or outperform models trained on curated corpora. See πŸ““ [the paper](https://arxiv.org/abs/2306.01116) for more information. RefinedWeb is licensed under Apache 2.0.
 
27
 
28
  See below for a detailed list of artefacts in the Falcon LLM family:
29
 
30
  | **Artefact** | **Link** | **Type** | **Details** |
31
  |---------------------|------------------------------------------------------------------|-------------------------|-------------------------------------------------------------------|
32
- | πŸ₯‡ **Falcon-40B** | [Here](https://huggingface.co/tiiuae/falcon-40b) | *pretrained model* | 40B parameters trained on 1,000 billion tokens. |
 
 
33
  | Falcon-40B-Instruct | [Here](https://huggingface.co/tiiuae/falcon-40b-instruct) | *instruction/chat model* | Falcon-40B finetuned on the [Baize](https://github.com/project-baize/baize-chatbot) dataset. |
34
- | πŸ₯ˆ **Falcon-7B** | [Here](https://huggingface.co/tiiuae/falcon-7b) | *pretrained model* | 6.7B parameters trained on 1,500 billion tokens. |
35
  | Falcon-7B-Instruct | [Here](https://huggingface.co/tiiuae/falcon-7b-instruct) | *instruction/chat model* | Falcon-7B finetuned on the [Baize](https://github.com/project-baize/baize-chatbot), [GPT4All](https://github.com/nomic-ai/gpt4all), and [GPTeacher](https://github.com/teknium1/GPTeacher) datasets. |
36
  | πŸ“€ **RefinedWeb** | [Here](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) | *pretraining web dataset* | ~600 billion "high-quality" tokens. |
37
  | Falcon-RW-1B | [Here](https://huggingface.co/tiiuae/falcon-rw-1b) | *pretrained model* | 1.3B parameters trained on 350 billion tokens. |
 
9
 
10
  **Do you believe in a better tomorrow? We do. Our team of expert researchers live the dream and work to build it every day.**
11
 
12
+ **πŸ”₯ [Falcon-180B](https://huggingface.co/tiiuae/falcon-180b) is now available in open-access! [Try it now in our chat demo!](https://huggingface.co/spaces/tiiuae/falcon-180b-demo).**
13
 
14
  # News
15
 
16
+ * πŸ’₯ **TII has open-sourced Falcon-180B for research and commercial utilization!** Access the [180B](https://huggingface.co/tiiuae/falcon-180b), as well as [7B](https://huggingface.co/tiiuae/falcon-7b)/[40B](https://huggingface.co/tiiuae/falcon-40b) models, and explore our high-quality web dataset, [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb).
17
  * ✨ **Falcon-[40B](https://huggingface.co/tiiuae/falcon-40b)/[7B](https://huggingface.co/tiiuae/falcon-7b) are now available under the Apache 2.0 license**, TII has [waived all royalties and commercial usage restrictions](https://www.tii.ae/news/uaes-falcon-40b-worlds-top-ranked-ai-model-technology-innovation-institute-now-royalty-free).
 
 
18
 
19
  # Falcon LLM
20
 
21
  Falcon LLM is TII's flagship series of large language models, built from scratch using a custom data pipeline and distributed training library. Papers coming soon 😊.
22
 
23
  To promote collaborations and drive innovation, we have open-sourced a number of artefacts:
24
+ * The **Falcon-180B** pretrained and chat models, under the [Falcon-180B TII license](https://huggingface.co/spaces/tiiuae/falcon-180b-license/blob/main/LICENSE.txt). Falcon-180B is the largest and most powerful open-access model available.
25
+ * The **Falcon-7/40B** pretrained and instruct models, under the Apache 2.0 software license . Falcon-7B/40B models are state-of-the-art for their size, outperforming other open-source models on NLP benchmarks.
26
+ * The **RefinedWeb** dataset, a massive web dataset with stringent filtering and large-scale deduplication, enabling models trained on web data alone to match or outperform models trained on curated corpora. See πŸ““ [the paper](https://arxiv.org/abs/2306.01116) for more information. RefinedWeb is licensed under ODC-By 1.0.
27
 
28
  See below for a detailed list of artefacts in the Falcon LLM family:
29
 
30
  | **Artefact** | **Link** | **Type** | **Details** |
31
  |---------------------|------------------------------------------------------------------|-------------------------|-------------------------------------------------------------------|
32
+ | πŸ₯‡ **Falcon-40B** | [Here](https://huggingface.co/tiiuae/falcon-180b) | *pretrained model* | 180B parameters trained on 3,500 billion tokens. |
33
+ | Falcon-180B-Chat | [Here](https://huggingface.co/tiiuae/falcon-180b-chat) | *chat model* | Falcon-180B finetuned on a mixture of [Ultrachat](https://huggingface.co/datasets/stingning/ultrachat), [Platypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus) and [Airoboros](https://huggingface.co/datasets/jondurbin/airoboros-2.1). |
34
+ | πŸ₯ˆ **Falcon-40B** | [Here](https://huggingface.co/tiiuae/falcon-40b) | *pretrained model* | 40B parameters trained on 1,000 billion tokens. |
35
  | Falcon-40B-Instruct | [Here](https://huggingface.co/tiiuae/falcon-40b-instruct) | *instruction/chat model* | Falcon-40B finetuned on the [Baize](https://github.com/project-baize/baize-chatbot) dataset. |
36
+ | πŸ₯‰ **Falcon-7B** | [Here](https://huggingface.co/tiiuae/falcon-7b) | *pretrained model* | 6.7B parameters trained on 1,500 billion tokens. |
37
  | Falcon-7B-Instruct | [Here](https://huggingface.co/tiiuae/falcon-7b-instruct) | *instruction/chat model* | Falcon-7B finetuned on the [Baize](https://github.com/project-baize/baize-chatbot), [GPT4All](https://github.com/nomic-ai/gpt4all), and [GPTeacher](https://github.com/teknium1/GPTeacher) datasets. |
38
  | πŸ“€ **RefinedWeb** | [Here](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) | *pretraining web dataset* | ~600 billion "high-quality" tokens. |
39
  | Falcon-RW-1B | [Here](https://huggingface.co/tiiuae/falcon-rw-1b) | *pretrained model* | 1.3B parameters trained on 350 billion tokens. |