|
--- |
|
license: mit |
|
license_name: deepnight-responsible-ai |
|
license_link: LICENSE |
|
--- |
|
|
|
# SaiLy 100B (deepnight-research/saily_100B) |
|
<img src="https://i.ibb.co/TvZQjZM/Leonardo-Diffusion-XL-Furious-and-strong-Elephant-and-anchor-l-1.jpg" alt="Saily: Experimental AI Models by DEEPNIGHT"> |
|
|
|
--- |
|
### SaiLy is a series/collection of AI Models by DEEPNIGHT-RESEARCH which are highly experimental and uncensored. Please use with responsibility. |
|
--- |
|
<br> |
|
*waiting for evals, the model is submitted on HuggingFace OpenLLM Leaderboard, and is currently in the pending list* |
|
Prompt Template: Alpaca |
|
|
|
``` |
|
Below is an instruction that describes a task. Write a response that appropriately completes the request. |
|
### Instruction: |
|
{prompt} |
|
### Response: |
|
``` |
|
|
|
### Description: |
|
This is the first *stable* model of the series. The model is based on Llama2-chat. |
|
|
|
--- |
|
|
|
### Did some said CODE? |
|
Here you go! |
|
```python |
|
import transformers |
|
model = transformers.AutoModelForCausalLM.from_pretrained( |
|
'deepnight-research/saily_100B' |
|
) |
|
``` |
|
|
|
To use the optimized triton implementation of FlashAttention, you can load the model on GPU ```(cuda:0)``` with ```attn_impl='triton'``` and with ```bfloat16``` precision: |
|
```python |
|
import torch |
|
import transformers |
|
|
|
name = 'deepnight-research/saily_100B' |
|
|
|
config = transformers.AutoConfig.from_pretrained(name) |
|
config.attn_config['attn_impl'] = 'triton' |
|
config.init_device = 'cuda:0' # For fast initialization directly on GPU! |
|
|
|
model = transformers.AutoModelForCausalLM.from_pretrained( |
|
name, |
|
config=config, |
|
torch_dtype=torch.bfloat16, # Load model weights in bfloat16 |
|
trust_remote_code=True |
|
) |
|
|
|
``` |
|
--- |
|
|
|
If you would like to support us, please consider donating for [#aiforcause](https://github.com/deepnight-ai/aiforcause). |
|
|
|
Cheers✌️ |
|
- Team [DEEPNIGHT](https://deepnight.tech) |