File size: 5,155 Bytes

8dae4d0
 
89797de
9c74028
 
62351d6
 
7da6e8d
5273f22
 
8dae4d0
618a701
8dae4d0
618a701
8dae4d0
 
 
 
811fe2a
8dae4d0
 
 
 
 
7c66f9e
 
687ae14
7c66f9e
 
 
 
 
8dae4d0
7c66f9e
 
c1e796c
9c74028
618a701
c1e796c
8dae4d0
7c66f9e
8dae4d0
 
618a701
18959e7
618a701
3c81dd9
618a701
3c81dd9
8dae4d0
 
 
 
 
 
618a701
 
 
 
 
 
 
 
8dae4d0
618a701
06222c5
618a701
 
66f267b
 
 
618a701
a935f18
618a701
 
3f338f6
 
618a701
 
 
 
 
 
 
3f338f6
 
618a701

---
license: apache-2.0
pipeline_tag: text-generation
datasets:
- JulesBelveze/tldr_news
language:
- en
library_name: elm
tags:
- elm
---
# SliceX AI™ ELM (Efficient Language Models)
**ELM** (which stands for **E**fficient **L**anguage **M**odels) is the first version in the series of cutting-edge language models from [SliceX AI](https://slicex.ai) that is designed to achieve the best in class performance in terms of _quality_, _throughput_ & _memory_.

<div align="center">
  <img src="elm-rambutan.png" width="256"/>
</div>

ELM is designed to be a modular and customizable family of neural networks that are highly efficient and performant. Today we are sharing the first version in this series: **ELM-v0.1** models (named _Rambutan_). 

_Model:_ ELM introduces a new type of _(de)-composable LLM model architecture_ along with the algorithmic optimizations required to learn (training) and run (inference) these models. At a high level, we train a single ELM model in a self-supervised manner (during pre-training phase) but once trained the ELM model can be sliced in many ways to fit different user/task needs. The optimizations can be applied to the model either during the pre-training and/or fine-tuning stage. 

_Fast Inference with Customization:_ Once trained, the ELM model architecture permits flexible inference strategies at runtime depending on the deployment needs. For instance, the ELM model can  be _decomposed_ into smaller slices, i.e., smaller (or larger) models can be extracted from the original model to create multiple inference endpoints. Alternatively, the original (single) ELM model can be loaded _as is_ for inference and different slices within the model can be queried directly to power faster inference. This provides an additional level of flexibility for users to make compute/memory tradeoffs depending on their application and runtime needs.

- **Blog:** [Medium](https://medium.com/sujith-ravi/introducing-elm-efficient-customizable-privacy-preserving-llms-cea56e4f727d)

- **Github:** https://github.com/slicex-ai/elm

- **Demo** (try it out): https://huggingface.co/spaces/slicexai/elm-demo-v1

- **HuggingFace** (access ELM Model cards, code & app from HF): https://huggingface.co/slicexai

## ELM-v0.1 Model Release
This repository contains code to run our ELM models. The current ELM model `elm-v0.1` (named _Rambutan_) was pre-trained (an intermediate checkpoint was used) and then instruction fine-tuned for downstream tasks.

ELM models (in the `models` folder) in this repository come in three sizes (`elm-1.0`, `elm-0.75` and `elm-0.25`). **All these different slices are extracted from the same ELM finetuned checkpoint for inference** and supports the following use-case.
- news_content_generation (tldr_news dataset)

**NOTE: ELM-v0.1 release is an early version finetuned from an intermediate pretrained checkpoint & without any KV caching, decoding optimizations, or quantization applied.**


## Setup ELM
### Download ELM repo
```bash
sudo apt-get install git-lfs 
git lfs install
git clone https://huggingface.co/slicexai/elm-v0.1_news_content_generation
```
For Macbook, replace `sudo apt-get install git-lfs` with `brew install git-lfs`
### Installation
```bash
cd elm-v0.1_news_content_generation
pip install -r requirements.txt
```

(Optional) Installing git-lfs without sudo,
```bash
wget https://github.com/git-lfs/git-lfs/releases/download/v3.2.0/git-lfs-linux-amd64-v3.2.0.tar.gz
tar -xzf git-lfs-linux-amd64-v3.2.0.tar.gz
PATH=$PATH:/<absolute-path>/git-lfs-3.2.0/
git lfs install
```



## How to use: Run ELM on a sample task
```bash
python run.py <elm-model-directory>
- python run.py elm-1.0_news_content_generation
- python run.py elm-0.75_news_content_generation
- python run.py elm-0.25_news_content_generation
``` 
Prompts for the specific tasks can be found in the corresponding checkpoint directory. See an example below from `models/elm-0.75_news_content_generation/example_prompts.json`.
```json
{
    "inputs": ["Scientists Invent 'Invisible' Metamaterial With Bonus Reflect Mode"],
    "template": "[INST]The following headline is the headline of a news report. Please write the content of the news passage based on only this headline.\n\nHeadline: {input} \n\nContent:[/INST]"
}
```

Running the above command returns the following response

```json
{
    "prompt": "[INST]The following headline is the headline of a news report. Please write the content of the news passage based on only this headline.\n\nHeadline: Scientists Invent 'Invisible' Metamaterial With Bonus Reflect Mode \n\nContent:[/INST]",
    "response": "A team of scientists have created an invisible material that can make objects disappear. It is made of a special material that creates a layer of nanoscale dots that allow light to enter from the material, directing it to a layer of gas that allows light to enter from the material. The material is able to levitate and roll off its surface without leaving the material. This technology could have many future applications in battery technology, microelectronics, and more. A video demonstrating the material is available in the article."
}
```