PolyReLU_1B / README.md
Taoer's picture
Update README.md
1d745f0 verified
|
raw
history blame contribute delete
1.98 kB
---
language:
- en
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
tags:
- PolyCom
- PolyNorm
- PolyReLU
---
# Introduction
This repository contains the checkpoints of ICLR 2025 paper **[“Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models](https://arxiv.org/pdf/2411.03884)”.**
In this work, we introduce a novel activation function called **Polynomial Composition (PolyCom)**, which enhances the expressiveness of large language models (LLMs) through dynamic polynomial compositions. Our method significantly improves the performance of dense and mixture of experts (MoE) models across a variety of downstream tasks, without adding significant computational overhead.
# Datasets and Training
We use the [RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T) dataset and pretrain the PolyCom model on 250B tokens. For more training details, please refer to [the source code](https://github.com/BryceZhuo/PolyCom).
# Inference
Here is an example of how to use the PolyCom model for inference:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(path_of_model, device_map="cuda",trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(path_of_model, padding_side="right",trust_remote_code=True)
prompt = "Hello, my name is"
input_ids = tokenizer.encode(prompt, return_tensors='pt').to('cuda')
greedy_output = model.generate(input_ids)
print(tokenizer.decode(greedy_output[0], skip_special_tokens=True))
```
# Citing this work
If you find this work helpful or use it in your research, please consider citing our paper:
```bibtex
@inproceedings{zhuo2025polycom,
title={Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models},
author={Zhijian Zhuo and Ya Wang and Yutao Zeng and Xiaoqing Li and Xun Zhou and Jinwen Ma},
booktitle={ICLR 2025},
year={2025}
}
```