CarbonLover (CarbonLover)

kaki-paper

updated 2 models 3 months ago

CarbonLover/krx_qwen2.5_it_test

Text Generation • Updated Oct 30, 2024 • 6

CarbonLover/krx_gemma2_enhanced_v2

Text Generation • Updated Oct 24, 2024 • 3

beomi

posted an update 3 months ago

Post

5341

# PyTorch == 2.5.0 Breaks Transformers' SDPAttention!

When you encounter "RuntimeError: cuDNN Frontend error: [cudnn_frontend] Error: No execution plans support the graph."

We can use workaround like this:

torch.backends.cuda.enable_cudnn_sdp(False)

but this slow downs the performance gain from PyTorch 2.5.

Although it is fixed(not "fixed" but default option is turn-off the cuDNN SDPA) at here -- https://github.com/pytorch/pytorch/pull/138587 , but not released yet. (you need to install directly from source)

Fastest way for now : pip install "torch<2.5"

Ref: https://github.com/huggingface/diffusers/issues/9704#issuecomment-2422585273

kaki-paper

updated 2 models 4 months ago

CarbonLover/krx_gemma2_enhanced

Text Generation • Updated Oct 22, 2024 • 2

CarbonLover/krx_test_model

Text Generation • Updated Oct 20, 2024 • 8

kaki-paper

updated a dataset 4 months ago

CarbonLover/econ_law_mrc_with_cot_answer

Viewer • Updated Oct 19, 2024 • 32.1k • 41

beomi

posted an update 10 months ago

Post

14017

#TPU #PyTorch #Jax

When You're trying to use PyTorch or Jax on TPU,

for v2/v3/v4:
use tpu-ubuntu2204-base

for v5p:
use v2-alpha-tpuv5

for v5e:
use v2-alpha-tpuv5-lite

You must use these base images for the system to 'boot'.

Previously used tpu-vm-v4-pt-1.13 images might seem to start the VM, but SSH connections do not work.

I thought it was a firewall issue and spent a lot of time on it before realizing it was a problem with the boot image 🥲

https://cloud.google.com/tpu/docs/runtimes#pytorch_and_jax

beomi

posted an update 10 months ago

Post

12267

🚀 **InfiniTransformer, Gemma/Llama3 based Implementation!** 🌌

> Update @ 2024.04.19: It now supports Llama-3!

> Note: this implementation is unofficial

This implementation is designed to handle virtually infinite context lengths.

Here's the github repo: https://github.com/Beomi/InfiniTransformer

📄 **Read the original Paper:** https://arxiv.org/abs/2404.07143

## **Focus on Infini-Attention**

- **2 Types of Implementation available:** Attention-layer only implementation / Model & Train-wise implementation
- **Fixed(segment dependent) Memory Usage:** Enables training on larger models and longer sequences without the memory overhead typical of standard Transformer implementations.
- **Infinite Context Capability:** Train with unprecedented sequence lengths—imagine handling up to 1 million sequence lengths on standard hardware!
- You could train Gemma-2B with 1M sequence length with 2K segmentation size with single H100 GPU.

## **Try InfiniTransformer**

1. **Clone the repository:**

bash
   git clone https://github.com/Beomi/InfiniTransformer

2. **Install necessary tools:**

bash
   pip install -r requirements.txt
   pip install -e git+https://github.com/huggingface/transformers.git@b109257f4f#egg=transformers

3. **Dive Deep into Custom Training:**
- Train with extensive sequence lengths using scripts such as ./train.gemma.infini.noclm.1Mseq.sh.

for more detailed info, please visit Repo: https://github.com/Beomi/InfiniTransformer

Look forward to see your feedbacks! 😊

ps. Training loss plot is here 😉

2 replies

·

CarbonLover

AI & ML interests

CarbonLover's activity

CarbonLover/krx_qwen2.5_it_test

CarbonLover/krx_gemma2_enhanced_v2

CarbonLover/krx_gemma2_enhanced

CarbonLover/krx_test_model

CarbonLover/econ_law_mrc_with_cot_answer

AI & ML interests

Team members 2

CarbonLover's activity