Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
59.7
TFLOPS
24
1
9
s
Tom-Neverwinter
Follow
21world's profile picture
LeroyDyer's profile picture
2 followers
Ā·
15 following
Tom-Neverwinter
AI & ML interests
Making improvements to help the world.
Recent Activity
reacted
to
csabakecskemeti
's
post
with š„
25 days ago
I've built a small utility to split safetensors file by file. The issue/need came up when I've tried to convert the new Deepseek V3 model from FP8 to BF16. The only Ada architecture GPU I have is an RTX 4080 and the 16GB vram was just wasn't enough for the conversion. BTW: I'll upload the bf16 version here: https://huggingface.co/DevQuasar/deepseek-ai.DeepSeek-V3-Base-bf16 (it will take a while - days with my upload speed) If anyone has access the resources to test it I'd appreciate a feedback if it's working or not. The tool, is available from here: https://github.com/csabakecskemeti/ai_utils/blob/main/safetensor_splitter.py It's splitting every file to n pieces by the layers if possible, and create a new "model.safetensors.index.json" file. I've tested it with Llama 3.1 8B and multiple split sizes, and validated by using inference pipeline. use `--help` for usage Please note current version expects the model is already multiple file and have a "model.safetensors.index.json" layer-safetensor mapping file.
new
activity
about 1 month ago
Apollo-LMMs/README:
model pulled
reacted
to
tomaarsen
's
post
with ā¤ļø
3 months ago
š£ Sentence Transformers v3.2.0 is out, marking the biggest release for inference in 2 years! 2 new backends for embedding models: ONNX (+ optimization & quantization) and OpenVINO, allowing for speedups up to 2x-3x AND Static Embeddings for 500x speedups at 10-20% accuracy cost. 1ļøā£ ONNX Backend: This backend uses the ONNX Runtime to accelerate model inference on both CPU and GPU, reaching up to 1.4x-3x speedup depending on the precision. We also introduce 2 helper methods for optimizing and quantizing models for (much) faster inference. 2ļøā£ OpenVINO Backend: This backend uses Intel their OpenVINO instead, outperforming ONNX in some situations on CPU. Usage is as simple as `SentenceTransformer("all-MiniLM-L6-v2", backend="onnx")`. Does your model not have an ONNX or OpenVINO file yet? No worries - it'll be autoexported for you. Thank me later š š Another major new feature is Static Embeddings: think word embeddings like GLoVe and word2vec, but modernized. Static Embeddings are bags of token embeddings that are summed together to create text embeddings, allowing for lightning-fast embeddings that don't require any neural networks. They're initialized in one of 2 ways: 1ļøā£ via Model2Vec, a new technique for distilling any Sentence Transformer models into static embeddings. Either via a pre-distilled model with `from_model2vec` or with `from_distillation` where you do the distillation yourself. It'll only take 5 seconds on GPU & 2 minutes on CPU, no dataset needed. 2ļøā£ Random initialization. This requires finetuning, but finetuning is extremely quick (e.g. I trained with 3 million pairs in 7 minutes). My final model was 6.6% worse than bge-base-en-v1.5, but 500x faster on CPU. Full release notes: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.2.0 Documentation on Speeding up Inference: https://sbert.net/docs/sentence_transformer/usage/efficiency.html
View all activity
Organizations
None yet
models
4
Sort:Ā Recently updated
Tom-Neverwinter/ew-lora
Updated
Aug 16, 2024
ā¢
8
Tom-Neverwinter/ts-lora
Updated
Aug 16, 2024
ā¢
2
Tom-Neverwinter/cr-lora
Updated
Aug 16, 2024
ā¢
4
Tom-Neverwinter/sw-lora
Updated
Aug 16, 2024
ā¢
3
datasets
None public yet