Syahmi Azhar's picture

Syahmi Azhar

prsyahmi

AI & ML interests

None yet

Recent Activity

liked a model 11 days ago
deepseek-ai/DeepSeek-V3
View all activity

Organizations

None yet

prsyahmi's activity

reacted to ginipick's post with ๐Ÿ‘ 23 days ago
view post
Post
4310
๐ŸŒŸ Digital Odyssey: AI Image & Video Generation Platform ๐ŸŽจ
Welcome to our all-in-one AI platform for image and video generation! ๐Ÿš€
โœจ Key Features

๐ŸŽจ High-quality image generation from text
๐ŸŽฅ Video creation from still images
๐ŸŒ Multi-language support with automatic translation
๐Ÿ› ๏ธ Advanced customization options

๐Ÿ’ซ Unique Advantages

โšก Fast and accurate results using FLUX.1-dev and Hyper-SD models
๐Ÿ”’ Robust content safety filtering system
๐ŸŽฏ Intuitive user interface
๐Ÿ› ๏ธ Extended toolkit including image upscaling and logo generation

๐ŸŽฎ How to Use

Enter your image or video description
Adjust settings as needed
Click generate
Save and share your results automatically

๐Ÿ”ง Tech Stack

FluxPipeline
Gradio
PyTorch
OpenCV

link: ginigen/Dokdo

Turn your imagination into reality with AI! โœจ
#AI #ImageGeneration #VideoGeneration #MachineLearning #CreativeTech
  • 7 replies
ยท
reacted to MoritzLaurer's post with ๐Ÿ‘ 23 days ago
view post
Post
2597
Quite excited by the ModernBERT release! 0.15/0.4B small, 2T modern pre-training data and tokenizer with code, 8k context window, great efficient model for embeddings & classification!

This will probably be the basis for many future SOTA encoders! And I can finally stop using DeBERTav3 from 2021 :D

Congrats @answerdotai , @LightOnIO and collaborators like @tomaarsen !

Paper and models here ๐Ÿ‘‡https://huggingface.co/collections/answerdotai/modernbert-67627ad707a4acbf33c41deb
ยท
reacted to singhsidhukuldeep's post with ๐Ÿค— 8 months ago
view post
Post
1457
You are all happy ๐Ÿ˜Š that @meta-llama released Llama 3.

Then you are sad ๐Ÿ˜” that it only has a context length of 8k.

Then you are happy ๐Ÿ˜„ that you can just scale llama-3 PoSE to 96k without training, only needing to modify max_position_embeddings and rope_theta.

But then you are sad ๐Ÿ˜ข it only improves the model's long-context retrieval performance (i.e., finding needles) while hardly improving its long-context utilization capability (doing QA and summarization).

But then you are happy ๐Ÿ˜ that the
@GradientsTechnologies community has released the long-context Llama-3-8B-Instruct-262K with long context (262k-1M+).

Now we have another paper "Extending Llama-3's Context Ten-Fold Overnight" ๐Ÿ“œ.

The context length of Llama-3-8B-Instruct is extended from 8K to 80K using QLoRA fine-tuningโš™๏ธ.

The training cycle is highly efficient, taking "only" ๐Ÿ˜‚ 8 hours on a single 8xA800 (80G) GPU machine.

The model also preserves its original capability over short contexts. โœ

The dramatic context extension is mainly attributed to merely 3.5K synthetic training samples generated by GPT-4.๐Ÿ“Š

The paper suggests that the context length could be extended far beyond 80K with more computation resources (๐Ÿ˜… GPU-poor).

The team plans to publicly release all resources, including data, model, data generation pipeline, and training code, to facilitate future research from the โค๏ธ community.

Paper: https://arxiv.org/abs/2404.19553

This is where we are... until next time... ๐ŸŒŸ

Extending Llama-3's Context Ten-Fold Overnight (2404.19553)