Kadir Erturk

KadirErturk
·

AI & ML interests

sensors and machine learning

Recent Activity

updated a dataset 20 days ago
KadirErturk/jenny-tts-tags-6h-v1
reacted to singhsidhukuldeep's post with 🔥 about 1 month ago
Good folks at @nvidia and @Tsinghua_Uni have released LLAMA-MESH - A Revolutionary Approach to 3D Content Generation! This innovative framework enables the direct generation of 3D meshes from natural language prompts while maintaining strong language capabilities. Here is the Architecture & Implementation! >> Core Components Model Foundation - If you haven't guessed it yet, it's built on the LLaMA-3.1-8B-Instruct base model - Maintains original language capabilities while adding 3D generation - Context length is set to 8,000 tokens 3D Representation Strategy - Uses the OBJ file format for mesh representation - Quantizes vertex coordinates into 64 discrete bins per axis - Sorts vertices by z-y-x coordinates, from lowest to highest - Sorts faces by the lowest vertex indices for consistency Data Processing Pipeline - Filters meshes to a maximum of 500 faces for computational efficiency - Applies random rotations (0°, 90°, 180°, 270°) for data augmentation - Generates ~125k mesh variations from 31k base meshes - Uses Cap3D-generated captions for text descriptions >> Training Framework Dataset Composition - 40% Mesh Generation tasks - 20% Mesh Understanding tasks - 40% General Conversation (UltraChat dataset) - 8x training turns for generation, 4x for understanding Training Configuration - Deployed on 32 A100 GPUs (for Nvidia, this is literally in-house) - 21,000 training iterations - Global batch size: 128 - AdamW optimizer with a 1e-5 learning rate - 30-step warmup with cosine scheduling - Total training time: approximately 3 days (based on the paper) This research opens exciting possibilities for intuitive 3D content creation through natural language interaction. The future of digital design is conversational!
View all activity

Organizations

None yet

KadirErturk's activity

reacted to singhsidhukuldeep's post with 🔥 about 1 month ago
view post
Post
2299
Good folks at @nvidia and @Tsinghua_Uni have released LLAMA-MESH - A Revolutionary Approach to 3D Content Generation!

This innovative framework enables the direct generation of 3D meshes from natural language prompts while maintaining strong language capabilities.

Here is the Architecture & Implementation!

>> Core Components

Model Foundation
- If you haven't guessed it yet, it's built on the LLaMA-3.1-8B-Instruct base model
- Maintains original language capabilities while adding 3D generation
- Context length is set to 8,000 tokens

3D Representation Strategy
- Uses the OBJ file format for mesh representation
- Quantizes vertex coordinates into 64 discrete bins per axis
- Sorts vertices by z-y-x coordinates, from lowest to highest
- Sorts faces by the lowest vertex indices for consistency

Data Processing Pipeline
- Filters meshes to a maximum of 500 faces for computational efficiency
- Applies random rotations (0°, 90°, 180°, 270°) for data augmentation
- Generates ~125k mesh variations from 31k base meshes
- Uses Cap3D-generated captions for text descriptions

>> Training Framework

Dataset Composition
- 40% Mesh Generation tasks
- 20% Mesh Understanding tasks
- 40% General Conversation (UltraChat dataset)
- 8x training turns for generation, 4x for understanding

Training Configuration
- Deployed on 32 A100 GPUs (for Nvidia, this is literally in-house)
- 21,000 training iterations
- Global batch size: 128
- AdamW optimizer with a 1e-5 learning rate
- 30-step warmup with cosine scheduling
- Total training time: approximately 3 days (based on the paper)

This research opens exciting possibilities for intuitive 3D content creation through natural language interaction. The future of digital design is conversational!