Kadir Erturk's picture

2 1 3

Kadir Erturk

KadirErturk

·

http://www.kadirerturk.com

AI & ML interests

sensors and machine learning

Recent Activity

new activity 3 days ago

marduk-ra/F5-TTS-Turkish:Any plan to change licensing to MIT or Apache 2.0?

updated a dataset 20 days ago

KadirErturk/jenny-tts-tags-6h-v1

reacted to singhsidhukuldeep's post with 🔥 about 1 month ago

Good folks at @nvidia and @Tsinghua_Uni have released LLAMA-MESH - A Revolutionary Approach to 3D Content Generation! This innovative framework enables the direct generation of 3D meshes from natural language prompts while maintaining strong language capabilities. Here is the Architecture & Implementation! >> Core Components Model Foundation - If you haven't guessed it yet, it's built on the LLaMA-3.1-8B-Instruct base model - Maintains original language capabilities while adding 3D generation - Context length is set to 8,000 tokens 3D Representation Strategy - Uses the OBJ file format for mesh representation - Quantizes vertex coordinates into 64 discrete bins per axis - Sorts vertices by z-y-x coordinates, from lowest to highest - Sorts faces by the lowest vertex indices for consistency Data Processing Pipeline - Filters meshes to a maximum of 500 faces for computational efficiency - Applies random rotations (0°, 90°, 180°, 270°) for data augmentation - Generates ~125k mesh variations from 31k base meshes - Uses Cap3D-generated captions for text descriptions >> Training Framework Dataset Composition - 40% Mesh Generation tasks - 20% Mesh Understanding tasks - 40% General Conversation (UltraChat dataset) - 8x training turns for generation, 4x for understanding Training Configuration - Deployed on 32 A100 GPUs (for Nvidia, this is literally in-house) - 21,000 training iterations - Global batch size: 128 - AdamW optimizer with a 1e-5 learning rate - 30-step warmup with cosine scheduling - Total training time: approximately 3 days (based on the paper) This research opens exciting possibilities for intuitive 3D content creation through natural language interaction. The future of digital design is conversational!

View all activity

Organizations

None yet

KadirErturk's activity

New activity in marduk-ra/F5-TTS-Turkish 3 days ago

Any plan to change licensing to MIT or Apache 2.0?

#10 opened 4 days ago by

updated a dataset 20 days ago

KadirErturk/jenny-tts-tags-6h-v1

Viewer • Updated 20 days ago • 4k • 31

reacted to singhsidhukuldeep's post with 🔥 about 1 month ago

Post

2299

Good folks at @nvidia and @Tsinghua_Uni have released LLAMA-MESH - A Revolutionary Approach to 3D Content Generation!

This innovative framework enables the direct generation of 3D meshes from natural language prompts while maintaining strong language capabilities.

Here is the Architecture & Implementation!

>> Core Components

Model Foundation
- If you haven't guessed it yet, it's built on the LLaMA-3.1-8B-Instruct base model
- Maintains original language capabilities while adding 3D generation
- Context length is set to 8,000 tokens

3D Representation Strategy
- Uses the OBJ file format for mesh representation
- Quantizes vertex coordinates into 64 discrete bins per axis
- Sorts vertices by z-y-x coordinates, from lowest to highest
- Sorts faces by the lowest vertex indices for consistency

Data Processing Pipeline
- Filters meshes to a maximum of 500 faces for computational efficiency
- Applies random rotations (0°, 90°, 180°, 270°) for data augmentation
- Generates ~125k mesh variations from 31k base meshes
- Uses Cap3D-generated captions for text descriptions

>> Training Framework

Dataset Composition
- 40% Mesh Generation tasks
- 20% Mesh Understanding tasks
- 40% General Conversation (UltraChat dataset)
- 8x training turns for generation, 4x for understanding

Training Configuration
- Deployed on 32 A100 GPUs (for Nvidia, this is literally in-house)
- 21,000 training iterations
- Global batch size: 128
- AdamW optimizer with a 1e-5 learning rate
- 30-step warmup with cosine scheduling
- Total training time: approximately 3 days (based on the paper)

This research opens exciting possibilities for intuitive 3D content creation through natural language interaction. The future of digital design is conversational!

liked a model 3 months ago

Helsinki-NLP/opus-mt-tc-big-tr-en

Translation • Updated Nov 28, 2023 • 46k • 22

updated a model 10 months ago

KadirErturk/image_info

Image-Text-to-Text • Updated Mar 14 • 11 • 1

liked a model 10 months ago

KadirErturk/image_info

Image-Text-to-Text • Updated Mar 14 • 11 • 1

liked a model 11 months ago

liuhaotian/llava-v1.6-34b

Image-Text-to-Text • Updated May 9 • 17k • 339

New activity in togethercomputer/RedPajama-INCITE-Instruct-3B-v1 over 1 year ago

Unable to run the example code on ubuntu

#8 opened over 1 year ago by