tt-creators / README.md
markojak's picture
Upload folder using huggingface_hub
46e6e62 verified
---
title: tt-creators
app_file: creators.py
sdk: gradio
sdk_version: 5.20.0
---
# TikTok Creator Analyzer
A Gradio-based tool for analyzing TikTok creator profiles from CSV files.
## Features
- Efficiently loads and processes millions of TikTok creator profiles
- Caches data in Parquet format for faster subsequent loads
- Tracks processed files to avoid reprocessing the same data
- Incrementally updates the database when new files are added
- Advanced search with multiple filters:
- Follower count range (min/max)
- Video count range (min/max)
- Keywords in signature
- Region filter
- "Has Email" filter to find profiles with contact information
- Download search results as CSV
- Network accessible interface (binds to 0.0.0.0)
- Shareable via temporary public URL
## Installation
1. Install the required dependencies:
```bash
pip install -r requirements.txt
```
2. Make sure your CSV files are in the correct location (`../data/tiktok_profiles/`)
## Usage
Run the script:
```bash
python creators.py
```
The first run will:
1. Load all CSV files from the data directory
2. Combine them into a single dataset
3. Save the combined data as a Parquet file for faster loading in the future
4. Track which files have been processed to avoid duplicates
5. Launch a Gradio web interface for searching and analyzing the data
Subsequent runs will:
1. Load the existing data from the Parquet file
2. Check for new CSV files that haven't been processed yet
3. If new files exist, process only those files and update the database
4. Launch the Gradio interface with the updated data
The interface will be accessible from:
- Other machines on your network at: `http://your-ip-address:7860`
- A temporary public URL that will be displayed in the console (thanks to `share=True`)
## Maintenance
The application includes a Maintenance tab that shows:
- How many files have been processed
- When the database was last updated
- An option to force reload all files (useful if you suspect data corruption)
## Data Format
The CSV files should have the following columns:
- id
- unique_id
- follower_count
- nickname
- video_count
- following_count
- signature
- email
- bio_link
- updated_at
- tt_seller
- region
- language
- url