tt-creators / README.md
markojak's picture
Upload folder using huggingface_hub
46e6e62 verified

A newer version of the Gradio SDK is available: 5.27.1

Upgrade
metadata
title: tt-creators
app_file: creators.py
sdk: gradio
sdk_version: 5.20.0

TikTok Creator Analyzer

A Gradio-based tool for analyzing TikTok creator profiles from CSV files.

Features

  • Efficiently loads and processes millions of TikTok creator profiles
  • Caches data in Parquet format for faster subsequent loads
  • Tracks processed files to avoid reprocessing the same data
  • Incrementally updates the database when new files are added
  • Advanced search with multiple filters:
    • Follower count range (min/max)
    • Video count range (min/max)
    • Keywords in signature
    • Region filter
    • "Has Email" filter to find profiles with contact information
  • Download search results as CSV
  • Network accessible interface (binds to 0.0.0.0)
  • Shareable via temporary public URL

Installation

  1. Install the required dependencies:
pip install -r requirements.txt
  1. Make sure your CSV files are in the correct location (../data/tiktok_profiles/)

Usage

Run the script:

python creators.py

The first run will:

  1. Load all CSV files from the data directory
  2. Combine them into a single dataset
  3. Save the combined data as a Parquet file for faster loading in the future
  4. Track which files have been processed to avoid duplicates
  5. Launch a Gradio web interface for searching and analyzing the data

Subsequent runs will:

  1. Load the existing data from the Parquet file
  2. Check for new CSV files that haven't been processed yet
  3. If new files exist, process only those files and update the database
  4. Launch the Gradio interface with the updated data

The interface will be accessible from:

  • Other machines on your network at: http://your-ip-address:7860
  • A temporary public URL that will be displayed in the console (thanks to share=True)

Maintenance

The application includes a Maintenance tab that shows:

  • How many files have been processed
  • When the database was last updated
  • An option to force reload all files (useful if you suspect data corruption)

Data Format

The CSV files should have the following columns:

  • id
  • unique_id
  • follower_count
  • nickname
  • video_count
  • following_count
  • signature
  • email
  • bio_link
  • updated_at
  • tt_seller
  • region
  • language
  • url