Spaces:

markojak
/

tt-creators

Runtime error

App Files Files Community

tt-creators / README.md

markojak

Upload folder using huggingface_hub

46e6e62 verified about 2 months ago

preview code

raw

history blame contribute delete

2.24 kB

	---
	title: tt-creators
	app_file: creators.py
	sdk: gradio
	sdk_version: 5.20.0
	---
	# TikTok Creator Analyzer

	A Gradio-based tool for analyzing TikTok creator profiles from CSV files.

	## Features

	- Efficiently loads and processes millions of TikTok creator profiles
	- Caches data in Parquet format for faster subsequent loads
	- Tracks processed files to avoid reprocessing the same data
	- Incrementally updates the database when new files are added
	- Advanced search with multiple filters:
	- Follower count range (min/max)
	- Video count range (min/max)
	- Keywords in signature
	- Region filter
	- "Has Email" filter to find profiles with contact information
	- Download search results as CSV
	- Network accessible interface (binds to 0.0.0.0)
	- Shareable via temporary public URL

	## Installation

	1. Install the required dependencies:

	```bash
	pip install -r requirements.txt
	```

	2. Make sure your CSV files are in the correct location (`../data/tiktok_profiles/`)

	## Usage

	Run the script:

	```bash
	python creators.py
	```

	The first run will:
	1. Load all CSV files from the data directory
	2. Combine them into a single dataset
	3. Save the combined data as a Parquet file for faster loading in the future
	4. Track which files have been processed to avoid duplicates
	5. Launch a Gradio web interface for searching and analyzing the data

	Subsequent runs will:
	1. Load the existing data from the Parquet file
	2. Check for new CSV files that haven't been processed yet
	3. If new files exist, process only those files and update the database
	4. Launch the Gradio interface with the updated data

	The interface will be accessible from:
	- Other machines on your network at: `http://your-ip-address:7860`
	- A temporary public URL that will be displayed in the console (thanks to `share=True`)

	## Maintenance

	The application includes a Maintenance tab that shows:
	- How many files have been processed
	- When the database was last updated
	- An option to force reload all files (useful if you suspect data corruption)

	## Data Format

	The CSV files should have the following columns:
	- id
	- unique_id
	- follower_count
	- nickname
	- video_count
	- following_count
	- signature
	- email
	- bio_link
	- updated_at
	- tt_seller
	- region
	- language
	- url