AI-Data-Cleaner / llm_prompts.py
reab5555's picture
Upload 5 files
1853d90 verified
raw
history blame contribute delete
No virus
4.09 kB
CHECK_HEADERS_PROMPT = """
Analyze the following DataFrame columns and identify any columns without names or with invalid names.
Return only a JSON list of column indices (0-based) that need attention, without any explanation.
Columns: {columns}
"""
NORMALIZE_HEADERS_PROMPT = """
Analyze the following DataFrame column names and normalize them according to these rules:
1. Convert to lowercase
2. Replace empty strings or spaces with underscores
3. Remove any invalid characters (keep only letters, numbers, and underscores)
Return only a JSON object where keys are the original column names and values are the normalized names, without any explanation.
Column names: {columns}
"""
CHECK_COLUMN_CONTENT_PROMPT = """
Analyze the following sample of values from the column '{column_name}' and determine:
1. The most appropriate data type (float, integer, string, or date)
2. Indices of empty or blank values
3. Indices of values that don't conform to the determined data type
Sample values:
{sample_values}
Return only a JSON object with the following structure, without any explanation:
{{
"data_type": "detected_type",
"empty_indices": [list of indices of empty or blank values],
"invalid_indices": [list of indices of values that don't conform to the detected type]
}}
"""
CHECK_TYPOS_PROMPT = """
Analyze the following sample of values from the column '{column_name}' and identify any potential typos or misspellings.
For each identified typo, suggest a correction.
Sample values:
{sample_values}
Return only a JSON object with the following structure, without any explanation:
{{
"typos": {{
"original_value1": "corrected_value1",
"original_value2": "corrected_value2",
...
}}
}}
If no typos are found, return an empty object for "typos".
"""
ENCODE_STRING_PROMPT = """
Analyze the following unique values from the column '{column_name}' and create an encoding scheme.
Assign a unique integer to each unique string value, starting from 0.
Unique values:
{unique_values}
Return only a JSON object with the following structure, without any explanation:
{{
"string_value1": 0,
"string_value2": 1,
"string_value3": 2,
...
}}
Ensure that each unique string value is assigned a unique integer.
"""
DETERMINE_DTYPE_PROMPT = """
Analyze the following sample values from a column and determine the most appropriate data type.
Possible types are: float, integer, string, or date.
If more than 80% of the values conform to a specific type, choose that type.
Otherwise, default to string.
Sample values:
{sample_values}
Return only a JSON object with the following structure, without any explanation:
{{
"column_type": "detected_type",
"invalid_indices": [list of indices that do not conform to the detected type]
}}
"""
TRANSFORM_STRING_PROMPT = """
Transform the following unique string values from the column '{column_name}' to lowercase.
If a value is a variation of "nan" (case-insensitive), map it to "nan".
Unique values:
{unique_values}
Return only a JSON object with the following structure, without any explanation:
{{
"original_value1": "transformed_value1",
"original_value2": "transformed_value2",
...
}}
"""
CHECK_LOW_COUNT_VALUES_PROMPT = """
Analyze the following value counts from the column '{column_name}' and identify values with a count lower than 2.
Value counts:
{value_counts}
Return only a JSON list of values that have a count lower than 2, without any explanation.
"""
CHECK_SCHEMA_CONFORMITY_PROMPT = """
Analyze the following sample of values from the column '{column_name}' and check if they conform to the determined data type '{data_type}'.
Sample values:
{sample_values}
Return only a JSON object with the following structure, without any explanation:
{{
"conforming_indices": [list of indices of values that conform to the data type],
"nonconforming_indices": [list of indices of values that do not conform to the data type]
}}
"""