# Content Classification LoRA Adapter for Gemma-2B A LoRA adapter for unsloth/gemma-2b that determines content indexing suitability using chain-of-thought reasoning. Note: This is used in a pipeline to determine if a context should be indexed or not. ## Technical Specifications ### Base Model - Model: unsloth/gemma-2b - LoRA Rank: 64 - Target Modules: q_proj, up_proj, down_proj, gate_proj, o_proj, k_proj, v_proj - Task: CAUSAL_LM - Dropout: 0 - Alpha: 32 ### Input/Output Format Input XML structure: ```xml Determine true or false if the following content is suitable and should be indexed. {input_text} ``` Output XML structure: ```xml {reasoning_process} {content_type} {true|false} ``` The model then expects an indefinite list of ` ... ` that you may not want. But you can use this to do fewshots with incontext learning to correct a mistake or enhance the results. Your stop token should be ``. ## Deployment ### VLLM Server Setup ```bash export VLLM_ALLOW_RUNTIME_LORA_UPDATING=1 export VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve unsloth/gemma-2-2b \ --gpu-memory-utilization=1 \ --port 6002 \ --served-model-name="gemma" \ --trust-remote-code \ --max-model-len 8192 \ --disable-log-requests \ --enable-lora \ --lora-modules lora=./dataset/output/unsloth/lora_model \ --max-lora-rank 64 ``` ### Processing Pipeline 1. Install Dependencies: ```bash pip install requests tqdm concurrent.futures ``` 2. Run Content Processor: ```bash python process.py --input corpus.jsonl --output results.jsonl --threads 24 ``` ### Client Implementation ```python import requests def classify_content(text: str, vllm_url: str = "http://localhost:6002/v1/completions") -> dict: xml_content = ( 'Determine true or false if the following content is ' 'suitable and should be indexed.\n' '\n' f' {text}' ) response = requests.post( vllm_url, json={ "prompt": xml_content, "max_tokens": 6000, "temperature": 1, "model": "lora", "stop": [""] }, timeout=30000 ) completion = response.json()["choices"][0]["text"] # Parse XML tags import re def extract_tag(tag: str) -> str: match = re.search(f'<{tag}>(.*?)', completion, re.DOTALL) return match.group(1).strip() if match else "" return { "thinking": extract_tag("thinking"), "category": extract_tag("category"), "should_index": extract_tag("should_index") } ``` ### Example Usage ```python text = """Multiservice Tactics, Techniques, and Procedures for Nuclear, Biological, and Chemical Aspects of Consequence Management TABLE OF CONTENTS...""" result = classify_content(text) print(result) ``` Example output: ```json { "thinking": "This is a table of contents for a document, not the actual content.", "category": "table of contents", "should_index": "false" } ``` ## Batch Processing The included processor supports parallel processing of JSONL files: ```python from request_processor import RequestProcessor processor = RequestProcessor( input_file="corpus.jsonl", output_file="results.jsonl", num_threads=24 ) processor.process_file() ``` Input JSONL format: ```json { "pid": "document_id", "docid": "path/to/source", "content": "document text", "metadata": { "key": "value" } } ``` Output JSONL format: ```json { "pid": "document_id", "docid": "path/to/source", "content": "document text", "metadata": { "key": "value" }, "thinking": "reasoning process", "category": "content type", "should_index": "true/false", "processed_at": "2024-10-22 02:52:33" } ``` ## Performance Considerations - Uses thread pooling for parallel processing - Implements atomic writes with file locking - Progress tracking with tqdm - Automatic error handling and logging - Configurable thread count for optimization ## Error Handling Errors are captured in the output JSONL: ```json { "error": "error message", "processed_at": "timestamp" } ``` Monitor errors in real-time: ```bash tail -f results.jsonl | grep error ```