File size: 4,431 Bytes

# Content Classification LoRA Adapter for Gemma-2B

A LoRA adapter for unsloth/gemma-2b that determines content indexing suitability using chain-of-thought reasoning.

Used in a pipeline.


## Technical Specifications

### Base Model
- Model: unsloth/gemma-2b
- LoRA Rank: 64
- Target Modules: q_proj, up_proj, down_proj, gate_proj, o_proj, k_proj, v_proj
- Task: CAUSAL_LM
- Dropout: 0
- Alpha: 32

### Input/Output Format

Input XML structure:
```xml
<instruction>Determine true or false if the following content is suitable and should be indexed.</instruction>
<suitable>
  <content>{input_text}</content>
```

Output XML structure:
```xml
  <thinking>{reasoning_process}</thinking>
  <category>{content_type}</category>
  <should_index>{true|false}</should_index>
</suitable>

```

The model then expects an indefinite list of `<suitable> ... </suitable>` that you may not want. But you can use this to do fewshots with incontext learning to correct a mistake or enhance the results.

Your stop token should be `</suitable>`.

## Deployment

### VLLM Server Setup
```bash
export VLLM_ALLOW_RUNTIME_LORA_UPDATING=1
export VLLM_ALLOW_LONG_MAX_MODEL_LEN=1

vllm serve unsloth/gemma-2-2b \
  --gpu-memory-utilization=1 \
  --port 6002 \
  --served-model-name="gemma" \
  --trust-remote-code \
  --max-model-len 8192 \
  --disable-log-requests \
  --enable-lora \
  --lora-modules lora=./dataset/output/unsloth/lora_model \
  --max-lora-rank 64
```

### Processing Pipeline

1. Install Dependencies:
```bash
pip install requests tqdm concurrent.futures
```

2. Run Content Processor:
```bash
python process.py --input corpus.jsonl --output results.jsonl --threads 24
```

### Client Implementation

```python
import requests

def classify_content(text: str, vllm_url: str = "http://localhost:6002/v1/completions") -> dict:
    xml_content = (
        '<instruction>Determine true or false if the following content is '
        'suitable and should be indexed.</instruction>\n'
        '<suitable>\n'
        f'  <content>{text}</content>'
    )
    
    response = requests.post(
        vllm_url,
        json={
            "prompt": xml_content,
            "max_tokens": 6000,
            "temperature": 1,
            "model": "lora",
            "stop": ["</suitable>"]
        },
        timeout=30000
    )
    
    completion = response.json()["choices"][0]["text"]
    
    # Parse XML tags
    import re
    def extract_tag(tag: str) -> str:
        match = re.search(f'<{tag}>(.*?)</{tag}>', completion, re.DOTALL)
        return match.group(1).strip() if match else ""
        
    return {
        "thinking": extract_tag("thinking"),
        "category": extract_tag("category"),
        "should_index": extract_tag("should_index")
    }
```

### Example Usage

```python
text = """Multiservice Tactics, Techniques, and Procedures
for
Nuclear, Biological, and Chemical Aspects of Consequence
Management

TABLE OF CONTENTS..."""

result = classify_content(text)
print(result)
```

Example output:
```json
{
    "thinking": "This is a table of contents for a document, not the actual content.",
    "category": "table of contents",
    "should_index": "false"
}
```

## Batch Processing

The included processor supports parallel processing of JSONL files:

```python
from request_processor import RequestProcessor

processor = RequestProcessor(
    input_file="corpus.jsonl",
    output_file="results.jsonl",
    num_threads=24
)
processor.process_file()
```

Input JSONL format:
```json
{
    "pid": "document_id",
    "docid": "path/to/source",
    "content": "document text",
    "metadata": {
        "key": "value"
    }
}
```

Output JSONL format:
```json
{
    "pid": "document_id",
    "docid": "path/to/source",
    "content": "document text",
    "metadata": {
        "key": "value"
    },
    "thinking": "reasoning process",
    "category": "content type",
    "should_index": "true/false",
    "processed_at": "2024-10-22 02:52:33"
}
```

## Implementation and Performance Considerations

- Use thread pooling for parallel processing
- Implement atomic writes with file locking
- Progress tracking with tqdm
- Automatic error handling and logging
- Configurable thread count for optimization

## Error Handling

Errors are captured in the output JSONL:
```json
{
    "error": "error message",
    "processed_at": "timestamp"
}
```

Monitor errors in real-time:
```bash
tail -f results.jsonl | grep error
```