File size: 5,944 Bytes

ce0f3a5
 
 
 
602ce95
21429de
cd59df0
602ce95
cd59df0
21429de
31cda2d
 
 
 
 
 
 
 
 
 
 
 
 
21429de
8224ccc
e4e6dfd
c597078
 
ce0f3a5
9be4a91
36ce1a8
76d6603
ce0f3a5
 
36ce1a8
cdc6556
 
7b8f585
ce0f3a5
36ce1a8
 
 
 
 
 
 
 
76d6603
ce0f3a5
 
 
 
36ce1a8
ce0f3a5
 
 
 
 
 
17e5236
 
 
36ce1a8
add6ad0
ce0f3a5
875614e
ce0f3a5
 
b8e00de
36ce1a8
add6ad0
 
 
 
 
 
76d6603
36ce1a8
 
 
 
875614e
36ce1a8
 
 
 
cdc6556
 
36ce1a8
cdc6556
36ce1a8
 
 
 
 
 
 
 
 
 
809dc9a
 
36ce1a8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
809dc9a
36ce1a8
 
 
809dc9a
36ce1a8
875614e
36ce1a8
 
 
875614e
36ce1a8
b8e00de

---
license: apache-2.0
---

In this project, we have refined the capabilities of a pre-existing model to assess **the Big Five personality traits** for a given text/sentence. By meticulously fine-tuning this model using a specially curated dataset tailored for personality traits, it has learned to correlate specific textual inputs with distinct personality characteristics. This targeted approach has significantly enhanced the model's precision in identifying the Big Five personality traits from text, outperforming other models that were developed or fine-tuned on more generalized datasets.

The **accuracy** reaches 80%, and **F1 score** is 79%. Both are much higher than the similar personality-detection models hosted in huggingface. In other words, our model remarkably outperforms other models.
Due to the fact that the output values are continuous, it is better to use mean squared errors (MSE) or mean absolute error (MAE) to evaluate the model's performance. 
When both metrics are smaller, it indciates that the model performs better. Our models performance: **MSE: 0.07**, **MAE: 0.14**.

Please **cite**: 

```
article{wang2024personality,
  title={Continuous Output Personality Detection Models via Mixed Strategy Training},
  author={Rong Wang, Kun Sun},
  year={2024},
  journal={ArXiv},
  url={https://arxiv.org/abs/2406.16223}
}
```

The project of predicting human cognition and emotion, and training details are available at: https://github.com/fivehills/detecting_personality

You can obtain the personality scores for an input text in the App **[KevSun/Personality_Test]**(https://huggingface.co/spaces/KevSun/Personality_Test).

The following provides the code to implement the task of detecting personality from an input text. However, there are two cases:


```python
# install these packages before importing them (transformers, PyTorch)
# install these packages before importing them (transformers, PyTorch)
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

warnings.filterwarnings('ignore')
model = AutoModelForSequenceClassification.from_pretrained("KevSun/Personality_LM", ignore_mismatched_sizes=True)
tokenizer = AutoTokenizer.from_pretrained("KevSun/Personality_LM")

# Choose between direct text input or file input
use_file = False  # Set to True if you want to read from a file

if use_file:
    file_path = 'path/to/your/textfile.txt'  # Replace with your file path
    with open(file_path, 'r', encoding='utf-8') as file:
        new_text = file.read()
else:
    new_text = "President Joe Biden said on Wednesday he pulled out of the race against Republican Donald Trump over concerns about the future of U.S. democracy, explaining he was stepping aside to allow a new generation to take over in his first public remarks since ending his re-election bid. In an Oval Office address, Biden invoked previous presidents Thomas Jefferson, George Washington, and Abraham Lincoln as he described his love for the office that he will leave in six months, capping a half century in public office."

# Encode the text using the same tokenizer used during training
encoded_input = tokenizer(new_text, return_tensors='pt', padding=True, truncation=True, max_length=64)

model.eval()  # Set the model to evaluation mode

# Perform the prediction
with torch.no_grad():
    outputs = model(**encoded_input)


predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_scores = predictions[0].tolist()


trait_names = ["agreeableness", "openness", "conscientiousness", "extraversion", "neuroticism"]


for trait, score in zip(trait_names, predicted_scores):
    print(f"{trait}: {score:.4f}")

##"output":
#agreeableness: 0.2138
#openness: 0.2890
#conscientiousness: 0.1921
#extraversion: 0.1307
#neuroticism: 0.1744



```

**Alternatively**, you can use the following code to make inference based on the **bash** terminal.
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
import argparse

warnings.filterwarnings('ignore')

def load_model_and_tokenizer(model_name):
    model = AutoModelForSequenceClassification.from_pretrained(model_name, ignore_mismatched_sizes=True)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    return model, tokenizer

def process_input(input_text, tokenizer, max_length=64):
    return tokenizer(input_text, return_tensors='pt', padding=True, truncation=True, max_length=max_length)

def predict_personality(model, encoded_input):
    model.eval()  # Set the model to evaluation mode
    with torch.no_grad():
        outputs = model(**encoded_input)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    return predictions[0].tolist()

def print_predictions(predictions, trait_names):
    for trait, score in zip(trait_names, predictions):
        print(f"{trait}: {score:.4f}")

def main():
    parser = argparse.ArgumentParser(description="Predict personality traits from text.")
    parser.add_argument("--input", type=str, required=True, help="Input text or path to text file")
    parser.add_argument("--model", type=str, default="KevSun/Personality_LM", help="Model name or path")
    args = parser.parse_args()

    model, tokenizer = load_model_and_tokenizer(args.model)

    # Check if input is a file path or direct text
    if args.input.endswith('.txt'):
        with open(args.input, 'r', encoding='utf-8') as file:
            input_text = file.read()
    else:
        input_text = args.input

    encoded_input = process_input(input_text, tokenizer)
    predictions = predict_personality(model, encoded_input)

    trait_names = ["Agreeableness", "Openness", "Conscientiousness", "Extraversion", "Neuroticism"]
    print_predictions(predictions, trait_names)

if __name__ == "__main__":
    main()

```
```bash
python script_name.py --input "Your text here"
```
or 
```bash
python script_name.py --input path/to/your/textfile.txt
```