nicholasKluge
/

RewardModel

Text Classification

preference model

Inference Endpoints

Model card Files Files and versions Community

RewardModel / README.md

nicholasKluge's picture

Update README.md

74719a6 over 1 year ago

|

3.28 kB

	---
	license: apache-2.0
	datasets:
	- nicholasKluge/reward-aira-dataset
	language:
	- en
	metrics:
	- accuracy
	library_name: transformers
	pipeline_tag: text-classification
	tags:
	- reward model
	- reard
	- alignment
	---
	# RewardModel (Portuguese-BR)

	The `RewardModel` is a modified BERT model that can be used to score the quality of completion to a given prompt. It is based on a [BERT model](https://huggingface.co/bert-base-cased), modified to act as a regression model.

	The `RewardModel` allows the specification of an `alpha` parameter, which is a multiplier to the reward score. This multiplier is set to 1 during training (since our reward values are bounded between -1 and 1) but can be changed at inference to allow for rewards with higher bounds.

	The model was trained with a dataset composed of `prompt`, `completions`, and annotated `rewards`.

	> Note: These prompt + completions are samples of intruction datasets created via the [Self-Instruct](https://github.com/yizhongw/self-instruct) framework.

	## Usage

	Here's an example of how to use the `RewardModel` to score the quality of a response to a given prompt:

	```python
	from transformers import AutoTokenizer,AutoConfig, AutoModel
	import torch

	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

	config = AutoConfig.from_pretrained('nicholasKluge/RewardModel', trust_remote_code=True, revision='main')
	tokenizer = AutoTokenizer.from_pretrained('nicholasKluge/RewardModel', trust_remote_code=True, config=config, revision='main')
	rewardModel = AutoModel.from_pretrained('nicholasKluge/RewardModel', trust_remote_code=True, config=config, revision='main')

	rewardModel.to(device)
	rewardModel.eval()


	# Define the question and response
	question = "What is the capital of France?"
	response1 = "Paris, France's capital, is a major European city and a global center for art, fashion, gastronomy and culture."
	response2 = "Google it pal."

	# Tokenize the question and response
	tokens = tokenizer(question, response1,
	return_token_type_ids=False,
	return_tensors="pt",
	return_attention_mask=True)

	tokens.to(device)

	# Score the response
	score = model(**tokens, alpha=10).item()

	print(f"Question: {question} \n")
	print(f"Response 1: {response1} Score: {score:.3f}")

	tokens = tokenizer(question, response2,
	return_token_type_ids=False,
	return_tensors="pt",
	return_attention_mask=True)

	tokens.to(device)

	score = model(**tokens, alpha=10).item()

	print(f"Response 2: {response2} Score: {score:.3f}")
	```

	This will output the following:

	```markdown
	>>> Question: What is the capital of France?

	>>>Response: Paris, France's capital, is a major European city and a global center for art, fashion, gastronomy and culture. Score: 3.183
	>>>Response: Google it pal. Score: -5.781
	```

	## Performance

	\| Acc \| [WebGPT](https://huggingface.co/datasets/openai/webgpt_comparisons) \| [SytheticGPT](https://huggingface.co/datasets/openai/summarize_from_feedback) \|
	\|---\|---\|---\|
	\| [Aira-RewardModel](https://huggingface.co/nicholasKluge/RewardModel) \| 53.54% \| 99.80% \|

	## License

	The `RewardModel` is licensed under the Apache License, Version 2.0. See the [LICENSE](LICENSE) file for more details.