cosmosage_v2 / README.md

Update README.md

75691ae verified 9 months ago

6.55 kB

	---
	tags:
	- physics
	- cosmology
	model-index:
	- name: cosmosage_qa
	results: []
	license: mit
	language:
	- en
	pipeline_tag: text-generation
	base_model: mistralai/Mistral-7B-v0.1
	---

	# cosmosage

	Cosmosage is a natural-language cosmology assistant that can answer questions about cosmology.

	cosmosage_v2 first underwent continued pretraining based on thousands of papers and textbooks,
	and was subsequently fine-tuned on synthetically-generated question-answer pairs. It is a full
	chat model, though it excels in Q&A mode, where the model gives a single answer in response to
	a single question.

	The code used to generate cosmosage_v2 is available at https://github.com/tijmen/cosmosage

	## Usage

	After downloading cosmosage_v2, the following example code can be used to ask questions:

	```python
	path_to_model = 'cosmosage_v2/'

	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch
	device = "cuda"
	model = AutoModelForCausalLM.from_pretrained(path_to_model).to(device)
	tokenizer = AutoTokenizer.from_pretrained(path_to_model)
	def ask_cosmosage(question):
	input_ids = torch.cat([
	tokenizer.encode("You are cosmosage, an AI programmed to be a cosmology expert. You answer the USER's question clearly in long form, always providing context. When appropriate, provide a reference.", return_tensors="pt"),
	torch.tensor([[28705]]),
	tokenizer.encode("USER:", add_special_tokens=False, return_tensors="pt"),
	tokenizer.encode(question, add_special_tokens=False, return_tensors="pt"),
	torch.tensor([[28705]]),
	tokenizer.encode("ASSISTANT:", add_special_tokens=False, return_tensors="pt")
	], dim=-1).to(device)
	generated_ids = model.generate(input_ids, max_length=input_ids.shape[1] + 1000, do_sample=True)
	return tokenizer.decode(generated_ids[0], skip_special_tokens=True)```

	## Comparison to cosmosage_v1

	cosmosage_v2 is a more knowledgeable model than cosmosage_v1 due to being pretrained on the papers and
	textbooks, rather than just on synthetically generated QA pairs. However, it continues to struggle with
	_reliability_. While many of its answers are factually accurate, some are not. The outputs of cosmosage
	(or any LLM) should not be trusted to be factual.

	[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.4.0`
	```yaml
	base_model: /workspace/output/cosmosage_base/
	model_type: MistralForCausalLM
	tokenizer_type: LlamaTokenizer
	is_mistral_derived_model: true

	load_in_8bit: false
	load_in_4bit: false
	strict: false

	datasets:
	- path: /workspace/input/datasets/qa_tune/arxiv_metadata_qa3.jsonl
	type: sharegpt
	- path: /workspace/input/datasets/qa_tune/arxiv_refined_qa.jsonl
	type: sharegpt
	- path: /workspace/input/datasets/qa_tune/arxiv_summary3.jsonl
	type: sharegpt
	- path: /workspace/input/datasets/qa_tune/cosmology_qa.jsonl
	type: alpaca_chat.load_qa
	- path: /workspace/input/datasets/qa_tune/openhermes2_5.jsonl
	type: sharegpt
	- path: /workspace/input/datasets/qa_tune/cosmology_textbooks_qa.jsonl
	type: alpaca_chat.load_qa
	- path: /workspace/input/datasets/qa_tune/physics_astro_qa.jsonl
	type: alpaca_chat.load_qa

	dataset_prepared_path: /workspace/output/qa_tune_prepared
	val_set_size: 0.001
	output_dir: /workspace/output/cosmosage_qa

	chat_template: inst

	adapter:
	lora_model_dir:

	sequence_len: 4096
	sample_packing: true
	pad_to_sequence_len: true

	lora_r:
	lora_alpha:
	lora_dropout:
	lora_target_modules:
	lora_target_linear:
	lora_fan_in_fan_out:

	seed: 702

	wandb_project:
	wandb_entity:
	wandb_watch:
	wandb_name:
	wandb_log_model:

	gradient_accumulation_steps: 1
	micro_batch_size: 4
	num_epochs: 2.0
	optimizer: adamw_torch
	lr_scheduler: linear
	learning_rate: 0.000002
	max_grad_norm: 3.0

	train_on_inputs: false
	group_by_length: false
	bf16: true
	fp16: false
	tf32: false

	gradient_checkpointing: true
	early_stopping_patience:
	resume_from_checkpoint:
	local_rank:
	logging_steps: 1
	xformers_attention:
	flash_attention: true

	warmup_steps: 100
	eval_steps: 0.05
	eval_table_size:
	eval_table_max_new_tokens: 128
	saves_per_epoch: 1
	save_total_limit: 2
	debug:
	deepspeed: /workspace/axolotl/deepspeed_configs/zero1.json
	weight_decay:
	fsdp:
	fsdp_config:
	special_tokens:
	bos_token: "<s>"
	eos_token: "</s>"
	unk_token: "<unk>"

	ddp_timeout: 7200000

	```

	</details><br>

	# workspace/output/cosmosage_qa

	This model was trained from scratch on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.5673

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-06
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 702
	- distributed_type: multi-GPU
	- num_devices: 4
	- total_train_batch_size: 16
	- total_eval_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 2.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|
	\| 1.1004 \| 0.0 \| 1 \| 1.1450 \|
	\| 0.7343 \| 0.1 \| 909 \| 0.7093 \|
	\| 0.697 \| 0.2 \| 1818 \| 0.6630 \|
	\| 0.6386 \| 0.3 \| 2727 \| 0.6380 \|
	\| 0.5687 \| 0.4 \| 3636 \| 0.6212 \|
	\| 0.5857 \| 0.5 \| 4545 \| 0.6083 \|
	\| 0.6161 \| 0.6 \| 5454 \| 0.5986 \|
	\| 0.522 \| 0.7 \| 6363 \| 0.5894 \|
	\| 0.5563 \| 0.8 \| 7272 \| 0.5825 \|
	\| 0.6176 \| 0.9 \| 8181 \| 0.5766 \|
	\| 0.5948 \| 1.0 \| 9090 \| 0.5719 \|
	\| 0.4269 \| 1.08 \| 9999 \| 0.5817 \|
	\| 0.4858 \| 1.18 \| 10908 \| 0.5796 \|
	\| 0.4909 \| 1.28 \| 11817 \| 0.5765 \|
	\| 0.4325 \| 1.38 \| 12726 \| 0.5746 \|
	\| 0.4037 \| 1.48 \| 13635 \| 0.5720 \|
	\| 0.507 \| 1.58 \| 14544 \| 0.5706 \|
	\| 0.4778 \| 1.68 \| 15453 \| 0.5697 \|
	\| 0.4599 \| 1.78 \| 16362 \| 0.5683 \|
	\| 0.4515 \| 1.88 \| 17271 \| 0.5673 \|


	### Framework versions

	- Transformers 4.38.0.dev0
	- Pytorch 2.0.1+cu118
	- Datasets 2.17.0
	- Tokenizers 0.15.0