fionazhang
/

mistral-environment-all

@@ -15,19 +15,62 @@ should probably proofread and complete it, then remove this comment. -->
 # mistral-environment-all
-This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
@@ -53,3 +96,14 @@ The following hyperparameters were used during training:
 - Pytorch 2.1.0a0+git7bcf7da
 - Datasets 2.16.1
 - Tokenizers 0.15.0

 # mistral-environment-all
+## Model Description
+<!-- Provide a longer summary of what this model is. -->
+The model is a fine-tuned (quantized) Mistral7b model on a self-organised dataset about environmental knowledge. This model is currently still under development.
+- **Developed by:** Fiona Zhang
+- **Funded:** CSIRO, Pawsey Supercomputing Research Centre
+- **Finetuned from model:** [Mistral7b](https://huggingface.co/mistralai/Mistral-7B-v0.1)
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+This repository includes the weights learned during the training process. It should be loaded witht the pre-trained Mistral 7b and tokenizer.
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+# Load the tokenizer, adjust configuration if needed
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name)
+# Load the fine-tuned model with its trained weights
+fine_tuned_model = AutoModelForSequenceClassification.from_pretrained(
+    'fionazhang/mistral_7b_environment',
+)
+# Now you can use `fine_tuned_model` for inference or further training
+input_text = "The impact of climate change on"
+output_text = fine_tuned_model.generate(tokenizer.encode(input_text, return_tensors="pt"))
+print(tokenizer.decode(output_text[0], skip_special_tokens=True))
+```
+## Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+The fine-tuning data are parsed from these public Wikipedia websites:
+- [Environmental Issues](https://en.wikipedia.org/wiki/Environmental_issues)
+- [Natural Environment](https://en.wikipedia.org/wiki/Natural_environment)
+- [Biophysical Environment](https://en.wikipedia.org/wiki/Biophysical_environment)
+- [Ecology](https://en.wikipedia.org/wiki/Ecology)
+- [Environment (Systems)](https://en.wikipedia.org/wiki/Environment_(systems))
+- [Built Environment](https://en.wikipedia.org/wiki/Built_environment)
+- [Climate Change](https://en.wikipedia.org/wiki/Climate_change)
+- [Human Impact on the Environment](https://en.wikipedia.org/wiki/Human_impact_on_the_environment)
+- [Environment of Australia](https://en.wikipedia.org/wiki/Environment_of_Australia)
+- [Environmental Protection](https://en.wikipedia.org/wiki/Environmental_protection)
+- [Environmental Issues in Australia](https://en.wikipedia.org/wiki/Environmental_issues_in_Australia)
+The text corpus are preprocessed for better format.
 ## Training procedure
 - Pytorch 2.1.0a0+git7bcf7da
 - Datasets 2.16.1
 - Tokenizers 0.15.0
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+- **Hardware Type:** T4 GPU
+- **Hours used:** <1
+- **Cloud Provider:** Google Cloud
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]