|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
- precision |
|
base_model: |
|
- nvidia/Llama-3.1-Nemotron-70B-Instruct-HF |
|
new_version: nvidia/Llama-3.1-Nemotron-70B-Instruct-HF |
|
pipeline_tag: text-classification |
|
library_name: transformers |
|
tags: |
|
- llm |
|
- oil-and-gas |
|
- engineering |
|
- custom-llm |
|
- ogai-3.1-engineer |
|
- nvidia |
|
- llama |
|
- Nemotron |
|
- drilling-engineering |
|
--- |
|
|
|
# OGAI 3.1 Engineer |
|
|
|
**Model Author:** Gain.Energy |
|
**Lead Developers:** Dr. Vlad Karén Payrazyan, CEO and Founder at Gain.Energy; Tommy Xaypanya, Lead AI Scientist and Developer at Gain.Energy |
|
**Date Created:** November 12, 2024 |
|
|
|
## Overview |
|
|
|
**OGAI 3.1 Engineer** is a large language model built on NVIDIA’s **Llama-3.1-Nemotron-70B-Instruct-HF** and customized specifically for the oil and gas industry, with a focus on drilling engineering. This model has been fine-tuned to understand and process technical calculations, interpret engineering documents, and generate domain-specific insights, making it a valuable asset for engineers and analysts. |
|
|
|
**Applications:** |
|
- Complex engineering calculations |
|
- Document interpretation and summarization |
|
- Drilling optimization and safety compliance |
|
- Collaborative, real-time engineering workspaces |
|
|
|
--- |
|
|
|
## Model Details |
|
|
|
- **Base Model:** nvidia/Llama-3.1-Nemotron-70B-Instruct-HF |
|
- **Parameter Count:** 70 billion |
|
- **Architecture:** Transformer-based |
|
- **Input Format:** Text prompts up to 128k tokens |
|
- **Output Format:** Text responses up to 4k tokens |
|
|
|
## Revision History |
|
|
|
### Revision 1.0 - Initial Release (November 12, 2024) |
|
- **Base Model:** nvidia/Llama-3.1-Nemotron-70B-Instruct-HF |
|
- **Custom Training:** Focused on oil and gas drilling engineering documents, industry standards, technical calculations, and safety protocols. |
|
- **Training Data:** |
|
- Industry-specific manuals, textbooks, and historical operational data. |
|
- Preprocessed datasets to ensure consistency and confidentiality. |
|
- **Fine-Tuning Techniques:** |
|
- **Low-Rank Adaptation (LoRA):** Applied LoRA for efficient parameter fine-tuning. |
|
- **Retrieval-Augmented Generation (RAG):** Integrated for real-time knowledge base retrieval. |
|
- **Prompt Engineering:** Crafted domain-specific prompts for enhanced accuracy. |
|
|
|
--- |
|
|
|
## Installation |
|
|
|
To install and run **OGAI 3.1 Engineer**, you’ll need: |
|
- Python 3.9 or higher |
|
- PyTorch 1.12 or higher |
|
- CUDA 11.8 for GPU support |
|
|
|
### Clone the Repository |
|
|
|
```bash |
|
git clone https://huggingface.co/gain-energy/OGAI-3.1-Engineer |
|
cd OGAI-3.1-Engineer |
|
pip install -r requirements.txt |
|
``` |
|
|
|
--- |
|
|
|
### Usage Example |
|
|
|
Here is an example code to load and interact with OGAI 3.1 Engineer: |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model_name = "gain-energy/OGAI-3.1-Engineer" |
|
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto") |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
prompt = "Calculate the mud weight required for a well with a true vertical depth of 15,000 feet and formation pressure of 10,000 psi." |
|
inputs = tokenizer(prompt, return_tensors="pt").to("cuda") |
|
outputs = model.generate(**inputs, max_length=200) |
|
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(generated_text) |
|
``` |
|
--- |
|
|
|
## Model Performance and Evaluation |
|
|
|
The model was benchmarked on several evaluation metrics relevant to oil and gas applications: |
|
- Domain-Specific Accuracy: 88% accuracy in answering technical questions. |
|
- Calculation Precision: Improved calculation accuracy by 90% over baseline. |
|
- Benchmark Scores: |
|
- Arena Hard: 86.5% |
|
- AlpacaEval 2.0 LC: 60% |
|
- GPT-4-Turbo MT-Bench: Score of 9.1 |
|
|
|
--- |
|
## Training and Fine-Tuning |
|
|
|
- Training Hardware: NVIDIA DGX systems with A100 GPUs (80 GB VRAM per GPU). |
|
- Training Parameters: Batch size of 8 per GPU, learning rate of 1e-4 with a cosine decay, 3 epochs. |
|
- Optimization Algorithm: AdamW with weight decay. |
|
|
|
--- |
|
## Intended Use and Limitations |
|
|
|
### Intended Use |
|
|
|
OGAI 3.1 Engineer is intended for professionals in the oil and gas industry, particularly those focused on drilling operations, safety compliance, and technical calculations. Its specialized training enables it to handle domain-specific terminology, calculations, and documentation with a high degree of accuracy. |
|
|
|
### Limitations |
|
|
|
- Numerical Computation: While enhanced for complex calculations, the model may require external computational tools for highly intricate numerical tasks. |
|
- Generalization: The model may not perform optimally on general knowledge topics outside its fine-tuned oil and gas domain. |
|
|
|
--- |
|
|
|
## License |
|
|
|
This model is released under the Apache License 2.0. Please see the LICENSE file for more details. |
|
|
|
--- |
|
|
|
## Acknowledgments |
|
|
|
Special thanks to NVIDIA AI Research for the development of the base model and to the Gain.Energy team for domain expertise and support in model fine-tuning and evaluation. |
|
|
|
--- |
|
### Contact Information |
|
|
|
For support, inquiries, or collaboration opportunities, please contact: |
|
|
|
- Tommy Xaypanya |
|
Lead AI Scientist and Developer at Gain.Energy |
|
Email: [email protected] |
|
|
|
- Dr. Vlad Karén Payrazyan |
|
CEO and Founder at Gain.Energy |
|
Email: [email protected] |
|
|
|
--- |
|
model-index: |
|
- name: OGAI 3.1 Engineer |
|
results: |
|
- task: |
|
type: text-generation |
|
dataset: |
|
name: oil_gas_docs |
|
type: GainEnergy-OilGasDocs |
|
metrics: |
|
- name: Domain-Specific Accuracy |
|
type: accuracy |
|
value: 88.0 |
|
source: |
|
name: Gain Energy Internal Evaluation |
|
url: https://gain.energy/evaluations/ogai-3-1-engineer |
|
|
|
- task: |
|
type: text-generation |
|
dataset: |
|
name: technical_calculations |
|
type: TechnicalCalculations-OilGas |
|
metrics: |
|
- name: Calculation Precision |
|
type: precision |
|
value: 90.0 |
|
source: |
|
name: Gain Energy Internal Evaluation |
|
url: https://gain.energy/evaluations/ogai-3-1-engineer |
|
|
|
- task: |
|
type: text-generation |
|
dataset: |
|
name: arena_hard |
|
type: arena_hard |
|
metrics: |
|
- name: Arena Hard |
|
type: helpfulness and alignment |
|
value: 86.5 |
|
source: |
|
name: Gain Energy Internal Evaluation |
|
url: https://gain.energy/evaluations/ogai-3-1-engineer |
|
|
|
- task: |
|
type: text-generation |
|
dataset: |
|
name: alpaca_eval_2_lc |
|
type: AlpacaEval 2.0 Length Controlled |
|
metrics: |
|
- name: AlpacaEval 2.0 Length Controlled (LC) |
|
type: length-controlled |
|
value: 60.0 |
|
source: |
|
name: Gain Energy Internal Evaluation |
|
url: https://gain.energy/evaluations/ogai-3-1-engineer |
|
|
|
- task: |
|
type: text-generation |
|
dataset: |
|
name: gpt_4_turbo_mt_bench |
|
type: gpt_4_turbo_mt_bench |
|
metrics: |
|
- name: GPT-4-Turbo MT-Bench |
|
type: reasoning and problem-solving |
|
value: 9.1 |
|
source: |
|
name: Gain Energy Internal Evaluation |
|
url: https://gain.energy/evaluations/ogai-3-1-engineer |