File size: 6,333 Bytes
157c4ca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188

---

tags:
- BharatGPT
- CoRover
language:
- hi
- pa
- gu
- kn
- mr
- te
- ml
- or
- ta
- ur
- bn
- en
license: other

---

[![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)


# QuantFactory/BharatGPT-3B-Indic-GGUF
This is quantized version of [CoRover/BharatGPT-3B-Indic](https://huggingface.co/CoRover/BharatGPT-3B-Indic) created using llama.cpp

# Original Model Card



### Model Description

This model is fine-tuned and designed to generate multilingual outputs across multiple Indic languages. The model has been trained on a diverse and curated dataset comprising Hindi, Punjabi, Marathi, Malayalam, Oriya, Kannada, Gujarati, Bengali, Urdu, Tamil, and Telugu. It is optimized to handle natural language tasks such as translation, summarization, and conversational generation across these languages. This model is trained on authentic Indian conversational data in 12 languages. However, it is not designed for direct use as a standalone chatbot, as it lacks the latest data updates. It is best suited for S-RAG (Secure Retrieval-Augmented Generation) or fine-tuning with your own data. For enhanced performance, integration with **[Conversational Gen AI platform](https://builder.corover.ai)** is recommended (though not mandatory). This platform enables the creation of multi-modal and multi-lingual AI Agents, Co-Pilots, and Virtual Assistants (such as ChatBots, VoiceBots, and VideoBots) using a sovereign AI and composite AI approach. It leverages classic NLP, grounded generative AI, and Generally Available LLMs to deliver powerful, versatile solutions.

- **Developed by:** CoRover.ai
- **Model type:** Finetuned (Language Model for Multilingual Text Generation and Text Completion)
- **Language(s) (NLP):** Hindi, Punjabi, Marathi, Malayalam, Oriya, Kannada, Gujarati, Bengali, Urdu, Tamil, Telugu
- **Learn (Become C-CAP: CoRover Certified AI Professional):** [Get Certified in 1 Hour](https://www.udemy.com/course/corover-certified-ai-associate/?referralCode=0EFDC465CE65DF66C021)

## How to Get Started with the Model

Make sure to update your transformers and bitsandbytes installation via `pip install -U transformers` & `pip install -U bitsandbytes`

Use the code below to get started with the model.

## English
```python
import torch
from transformers import pipeline

model_id = "CoRover/BharatGPT-3B-Indic"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
messages = [
    {"role": "system", "content": "You are a helpful assistant who responds in English"},
    {"role": "user", "content": "who created you?"},
]
outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

```

## Hindi
```python
import torch
from transformers import pipeline

model_id = "CoRover/BharatGPT-3B-Indic"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
messages = [
    {"role": "system", "content": "You are a helpful assistant who responds in Hindi"},
    {"role": "user", "content": "भारत की राजधानी क्या है"},
]
outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

```

## Gujarati
```python
import torch
from transformers import pipeline

model_id = "CoRover/BharatGPT-3B-Indic"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
messages = [
    {"role": "system", "content": "You are a helpful assistant who responds in Gujarati"},
    {"role": "user", "content": "શું છે ભારતની રાજધાની"},
]
outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

```

## Marathi
```python
import torch
from transformers import pipeline

model_id = "CoRover/BharatGPT-3B-Indic"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
messages = [
    {"role": "system", "content": "You are a helpful assistant who responds in Marathi"},
    {"role": "user", "content": "भारताची राजधानी कोणती?"},
]
outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

```

## Malayalam
```python
import torch
from transformers import pipeline

model_id = "CoRover/BharatGPT-3B-Indic"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
messages = [
    {"role": "system", "content": "You are a helpful assistant who responds in Malayalam"},
    {"role": "user", "content": "ഭരത് കി രാജധാനി ഉണ്ട്"},
]
outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

```

## Training Details

### Training Data

- **Language Spectrum**: A balanced representation of Hindi, Punjabi, Marathi, Malayalam, Oriya, Kannada, Gujarati, Bengali, Urdu, Tamil, and Telugu, capturing the unique syntactic structures of each language.

## Usage and Limitations

- **License:** Non-Commercial. For academic and research purposes only. For commercial use, please visit [Conversational Gen AI platform](https://builder.corover.ai) or [Contact Us](https://corover.ai/contact/).

- **Terms of Use:**  [Terms and Conditions](https://corover.ai/terms-conditions/)

- **Responsible AI Framework**: [CoRover's Responsible AI Framework](https://corover.ai/responsible-generative-ai-key-factors-for-ai-safety-and-trust/)

## Hardware & Software

To ensure top-tier performance and scalability, the model was fine-tuned using state-of-the-art hardware and software configurations:

  - NVIDIA A100 GPUs, renowned for their unmatched computational power and efficiency in deep learning tasks, were leveraged. These GPUs, with their advanced tensor cores, provided the ability to train large-scale models with reduced training time and enhanced precision. High-bandwidth GPU interconnects ensured seamless parallel processing for handling massive multilingual datasets.