sourabhd commited on
Commit
ec370d0
·
verified ·
1 Parent(s): c48a9d8

Add model card for initial version

Browse files
Files changed (1) hide show
  1. README.md +147 -0
README.md ADDED
@@ -0,0 +1,147 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-sa-4.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - finance
9
+ - legal
10
+ ---
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+ RegLLM is LLM model for regulatory compliance. It has been domain adapted by unsupervised pretraining and instruction finetuned for regulatory compliance.
16
+ This release focuses on Indian Banking rules and regulations.
17
+
18
+ ## Model Details
19
+
20
+ ### Model Description
21
+
22
+ <!-- Provide a longer summary of what this model is. -->
23
+
24
+ - **Developed by:** [dataeaze systems pvt ltd](https://www.dataeaze.io/)
25
+ - **Funded by:** [dataeaze systems pvt ltd](https://www.dataeaze.io/)
26
+ - **Shared by:** [dataeaze systems pvt ltd](https://www.dataeaze.io/)
27
+ - **Model type:** MistralForCausalLM
28
+ - **Language(s) (NLP):** English
29
+ - **License:** [cc-by-nc-sa-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en) Model is made available under non-commercial use for research purposes only. For commercial usage please connect at [email protected]
30
+ - **Finetuned from model:** [zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)
31
+
32
+
33
+ ## Uses
34
+
35
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
36
+
37
+ ### Direct Use
38
+
39
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
40
+
41
+ The model has been crafted crafted to provide precise and insightful answers to a wide array of queries related to Indian Banking regulations.
42
+
43
+ ### Downstream Use
44
+
45
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
46
+
47
+ This model can be used as core component in RegTech application
48
+
49
+ ### Out-of-Scope Use
50
+
51
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
52
+
53
+ Model has been fine tuned on a specific task of answering questions related to Indian regulatory compliance.
54
+ Any use beyond this is not guaranteed to be accurate.
55
+
56
+ ## Bias, Risks, and Limitations
57
+
58
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
59
+
60
+ - **Bias:** Trained for English language only (as of now).
61
+ - **Risk:** Guardrails are reliant on the base models Mistral/Zephyr. Finetuning could impact this behaviour.
62
+ - **Limitations:** Intended to be a small model optimised for Indian regulations (as of now).
63
+
64
+
65
+ ### Recommendations
66
+
67
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
68
+
69
+ * This model is supposed to be used as an assistive AI technology. Kindly consult and verify with the source documents for decision making.
70
+ * This model should be used with grounding on a set of regulatory documents.
71
+
72
+ ## How to Get Started with the Model
73
+
74
+ ```
75
+ import torch
76
+ from transformers import pipeline
77
+
78
+ pipe = pipeline("text-generation",
79
+ model="dataeaze/dataeaze-RegLLM-zephyr_7b_beta-dzcompli",
80
+ torch_dtype=torch.bfloat16,
81
+ device_map="auto")
82
+
83
+ messages = [
84
+ {
85
+ "role": "system",
86
+ "content": "You are a compliance assistant who answers in a formal manner",
87
+ },
88
+ {"role": "user", "content": "How often should IRRBB policies be reviewed?"},
89
+ ]
90
+ prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
91
+ outputs = pipe(prompt, max_new_tokens=120, do_sample=True, temperature=0.1, top_k=50, top_p=0.95)
92
+ print(outputs[0]["generated_text"])
93
+ ```
94
+
95
+ ## Sample Output
96
+
97
+ ### Question
98
+ How often should IRRBB policies be reviewed?
99
+
100
+ ### RegLLM respose
101
+
102
+ ```
103
+ <|system|>
104
+ You are a compliance assistant who answers in a formal manner</s>
105
+ <|user|>
106
+ How often should IRRBB policies be reviewed?</s>
107
+ <|assistant|>
108
+ At least annually.
109
+ The review should be conducted by a committee consisting of the Chief Risk Officer, the Head of Market Risk, and the Head of Credit Risk.
110
+ The review should include an assessment of the effectiveness of the IRRBB policy and the adequacy of the capital and liquidity management framework.
111
+ The committee should also review the risk appetite and the risk tolerance of the organization.
112
+ The review should be documented and presented to the Board of Directors for approval.
113
+ The policy should be updated as necessary to reflect changes in the organization's business, products, or processes.
114
+ ```
115
+
116
+ ### GPT-4 response
117
+ ![gpt4 response](./gpt4_reponse.png)
118
+
119
+
120
+ ### Reference
121
+
122
+ For evalating truthfulness / hallucination of this response, refer to RBI notification
123
+ [RBI/2022-23/180
124
+ DOR.MRG.REC.102/00-00-009/2022-23](https://rbidocs.rbi.org.in/rdocs/notification/PDFs/NOTI180CF30A8446A704C11BD8267A8D0BB2AC2.PDF) (page 8)
125
+
126
+ Screenshot below
127
+
128
+ <img src="rbi_reference.png" alt="drawing" width="500"/>
129
+
130
+
131
+ As you can see, RegLLM has identified the frequency of IRRBB policies, while GPT-4 provides a more general response.
132
+ Note, that the response of RegLLM is not backed by any external knowledge.
133
+ When coupled with retriever model, RegLLM can provide fairly precise responses to user queries related to regulatory compliance.
134
+
135
+ Keep watching this space for more updates on the model and evaluations.
136
+
137
+ ## Model Card Authors
138
+
139
+ * Atharva Inamdar
140
+ * Niranjan Kakade
141
+ * Tony Tom
142
+ * Nayan Chheda
143
+ * Sourabh Daptardar
144
+
145
+ ## Model Card Contact
146
+
147
+ "dataeaze systems" <[email protected]>