QagentS commited on
Commit
47ca69e
1 Parent(s): 830946f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -0
README.md CHANGED
@@ -1,3 +1,55 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ language:
4
+ - en
5
+ metrics:
6
+ - accuracy
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - code
10
+ - sql
11
+ - text2sql
12
+ - instruction_tuned
13
+ - basemodel
14
+ - jax
15
+ - pytorch
16
+ datasets:
17
+ - PipableAI/spider-bird
18
  ---
19
+ # Pipable’s pipSQL
20
+
21
+ Pipable’s pipSQL is a model distilled from llama 1b to generate sql queries given prompt and schema.
22
+ We used a unique pipeline which involved the model working on two objectives alternatively ----
23
+ 1. Maximizing the log prob of all tokens in the sequence (including the prompt tokens)
24
+ 2. Minimizng the difference between the true value and the predicted maximum value of the output tokens i.e generated tokens for the sql query slice of the entire sequence.
25
+
26
+
27
+
28
+
29
+
30
+ ## License
31
+
32
+ The model's new weights along with all other assets involved with it are open sourced under mit license.
33
+
34
+ ## How to Use
35
+
36
+ ```python
37
+ text = """<schema>{schema}</schema>
38
+ <question>{question}</question>
39
+ <sql>"""
40
+ ```
41
+
42
+ ```python
43
+ from transformers import AutoModelForCasualLM, AutoTokenizer
44
+ device = "cuda"
45
+ model = AutoModelForCausalLM.from_pretrained("PipableAI/pipSQL")
46
+ tokenizer = AutoTokenizer.from_pretrained("PipableAI/pipSQL")
47
+
48
+ inputs = tokenizer(text, return_tensors="pt")
49
+ outputs = model.generate(**inputs, max_new_tokens=200)
50
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True).split('<sql>')[1].split('</sql>')[0])
51
+ ```
52
+
53
+ ## The PipableAI team
54
+
55
+ Avi Kothari, Pratham Gupta, Ritvik Aryan Kalra, Rohan Bhatial, Soham Acharya , Gyan Ranjan