iterateai commited on
Commit
9230022
1 Parent(s): b5b06d0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -27
README.md CHANGED
@@ -34,60 +34,89 @@ The result is Interplay-AppCoder LLM, a brand new high performing code generatio
34
 
35
 
36
 
37
- - **Demo [optional]:** [https://appcoder.interplay.iterate.ai/]
38
 
39
 
40
 
41
  ## Bias, Risks, and Limitations
42
 
43
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 
44
 
45
- [More Information Needed]
46
-
47
- ### Recommendations
48
-
49
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
50
-
51
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
52
 
53
  ## How to Get Started with the Model
54
 
55
  Use the code below to get started with the model.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
- [More Information Needed]
58
-
59
- ## Training Details
60
-
61
- ### Training Data
62
-
63
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
64
 
65
- [More Information Needed]
66
 
67
- ### Training Procedure
68
 
69
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
70
 
 
71
 
72
 
73
- ## Evaluation
74
 
75
- <!-- This section describes the evaluation protocols and provides the results. -->
 
76
 
77
- ### Testing Data, Factors & Metrics
78
 
79
- #### Testing Data
 
80
 
81
- <!-- This should link to a Dataset Card if possible. -->
82
 
83
- [More Information Needed]
84
 
 
85
 
86
- #### Metrics
87
 
88
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
89
 
90
- [More Information Needed]
91
 
92
 
93
 
 
34
 
35
 
36
 
37
+ - **Demo :** [https://appcoder.interplay.iterate.ai/]
38
 
39
 
40
 
41
  ## Bias, Risks, and Limitations
42
 
43
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
44
+ The model is optimized for code generation and cannot be used as chat model.
45
 
 
 
 
 
 
 
 
46
 
47
  ## How to Get Started with the Model
48
 
49
  Use the code below to get started with the model.
50
+ #import model from hugging face repository
51
+ import torch
52
+ from transformers import (
53
+ AutoModelForCausalLM,
54
+ AutoTokenizer,
55
+ BitsAndBytesConfig,
56
+ HfArgumentParser,
57
+ pipeline,
58
+ logging
59
+ )
60
+ model_repo_id ="iterateai/Interplay-AppCoder"
61
+
62
+ #### Load the model in FP16
63
+ iterate_model = AutoModelForCausalLM.from_pretrained(
64
+ model_repo_id,
65
+ low_cpu_mem_usage=True,
66
+ return_dict=True,
67
+ torch_dtype=torch.float16,
68
+ device_map={"": 0},
69
+ trust_remote_code=True
70
+ )
71
+ #Note: You can quantize the model using bnb confi parameter to load the model in T4 GPU
72
+
73
+ ### Load tokenizer to save it
74
+ tokenizer = AutoTokenizer.from_pretrained(model_repo_id, trust_remote_code=True)
75
+ tokenizer.pad_token = tokenizer.eos_token
76
+ tokenizer.padding_side = "right"
77
+
78
+ ### Inferencing
79
+
80
+ logging.set_verbosity(logging.CRITICAL)
81
+ #### Sample prompt
82
+ prompt = "Can you provide a python script that uses the YOLOv8 model from the Ultralytics library to detect people in an image, draw green bounding boxes around them, and then save the image?"
83
+
84
+ pipe = pipeline(task="text-generation", model=iterate_model, tokenizer=tokenizer, max_length=1024)
85
+ result = pipe(f"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: {prompt} ### Response:",temperature=0.1,do_sample=True)
86
+ print(result[0]['generated_text'])
87
+
88
+ ## Sample demo notebook
89
+ [https://colab.research.google.com/drive/1USuNLFxLex-C5tLHYET_nQfpM4ALCbc5?usp=sharing#scrollTo=lNCZTBj1nBsJ]
90
 
91
+ ## Evaluation
 
 
 
 
 
 
92
 
93
+ <!-- This section describes the evaluation protocols and provides the results. -->
94
 
95
+ #### Testing Data
96
 
97
+ <!-- This should link to a Dataset Card if possible. -->
98
 
99
+ Dataset used for evaluation [https://drive.google.com/file/d/1R6DDyBhcR6TSUYFTgUosJxrvibkR1BHC/view]
100
 
101
 
102
+ #### Metrics
103
 
104
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
105
+ Our CodeGeneration LLM was created and fine-tuned with a new and unique knowledge base. As such, we utilized the newly published ICE score benchmark methodology for evaluating the code generated by the Interplay-AppCoder LLM.
106
 
107
+ The ICE methodology provides metrics for Usefulness and Functional Correctness as a baseline for scoring code generation.
108
 
109
+ * Usefulness: addresses whether the code output from the model is clear, presented in logical order, and maintains human readability and whether it covers all functionalities of the problem statement after comparing it with the reference code.
110
+ * Functional Correctness: An LLM that has complex reasoning capabilities is utilized to conduct unit tests while considering the given question and the reference code.
111
 
 
112
 
113
+ We utilized GPT4 to measure the above metrics and provide a score from 0-4. This is the test dataset[https://drive.google.com/file/d/1R6DDyBhcR6TSUYFTgUosJxrvibkR1BHC/view] and Jupyter notebook [https://colab.research.google.com/drive/1USuNLFxLex-C5tLHYET_nQfpM4ALCbc5?usp=sharing#scrollTo=lNCZTBj1nBsJ] we used to perform the benchmark.
114
 
115
+ You can read more about the ICE methodology in this paper.
116
 
117
+ [https://openreview.net/pdf?id=RoGZaCsGUW]
118
 
 
119
 
 
120
 
121
 
122