Update README.md
Browse files
README.md
CHANGED
@@ -51,15 +51,16 @@ without the need for a lot of complex instruction verbiage - provide a text pass
|
|
51 |
|
52 |
### Benchmark Tests
|
53 |
|
54 |
-
Evaluated against the benchmark test: [RAG-Instruct-Benchmark-Tester]
|
55 |
Average of 2 Test Runs with 1 point for correct answer, 0.5 point for partial correct or blank / NF, 0.0 points for incorrect, and -1 points for hallucinations.
|
56 |
|
57 |
-
|
58 |
--Not Found Classification: 17.5%
|
59 |
--Boolean: 29%
|
60 |
--Math/Logic: 0%
|
61 |
--Complex Questions (1-5): 1 (Low)
|
62 |
--Summarization Quality (1-5): 1 (Coherent, extractive)
|
|
|
63 |
|
64 |
For test run results, please see the files ("core_rag_test" and "answer_sheet" in the repo).
|
65 |
|
@@ -70,7 +71,10 @@ For test run results, please see the files ("core_rag_test" and "answer_sheet" i
|
|
70 |
|
71 |
Any model can provide inaccurate or incomplete information, and should be used in conjunction with appropriate safeguards and fact-checking mechanisms.
|
72 |
|
73 |
-
This model can be used effective for quick testing and will be generally accurate in relatively simple extractive Q&A and basic summarization.
|
|
|
|
|
|
|
74 |
|
75 |
|
76 |
## How to Get Started with the Model
|
|
|
51 |
|
52 |
### Benchmark Tests
|
53 |
|
54 |
+
Evaluated against the benchmark test: [RAG-Instruct-Benchmark-Tester](https://www.huggingface.co/llmware/rag_instruct_benchmark_tester)
|
55 |
Average of 2 Test Runs with 1 point for correct answer, 0.5 point for partial correct or blank / NF, 0.0 points for incorrect, and -1 points for hallucinations.
|
56 |
|
57 |
+
--**Accuracy Score**: **73.25** correct out of 100
|
58 |
--Not Found Classification: 17.5%
|
59 |
--Boolean: 29%
|
60 |
--Math/Logic: 0%
|
61 |
--Complex Questions (1-5): 1 (Low)
|
62 |
--Summarization Quality (1-5): 1 (Coherent, extractive)
|
63 |
+
--Hallucinations: No hallucinations observed.
|
64 |
|
65 |
For test run results, please see the files ("core_rag_test" and "answer_sheet" in the repo).
|
66 |
|
|
|
71 |
|
72 |
Any model can provide inaccurate or incomplete information, and should be used in conjunction with appropriate safeguards and fact-checking mechanisms.
|
73 |
|
74 |
+
This model can be used effective for quick "on laptop" testing and will be generally accurate in relatively simple extractive Q&A and basic summarization.
|
75 |
+
For higher performing models, please see the larger models in the BLING series, starting at 1.3B-1.4B up to 3B.
|
76 |
+
|
77 |
+
Note: this was the smallest model that we were able to train to consistently recognize Q&A and RAG instructions.
|
78 |
|
79 |
|
80 |
## How to Get Started with the Model
|