doberst commited on
Commit
e028a91
·
1 Parent(s): 492f901

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -3
README.md CHANGED
@@ -51,15 +51,16 @@ without the need for a lot of complex instruction verbiage - provide a text pass
51
 
52
  ### Benchmark Tests
53
 
54
- Evaluated against the benchmark test: [RAG-Instruct-Benchmark-Tester][https://www.huggingface.co/llmware/rag_instruct_benchmark_tester]
55
  Average of 2 Test Runs with 1 point for correct answer, 0.5 point for partial correct or blank / NF, 0.0 points for incorrect, and -1 points for hallucinations.
56
 
57
- --Score: 73.25 correct out of 100
58
  --Not Found Classification: 17.5%
59
  --Boolean: 29%
60
  --Math/Logic: 0%
61
  --Complex Questions (1-5): 1 (Low)
62
  --Summarization Quality (1-5): 1 (Coherent, extractive)
 
63
 
64
  For test run results, please see the files ("core_rag_test" and "answer_sheet" in the repo).
65
 
@@ -70,7 +71,10 @@ For test run results, please see the files ("core_rag_test" and "answer_sheet" i
70
 
71
  Any model can provide inaccurate or incomplete information, and should be used in conjunction with appropriate safeguards and fact-checking mechanisms.
72
 
73
- This model can be used effective for quick testing and will be generally accurate in relatively simple extractive Q&A and basic summarization.
 
 
 
74
 
75
 
76
  ## How to Get Started with the Model
 
51
 
52
  ### Benchmark Tests
53
 
54
+ Evaluated against the benchmark test: [RAG-Instruct-Benchmark-Tester](https://www.huggingface.co/llmware/rag_instruct_benchmark_tester)
55
  Average of 2 Test Runs with 1 point for correct answer, 0.5 point for partial correct or blank / NF, 0.0 points for incorrect, and -1 points for hallucinations.
56
 
57
+ --**Accuracy Score**: **73.25** correct out of 100
58
  --Not Found Classification: 17.5%
59
  --Boolean: 29%
60
  --Math/Logic: 0%
61
  --Complex Questions (1-5): 1 (Low)
62
  --Summarization Quality (1-5): 1 (Coherent, extractive)
63
+ --Hallucinations: No hallucinations observed.
64
 
65
  For test run results, please see the files ("core_rag_test" and "answer_sheet" in the repo).
66
 
 
71
 
72
  Any model can provide inaccurate or incomplete information, and should be used in conjunction with appropriate safeguards and fact-checking mechanisms.
73
 
74
+ This model can be used effective for quick "on laptop" testing and will be generally accurate in relatively simple extractive Q&A and basic summarization.
75
+ For higher performing models, please see the larger models in the BLING series, starting at 1.3B-1.4B up to 3B.
76
+
77
+ Note: this was the smallest model that we were able to train to consistently recognize Q&A and RAG instructions.
78
 
79
 
80
  ## How to Get Started with the Model