loubnabnl HF staff commited on
Commit
835e9c6
·
1 Parent(s): 337d672

Update evaluation/demo_humaneval.md

Browse files
Files changed (1) hide show
  1. evaluation/demo_humaneval.md +1 -15
evaluation/demo_humaneval.md CHANGED
@@ -52,18 +52,4 @@ Results: {'pass@1': 0.1, 'pass@10': 0.7631, 'pass@20': 1.0}
52
  ````
53
 
54
  If we take a closer look at the unit test results for each candidate solution, we find that 2 passed the unit test. This means that we have 2 correct solutions among 20, which corresponds to our pass@1 value `2/20 = 0.1`. The scores pass@10 and pass@20 are higher, because the more samples we select from the candidate completions, the more likely we are to include the correct implementation. As
55
- for pass@20, it is `1`, since if we select all 20 candidates the problem gets solved which gives 100% success rate. If you are curious about the candidate solutions that passed the tests, they both implemented this function:
56
-
57
- ```python
58
-
59
- def truncate_number(number: float) -> float:
60
- """ Given a positive floating point number, it can be decomposed into
61
- and integer part (largest integer smaller than given number) and decimals
62
- (leftover part always smaller than 1).
63
-
64
- Return the decimal part of the number.
65
- >>> truncate_number(3.5)
66
- 0.5
67
- """
68
- return number % 1
69
- ```
 
52
  ````
53
 
54
  If we take a closer look at the unit test results for each candidate solution, we find that 2 passed the unit test. This means that we have 2 correct solutions among 20, which corresponds to our pass@1 value `2/20 = 0.1`. The scores pass@10 and pass@20 are higher, because the more samples we select from the candidate completions, the more likely we are to include the correct implementation. As
55
+ for pass@20, it is `1`, since if we select all 20 candidates the problem gets solved which gives 100% success rate.