Spaces:

guydav
/

restrictedpython_code_eval

Sleeping

App Files Files Community

guydav commited on Jun 13, 2023

Commit

7293ac9

1 Parent(s): 032ea0d

Added some additional parameters to control `RestrictedPython` behavior.

Browse files

Files changed (2) hide show

README.md +10 -2
restrictedpython_code_eval.py +207 -7

README.md CHANGED Viewed

@@ -46,7 +46,7 @@ The following arguments are inherited from the basic `code_eval`:
 **`timeout`** (`float`): The maximum time taken to produce a prediction before it is considered a "timeout". The default value is `3.0` (i.e. 3 seconds).
-In addition, this metric supports three additional arguments, specifying which default imports should be made available:
 **`use_safe_builtins`** (`bool`): Whether or not to allow the usage of [`RestrictedPython.safe_builtins`](https://github.com/zopefoundation/RestrictedPython/blob/c31c133844ac2308f5cc930e934a7227a2a6a77b/src/RestrictedPython/Guards.py#L23), defaults to True
@@ -54,7 +54,15 @@ In addition, this metric supports three additional arguments, specifying which d
 **`use_utility_builtins`** (`bool`): Whether or not to allow the usage of [`RestrictedPython.utility_builtins`](https://github.com/zopefoundation/RestrictedPython/blob/c31c133844ac2308f5cc930e934a7227a2a6a77b/src/RestrictedPython/Utilities.py#L19), which includes the `string`, `math`, `random`, and `set` packages, among others. Defaults to True.
-As the additional arguments are optional, this could be used as a drop-in replacement for `code_eval`.
 ### Output Values

 **`timeout`** (`float`): The maximum time taken to produce a prediction before it is considered a "timeout". The default value is `3.0` (i.e. 3 seconds).
+In addition, this metric supports three additional arguments, specifying which imports should be made available, and controlling other apsects of `RestrictedPython` behavior:
 **`use_safe_builtins`** (`bool`): Whether or not to allow the usage of [`RestrictedPython.safe_builtins`](https://github.com/zopefoundation/RestrictedPython/blob/c31c133844ac2308f5cc930e934a7227a2a6a77b/src/RestrictedPython/Guards.py#L23), defaults to True
 **`use_utility_builtins`** (`bool`): Whether or not to allow the usage of [`RestrictedPython.utility_builtins`](https://github.com/zopefoundation/RestrictedPython/blob/c31c133844ac2308f5cc930e934a7227a2a6a77b/src/RestrictedPython/Utilities.py#L19), which includes the `string`, `math`, `random`, and `set` packages, among others. Defaults to True.
+**`additional_globals`** (`Dict[str, Any] | None`): Any additional `globals` to make available to the code. Defaults to None.
+**`additional_locals`** (`Dict[str, Any] | None`): Any additional `locals` to make available to the code. Defaults to None.
+**`allowed_imports`** (`List[str] | None`): A list of allowed imports. Defaults to None.
+As the new arguments are optional, this could be used as a drop-in replacement for `code_eval`.
+Additionally, this metric sets several different `globals` if they are not provided as additional globals. The full list of globals set is: `__metaclass__, __name__, _getiter_, _iter_unpack_sequence_, _getitem_, getattr, _write_, _inplacevar_`. See the code for additional details.
 ### Output Values

restrictedpython_code_eval.py CHANGED Viewed

@@ -19,12 +19,15 @@ Lightly adapted and mostly copied verbatim from the implementation in `evaluate`
 import contextlib
 import faulthandler
 import itertools
 import io
 import multiprocessing
 import os
 import platform
 import signal
 import tempfile
 from collections import Counter, defaultdict
 from concurrent.futures import ThreadPoolExecutor, as_completed
@@ -34,6 +37,8 @@ import evaluate
 import datasets
 import numpy as np
 from RestrictedPython import compile_restricted, safe_builtins, limited_builtins, utility_builtins
 # TODO: Add BibTeX citation
@@ -65,6 +70,10 @@ Args:
     use_safe_builtins: a bool indicating whether to use the `RestrictedPython.safe_builtins`
     use_limited_builtins: a bool indicating whether to use the `RestrictedPython.limited_builtins`
     use_utility_builtins: a bool indicating whether to use the `RestrictedPython.utility_builtins`
 Returns:
     pass_at_k: dict with pass rates for each k
     results: dict with granular results of each unittest
@@ -148,7 +157,9 @@ class RestrictedPythonCodeEval(evaluate.Metric):
         )
     def _compute(self, predictions, references, k=[1, 10, 100], num_workers=4, timeout=3.0,
-                 use_safe_builtins: bool = True, use_limited_builtins: bool = True, use_utility_builtins: bool = True):
         """Returns the scores"""
         if os.getenv("HF_ALLOW_CODE_EVAL", 0) != "1":
@@ -166,7 +177,11 @@ class RestrictedPythonCodeEval(evaluate.Metric):
             for task_id, (candidates, test_case) in enumerate(zip(predictions, references)):
                 for candidate in candidates:
                     test_program = candidate + "\n" + test_case
-                    args = (test_program, timeout, task_id, completion_id[task_id], use_safe_builtins, use_limited_builtins, use_utility_builtins)
                     future = executor.submit(_check_correctness, *args)
                     futures.append(future)
                     completion_id[task_id] += 1
@@ -211,7 +226,9 @@ def estimate_pass_at_k(num_samples, num_correct, k):
 def _check_correctness(check_program, timeout, task_id, completion_id,
-                       use_safe_builtins: bool = True, use_limited_builtins: bool = True, use_utility_builtins: bool = True):
     """
     Evaluates the functional correctness of a completion by running the test
     suite provided in the problem.
@@ -222,7 +239,12 @@ def _check_correctness(check_program, timeout, task_id, completion_id,
     manager = multiprocessing.Manager()
     result = manager.list()
-    p = multiprocessing.Process(target=_unsafe_execute, args=(check_program, result, timeout, use_safe_builtins, use_limited_builtins, use_utility_builtins))
     p.start()
     p.join(timeout=timeout + 1)
     if p.is_alive():
@@ -238,8 +260,36 @@ def _check_correctness(check_program, timeout, task_id, completion_id,
         completion_id=completion_id,
     )
 def _unsafe_execute(check_program, result, timeout,
-                    use_safe_builtins: bool = True, use_limited_builtins: bool = True, use_utility_builtins: bool = True):
     with create_tempdir():
@@ -265,10 +315,42 @@ def _unsafe_execute(check_program, result, timeout,
                 builtins.update(utility_builtins)
             exec_globals = {'__builtins__': builtins}
             with swallow_io():
                 with time_limit(timeout):
                     byte_code = compile_restricted(check_program, filename="<model output>", mode="exec")
-                    exec(byte_code, exec_globals, None)
             result.append("passed")
         except TimeoutException:
             result.append("timed out")
@@ -428,4 +510,122 @@ def reliability_guard(maximum_memory_bytes=None):
     sys.modules["joblib"] = None  # type: ignore
     sys.modules["resource"] = None  # type: ignore
     sys.modules["psutil"] = None  # type: ignore
-    sys.modules["tkinter"] = None  # type: ignore

 import contextlib
 import faulthandler
 import itertools
+import importlib
 import io
 import multiprocessing
 import os
 import platform
 import signal
 import tempfile
+import types
+from typing import Optional, Dict, List, Any
 from collections import Counter, defaultdict
 from concurrent.futures import ThreadPoolExecutor, as_completed
 import datasets
 import numpy as np
 from RestrictedPython import compile_restricted, safe_builtins, limited_builtins, utility_builtins
+from RestrictedPython.Eval import default_guarded_getiter, default_guarded_getitem
+from RestrictedPython.Guards import guarded_iter_unpack_sequence, safer_getattr
 # TODO: Add BibTeX citation
     use_safe_builtins: a bool indicating whether to use the `RestrictedPython.safe_builtins`
     use_limited_builtins: a bool indicating whether to use the `RestrictedPython.limited_builtins`
     use_utility_builtins: a bool indicating whether to use the `RestrictedPython.utility_builtins`
+    additional_globals: a optional dict of additional globals to pass to the RestrictedPython interpreter
+    additional_locals: a optional dict of additional locals to pass to the RestrictedPython interpreter
+    allowed_imports: an optional list of string, modules the tested code is allowed to import
 Returns:
     pass_at_k: dict with pass rates for each k
     results: dict with granular results of each unittest
         )
     def _compute(self, predictions, references, k=[1, 10, 100], num_workers=4, timeout=3.0,
+                 use_safe_builtins: bool = True, use_limited_builtins: bool = True, use_utility_builtins: bool = True,
+                 additional_globals: Optional[Dict[str, Any]] = None, additional_locals: Optional[Dict[str, Any]] = None,
+                 allowed_imports: Optional[List[str]] = None):
         """Returns the scores"""
         if os.getenv("HF_ALLOW_CODE_EVAL", 0) != "1":
             for task_id, (candidates, test_case) in enumerate(zip(predictions, references)):
                 for candidate in candidates:
                     test_program = candidate + "\n" + test_case
+                    args = (
+                        test_program, timeout, task_id, completion_id[task_id],
+                        use_safe_builtins, use_limited_builtins, use_utility_builtins,
+                        additional_globals, additional_locals, allowed_imports
+                    )
                     future = executor.submit(_check_correctness, *args)
                     futures.append(future)
                     completion_id[task_id] += 1
 def _check_correctness(check_program, timeout, task_id, completion_id,
+                       use_safe_builtins: bool = True, use_limited_builtins: bool = True, use_utility_builtins: bool = True,
+                       additional_globals: Optional[Dict[str, Any]] = None, additional_locals: Optional[Dict[str, Any]] = None,
+                       allowed_imports: Optional[List[str]] = None):
     """
     Evaluates the functional correctness of a completion by running the test
     suite provided in the problem.
     manager = multiprocessing.Manager()
     result = manager.list()
+    args = (
+        check_program, result, timeout,
+        use_safe_builtins, use_limited_builtins, use_utility_builtins,
+        additional_globals, additional_locals, allowed_imports
+    )
+    p = multiprocessing.Process(target=_unsafe_execute, args=args)
     p.start()
     p.join(timeout=timeout + 1)
     if p.is_alive():
         completion_id=completion_id,
     )
+class AllowListImporter:
+    def __init__(self, allowed_imports: List[str]):
+        self.allowed_imports = allowed_imports
+    def __call__(self, name, globals=None, locals=None, fromlist=(), level=0):
+        if name.startswith('.'):
+            raise ImportError("Relative imports are not allowed.")
+        if '.' in name:
+            package_name, _ = name.split('.', 1)
+        else:
+            package_name = name
+        if package_name in self.allowed_imports:
+            return importlib.__import__(name, globals, locals, fromlist, level)
+def _default_write_(obj):
+    if isinstance(obj, types.ModuleType):
+        raise ValueError("Modules are not allowed in to be written to.")
+    return obj
 def _unsafe_execute(check_program, result, timeout,
+                    use_safe_builtins: bool = True, use_limited_builtins: bool = True, use_utility_builtins: bool = True,
+                    additional_globals: Optional[Dict[str, Any]] = None, additional_locals: Optional[Dict[str, Any]] = None,
+                    allowed_imports: Optional[List[str]] = None):
     with create_tempdir():
                 builtins.update(utility_builtins)
             exec_globals = {'__builtins__': builtins}
+            exec_globals.update(additional_globals or {})
+            if allowed_imports is not None:
+                if '__import__' in exec_globals['__builtins__']:
+                    raise ValueError("Cannot specify allowed_imports when __import__ is in additional_globals.")
+                exec_globals['__builtins__']['__import__'] = AllowListImporter(allowed_imports)
+            if '__metaclass__' not in exec_globals:
+                exec_globals['__metaclass__'] = type  # type: ignore
+            if '__name__' not in exec_globals:
+                exec_globals['__name__'] = '__main__'  # type: ignore
+            if '_getiter_' not in exec_globals:
+                exec_globals['_getiter_'] = default_guarded_getiter  # type: ignore
+            if '_iter_unpack_sequence_' not in exec_globals:
+                exec_globals['_iter_unpack_sequence_'] = guarded_iter_unpack_sequence  # type: ignore
+            if '_getitem_' not in exec_globals:
+                exec_globals['_getitem_'] = default_guarded_getitem  # type: ignore
+            if 'getattr' not in exec_globals:
+                exec_globals['getattr'] = safer_getattr  # type: ignore
+            if '_write_' not in exec_globals:
+                exec_globals['_write_'] = _default_write_  # type: ignore
+            if '_inplacevar_' not in exec_globals:
+                exec_globals['_inplacevar_'] = protected_inplacevar  # type: ignore
             with swallow_io():
                 with time_limit(timeout):
                     byte_code = compile_restricted(check_program, filename="<model output>", mode="exec")
+                    exec(byte_code, exec_globals, additional_locals)
             result.append("passed")
         except TimeoutException:
             result.append("timed out")
     sys.modules["joblib"] = None  # type: ignore
     sys.modules["resource"] = None  # type: ignore
     sys.modules["psutil"] = None  # type: ignore
+    sys.modules["tkinter"] = None  # type: ignore
+"""
+Borrowed implementation of _inplacevar_ from the Zope Foundations's AccessControl module
+https://github.com/zopefoundation/AccessControl/blob/f9ae58816f0712eb6ea97459b4ccafbf4662d9db/src/AccessControl/ZopeGuards.py#L530
+"""
+valid_inplace_types = (list, set)
+inplace_slots = {
+    '+=': '__iadd__',
+    '-=': '__isub__',
+    '*=': '__imul__',
+    '/=': (1 / 2 == 0) and '__idiv__' or '__itruediv__',
+    '//=': '__ifloordiv__',
+    '%=': '__imod__',
+    '**=': '__ipow__',
+    '<<=': '__ilshift__',
+    '>>=': '__irshift__',
+    '&=': '__iand__',
+    '^=': '__ixor__',
+    '|=': '__ior__',
+}
+def __iadd__(x, y):
+    x += y
+    return x
+def __isub__(x, y):
+    x -= y
+    return x
+def __imul__(x, y):
+    x *= y
+    return x
+def __idiv__(x, y):
+    x /= y
+    return x
+def __ifloordiv__(x, y):
+    x //= y
+    return x
+def __imod__(x, y):
+    x %= y
+    return x
+def __ipow__(x, y):
+    x **= y
+    return x
+def __ilshift__(x, y):
+    x <<= y
+    return x
+def __irshift__(x, y):
+    x >>= y
+    return x
+def __iand__(x, y):
+    x &= y
+    return x
+def __ixor__(x, y):
+    x ^= y
+    return x
+def __ior__(x, y):
+    x |= y
+    return x
+inplace_ops = {
+    '+=': __iadd__,
+    '-=': __isub__,
+    '*=': __imul__,
+    '/=': __idiv__,
+    '//=': __ifloordiv__,
+    '%=': __imod__,
+    '**=': __ipow__,
+    '<<=': __ilshift__,
+    '>>=': __irshift__,
+    '&=': __iand__,
+    '^=': __ixor__,
+    '|=': __ior__,
+}
+def protected_inplacevar(op, var, expr):
+    """Do an inplace operation
+    If the var has an inplace slot, then disallow the operation
+    unless the var an instance of ``valid_inplace_types``.
+    """
+    if hasattr(var, inplace_slots[op]) and \
+       not isinstance(var, valid_inplace_types):
+        try:
+            cls = var.__class__
+        except AttributeError:
+            cls = type(var)
+        raise TypeError(
+            "Augmented assignment to %s objects is not allowed"
+            " in untrusted code" % cls.__name__)
+    return inplace_ops[op](var, expr)