promptsource / API_DOCUMENTATION.md
VictorSanh's picture
initial push
3adea03

A newer version of the Streamlit SDK is available: 1.41.1

Upgrade

Manipulating prompts

PromptSource implements 4 classes to store, manipulate and use prompts and their metadata: Template, Metadata, DatasetTemplates and TemplateCollection. All of them are implemented in templates.py

Class Template and Metadata

Template is a class that wraps a prompt, its associated metadata, and implements the helper functions to use the prompt.

Instances of Template have the following main methods that will come handy:

  • apply(example, truncate=True, highlight_variables=False): Create a prompted example by applying the template to the given example
    • example (Dict): the dataset example to create a prompt for
    • truncate (Bool, default to True): if True, example fields will be truncated to TEXT_VAR_LENGTH chars
    • highlight_variables(Bool, default to False): highlight the added variables (internal use for the app rendering)
  • get_id(): Get the uuid of the prompt
  • get_name(): Get the name of the prompt
  • get_reference(): Get any additional information about the prompt (such as bibliographic reference)
  • get_answer_choices_list(example): If applicable, returns a list of answer choices for a given example.

Each Template also has a metadata attribute, an instance of the class Metadata that encapsulates the following 3 attributes:

  • original_task: If True, this prompt asks a model to perform the original task designed for this dataset.
  • choices_in_prompt: If True, the answer choices are included in the templates such that models see those choices in the input. Only applicable to classification tasks.
  • metrics: List of strings denoting metrics to use for evaluation

Class DatasetTemplates

DatasetTemplates is a class that wraps all the prompts (each of them are instances of Template) for a specific dataset/subset and implements all the helper functions necessary to read/write to the YAML file in which the prompts are saved.

You will likely mainly be interested in getting the existing prompts and their names for a given dataset. You can do that with the following instantiation:

>>> template_key = f"{dataset_name}/{subset_name}" if subset_name is not None else dataset_name
>>> prompts = DatasetTemplates(template_key)
>>> len(prompts) # Returns the number of prompts for the given dataset
>>> prompts.all_template_names # Returns a sorted list of all templates names for this dataset

Class TemplateCollection

TemplateCollection is a class that encapsulates all the prompts available under PromptSource by wrapping the DatasetTemplates class. It initializes the DatasetTemplates for all existing template folders, gives access to each DatasetTemplates, and provides aggregated counts overall DatasetTemplates.

The main methods are:

  • get_dataset(dataset_name, subset_name): Return the DatasetTemplates object corresponding to the dataset name
    • dataset_name (Str): name of the dataset to get
    • subset_name (Str, default to None): name of the subset
  • get_templates_count(): Return the overall number count over all datasets. NB: we don't breakdown datasets into subsets for the count, i.e subsets count are included into the dataset count