Spaces:
Running
Running
prompt_author: Will Weaver, Kendall Fitzgerald | |
prompt_author_institution: University of Michigan, Field Museum of Natural History | |
prompt_name: FMNH_mammals_test6 | |
prompt_version: v-6 | |
prompt_description: Prompt developed by the University of Michigan. Adapted from SLTPvM. | |
SLTPvB prompts all have standardized column headers (fields) that were chosen due | |
to their reliability and prevalence in herbarium records. All field descriptions | |
are based on the official Darwin Core guidelines. SLTPvB_long - The most verbose | |
prompt option. Descriptions closely follow DwC guides. Detailed rules for the LLM | |
to follow. Works best with double or triple OCR to increase attention back to the | |
OCR (select 'use both OCR models' or 'handwritten + printed' along with trOCR). | |
SLTPvB_medium - Shorter verion of _long. SLTPvB_short - The least verbose possible | |
prompt while still providing rules and DwC descriptions. | |
LLM: General Purpose | |
instructions: 1. Refactor the unstructured OCR text into a dictionary based on the | |
JSON structure outlined below. 2. Map the unstructured OCR text to the appropriate | |
JSON key and populate the field given the user-defined rules. 3. JSON key values | |
are permitted to remain empty strings if the corresponding information is not found | |
in the unstructured OCR text. 4. Duplicate dictionary fields are not allowed. 5. | |
Ensure all JSON keys are in camel case. 6. Ensure new JSON field values follow sentence | |
case capitalization. 7. Ensure all key-value pairs in the JSON dictionary strictly | |
adhere to the format and data types specified in the template. 8. Ensure output | |
JSON string is valid JSON format. It should not have trailing commas or unquoted | |
keys. 9. Only return a JSON dictionary represented as a string. You should not explain | |
your answer. | |
json_formatting_instructions: This section provides rules for formatting each JSON | |
value organized by the JSON key. | |
rules: | |
catalogNumber: Barcode identifier, typically a number with at least 6 digits, but | |
fewer than 30 digits. | |
scientificName: The scientific name of the taxon including genus, specific epithet, | |
and any lower classifications. Occasionally, the genus or specific epithet will | |
be crossed out with pen or pencil and the correct genus or specific epithet name will | |
be written above it. In this case, use the text written above the crossed-out | |
text. | |
genus: Taxonomic determination to genus. Genus must be capitalized. If genus is | |
not present use the taxonomic family name followed by the word 'indet'. Occasionally, | |
the genus name will be crossed out with pen or pencil and the correct genus name | |
will be written above it. In this case, use the name written above the crossed | |
out name. | |
specificEpithet: The name of the species epithet of the scientificName. Only include | |
the species epithet. Occasionally, the specific epithet name will be crossed out | |
with pen or pencil and the correct specific epithet name will be written above | |
it. In this case, use the name written above the crossed out name. | |
speciesNameAuthorship: The authorship information for the scientificName formatted | |
according to the conventions of the applicable Darwin Core nomenclatural code. | |
collectedBy: A comma separated list of names of people, groups, or organizations | |
responsible for observing, recording, collecting, or presenting the original specimen. | |
The primary collector or observer should be listed first. | |
collectorNumber: An identifier given to the occurrence at the time it was recorded, | |
the specimen collectors number. It is often written vertically on the edge of | |
the paper tag, with a line separating it from other information. It is often written | |
in the y-axis orientation while the rest of the numbers, data and text are written | |
in the x-axis orientation. It is sometimes written next to the sex symbol or next | |
to the collector name or initials. | |
identifiedBy: A comma separated list of names of people, groups, or organizations | |
who assigned the taxon to the subject organism. This is not the specimen collector. | |
verbatimCollectionDate: The verbatim original representation of the date and time | |
information for when the specimen was collected. Date of collection exactly as | |
it appears on the label. Do not change the format or correct typos. | |
collectionDate: Date the specimen was collected formatted as year-month-day, YYYY-MM-DD. | |
If specific components of the date are unknown, they should be replaced with zeros. | |
Use 0000-00-00 if the entire date is unknown, YYYY-00-00 if only the year is known, | |
and YYYY-MM-00 if year and month are known but day is not. | |
collectionDateEnd: If a range of collection dates is provided, this is the later | |
end date while collectionDate is the beginning date. Use the same formatting as | |
for collectionDate. | |
occurrenceRemarks: Verbatim text describing the specimens geographic location. Text | |
describing the appearance of the specimen. A statement about the presence or absence | |
of a taxon at a the collection location. Text describing the significance of the | |
specimen, such as a specific expedition or notable collection. Description of | |
mammal features such as size, color, wellbeing, molting pattern, smell and any | |
other distinguishing morphological or physiological characteristics. | |
habitat: Verbatim category or description of the habitat in which the specimen collection | |
event occurred. | |
country: The name of the country or major administrative unit in which the specimen | |
was originally collected. | |
stateProvince: The name of the next smaller administrative region than country (state, | |
province, canton, department, region, etc.) in which the specimen was originally | |
collected. | |
county: The full, unabbreviated name of the next smaller administrative region than | |
stateProvince (county, shire, department, parish etc.) in which the specimen was | |
originally collected. | |
locality: Description of geographic location, landscape, landmarks, regional features, | |
nearby places, municipality, city, or any contextual information aiding in pinpointing | |
the exact origin or location of the specimen. | |
verbatimCoordinates: Verbatim location coordinates as they appear on the label. | |
Do not convert formats. Possible coordinate types include [Lat, Long, UTM, TRS]. | |
decimalLatitude: Latitude decimal coordinate. Correct and convert the verbatim location | |
coordinates to conform with the decimal degrees GPS coordinate format. | |
decimalLongitude: Longitude decimal coordinate. Correct and convert the verbatim | |
location coordinates to conform with the decimal degrees GPS coordinate format. | |
elevationUnits: Use m if the final elevation is reported in meters. Use ft if the | |
final elevation is in feet. Units should match elevation. | |
measurementsTL: The total length of the animal from snout to tip of the tail. This | |
is usually a 3 digit number. It is the first number in a string of 3 or 4 measurement | |
numbers that are usually separated by dashes, commas or spaces or are sometimes | |
written vertically in the same order. This total length measurement will be the | |
largest number in the series of 3 or 4 measurements numbers. | |
measurementsTV: The length of the tail vertebrae of the animal from the first tail | |
vertebrae to the last tail vertebrae. This is usually a minimum of 1 digit to | |
a maximum of 3 digit number. It is the second number in a string of 3 or 4 measurement | |
numbers that are usually separated by dashes, commas or spaces or are sometimes | |
written vertically in the same order. | |
measurementsHF: The length of the hindfoot of the animal with claw (H.F. cu) from | |
the ankle to the tip of the longest claw. This is usually has at least 2 digits | |
and a maximum of 3 digit number. It is the third number in a string of 3 or 4 | |
measurement numbers that are usually separated by dashes, commas or spaces or | |
are sometimes written vertically in the same order. | |
measurementsEAR: The length of the ear of the animal. This is usually a 1 to 3 digit | |
number. It is usually the fourth number in a string of 3 or 4 measurement numbers | |
that are usually separated by dashes, commas or spaces or are sometimes written | |
vertically in the same order. | |
measurementsWEIGHT: The weight of the animal. This is usually a 1 to 3 digit number. | |
It is sometimes preceded by an equal sign and or followed by the letter g which | |
stands for the unit of grams. It is sometimes followed or preceded by the letters | |
lbs for the unit of pounds. | |
catalogNumberFMNH: Barcode identifier, typically a number with at least 3 digits, | |
but fewer than 8 digits. It is typically preceded by or near the words Field Museum, | |
FM, FMNH, or CNMH. | |
collectionMethod: Mammals are sometimes intentionally caught by collectors, brought | |
to collectors as roadkill or brought to collectors after being killed as pest. | |
Text description may include description of how the animal was killed, for example | |
as roadkill or in a trap or by a hunter. Record that information verbatim here. | |
measurementsTLunits: Use mm if the Total Length is recorded in millimeters. Use | |
in if the Total Length is recorded in inches. Units should match measurementsTVunits | |
and measurementsHFunits and measurementsEARunits. | |
measurementsTVunits: Use mm if the Tail Length is recorded in millimeters. Use in | |
if the Tail Length is recorded in inches. Units should match measurementsTLunits | |
and measurementsHFunits and measurementsEARunits. | |
measurementsHFunits: Use mm if the hindfoot length is recorded in millimeters. Use | |
in if the hindfoot length is recorded in inches. Units should match measurementsTVunits | |
and measurementsTLunits and measurementsEARunits. | |
measurementsEARunits: Use mm if the ear length is recorded in millimeters. Use in | |
if the ear length is recorded in inches. Units should match measurementsTVunits | |
and measurementsTLunits and measurementsHFunits. | |
measurementsWEIGHTunits: Use g if the weight is recorded in millimeters. Use lbs | |
if the weight is recorded in pounds. | |
elevation: Elevation or altitude in meters or feet. | |
mapping: | |
TAXONOMY: | |
- catalogNumber | |
- scientificName | |
- genus | |
- specificEpithet | |
- speciesNameAuthorship | |
- collectedBy | |
- collectorNumber | |
- identifiedBy | |
- catalogNumberFMNH | |
GEOGRAPHY: | |
- country | |
- stateProvince | |
- county | |
- locality | |
- verbatimCoordinates | |
- decimalLatitude | |
- decimalLongitude | |
- elevationUnits | |
- elevation | |
COLLECTING: | |
- verbatimCollectionDate | |
- collectionDate | |
- collectionDateEnd | |
- habitat | |
- occurrenceRemarks | |
- collectionMethod | |
LOCALITY: [] | |
MISC: | |
- measurementsTL | |
- measurementsTV | |
- measurementsEAR | |
- measurementsHF | |
- measurementsWEIGHT | |
- measurementsTLunits | |
- measurementsTVunits | |
- measurementsHFunits | |
- measurementsEARunits | |
- measurementsWEIGHTunits | |