Addaci commited on
Commit
bd4ae42
1 Parent(s): 63ac82a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -2
README.md CHANGED
@@ -11,10 +11,12 @@ MarineLives is a volunteer-led collaboration for the transcription and enrichmen
11
  records from the C16th and C17th. The records provide a rich and underutilised source of social, material
12
  and economic history.
13
 
14
- We have two datasets available to researchers working on Early Modern English in the late C16th and
15
  early to mid-C17th
16
- 1. Ground Truth [420,000 tokens]
17
  2. Machine transcribed and hand corrected corpus [4.5 mill tokens]
 
 
18
 
19
  Dataset 1 is a full diplomatic transcription, preserving abbreviations, contractions, capitalisation, punctuation,
20
  spelling variation, and syntax. It comprises roughly thirty different notarial hands drawn from sixteen different
@@ -36,3 +38,8 @@ HCA 13/57; HCA 13/58; HCA 13/61; HCA 13/63; HCA 13/68; HCA 13/71; HCA 13/73; HCA
36
  We are working on a significantly larger version of Dataset 2, which (when complete) will have circa 30 mill tokens
37
  and will comprise fifty-nine complete volumes of Admiralty Court depositions made between 1570 and 1685. We are targeting
38
  completion end 2025.
 
 
 
 
 
 
11
  records from the C16th and C17th. The records provide a rich and underutilised source of social, material
12
  and economic history.
13
 
14
+ We have three datasets available to researchers working on Early Modern English in the late C16th and
15
  early to mid-C17th
16
+ 1. Hand transcribed Ground Truth [420,000 tokens]
17
  2. Machine transcribed and hand corrected corpus [4.5 mill tokens]
18
+ 3. Hand transcribed Early Modern non-elite letters [100,000 tokens]
19
+
20
 
21
  Dataset 1 is a full diplomatic transcription, preserving abbreviations, contractions, capitalisation, punctuation,
22
  spelling variation, and syntax. It comprises roughly thirty different notarial hands drawn from sixteen different
 
38
  We are working on a significantly larger version of Dataset 2, which (when complete) will have circa 30 mill tokens
39
  and will comprise fifty-nine complete volumes of Admiralty Court depositions made between 1570 and 1685. We are targeting
40
  completion end 2025.
41
+
42
+ Dataset 3 is a full diplomatic transciption of 400 Early Modern letters, preserving abbreviations, contractions,
43
+ capitalisation, punctuation, spelling variation, and syntax. It comprises over 250 hands of non-elite writers, largely men
44
+ but some women, from a range of marine related occupations - mariners, shore tradesmen, dockyard employees
45
+ - written between 1600 and 1685