Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -11,10 +11,12 @@ MarineLives is a volunteer-led collaboration for the transcription and enrichmen
|
|
11 |
records from the C16th and C17th. The records provide a rich and underutilised source of social, material
|
12 |
and economic history.
|
13 |
|
14 |
-
We have
|
15 |
early to mid-C17th
|
16 |
-
1. Ground Truth [420,000 tokens]
|
17 |
2. Machine transcribed and hand corrected corpus [4.5 mill tokens]
|
|
|
|
|
18 |
|
19 |
Dataset 1 is a full diplomatic transcription, preserving abbreviations, contractions, capitalisation, punctuation,
|
20 |
spelling variation, and syntax. It comprises roughly thirty different notarial hands drawn from sixteen different
|
@@ -36,3 +38,8 @@ HCA 13/57; HCA 13/58; HCA 13/61; HCA 13/63; HCA 13/68; HCA 13/71; HCA 13/73; HCA
|
|
36 |
We are working on a significantly larger version of Dataset 2, which (when complete) will have circa 30 mill tokens
|
37 |
and will comprise fifty-nine complete volumes of Admiralty Court depositions made between 1570 and 1685. We are targeting
|
38 |
completion end 2025.
|
|
|
|
|
|
|
|
|
|
|
|
11 |
records from the C16th and C17th. The records provide a rich and underutilised source of social, material
|
12 |
and economic history.
|
13 |
|
14 |
+
We have three datasets available to researchers working on Early Modern English in the late C16th and
|
15 |
early to mid-C17th
|
16 |
+
1. Hand transcribed Ground Truth [420,000 tokens]
|
17 |
2. Machine transcribed and hand corrected corpus [4.5 mill tokens]
|
18 |
+
3. Hand transcribed Early Modern non-elite letters [100,000 tokens]
|
19 |
+
|
20 |
|
21 |
Dataset 1 is a full diplomatic transcription, preserving abbreviations, contractions, capitalisation, punctuation,
|
22 |
spelling variation, and syntax. It comprises roughly thirty different notarial hands drawn from sixteen different
|
|
|
38 |
We are working on a significantly larger version of Dataset 2, which (when complete) will have circa 30 mill tokens
|
39 |
and will comprise fifty-nine complete volumes of Admiralty Court depositions made between 1570 and 1685. We are targeting
|
40 |
completion end 2025.
|
41 |
+
|
42 |
+
Dataset 3 is a full diplomatic transciption of 400 Early Modern letters, preserving abbreviations, contractions,
|
43 |
+
capitalisation, punctuation, spelling variation, and syntax. It comprises over 250 hands of non-elite writers, largely men
|
44 |
+
but some women, from a range of marine related occupations - mariners, shore tradesmen, dockyard employees
|
45 |
+
- written between 1600 and 1685
|