Addaci commited on
Commit
4aac2f5
1 Parent(s): 8f00679

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -7
README.md CHANGED
@@ -11,9 +11,9 @@ MarineLives is a volunteer-led collaboration for the transcription and enrichmen
11
  records from the C16th and C17th. The records provide a rich and underutilised source of social, material
12
  and economic history.
13
 
14
- RESEARCH FOCUS
15
 
16
- **Broad objective**
17
 
18
  Explore the potential for small LLMs to support the process of cleaning Raw HTR output after
19
  the machine transcription of English High Court of Admiralty depositions. We have both Raw HTR output and human corrected
@@ -76,7 +76,8 @@ HTR for the same tokens with page to page congruence, and broadly line by line c
76
 
77
  #Small RAG Systems
78
 
79
- #Components:
 
80
  A small retriever (e.g., BM25, Sentence-BERT).
81
  A relatively lightweight LLM like mT5-small.
82
  A smaller corpus of documents or a curated thesaurus, perhaps stored in a simple format like JSON or SQLite.
@@ -88,17 +89,15 @@ Cloud Hosting: Easily deployable on platforms like Hugging Face Spaces or a clou
88
 
89
  #Hugging Face Spaces:
90
 
91
- We are looking at Hugging Fac options:
92
-
93
  Suitable for Prototypes: Spaces allow you to deploy small to medium models for free or at a low cost with CPU instances. You can also use GPU instances (such as T4 or A100) to host mT5 and experiment with RAG.
94
  Environment: Hugging Face Spaces uses Gradio or Streamlit interfaces, making it simple to build and share RAG applications.
95
  Scaling: This platform is ideal for prototyping and small-scale applications, but if you plan on scaling up (e.g., with large corpora or high-traffic queries), you may need a more robust infrastructure like AWS or GCP.
96
 
97
- Hugging Face Inference API:
98
 
99
  Using the Hugging Face Inference API to host models like mT5-small. This is a straightforward way to make API calls to the model for generation tasks. If you want to integrate a retriever with this API-based system, you would need to build that part separately (e.g., using an external document store or retriever).
100
 
101
- DATASETS
102
 
103
  We have three datasets available to researchers working on Early Modern English in the late C16th and
104
  early to mid-C17th:
 
11
  records from the C16th and C17th. The records provide a rich and underutilised source of social, material
12
  and economic history.
13
 
14
+ **RESEARCH FOCUS**
15
 
16
+ *Broad objective*
17
 
18
  Explore the potential for small LLMs to support the process of cleaning Raw HTR output after
19
  the machine transcription of English High Court of Admiralty depositions. We have both Raw HTR output and human corrected
 
76
 
77
  #Small RAG Systems
78
 
79
+ Components:
80
+
81
  A small retriever (e.g., BM25, Sentence-BERT).
82
  A relatively lightweight LLM like mT5-small.
83
  A smaller corpus of documents or a curated thesaurus, perhaps stored in a simple format like JSON or SQLite.
 
89
 
90
  #Hugging Face Spaces:
91
 
 
 
92
  Suitable for Prototypes: Spaces allow you to deploy small to medium models for free or at a low cost with CPU instances. You can also use GPU instances (such as T4 or A100) to host mT5 and experiment with RAG.
93
  Environment: Hugging Face Spaces uses Gradio or Streamlit interfaces, making it simple to build and share RAG applications.
94
  Scaling: This platform is ideal for prototyping and small-scale applications, but if you plan on scaling up (e.g., with large corpora or high-traffic queries), you may need a more robust infrastructure like AWS or GCP.
95
 
96
+ #Hugging Face Inference API:
97
 
98
  Using the Hugging Face Inference API to host models like mT5-small. This is a straightforward way to make API calls to the model for generation tasks. If you want to integrate a retriever with this API-based system, you would need to build that part separately (e.g., using an external document store or retriever).
99
 
100
+ **DATASETS**
101
 
102
  We have three datasets available to researchers working on Early Modern English in the late C16th and
103
  early to mid-C17th: