Spaces:

MarineLives
/

README

Running

App Files Files Community

Addaci commited on Sep 15

Commit

4aac2f5

•

1 Parent(s): 8f00679

Update README.md

Browse files

Files changed (1) hide show

README.md +6 -7

README.md CHANGED Viewed

@@ -11,9 +11,9 @@ MarineLives is a volunteer-led collaboration for the transcription and enrichmen
 records from the C16th and C17th. The records provide a rich and underutilised source of social, material
 and economic history.
-RESEARCH FOCUS
-**Broad objective**
 Explore the potential for small LLMs to support the process of cleaning Raw HTR output after
 the machine transcription of English High Court of Admiralty depositions. We have both Raw HTR output and human corrected
@@ -76,7 +76,8 @@ HTR for the same tokens with page to page congruence, and broadly line by line c
 #Small RAG Systems
-#Components:
 A small retriever (e.g., BM25, Sentence-BERT).
 A relatively lightweight LLM like mT5-small.
 A smaller corpus of documents or a curated thesaurus, perhaps stored in a simple format like JSON or SQLite.
@@ -88,17 +89,15 @@ Cloud Hosting: Easily deployable on platforms like Hugging Face Spaces or a clou
 #Hugging Face Spaces:
-We are looking at Hugging Fac options:
 Suitable for Prototypes: Spaces allow you to deploy small to medium models for free or at a low cost with CPU instances. You can also use GPU instances (such as T4 or A100) to host mT5 and experiment with RAG.
 Environment: Hugging Face Spaces uses Gradio or Streamlit interfaces, making it simple to build and share RAG applications.
 Scaling: This platform is ideal for prototyping and small-scale applications, but if you plan on scaling up (e.g., with large corpora or high-traffic queries), you may need a more robust infrastructure like AWS or GCP.
-Hugging Face Inference API:
 Using the Hugging Face Inference API to host models like mT5-small. This is a straightforward way to make API calls to the model for generation tasks. If you want to integrate a retriever with this API-based system, you would need to build that part separately (e.g., using an external document store or retriever).
-DATASETS
 We have three datasets available to researchers working on Early Modern English in the late C16th and
 early to mid-C17th:

 records from the C16th and C17th. The records provide a rich and underutilised source of social, material
 and economic history.
+**RESEARCH FOCUS**
+*Broad objective*
 Explore the potential for small LLMs to support the process of cleaning Raw HTR output after
 the machine transcription of English High Court of Admiralty depositions. We have both Raw HTR output and human corrected
 #Small RAG Systems
+Components:
 A small retriever (e.g., BM25, Sentence-BERT).
 A relatively lightweight LLM like mT5-small.
 A smaller corpus of documents or a curated thesaurus, perhaps stored in a simple format like JSON or SQLite.
 #Hugging Face Spaces:
 Suitable for Prototypes: Spaces allow you to deploy small to medium models for free or at a low cost with CPU instances. You can also use GPU instances (such as T4 or A100) to host mT5 and experiment with RAG.
 Environment: Hugging Face Spaces uses Gradio or Streamlit interfaces, making it simple to build and share RAG applications.
 Scaling: This platform is ideal for prototyping and small-scale applications, but if you plan on scaling up (e.g., with large corpora or high-traffic queries), you may need a more robust infrastructure like AWS or GCP.
+#Hugging Face Inference API:
 Using the Hugging Face Inference API to host models like mT5-small. This is a straightforward way to make API calls to the model for generation tasks. If you want to integrate a retriever with this API-based system, you would need to build that part separately (e.g., using an external document store or retriever).
+**DATASETS**
 We have three datasets available to researchers working on Early Modern English in the late C16th and
 early to mid-C17th: