Spaces:
Configuration error
Configuration error
docs: base README.md introduction to the space
Browse files
README.md
CHANGED
@@ -1,10 +1,28 @@
|
|
1 |
---
|
2 |
title: README
|
3 |
-
emoji:
|
4 |
colorFrom: yellow
|
5 |
colorTo: indigo
|
6 |
-
sdk:
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
title: README
|
3 |
+
emoji: π
|
4 |
colorFrom: yellow
|
5 |
colorTo: indigo
|
6 |
+
sdk: streamlit
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
+
Welcome to our space! π
|
11 |
+
|
12 |
+
The [Unstructured.io](www.unstructured.io) Team provides libraries with open-source components for pre-processing text documents
|
13 |
+
such as **PDFs**, **HTML** and **Word** Documents. These components are packaged as *bricks* π§±, which provide
|
14 |
+
users the building blocks they need to build pipelines targeted at the documents they care
|
15 |
+
about. Bricks in the library fall into three categories:
|
16 |
+
|
17 |
+
- 𧩠***Partitioning bricks*** that break raw documents down into standard, structured
|
18 |
+
elements.
|
19 |
+
- π§Ή ***Cleaning bricks*** that remove unwanted text from documents, such as boilerplate and
|
20 |
+
sentence
|
21 |
+
fragments.
|
22 |
+
- π ***Staging bricks*** that format data for downstream tasks, such as ML inference
|
23 |
+
and data labeling.
|
24 |
+
|
25 |
+
In this space we explore different settings of deep-learning models fine-tuned with several datasets containing a
|
26 |
+
specific document type and corresponding annotations.
|
27 |
+
|
28 |
+
Main GitHub repository link: [here](https://github.com/Unstructured-IO/unstructured)
|