Spaces:
Runtime error
Runtime error
Raymond Weitekamp
commited on
Commit
·
d4167d9
0
Parent(s):
Initial commit without binary files
Browse files- .gitattributes +1 -0
- .gitignore +28 -0
- README.md +34 -0
- app.py +118 -0
- requirements.txt +5 -0
- run_local.sh +35 -0
- test_app.py +56 -0
- test_e2e.py +73 -0
- test_local.sh +69 -0
.gitattributes
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
*.png filter=lfs diff=lfs merge=lfs -text
|
.gitignore
ADDED
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Virtual Environment
|
2 |
+
venv/
|
3 |
+
env/
|
4 |
+
.env/
|
5 |
+
|
6 |
+
# Python
|
7 |
+
__pycache__/
|
8 |
+
*.py[cod]
|
9 |
+
*$py.class
|
10 |
+
.Python
|
11 |
+
*.so
|
12 |
+
.pytest_cache/
|
13 |
+
|
14 |
+
# IDE
|
15 |
+
.idea/
|
16 |
+
.vscode/
|
17 |
+
*.swp
|
18 |
+
*.swo
|
19 |
+
|
20 |
+
# Gradio
|
21 |
+
flagged/
|
22 |
+
gradio_cached_examples/
|
23 |
+
|
24 |
+
# OS
|
25 |
+
.DS_Store
|
26 |
+
Thumbs.db
|
27 |
+
|
28 |
+
test-image.png
|
README.md
ADDED
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
title: Handwriting OCR Dataset Collection
|
3 |
+
emoji: ✍️
|
4 |
+
colorFrom: blue
|
5 |
+
colorTo: indigo
|
6 |
+
sdk: gradio
|
7 |
+
sdk_version: 5.15.0
|
8 |
+
app_file: app.py
|
9 |
+
pinned: false
|
10 |
+
short_description: Collect handwritten text samples for OCR training
|
11 |
+
tags:
|
12 |
+
- ocr
|
13 |
+
- handwriting
|
14 |
+
- dataset
|
15 |
+
- computer-vision
|
16 |
+
---
|
17 |
+
|
18 |
+
# Handwriting OCR Dataset Collection
|
19 |
+
|
20 |
+
This Space provides an interface for collecting handwritten samples of text to create a dataset for OCR (Optical Character Recognition) training. Users are presented with text snippets which they can handwrite and upload as images.
|
21 |
+
|
22 |
+
## How it Works
|
23 |
+
|
24 |
+
1. You will be shown 1-5 consecutive sentences about OCR and handwriting recognition
|
25 |
+
2. Write these sentences by hand on paper
|
26 |
+
3. Take a photo or scan of your handwriting
|
27 |
+
4. Upload the image through the interface
|
28 |
+
5. Submit or skip to get a new text block
|
29 |
+
|
30 |
+
The collected data pairs (text and corresponding handwritten images) will be used to train and improve handwriting recognition models.
|
31 |
+
|
32 |
+
## Usage
|
33 |
+
|
34 |
+
Simply visit the Space and follow the on-screen instructions to contribute your handwriting samples to the dataset.
|
app.py
ADDED
@@ -0,0 +1,118 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import gradio as gr
|
2 |
+
import random
|
3 |
+
import os
|
4 |
+
from datetime import datetime
|
5 |
+
|
6 |
+
# The list of sentences from our previous conversation.
|
7 |
+
sentences = [
|
8 |
+
"Optical character recognition (OCR) is the process of converting images of text into machine-readable data.",
|
9 |
+
"When applied to handwriting, OCR faces additional challenges because of the natural variability in individual penmanship.",
|
10 |
+
"Over the last century, advances in computer vision and machine learning have transformed handwriting OCR from bulky, specialized hardware into highly accurate, software-driven systems.",
|
11 |
+
"The origins of OCR date back to the early 20th century.",
|
12 |
+
"Early pioneers explored how machines might read text.",
|
13 |
+
"In the 1920s, inventors such as Emanuel Goldberg developed early devices that could capture printed characters by converting them into telegraph codes.",
|
14 |
+
"Around the same time, Gustav Tauschek created the Reading Machine using template-matching methods to detect letters in images.",
|
15 |
+
"These devices were designed for printed text and depended on fixed, machine-friendly fonts rather than natural handwriting.",
|
16 |
+
"In the 1950s, systems like David Shepard's GISMO emerged to begin automating the conversion of paper records into digital form.",
|
17 |
+
"Although these early OCR systems were limited in scope and accuracy, they laid the groundwork for later innovations.",
|
18 |
+
"The 1960s saw OCR technology being applied to real-world tasks.",
|
19 |
+
"In 1965, American inventor Jacob Rabinow developed an OCR machine specifically aimed at sorting mail by reading addresses.",
|
20 |
+
"This was a critical step for the U.S. Postal Service.",
|
21 |
+
"Soon after, research groups, including those at IBM, began developing machines such as the IBM 1287, which was capable of reading handprinted numbers on envelopes to facilitate automated mail processing.",
|
22 |
+
"These systems marked the first attempts to apply computer vision to handwritten data on a large scale.",
|
23 |
+
"By the late 1980s and early 1990s, researchers such as Yann LeCun and his colleagues developed neural network architectures to recognize handwritten digits.",
|
24 |
+
"Their work, initially applied to reading ZIP codes on mail, demonstrated that carefully designed, constrained neural networks could achieve error rates as low as about 1% on USPS data.",
|
25 |
+
"Sargur Srihari and his team at the Center of Excellence for Document Analysis and Recognition extended these ideas to develop complete handwritten address interpretation systems.",
|
26 |
+
"These systems, deployed by the USPS and postal agencies worldwide, helped automate the routing of mail and revolutionized the sorting process.",
|
27 |
+
"The development and evaluation of handwriting OCR have been driven in part by standard benchmark datasets.",
|
28 |
+
"The MNIST dataset, introduced in the 1990s, consists of 70,000 images of handwritten digits and became the de facto benchmark for handwritten digit recognition.",
|
29 |
+
"Complementing MNIST is the USPS dataset, which provides images of hand‐written digits derived from actual envelopes and captures real-world variability.",
|
30 |
+
"Handwriting OCR entered a new era with the introduction of neural network models.",
|
31 |
+
"In 1989, LeCun et al. applied backpropagation to a convolutional neural network tailored for handwritten digit recognition, an innovation that evolved into the LeNet series.",
|
32 |
+
"By automatically learning features rather than relying on hand-designed templates, these networks drastically improved recognition performance.",
|
33 |
+
"As computational power increased and large labeled datasets became available, deep learning models, particularly convolutional neural networks and recurrent neural networks, pushed the accuracy of handwriting OCR to near-human levels.",
|
34 |
+
"Modern systems can handle both printed and cursive text, automatically segmenting and recognizing characters in complex handwritten documents.",
|
35 |
+
"Cursive handwriting presents a classic challenge known as Sayre's paradox, where word recognition requires letter segmentation and letter segmentation requires word recognition.",
|
36 |
+
"Contemporary approaches use implicit segmentation methods, often combined with hidden Markov models or end-to-end neural networks, to circumvent this paradox.",
|
37 |
+
"Today's handwriting OCR systems are highly accurate and widely deployed.",
|
38 |
+
"Modern systems combine OCR with artificial intelligence to not only recognize text but also extract meaning, verify data, and integrate into larger enterprise workflows.",
|
39 |
+
"Projects such as In Codice Ratio use deep convolutional networks to transcribe historical handwritten documents, further expanding OCR applications.",
|
40 |
+
"Despite impressive advances, handwriting OCR continues to face challenges with highly variable or degraded handwriting.",
|
41 |
+
"Ongoing research aims to improve recognition accuracy, particularly for cursive and unconstrained handwriting, and to extend support across languages and historical scripts.",
|
42 |
+
"With improvements in deep learning architectures, increased computing power, and large annotated datasets, future OCR systems are expected to become even more robust, handling real-world handwriting in diverse applications from postal services to archival digitization.",
|
43 |
+
"Today's research in handwriting OCR benefits from a wide array of well-established datasets and ongoing evaluation challenges.",
|
44 |
+
"These resources help drive the development of increasingly robust systems for both digit and full-text recognition.",
|
45 |
+
"For handwritten digit recognition, the MNIST dataset remains the most widely used benchmark thanks to its simplicity and broad adoption.",
|
46 |
+
"Complementing MNIST is the USPS dataset, which is derived from actual mail envelopes and provides additional challenges with real-world variability.",
|
47 |
+
"The IAM Handwriting Database is one of the most popular datasets for unconstrained offline handwriting recognition and includes scanned pages of handwritten English text with corresponding transcriptions.",
|
48 |
+
"It is frequently used to train and evaluate models that work on full-line or full-page recognition tasks.",
|
49 |
+
"For systems designed to capture the dynamic aspects of handwriting, such as pen stroke trajectories, the IAM On-Line Handwriting Database offers valuable data.",
|
50 |
+
"The CVL dataset provides multi-writer handwritten texts with a range of writing styles, making it useful for assessing the generalization capabilities of OCR systems across diverse handwriting samples.",
|
51 |
+
"The RIMES dataset, developed for French handwriting recognition, contains scanned documents and is a key resource for evaluating systems in multilingual settings.",
|
52 |
+
"Various ICDAR competitions, such as ICDAR 2013 and ICDAR 2017, have released datasets that reflect the complexities of real-world handwriting, including historical documents and unconstrained writing.",
|
53 |
+
"For Arabic handwriting recognition, the KHATT dataset offers a collection of handwritten texts that capture the unique challenges of cursive and context-dependent scripts.",
|
54 |
+
"These datasets, along with continual evaluation efforts through competitions hosted at ICDAR and ICFHR, ensure that the field keeps pushing toward higher accuracy, better robustness, and broader language coverage.",
|
55 |
+
"Emerging benchmarks, often tailored to specific scripts, historical documents, or noisy real-world data, will further refine the state-of-the-art in handwriting OCR.",
|
56 |
+
"This array of resources continues to shape the development of handwriting OCR systems today.",
|
57 |
+
"This additional section outlines today's most influential datasets and benchmarks, highlighting how they continue to shape the development of handwriting OCR systems."
|
58 |
+
]
|
59 |
+
|
60 |
+
class OCRDataCollector:
|
61 |
+
def __init__(self):
|
62 |
+
self.collected_pairs = []
|
63 |
+
self.current_text_block = self.get_random_text_block()
|
64 |
+
|
65 |
+
def get_random_text_block(self):
|
66 |
+
block_length = random.randint(1, 5)
|
67 |
+
start_index = random.randint(0, len(sentences) - block_length)
|
68 |
+
block = " ".join(sentences[start_index:start_index + block_length])
|
69 |
+
return block
|
70 |
+
|
71 |
+
def submit_image(self, image, text_block):
|
72 |
+
if image is None:
|
73 |
+
message = "No image uploaded. Please try again or use 'Skip' to move on."
|
74 |
+
else:
|
75 |
+
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
76 |
+
self.collected_pairs.append({"text": text_block, "image": image, "timestamp": timestamp})
|
77 |
+
message = "Thank you! Your submission has been saved."
|
78 |
+
new_text = self.get_random_text_block()
|
79 |
+
return new_text, message
|
80 |
+
|
81 |
+
def skip_text(self, text_block):
|
82 |
+
new_text = self.get_random_text_block()
|
83 |
+
message = "Skipped. Here is the next text."
|
84 |
+
return new_text, message
|
85 |
+
|
86 |
+
def create_gradio_interface():
|
87 |
+
collector = OCRDataCollector()
|
88 |
+
|
89 |
+
with gr.Blocks() as demo:
|
90 |
+
gr.Markdown("## Crowdsourcing Handwriting OCR Dataset")
|
91 |
+
gr.Markdown("You will be shown between 1 and 5 consecutive sentences. Please handwrite them on paper and upload an image of your handwriting. If you wish to skip the current text, click 'Skip'.")
|
92 |
+
|
93 |
+
text_box = gr.Textbox(value=collector.current_text_block, label="Text to Handwrite", interactive=False)
|
94 |
+
image_input = gr.Image(type="pil", label="Upload Handwritten Image", sources=["upload"])
|
95 |
+
|
96 |
+
|
97 |
+
with gr.Row():
|
98 |
+
submit_btn = gr.Button("Submit")
|
99 |
+
skip_btn = gr.Button("Skip")
|
100 |
+
|
101 |
+
submit_btn.click(
|
102 |
+
fn=collector.submit_image,
|
103 |
+
inputs=[image_input, text_box],
|
104 |
+
outputs=[text_box]
|
105 |
+
)
|
106 |
+
|
107 |
+
skip_btn.click(
|
108 |
+
fn=collector.skip_text,
|
109 |
+
inputs=text_box,
|
110 |
+
outputs=[text_box]
|
111 |
+
)
|
112 |
+
|
113 |
+
|
114 |
+
return demo
|
115 |
+
|
116 |
+
if __name__ == "__main__":
|
117 |
+
demo = create_gradio_interface()
|
118 |
+
demo.launch()
|
requirements.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
gradio>=3.50.2
|
2 |
+
Pillow>=10.0.0
|
3 |
+
pytest>=7.0.0
|
4 |
+
pytest-playwright>=0.4.0
|
5 |
+
playwright>=1.40.0
|
run_local.sh
ADDED
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#!/bin/bash
|
2 |
+
|
3 |
+
# Exit on error
|
4 |
+
set -e
|
5 |
+
|
6 |
+
# Kill any existing processes using port 7862
|
7 |
+
echo "Cleaning up port 7862..."
|
8 |
+
lsof -ti:7862 | xargs kill -9 2>/dev/null || true
|
9 |
+
|
10 |
+
# Check if uv is installed, if not install it
|
11 |
+
if ! command -v uv &> /dev/null; then
|
12 |
+
echo "Installing uv package installer..."
|
13 |
+
curl -LsSf https://astral.sh/uv/install.sh | sh
|
14 |
+
fi
|
15 |
+
|
16 |
+
# Create virtual environment if it doesn't exist
|
17 |
+
if [ ! -d "venv" ]; then
|
18 |
+
echo "Creating virtual environment..."
|
19 |
+
python -m venv venv
|
20 |
+
fi
|
21 |
+
|
22 |
+
# Activate virtual environment
|
23 |
+
echo "Activating virtual environment..."
|
24 |
+
source venv/bin/activate
|
25 |
+
|
26 |
+
# Install dependencies using uv
|
27 |
+
echo "Installing dependencies with uv..."
|
28 |
+
uv pip install -r requirements.txt
|
29 |
+
|
30 |
+
# Start the Gradio app
|
31 |
+
echo "Starting Gradio app..."
|
32 |
+
python app.py
|
33 |
+
|
34 |
+
# Deactivate virtual environment when done
|
35 |
+
deactivate
|
test_app.py
ADDED
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import pytest
|
2 |
+
from PIL import Image
|
3 |
+
import numpy as np
|
4 |
+
from app import OCRDataCollector, sentences
|
5 |
+
import io
|
6 |
+
|
7 |
+
@pytest.fixture
|
8 |
+
def collector():
|
9 |
+
return OCRDataCollector()
|
10 |
+
|
11 |
+
def test_get_random_text_block(collector):
|
12 |
+
# Test that we get a non-empty string
|
13 |
+
text_block = collector.get_random_text_block()
|
14 |
+
assert isinstance(text_block, str)
|
15 |
+
assert len(text_block) > 0
|
16 |
+
|
17 |
+
# Test that the text block contains content from our sentences
|
18 |
+
assert any(sentence in text_block for sentence in sentences)
|
19 |
+
|
20 |
+
# Test that we get different blocks (probabilistic, but very likely)
|
21 |
+
blocks = [collector.get_random_text_block() for _ in range(5)]
|
22 |
+
assert len(set(blocks)) > 1, "Random blocks should be different"
|
23 |
+
|
24 |
+
def test_skip_text(collector):
|
25 |
+
# Test that we get a different text block when skipping
|
26 |
+
current_text = collector.get_random_text_block()
|
27 |
+
new_text = collector.get_random_text_block()
|
28 |
+
|
29 |
+
assert isinstance(new_text, str)
|
30 |
+
assert len(new_text) > 0
|
31 |
+
assert new_text != current_text # This is probabilistic but very likely
|
32 |
+
|
33 |
+
def test_submit_image(collector):
|
34 |
+
# Create a dummy test image using numpy array
|
35 |
+
img_array = np.zeros((100, 100, 3), dtype=np.uint8)
|
36 |
+
img_array.fill(255) # White image
|
37 |
+
|
38 |
+
# Convert numpy array to PIL Image
|
39 |
+
test_image = Image.fromarray(img_array)
|
40 |
+
|
41 |
+
# Test the current text block
|
42 |
+
current_text = collector.get_random_text_block()
|
43 |
+
|
44 |
+
# Test submission with valid image
|
45 |
+
new_text = collector.submit_image(test_image, current_text)
|
46 |
+
assert isinstance(new_text, str)
|
47 |
+
assert len(new_text) > 0
|
48 |
+
assert len(collector.collected_pairs) == 1
|
49 |
+
assert collector.collected_pairs[0]["text"] == current_text
|
50 |
+
|
51 |
+
# Test submission with no image
|
52 |
+
new_text = collector.submit_image(None, current_text)
|
53 |
+
assert isinstance(new_text, str)
|
54 |
+
assert len(new_text) > 0
|
55 |
+
# Should not have added to collected_pairs
|
56 |
+
assert len(collector.collected_pairs) == 1
|
test_e2e.py
ADDED
@@ -0,0 +1,73 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import pytest
|
2 |
+
import os
|
3 |
+
from playwright.sync_api import expect
|
4 |
+
from PIL import Image
|
5 |
+
import numpy as np
|
6 |
+
import tempfile
|
7 |
+
|
8 |
+
# Constants
|
9 |
+
GRADIO_PORT = 7862
|
10 |
+
GRADIO_URL = f"http://localhost:{GRADIO_PORT}"
|
11 |
+
|
12 |
+
@pytest.fixture(scope="module")
|
13 |
+
def test_image():
|
14 |
+
# Create a temporary test image
|
15 |
+
test_img = Image.fromarray(np.zeros((100, 100, 3), dtype=np.uint8))
|
16 |
+
temp_dir = tempfile.mkdtemp()
|
17 |
+
test_img_path = os.path.join(temp_dir, "test_image.png")
|
18 |
+
test_img.save(test_img_path)
|
19 |
+
|
20 |
+
yield test_img_path
|
21 |
+
|
22 |
+
# Cleanup
|
23 |
+
os.remove(test_img_path)
|
24 |
+
os.rmdir(temp_dir)
|
25 |
+
|
26 |
+
def test_page_loads(page):
|
27 |
+
page.goto(GRADIO_URL)
|
28 |
+
page.wait_for_load_state("networkidle")
|
29 |
+
|
30 |
+
# Check if title is present with exact text
|
31 |
+
expect(page.locator("h2", has_text="Crowdsourcing Handwriting OCR Dataset")).to_be_visible()
|
32 |
+
|
33 |
+
# Check if main interface elements are present
|
34 |
+
expect(page.get_by_label("Text to Handwrite")).to_be_visible()
|
35 |
+
expect(page.locator('input[type="file"]')).to_be_attached()
|
36 |
+
expect(page.get_by_role("button", name="Submit")).to_be_visible()
|
37 |
+
expect(page.get_by_role("button", name="Skip")).to_be_visible()
|
38 |
+
|
39 |
+
def test_skip_functionality(page):
|
40 |
+
page.goto(GRADIO_URL)
|
41 |
+
page.wait_for_load_state("networkidle")
|
42 |
+
|
43 |
+
# Get initial text
|
44 |
+
text_box = page.get_by_label("Text to Handwrite")
|
45 |
+
initial_text = text_box.input_value()
|
46 |
+
|
47 |
+
# Click skip button
|
48 |
+
page.get_by_role("button", name="Skip").click()
|
49 |
+
page.wait_for_timeout(2000) # Wait for response
|
50 |
+
|
51 |
+
# Get new text and verify it changed
|
52 |
+
new_text = text_box.input_value()
|
53 |
+
assert initial_text != new_text
|
54 |
+
|
55 |
+
def test_upload_image(page, test_image):
|
56 |
+
page.goto(GRADIO_URL)
|
57 |
+
page.wait_for_load_state("networkidle")
|
58 |
+
|
59 |
+
# Get initial text
|
60 |
+
text_box = page.get_by_label("Text to Handwrite")
|
61 |
+
initial_text = text_box.input_value()
|
62 |
+
|
63 |
+
# Upload image - file input is hidden, but we can still set its value
|
64 |
+
page.locator('input[type="file"]').set_input_files(test_image)
|
65 |
+
page.wait_for_timeout(2000) # Wait for upload
|
66 |
+
|
67 |
+
# Click submit to complete the upload
|
68 |
+
page.get_by_role("button", name="Submit").click()
|
69 |
+
page.wait_for_timeout(2000) # Wait for response
|
70 |
+
|
71 |
+
# Verify text changed after submission
|
72 |
+
new_text = text_box.input_value()
|
73 |
+
assert initial_text != new_text
|
test_local.sh
ADDED
@@ -0,0 +1,69 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#!/bin/bash
|
2 |
+
|
3 |
+
# Exit on error
|
4 |
+
set -e
|
5 |
+
|
6 |
+
# Kill any existing processes using port 7862
|
7 |
+
echo "Cleaning up port 7862..."
|
8 |
+
lsof -ti:7862 | xargs kill -9 2>/dev/null || true
|
9 |
+
|
10 |
+
# Check if uv is installed, if not install it
|
11 |
+
if ! command -v uv &> /dev/null; then
|
12 |
+
echo "Installing uv package installer..."
|
13 |
+
curl -LsSf https://astral.sh/uv/install.sh | sh
|
14 |
+
fi
|
15 |
+
|
16 |
+
# Create virtual environment if it doesn't exist
|
17 |
+
if [ ! -d "venv" ]; then
|
18 |
+
echo "Creating virtual environment..."
|
19 |
+
python -m venv venv
|
20 |
+
fi
|
21 |
+
|
22 |
+
# Activate virtual environment
|
23 |
+
echo "Activating virtual environment..."
|
24 |
+
source venv/bin/activate
|
25 |
+
|
26 |
+
# Install dependencies using uv
|
27 |
+
echo "Installing dependencies with uv..."
|
28 |
+
uv pip install -r requirements.txt
|
29 |
+
|
30 |
+
# Install Playwright browsers
|
31 |
+
echo "Installing Playwright browsers..."
|
32 |
+
playwright install chromium
|
33 |
+
|
34 |
+
# Run unit tests
|
35 |
+
echo "Running unit tests..."
|
36 |
+
python -m pytest test_app.py -v
|
37 |
+
|
38 |
+
if [ $? -eq 0 ]; then
|
39 |
+
echo "Unit tests passed! Starting Gradio app..."
|
40 |
+
# Start Gradio app in background
|
41 |
+
python app.py &
|
42 |
+
GRADIO_PID=$!
|
43 |
+
|
44 |
+
# Wait for server to start
|
45 |
+
echo "Waiting for Gradio server to start..."
|
46 |
+
sleep 3
|
47 |
+
|
48 |
+
# Run e2e tests
|
49 |
+
echo "Running e2e tests..."
|
50 |
+
python -m pytest test_e2e.py -v
|
51 |
+
E2E_STATUS=$?
|
52 |
+
|
53 |
+
# Kill Gradio server
|
54 |
+
kill $GRADIO_PID
|
55 |
+
|
56 |
+
if [ $E2E_STATUS -eq 0 ]; then
|
57 |
+
echo "All tests passed! Starting Gradio app for development..."
|
58 |
+
python app.py
|
59 |
+
else
|
60 |
+
echo "E2E tests failed! Please fix the issues before running the app."
|
61 |
+
exit 1
|
62 |
+
fi
|
63 |
+
else
|
64 |
+
echo "Unit tests failed! Please fix the issues before running e2e tests."
|
65 |
+
exit 1
|
66 |
+
fi
|
67 |
+
|
68 |
+
# Deactivate virtual environment
|
69 |
+
deactivate
|