Commit History

Added tab to be able to compare pages across multiple documents and redact duplicates
a265560

seanpedrickcase commited on

Ensured the text ocr outputs have no line breaks at end. Multi-line custom text searches now possible. Files for review sent from redact button. Fixed image redaction (not review yet). Can get user pool details from headers. Gradio update.
cb349ad

seanpedrickcase commited on

Corrected large image reduction code
3518b67

seanpedrickcase commited on

Dropdown choices for redactions are now listed correctly
3187788

seanpedrickcase commited on

Moved review components to give more space for page. Extended zoom limits. Existing redaction labels should now appear in new redaction box dropdown.
a9dcd2e

seanpedrickcase commited on

Corrected image resizing method for instances where the image is very large.
0c2987b

seanpedrickcase commited on

App should now resize images that are too large before sending to Textract. Textract now more robust to failure. Improved reliability of json conversion to review dataframe
143e2cc

seanpedrickcase commited on

You can now have output redaction boxes in grey according to an environment variable. Review files are now saved every time page is changed.
c3a8cd7

seanpedrickcase commited on

Greatly improved regex for direct matching with custom entities
6ac4be4

seanpedrickcase commited on

Uploaded pdfs with review files will now include all pages that don't have redactions. Slightly improved deny list matching.
613b1b4

seanpedrickcase commited on

Fixed bug where pages suggested for whole redaction are one lower than requested
e8681e8

seanpedrickcase commited on

Fix bug to identify all handwriting labels. Now only concatenates entity_type boxes if they have different labels.
0d3554e

seanpedrickcase commited on

Allowed for Cognito login in situations where client secret is not provided
eafaaed

seanpedrickcase commited on

Merge pull request #1 from seanpedrick-case/main
9013fdc
unverified

Sean Pedrick-Case commited on

Update README.md with corrected link
9de60e6
unverified

Sean Pedrick-Case commited on

Update README.md
addcf36
unverified

Sean Pedrick-Case commited on

Updated user guide to update an image reference
99724b5

seanpedrickcase commited on

Update app layout, user guide, and Gradio upgrade.
1b13393

seanpedrickcase commited on

Removed some placeholder values
b1b0e04
unverified

Sean Pedrick-Case commited on

Adapted text join options to review file to be more resilient to changes in image size. Added possibility of using client secret with AWS login
c9e23cb

seanpedrickcase commited on

Side review bar is mostly there. A couple of bugs fixed. Can now return identified text in initial review files. Still working on retaining found text throughout review process
a03496e

seanpedrickcase commited on

Hopefully finally fixed the duplicate image_annotation_object issue
59ff822

seanpedrickcase commited on

Now should correctly remove duplicate items from all_image_annotator
8183bc4

seanpedrickcase commited on

Refactor redaction functionality and enhance UI components: Added support for custom recognizers and whole page redaction options. Updated file handling to include new dropdowns for entity selection and improved dataframes for entity management. Enhanced the annotator with better state management and UI responsiveness. Cleaned up redundant code and improved overall performance in the redaction process.
1d772de

seanpedrickcase commited on

Enhance file handling and UI features: improved Gradio app layout with fill width option, and integrated new settings for deny, and fully redacted lists (placeholders so far). Updated file conversion functions to handle CSV inputs and added CSV review file generation for redactions. Now retains all original and merged redaction boxes.
a770956

seanpedrickcase commited on

Can now toggle colour change for boxes. Large boxes now remove text correctly
928b1e9

seanpedrickcase commited on

Fixed issue where redactions were sometimes not removing text underneath boxes. You can now redact in different colours from review page
23f8ca3

seanpedrickcase commited on

Updated packages. Reinstituted multithreading with page load, now with order protected. Smaller spacy model used for speed. Textract calls should now be faster
f0c28d7

seanpedrickcase commited on

Started adding in support for custom deny list. Fixed textract call issue. Removed multithreading for now as it mixes up pages
e3365ed

seanpedrickcase commited on

Multithreaded file preparation. Can call Textract without signature detection
9504619

seanpedrickcase commited on

Can now specify the root path that the app will run on with an environment variable
b8e245f

seanpedrickcase commited on

Can now define queue size, max file size, and server port in environment variables
dc17f6e

seanpedrickcase commited on

Updated Dockerfile and entrypoint file to hopefully deal correctly with APP_MODE environment variable
7c7fd7c

seanpedrickcase commited on

Removed default custom header values so as not to cause errors
7f5a542

seanpedrickcase commited on

Moved chmod command to before user switch in Dockerfile
05c20d6

seanpedrickcase commited on

Ensure entrypoint.sh is copied
3dc1171

seanpedrickcase commited on

Modified Dockerfile hopefully to not need Lambda overrides. Looking into custom headers from Cloudfront to try to get them to work
bf7bb79

seanpedrickcase commited on

Allowed for overwriting of default output folder in choose_and_run_redactor function.
68a91f4

seanpedrickcase commited on

Updated output file creation variables for Lambda direct redaction runs
e85b74e

seanpedrickcase commited on

Removed need to write result.stdout in lambda entrypoint
5d649ba

seanpedrickcase commited on

Added a little more debugging code to lambda_entrypoint
653bd2d

seanpedrickcase commited on

Created custom csvlogger to try to overcome AWS Lambda's incompatibility with multithread locks
34bd97b

seanpedrickcase commited on

Changed app_mode arg position in dockerfile, changed default to gradio
d0b63c6

seanpedrickcase commited on

Moved entrypoint.sh creation to before user switch to avoid permission errors
7e8c1c9

seanpedrickcase commited on

Updated Dockerfile and requirements to include relevant Lambda packages
3f9e976

seanpedrickcase commited on

Moved gradio run code to outside of lambda_handler function in lambda_entrypoint.py
1cfa6e8

seanpedrickcase commited on

Switched start py file through Dockerfile to lambda_entrypoint. Added gradio links from this .py
6622361

seanpedrickcase commited on