Commit History
More config options. Fixed some bugs with removing elements from review page and Adobe export. Some UI rearrangements
6319afc
Added features to review dataframe to filter and exclude features based on text. Text should now appear consistently in review_df (for boxes not modified). Larger spacy model returned to use. Gradio upgrade.
66e145d
Now redact on whole PDF mediabox size (larger than viewable size sometimes), then converted back to cropbox size for print and Adobe review. Improved some error raising and app flow
08a3ec3
Fixed minor bug with bottom previous page button
c6b043a
Integrated AWS Comprehend and fuzzy matching functions with tabular data redaction.
ff290e1
Allowed for output files to be saved into user-specific folders. Added deny list capability to xlsx/csv file redaction
dacc782
Added python-dotenv to requirements file
f13e98b
Allowed for Textract and Comprehend API calls through AWS keys. File preparation function incorporated into main redaction function to avoid needing user to 'check in' during redaction process
391712c
changed back log files output box visibility to False
35a1591
Fixed issues with log file list picking up logs from other file runs. Updated packages.
42180e4
Added concurrency limit to run options. Trying again to load in zoom/rotate options from gradio_image_annotator fork.
dea568f
Laid groundwork for passing in AWS API keys. Duplicate pages option should now work for pages with no text.
7907ad4
App now correctly updates custom fuzzy recognisers
82b9d9d
Reverted gradio_image_annotation version to 0.2.5 until issues with installation resolved
7917a26
Added git to the correct area in Dockerfile (build as opposed to run area)
520f2c4
Added git to Dockerfile to be able to install git-based custom gradio components
4790eb4
Fixed issues with gradio version 5.16. Fixed fuzzy search error with pages with no data.
3cecbfa
Corrected image coordinate translation when the pdf mediabox is not the same size as pdf page rectangle
760ef5c
Zoom and rotate features from forked gradio_annotation package. Fixed csv/xlsx redaction. Updated guide on creating exe.
20d940b
Rename INDEX.md to index.md
713ca11
unverified
Sean Pedrick-Case
commited on
Merge pull request #7 from seanpedrick-case/dev
b60a0cb
unverified
Sean Pedrick-Case
commited on
Updated documentation with an advanced user guide detailing new features.
fcbaca7
Merge pull request #6 from seanpedrick-case/dev
d84e0a9
unverified
Sean Pedrick-Case
commited on
Added scikit-learn to requirements
b397d1d
Fuzzy match implementation for deny list. Added option to merge multiple review files. Review files from redaction step should now include text.
bde6e5b
Added capabilities to export to and import from Adobe .xfdf files
6b28cfa
Added tab to be able to compare pages across multiple documents and redact duplicates
a265560
Merge pull request #5 from seanpedrick-case/dev
8b4217f
unverified
Sean Pedrick-Case
commited on
Ensured the text ocr outputs have no line breaks at end. Multi-line custom text searches now possible. Files for review sent from redact button. Fixed image redaction (not review yet). Can get user pool details from headers. Gradio update.
cb349ad
Corrected large image reduction code
3518b67
Merge pull request #3 from seanpedrick-case/dev
e54931f
unverified
Sean Pedrick-Case
commited on
Dropdown choices for redactions are now listed correctly
3187788
Removed amplify folder as not used
ad2d759
Moved review components to give more space for page. Extended zoom limits. Existing redaction labels should now appear in new redaction box dropdown.
a9dcd2e
Corrected image resizing method for instances where the image is very large.
0c2987b
App should now resize images that are too large before sending to Textract. Textract now more robust to failure. Improved reliability of json conversion to review dataframe
143e2cc
Merge pull request #2 from seanpedrick-case/dev
45c751d
unverified
Sean Pedrick-Case
commited on
You can now have output redaction boxes in grey according to an environment variable. Review files are now saved every time page is changed.
c3a8cd7
Greatly improved regex for direct matching with custom entities
6ac4be4
Uploaded pdfs with review files will now include all pages that don't have redactions. Slightly improved deny list matching.
613b1b4
Fixed bug where pages suggested for whole redaction are one lower than requested
e8681e8
Fix bug to identify all handwriting labels. Now only concatenates entity_type boxes if they have different labels.
0d3554e
Fixed redaction of image files
11770c9
Allowed for Cognito login in situations where client secret is not provided
eafaaed
Merge pull request #1 from seanpedrick-case/main
9013fdc
unverified
Sean Pedrick-Case
commited on
Update README.md with corrected link
9de60e6
unverified
Sean Pedrick-Case
commited on
Update README.md
addcf36
unverified
Sean Pedrick-Case
commited on