MattStammers commited on
Commit
067c9fa
·
1 Parent(s): 4b17270

readme improved

Browse files
Files changed (1) hide show
  1. app.py +79 -9
app.py CHANGED
@@ -1,3 +1,5 @@
 
 
1
  import re
2
  from pathlib import Path
3
 
@@ -8,7 +10,17 @@ import seaborn as sns
8
  import yaml
9
 
10
  import pteredactyl as pt
11
- from pteredactyl.defaults import change_model # Ensure this import is correct
 
 
 
 
 
 
 
 
 
 
12
 
13
  sample_text = """
14
  1. Dr. Huntington (Patient No: 1234567890) diagnosed Ms. Alzheimer with Alzheimer's disease during her last visit to the Huntington Medical Center on 12/12/2023. The prognosis was grim, but Dr. Huntington assured Ms. Alzheimer that the facility was well-equipped to handle her condition despite the lack of a cure for Alzheimer's.
@@ -68,6 +80,9 @@ def redact(text: str, model_name: str):
68
 
69
  model_path = model_paths.get(model_name, "StanfordAIMI/stanford-deidentifier-base")
70
 
 
 
 
71
  if model_path:
72
  change_model(model_path)
73
  else:
@@ -256,6 +271,12 @@ def redact_and_visualize(text: str, model_name: str):
256
  reference_text, redacted_text
257
  )
258
 
 
 
 
 
 
 
259
  # Count entities and compute metrics
260
  tp_count, fn_count, fp_count, tn_count = count_entities_and_compute_metrics(
261
  reference_text_with_fn, redacted_text_with_fp
@@ -282,21 +303,70 @@ def redact_and_visualize(text: str, model_name: str):
282
 
283
 
284
  hint = """
285
- # Guide/Instructions
 
 
 
 
 
 
 
 
 
 
 
 
 
 
286
 
287
- ## How the tool works:
 
 
 
288
 
289
- When the input text is entered, the tool redacts the entered text with labelled masking tokens and then assesses the models results. You can test the text against different models by selecting from the dropdown.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
290
 
291
  ### Strengths
292
- - The Stanford De-Identifier Base Model is 99% accurate on our test set of radiology reports. The others are really to illustrate its superiority.
 
 
293
 
294
- - This test set here was derived after lots of experimentation to make the challenge as hard as possible. It is the toughest PII benchmark we have seen so far.
295
 
296
  ### Limitations
297
- - The tool was not designed initially to redact clinic letters as it was developed primarily on radiology reports in the US. We have made some augmentations to cover postcodes but these might not always work.
 
 
 
 
 
 
 
 
 
298
 
299
- - It may overly aggressively redact text because it was built as a research tool where precision is prized > recall but the recall is also high.
 
 
300
  """
301
 
302
  description = """
@@ -304,7 +374,7 @@ description = """
304
 
305
  *Version:* **1.0** - Working Proof of Concept Demo with API option and webapp demonstration.
306
 
307
- *Authors:* **Cai Davis, Michael George, Matt Stammers**
308
  """
309
 
310
  iface = gr.Interface(
 
1
+ import logging
2
+ import logging.config
3
  import re
4
  from pathlib import Path
5
 
 
10
  import yaml
11
 
12
  import pteredactyl as pt
13
+ from pteredactyl.defaults import change_model
14
+
15
+ # Logging configuration. This is only done at root level
16
+ logging_config = yaml.safe_load(Path("logging.yaml").read_text())
17
+ logging.config.dictConfig(logging_config)
18
+
19
+ # Get the logger
20
+ log = logging.getLogger(__name__)
21
+
22
+ # Load the model
23
+ log.info("Starting App")
24
 
25
  sample_text = """
26
  1. Dr. Huntington (Patient No: 1234567890) diagnosed Ms. Alzheimer with Alzheimer's disease during her last visit to the Huntington Medical Center on 12/12/2023. The prognosis was grim, but Dr. Huntington assured Ms. Alzheimer that the facility was well-equipped to handle her condition despite the lack of a cure for Alzheimer's.
 
80
 
81
  model_path = model_paths.get(model_name, "StanfordAIMI/stanford-deidentifier-base")
82
 
83
+ # Log the model being changed to
84
+ log.info(f"Changing to model: {model_path}")
85
+
86
  if model_path:
87
  change_model(model_path)
88
  else:
 
271
  reference_text, redacted_text
272
  )
273
 
274
+ # Print the final texts with flags for debugging
275
+ log.debug("Final Reference Text with False Negatives:")
276
+ log.debug(reference_text_with_fn)
277
+ log.debug("\nFinal Redacted Text with False Positives:")
278
+ log.debug(redacted_text_with_fp)
279
+
280
  # Count entities and compute metrics
281
  tp_count, fn_count, fp_count, tn_count = count_entities_and_compute_metrics(
282
  reference_text_with_fn, redacted_text_with_fp
 
303
 
304
 
305
  hint = """
306
+ ## Pteredactyl Gradio Webapp and API
307
+
308
+ Clinical patient identifiable information (cPII) presents a significant challenge in natural language processing (NLP) that has yet to be fully resolved but significant progress is being made [1,2].
309
+
310
+ This is why we created [Pteredactyl](https://pypi.org/project/pteredactyl/) - a python module to help with redaction of clinical free text.
311
+
312
+ ## Tool Usage Instructions
313
+
314
+ When the input text is entered, the tool redacts the cPII from the entered text using NLP with labelled masking tokens and then assesses the models results. You can test the text against different models by selecting from the dropdown.
315
+
316
+ ## Deployment Options
317
+
318
+ This webapp is available online as a gradio app on Huggingface: [Huggingface Gradio App](https://huggingface.co/spaces/MattStammers/pteredactyl_PII). It is also available as [source](https://github.com/SETT-Centre-Data-and-AI/PteRedactyl) or as a Docker Image: [Docker Image](https://registry.hub.docker.com/r/mattstammers/pteredactyl). All are MIT licensed.
319
+
320
+ Please note if deploying the docker image the port bindings are to 7860. The image can also be deployed from source using the following command:
321
 
322
+ ```bat
323
+ docker build -t pteredactyl:latest .
324
+ docker run -d -p 7860:7860 --name pteredactyl-app pteredactyl:latest
325
+ ```
326
 
327
+ ## Information
328
+
329
+ A lot of work and experimentation has gone into the development of this tool. Because we believe in being fully transparent further details are given below.
330
+
331
+
332
+ ### Methods:
333
+
334
+ We evaluated three open-source models from Huggingface: Stanford Base De-Identifier, Deberta PII, and Nikhilrk De-Identify using our Clinical_PII_Redaction_Test dataset. The text was tokenised, and all entities such as [PERSON], [ID], and [LOCATION] were tagged in the gold standard. Each model redacted cPII from clinical texts, and outputs were compared to the gold standard template to calculate the confusion matrix, accuracy, precision, recall, and F1 score.
335
+
336
+ ### Results
337
+
338
+ The full results of the tool are given below in <i>Table 1</i> below.
339
+
340
+ | Metric | Stanford Base De-Identifier | Deberta PII | Nikhilrk De-Identify |
341
+ |------------|-----------------------------|-------------|----------------------|
342
+ | Accuracy | 0.98 | 0.85 | 0.68 |
343
+ | Precision | 0.91 | 0.93 | 0.28 |
344
+ | Recall | 0.94 | 0.16 | 0.49 |
345
+ | F1 Score | 0.93 | 0.28 | 0.36 |
346
+ <small><i>Table 1: Summary of Model Performance Metrics</i></small>
347
 
348
  ### Strengths
349
+ - The test benchmark [Clinical_PII_Redaction_Test](https://huggingface.co/datasets/MattStammers/Clinical_PII_Redaction_Test) intentionally exploits commonly observed weaknesses in NLP cPII token masking systems such as clinician/patient/diagnosis name similarity and commonly observed ID/username and location/postcode issues.
350
+
351
+ - [The Stanford De-Identifier Base Model](https://huggingface.co/StanfordAIMI/stanford-deidentifier-base)[1] is 99% accurate on our test set of radiology reports and achieves an F1 score of 93% on our challenging open-source benchmark. The others models are really to demonstrate the potential of Pteredactyl to deploy any transfomer model.
352
 
353
+ - We have submitted the code to [OHDSI](https://www.ohdsi.org/) as an abstract and aim strongly to incorporate this into a wider open-source effort to solve intractable clinical informatics problems.
354
 
355
  ### Limitations
356
+ - The tool was not designed initially to redact clinic letters as it was developed primarily on radiology reports in the US. We have made some augmentations to cover elements like postcodes using checksums but these might not always work. The same is true of NHS numbers as illustrated above.
357
+
358
+ - It may overly aggressively redact text because it was built as a research tool where precision is prized > recall. However, in our experience this is uncommon enough that it is still very useful.
359
+
360
+ - This is very much a research tool and should not be relied upon as a catch-all in any production-type capacity. The app makes the limitations very transparently obvious via the attached confusion matrix.
361
+
362
+ ### Conclusion
363
+ The validation cohort introduced in this study proves to be a highly effective tool for discriminating the performance of open-source cPII redaction models. Intentionally exploiting common weaknesses in cNLP token masking systems offers a more rigorous cPII benchmark than many larger datasets provide.
364
+
365
+ We invite the open-source community to collaborate to improve the present results and enhance the robustness of cPII redaction methods by building on the work we have begun here [here](https://github.com/SETT-Centre-Data-and-AI/PteRedactyl).
366
 
367
+ ### References:
368
+ 1. Chambon PJ, Wu C, Steinkamp JM, Adleberg J, Cook TS, Langlotz CP. Automated deidentification of radiology reports combining transformer and “hide in plain sight” rule-based methods. J Am Med Inform Assoc. 2023 Feb 1;30(2):318–28.
369
+ 2. Kotevski DP, Smee RI, Field M, Nemes YN, Broadley K, Vajdic CM. Evaluation of an automated Presidio anonymisation model for unstructured radiation oncology electronic medical records in an Australian setting. Int J Med Inf. 2022 Dec 1;168:104880.
370
  """
371
 
372
  description = """
 
374
 
375
  *Version:* **1.0** - Working Proof of Concept Demo with API option and webapp demonstration.
376
 
377
+ *Authors:* **Matt Stammers🧪, Cai Davis🥼 and Michael George🩺**
378
  """
379
 
380
  iface = gr.Interface(