ClinicalMetaScience commited on
Commit
fc2ecc5
·
verified ·
1 Parent(s): 78aa295

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -38
README.md CHANGED
@@ -29,46 +29,9 @@ SciBERT text classification model for positive and negative results prediction i
29
 
30
  ## Data
31
  We annotated over 1,900 clinical psychology abstracts into two categories: 'positive results only' and 'mixed or negative results', and trained models using SciBERT.
32
- The SciBERT model was validated against one in-domain (clinical psychology) and two out-of-domain data sets (psychotherapy). We compared model performance with Random Forest and three further benchmarks: natural language indicators of result types, *p*-values, and abstract length.
33
- SciBERT outperformed all benchmarks and random forest in in-domain (accuracy: 0.86) and out-of-domain data (accuracy: 0.85-0.88).
34
  Further information on documentation, code and data for the preprint "Classifying Positive Results in Clinical Psychology Using Natural Language Processing" can be found on this [GitHub repository](https://github.com/schiekiera/NegativeResultDetector).
35
 
36
- ## Using the model on Huggingface
37
- The model can be used on Hugginface utilizing the "Hosted inference API" in the window on the right.
38
- Click 'Compute' to predict the class labels for an example abstract or an abstract inserted by yourself.
39
- The class label 'positive' corresponds to 'positive results only', while 'negative' represents 'mixed or negative results'.
40
-
41
- ## Using the model for larger data
42
- ```
43
- from transformers import AutoTokenizer, Trainer, AutoModelForSequenceClassification
44
- ## 1. Load tokenizer
45
- tokenizer = AutoTokenizer.from_pretrained('allenai/scibert_scivocab_uncased')
46
-
47
- ## 2. Apply preprocess function to data
48
- def preprocess_function(examples):
49
- return tokenizer(examples["text"],
50
- truncation=True,
51
- max_length=512,
52
- padding='max_length'
53
- )
54
- tokenized_data = dataset.map(preprocess_function, batched=True)
55
-
56
- # 3. Load Model
57
- NegativeResultDetector = AutoModelForSequenceClassification.from_pretrained("ClinicalMetaScience/NegativeResultDetector")
58
-
59
- ## 4. Initialize the trainer with the model and tokenizer
60
- trainer = Trainer(
61
- model=NegativeResultDetector,
62
- tokenizer=tokenizer,
63
- )
64
-
65
- # 5. Apply NegativeResultDetector for prediction on inference data
66
- predict_test=trainer.predict(tokenized_data["inference"])
67
-
68
- ```
69
-
70
- Further information on analyzing your own or our example data can be found in this [script](https://github.com/PsyCapsLock/PubBiasDetect/blob/main/Scripts/example_folder/Predict_Example_Abstracts_using_NegativeResultDetector.ipynb)
71
- from our [GitHub repository](https://github.com/PsyCapsLock/PubBiasDetect).
72
 
73
  ## Results
74
  **Table 1** <br>
@@ -153,6 +116,45 @@ from our [GitHub repository](https://github.com/PsyCapsLock/PubBiasDetect).
153
  <br>
154
 
155
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
156
  ## Disclaimer
157
  This tool is developed to analyze and predict the prevalence of positive and negative results in scientific abstracts based on the SciBERT model. While publication bias is a plausible explanation for certain patterns of results observed in scientific literature, the analyses conducted by this tool do not conclusively establish the presence of publication bias or any other underlying factors. It's essential to understand that this tool evaluates data but does not delve into the underlying reasons for the observed trends.
158
  The validation of this tool has been conducted on primary studies from the field of clinical psychology and psychotherapy. While it might yield insights when applied to abstracts of other fields or other types of studies (such as meta-analyses), its applicability and accuracy in such contexts have not been thoroughly tested yet. The developers of this tool are not responsible for any misinterpretation or misuse of the tool's results, and encourage users to have a comprehensive understanding of the limitations inherent in statistical analysis and prediction models.
 
29
 
30
  ## Data
31
  We annotated over 1,900 clinical psychology abstracts into two categories: 'positive results only' and 'mixed or negative results', and trained models using SciBERT.
32
+ The SciBERT model was validated against one in-domain (clinical psychology) and two out-of-domain data sets (psychotherapy).
 
33
  Further information on documentation, code and data for the preprint "Classifying Positive Results in Clinical Psychology Using Natural Language Processing" can be found on this [GitHub repository](https://github.com/schiekiera/NegativeResultDetector).
34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
  ## Results
37
  **Table 1** <br>
 
116
  <br>
117
 
118
 
119
+
120
+ ## Using the model on Huggingface
121
+ The model can be used on Hugginface utilizing the "Hosted inference API" in the window on the right.
122
+ Click 'Compute' to predict the class labels for an example abstract or an abstract inserted by yourself.
123
+ The class label 'positive' corresponds to 'positive results only', while 'negative' represents 'mixed or negative results'.
124
+
125
+ ## Using the model for larger data
126
+ ```
127
+ from transformers import AutoTokenizer, Trainer, AutoModelForSequenceClassification
128
+ ## 1. Load tokenizer
129
+ tokenizer = AutoTokenizer.from_pretrained('allenai/scibert_scivocab_uncased')
130
+
131
+ ## 2. Apply preprocess function to data
132
+ def preprocess_function(examples):
133
+ return tokenizer(examples["text"],
134
+ truncation=True,
135
+ max_length=512,
136
+ padding='max_length'
137
+ )
138
+ tokenized_data = dataset.map(preprocess_function, batched=True)
139
+
140
+ # 3. Load Model
141
+ NegativeResultDetector = AutoModelForSequenceClassification.from_pretrained("ClinicalMetaScience/NegativeResultDetector")
142
+
143
+ ## 4. Initialize the trainer with the model and tokenizer
144
+ trainer = Trainer(
145
+ model=NegativeResultDetector,
146
+ tokenizer=tokenizer,
147
+ )
148
+
149
+ # 5. Apply NegativeResultDetector for prediction on inference data
150
+ predict_test=trainer.predict(tokenized_data["inference"])
151
+
152
+ ```
153
+
154
+ Further information on analyzing your own or our example data can be found in this [script](https://github.com/PsyCapsLock/PubBiasDetect/blob/main/Scripts/example_folder/Predict_Example_Abstracts_using_NegativeResultDetector.ipynb)
155
+ from our [GitHub repository](https://github.com/PsyCapsLock/PubBiasDetect).
156
+
157
+
158
  ## Disclaimer
159
  This tool is developed to analyze and predict the prevalence of positive and negative results in scientific abstracts based on the SciBERT model. While publication bias is a plausible explanation for certain patterns of results observed in scientific literature, the analyses conducted by this tool do not conclusively establish the presence of publication bias or any other underlying factors. It's essential to understand that this tool evaluates data but does not delve into the underlying reasons for the observed trends.
160
  The validation of this tool has been conducted on primary studies from the field of clinical psychology and psychotherapy. While it might yield insights when applied to abstracts of other fields or other types of studies (such as meta-analyses), its applicability and accuracy in such contexts have not been thoroughly tested yet. The developers of this tool are not responsible for any misinterpretation or misuse of the tool's results, and encourage users to have a comprehensive understanding of the limitations inherent in statistical analysis and prediction models.