DaJulster commited on
Commit
80d5985
·
1 Parent(s): d8a9300

Add essential files for deployment

Browse files
.gitattributes ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ *.faiss filter=lfs diff=lfs merge=lfs -text
2
+ *.pkl filter=lfs diff=lfs merge=lfs -text
3
+ docs/faiss/*.faiss filter=lfs diff=lfs merge=lfs -text
4
+ docs/faiss/*.pkl filter=lfs diff=lfs merge=lfs -text
5
+ *.jpg filter=lfs diff=lfs merge=lfs -text
6
+ *.jpeg filter=lfs diff=lfs merge=lfs -text
7
+ *.png filter=lfs diff=lfs merge=lfs -text
8
+ *.pdf filter=lfs diff=lfs merge=lfs -text
9
+ *.m4a filter=lfs diff=lfs merge=lfs -text
docs/faiss/document_lookup.txt ADDED
@@ -0,0 +1,463 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Document 0:
2
+ Source: https://julien-ser.github.io/JulienSerbanescu/
3
+ Type: Unknown
4
+ Content Preview: Julien Serbanescu
5
+
6
+
7
+
8
+
9
+ Julien Serbanescu...
10
+ --------------------------------------------------------------------------------
11
+ Document 1:
12
+ Source: docs/pdfs\paper.pdf
13
+ Type: Unknown
14
+ Content Preview: UnAnswGen: A Systematic Approach for Generating
15
+ Unanswerable Questions in Machine Reading Comprehens...
16
+ --------------------------------------------------------------------------------
17
+ Document 2:
18
+ Source: docs/pdfs\paper.pdf
19
+ Type: Unknown
20
+ Content Preview: Unlike existing datasets like SQuAD2.0, which do not account for
21
+ the reasons behind question unanswe...
22
+ --------------------------------------------------------------------------------
23
+ Document 3:
24
+ Source: docs/pdfs\paper.pdf
25
+ Type: Unknown
26
+ Content Preview: query reformulation. The resulting UnAnswGen dataset and asso-
27
+ ciated software workflow are made pub...
28
+ --------------------------------------------------------------------------------
29
+ Document 4:
30
+ Source: docs/pdfs\paper.pdf
31
+ Type: Unknown
32
+ Content Preview: on the first page. Copyrights for components of this work owned by others than the
33
+ author(s) must be...
34
+ --------------------------------------------------------------------------------
35
+ Document 5:
36
+ Source: docs/pdfs\paper.pdf
37
+ Type: Unknown
38
+ Content Preview: Development in Information Retrieval in the Asia Pacific Region (SIGIR-AP
39
+ ’24), December 9–12, 2024,...
40
+ --------------------------------------------------------------------------------
41
+ Document 6:
42
+ Source: docs/pdfs\paper.pdf
43
+ Type: Unknown
44
+ Content Preview: should avoid responding rather than making uncertain guesses,
45
+ demonstrating their language comprehen...
46
+ --------------------------------------------------------------------------------
47
+ Document 7:
48
+ Source: docs/pdfs\paper.pdf
49
+ Type: Unknown
50
+ Content Preview: systems advance to meet the complexity of real-world information
51
+ needs, there is an increasing deman...
52
+ --------------------------------------------------------------------------------
53
+ Document 8:
54
+ Source: docs/pdfs\paper.pdf
55
+ Type: Unknown
56
+ Content Preview: SIGIR-AP ’24, December 9–12, 2024, Tokyo, Japan Hadiseh Moradisani, Fattane Zarrinkalam, Julien Serb...
57
+ --------------------------------------------------------------------------------
58
+ Document 9:
59
+ Source: docs/pdfs\paper.pdf
60
+ Type: Unknown
61
+ Content Preview: SQuAD2-CR
62
+ [17] Wikip
63
+ edia Cr
64
+ owdsourcing (86,821
65
+ - 43,498) 6 [19] 2020
66
+ Dur
67
+ eader [11] Chinese
68
+ search...
69
+ --------------------------------------------------------------------------------
70
+ Document 10:
71
+ Source: docs/pdfs\paper.pdf
72
+ Type: Unknown
73
+ Content Preview: instance, it contains only 3,350 unanswerable questions labeled
74
+ with No Information. Moreover, as th...
75
+ --------------------------------------------------------------------------------
76
+ Document 11:
77
+ Source: docs/pdfs\paper.pdf
78
+ Type: Unknown
79
+ Content Preview: ing unanswerable questions and enables the exploration of various
80
+ causes of unanswerability. The onl...
81
+ --------------------------------------------------------------------------------
82
+ Document 12:
83
+ Source: docs/pdfs\paper.pdf
84
+ Type: Unknown
85
+ Content Preview: unanswerable questions into answerable ones.
86
+ To develop a multi-label MRC dataset with unanswerable ...
87
+ --------------------------------------------------------------------------------
88
+ Document 13:
89
+ Source: docs/pdfs\paper.pdf
90
+ Type: Unknown
91
+ Content Preview: mation for each input question. Second, the generated candidate
92
+ unanswerable questions are evaluated...
93
+ --------------------------------------------------------------------------------
94
+ Document 14:
95
+ Source: docs/pdfs\paper.pdf
96
+ Type: Unknown
97
+ Content Preview: The advantages of our work are twofold: (1) Our implementation
98
+ of the proposed software workflow all...
99
+ --------------------------------------------------------------------------------
100
+ Document 15:
101
+ Source: docs/pdfs\paper.pdf
102
+ Type: Unknown
103
+ Content Preview: of SQuAD2.0 that includes multi-labeled unanswerable questions.
104
+ Figure 1 presents the overview of ou...
105
+ --------------------------------------------------------------------------------
106
+ Document 16:
107
+ Source: docs/pdfs\paper.pdf
108
+ Type: Unknown
109
+ Content Preview: UnAnswGen: A Systematic Approach for Generating Unanswerable Questions in Machine Reading Comprehens...
110
+ --------------------------------------------------------------------------------
111
+ Document 17:
112
+ Source: docs/pdfs\paper.pdf
113
+ Type: Unknown
114
+ Content Preview: linguistic dimensions such as entity swap, number swap, negation,
115
+ antonym, mutual exclusion, and no ...
116
+ --------------------------------------------------------------------------------
117
+ Document 18:
118
+ Source: docs/pdfs\paper.pdf
119
+ Type: Unknown
120
+ Content Preview: passage, and its candidate unanswerable questions. We have imple-
121
+ mented and integrated a comprehens...
122
+ --------------------------------------------------------------------------------
123
+ Document 19:
124
+ Source: docs/pdfs\paper.pdf
125
+ Type: Unknown
126
+ Content Preview: placement: For each entity in the input answerable question, we
127
+ replace it with another entity of th...
128
+ --------------------------------------------------------------------------------
129
+ Document 20:
130
+ Source: docs/pdfs\paper.pdf
131
+ Type: Unknown
132
+ Content Preview: corresponding context.
133
+ Number Swap. Number Swap involves modifying a question to
134
+ potentially render ...
135
+ --------------------------------------------------------------------------------
136
+ Document 21:
137
+ Source: docs/pdfs\paper.pdf
138
+ Type: Unknown
139
+ Content Preview: Time magazine named her one of the most 100 influential people of the
140
+ century? could be Time magazin...
141
+ --------------------------------------------------------------------------------
142
+ Document 22:
143
+ Source: docs/pdfs\paper.pdf
144
+ Type: Unknown
145
+ Content Preview: (2) Replacement: Replace each identified word with its antonym,
146
+ ensuring the modified question remai...
147
+ --------------------------------------------------------------------------------
148
+ Document 23:
149
+ Source: docs/pdfs\paper.pdf
150
+ Type: Unknown
151
+ Content Preview: SIGIR-AP ’24, December 9–12, 2024, Tokyo, Japan Hadiseh Moradisani, Fattane Zarrinkalam, Julien Serb...
152
+ --------------------------------------------------------------------------------
153
+ Document 24:
154
+ Source: docs/pdfs\paper.pdf
155
+ Type: Unknown
156
+ Content Preview: of her music?, utilizing the Detection and Removal approach might
157
+ lead to a question such as Beyoncé...
158
+ --------------------------------------------------------------------------------
159
+ Document 25:
160
+ Source: docs/pdfs\paper.pdf
161
+ Type: Unknown
162
+ Content Preview: formation available in the given context, the question becomes
163
+ inherently unanswerable. For instance...
164
+ --------------------------------------------------------------------------------
165
+ Document 26:
166
+ Source: docs/pdfs\paper.pdf
167
+ Type: Unknown
168
+ Content Preview: No Information. Similar to [33], to modify the original answer-
169
+ able questions by considering this c...
170
+ --------------------------------------------------------------------------------
171
+ Document 27:
172
+ Source: docs/pdfs\paper.pdf
173
+ Type: Unknown
174
+ Content Preview: California is also home to a large homegrown surf and skateboard cul-
175
+ ture.... This method ensures t...
176
+ --------------------------------------------------------------------------------
177
+ Document 28:
178
+ Source: docs/pdfs\paper.pdf
179
+ Type: Unknown
180
+ Content Preview: 𝑎𝑖 , and for each candidate unanswerable question (𝑐𝑗,𝑙𝑗 ) ∈𝐶𝑞𝑖 , we
181
+ conduct the following evaluatio...
182
+ --------------------------------------------------------------------------------
183
+ Document 29:
184
+ Source: docs/pdfs\paper.pdf
185
+ Type: Unknown
186
+ Content Preview: 𝑞𝑖 , denoted as 𝑞′
187
+ 𝑖 , to 𝑈𝑞𝑖 , and attribute 𝑙𝑗 as the reason for the
188
+ unanswerability of 𝑞′
189
+ 𝑖 .
190
+ The...
191
+ --------------------------------------------------------------------------------
192
+ Document 30:
193
+ Source: docs/pdfs\paper.pdf
194
+ Type: Unknown
195
+ Content Preview: are answerable. This dataset, developed through crowdsourcing,
196
+ consists of a training set with 130,3...
197
+ --------------------------------------------------------------------------------
198
+ Document 31:
199
+ Source: docs/pdfs\paper.pdf
200
+ Type: Unknown
201
+ Content Preview: swerable candidate questions from a single modification process.
202
+ Consequently, from the 86,821 answe...
203
+ --------------------------------------------------------------------------------
204
+ Document 32:
205
+ Source: docs/pdfs\paper.pdf
206
+ Type: Unknown
207
+ Content Preview: UnAnswGen: A Systematic Approach for Generating Unanswerable Questions in Machine Reading Comprehens...
208
+ --------------------------------------------------------------------------------
209
+ Document 33:
210
+ Source: docs/pdfs\paper.pdf
211
+ Type: Unknown
212
+ Content Preview: ele
213
+ ctra-base-squad2 74.8 84.7 84.7 67.9 87.8 93.5 72.2 81.6 89.9
214
+ r
215
+ oberta-large-squad 78.7 90 90 69...
216
+ --------------------------------------------------------------------------------
217
+ Document 34:
218
+ Source: docs/pdfs\paper.pdf
219
+ Type: Unknown
220
+ Content Preview: questions by returning a null or empty string when no appropri-
221
+ ate answer is found within the conte...
222
+ --------------------------------------------------------------------------------
223
+ Document 35:
224
+ Source: docs/pdfs\paper.pdf
225
+ Type: Unknown
226
+ Content Preview: score indicating the model’s certainty in its provided answer. For
227
+ unanswerable questions, these mod...
228
+ --------------------------------------------------------------------------------
229
+ Document 36:
230
+ Source: docs/pdfs\paper.pdf
231
+ Type: Unknown
232
+ Content Preview: Table 4: Statistics on UnAnswGen dataset.
233
+ Unansw
234
+ erability Classes #
235
+ of Questions Per
236
+ centage A
237
+ vera...
238
+ --------------------------------------------------------------------------------
239
+ Document 37:
240
+ Source: docs/pdfs\paper.pdf
241
+ Type: Unknown
242
+ Content Preview: of the final unanswerable question set. Specifically, questions from
243
+ the Negation category account f...
244
+ --------------------------------------------------------------------------------
245
+ Document 38:
246
+ Source: docs/pdfs\paper.pdf
247
+ Type: Unknown
248
+ Content Preview: under the Negation label, 31.95 unanswerable questions under the
249
+ Antonym label, and only 2.6 unanswe...
250
+ --------------------------------------------------------------------------------
251
+ Document 39:
252
+ Source: docs/pdfs\paper.pdf
253
+ Type: Unknown
254
+ Content Preview: indicates that the question is completely unrelated to the context,
255
+ whereas a score of 1 indicates s...
256
+ --------------------------------------------------------------------------------
257
+ Document 40:
258
+ Source: docs/pdfs\paper.pdf
259
+ Type: Unknown
260
+ Content Preview: SIGIR-AP ’24, December 9–12, 2024, Tokyo, Japan Hadiseh Moradisani, Fattane Zarrinkalam, Julien Serb...
261
+ --------------------------------------------------------------------------------
262
+ Document 41:
263
+ Source: docs/pdfs\paper.pdf
264
+ Type: Unknown
265
+ Content Preview: SQuAD2-CR+UnAnsw
266
+ Gen 71.93 71.93 92.48 96.09 51.43 58.27 46.83 63.78 69.17 81.77 64.86 78.69 86.8 92...
267
+ --------------------------------------------------------------------------------
268
+ Document 42:
269
+ Source: docs/pdfs\paper.pdf
270
+ Type: Unknown
271
+ Content Preview: to evaluate the UnAnswerGen dataset against these criteria. Table
272
+ 6 presents the results of Krippend...
273
+ --------------------------------------------------------------------------------
274
+ Document 43:
275
+ Source: docs/pdfs\paper.pdf
276
+ Type: Unknown
277
+ Content Preview: CR, which already includes multi-class labeling of unanswerable
278
+ questions, and (2) the SQuAD2-CR tra...
279
+ --------------------------------------------------------------------------------
280
+ Document 44:
281
+ Source: docs/pdfs\paper.pdf
282
+ Type: Unknown
283
+ Content Preview: underwent training on both the enhanced SQuAD2.0+ UnAnswGen
284
+ and the original SQuAD2-CR datasets, wit...
285
+ --------------------------------------------------------------------------------
286
+ Document 45:
287
+ Source: docs/pdfs\paper.pdf
288
+ Type: Unknown
289
+ Content Preview: BERTa, and 1% for Electra were observed. The balanced dataset
290
+ successfully mitigates issues related ...
291
+ --------------------------------------------------------------------------------
292
+ Document 46:
293
+ Source: docs/pdfs\paper.pdf
294
+ Type: Unknown
295
+ Content Preview: hanced MRC datasets, with a focus on including multi-label unan-
296
+ swerable questions. We have develop...
297
+ --------------------------------------------------------------------------------
298
+ Document 47:
299
+ Source: docs/pdfs\paper.pdf
300
+ Type: Unknown
301
+ Content Preview: flow to enrich other datasets, such as HotPotQA [35] and Natural
302
+ Questions [16], with multi-label un...
303
+ --------------------------------------------------------------------------------
304
+ Document 48:
305
+ Source: docs/pdfs\paper.pdf
306
+ Type: Unknown
307
+ Content Preview: UnAnswGen: A Systematic Approach for Generating Unanswerable Questions in Machine Reading Comprehens...
308
+ --------------------------------------------------------------------------------
309
+ Document 49:
310
+ Source: docs/pdfs\paper.pdf
311
+ Type: Unknown
312
+ Content Preview: 3115–3119.
313
+ [4] Christopher Clark and Matt Gardner. 2017. Simple and effective multi-paragraph
314
+ readin...
315
+ --------------------------------------------------------------------------------
316
+ Document 50:
317
+ Source: docs/pdfs\paper.pdf
318
+ Type: Unknown
319
+ Content Preview: ciation for Computational Linguistics: EMNLP 2023. 7349–7360.
320
+ [9] Kilem L Gwet. 2011. On the Krippen...
321
+ --------------------------------------------------------------------------------
322
+ Document 51:
323
+ Source: docs/pdfs\paper.pdf
324
+ Type: Unknown
325
+ Content Preview: questions. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33.
326
+ 6529–6537.
327
+ [13...
328
+ --------------------------------------------------------------------------------
329
+ Document 52:
330
+ Source: docs/pdfs\paper.pdf
331
+ Type: Unknown
332
+ Content Preview: [16] Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur
333
+ Parikh, Chris Alb...
334
+ --------------------------------------------------------------------------------
335
+ Document 53:
336
+ Source: docs/pdfs\paper.pdf
337
+ Type: Unknown
338
+ Content Preview: 2022. Ptau: Prompt tuning for attributing unanswerable questions. In Proceedings
339
+ of the 45th Interna...
340
+ --------------------------------------------------------------------------------
341
+ Document 54:
342
+ Source: docs/pdfs\paper.pdf
343
+ Type: Unknown
344
+ Content Preview: Answer Networks for Machine Reading Comprehension. In Proceedings of the
345
+ 56th Annual Meeting of the ...
346
+ --------------------------------------------------------------------------------
347
+ Document 55:
348
+ Source: docs/pdfs\paper.pdf
349
+ Type: Unknown
350
+ Content Preview: Majumder, and Li Deng. 2016. Ms marco: A human-generated machine reading
351
+ comprehension dataset. (201...
352
+ --------------------------------------------------------------------------------
353
+ Document 56:
354
+ Source: docs/pdfs\paper.pdf
355
+ Type: Unknown
356
+ Content Preview: Unanswerable questions for SQuAD. arXiv preprint arXiv:1806.03822 (2018).
357
+ [30] Pranav Rajpurkar, Jia...
358
+ --------------------------------------------------------------------------------
359
+ Document 57:
360
+ Source: docs/pdfs\paper.pdf
361
+ Type: Unknown
362
+ Content Preview: 7th CCF International Conference, NLPCC 2018, Hohhot, China, August 26–30, 2018,
363
+ Proceedings, Part I...
364
+ --------------------------------------------------------------------------------
365
+ Document 58:
366
+ Source: docs/pdfs\paper.pdf
367
+ Type: Unknown
368
+ Content Preview: [36] Changchang Zeng, Shaobo Li, Qin Li, Jie Hu, and Jianjun Hu. 2020. A survey
369
+ on machine reading c...
370
+ --------------------------------------------------------------------------------
371
+ Document 59:
372
+ Source: docs/pdfs\resume.pdf
373
+ Type: Unknown
374
+ Content Preview: Julien Serbanescu
375
+ 437-260-3435 | [email protected] | linkedin.com/in/julien-serbanescu-6ba52a241 ...
376
+ --------------------------------------------------------------------------------
377
+ Document 60:
378
+ Source: docs/pdfs\resume.pdf
379
+ Type: Unknown
380
+ Content Preview: Innovation/Creativity, Technical Communication, Mentoring
381
+ Education
382
+ Computer Engineering Co-op Major...
383
+ --------------------------------------------------------------------------------
384
+ Document 61:
385
+ Source: docs/pdfs\resume.pdf
386
+ Type: Unknown
387
+ Content Preview: on publications. /external-link-altUtilized various NLP methods such as NLTK and SpaCy in Python to ...
388
+ --------------------------------------------------------------------------------
389
+ Document 62:
390
+ Source: docs/pdfs\resume.pdf
391
+ Type: Unknown
392
+ Content Preview: application (Windows EXE) for cybersecurity threat detection and testing
393
+ Organizations
394
+ Guelph AI Clu...
395
+ --------------------------------------------------------------------------------
396
+ Document 63:
397
+ Source: docs/pdfs\resume.pdf
398
+ Type: Unknown
399
+ Content Preview: • /external-link-altLed a team to develop an AI assistant inspired by Jarvis from Iron Man, ensuring...
400
+ --------------------------------------------------------------------------------
401
+ Document 64:
402
+ Source: docs/pdfs\resume.pdf
403
+ Type: Unknown
404
+ Content Preview: • Developing robotics software using Docker and Linux, implementing Python-based control for Webots
405
+ ...
406
+ --------------------------------------------------------------------------------
407
+ Document 65:
408
+ Source: docs/pdfs\resume.pdf
409
+ Type: Unknown
410
+ Content Preview: codes and provide medicine information, working on backend, delegating frontend and bridging
411
+ • /exte...
412
+ --------------------------------------------------------------------------------
413
+ Document 66:
414
+ Source: docs/pdfs\resume.pdf
415
+ Type: Unknown
416
+ Content Preview: with a ReactJS frontend dashboard for configuring and managing academic research projects
417
+ GAN to Gen...
418
+ --------------------------------------------------------------------------------
419
+ Document 67:
420
+ Source: BinThere.ai.m4a
421
+ Type: audio_transcription
422
+ Content Preview: All right, so today we're going to be quickly demoing binthere.ai. Now, what this does, it will det...
423
+ --------------------------------------------------------------------------------
424
+ Document 68:
425
+ Source: BinThere.ai.m4a
426
+ Type: audio_transcription
427
+ Content Preview: list it as biodegradable piece. So if I lower it down just because it's getting the white backgroun...
428
+ --------------------------------------------------------------------------------
429
+ Document 69:
430
+ Source: BinThere.ai.m4a
431
+ Type: audio_transcription
432
+ Content Preview: This is our new max score. And then as I listed there, and then it says it typically goes in a comp...
433
+ --------------------------------------------------------------------------------
434
+ Document 70:
435
+ Source: Synthia by Nuvela-AI.m4a
436
+ Type: audio_transcription
437
+ Content Preview: This is the user interface review of Cynthia, which is a service that makes research papers smarter...
438
+ --------------------------------------------------------------------------------
439
+ Document 71:
440
+ Source: Synthia by Nuvela-AI.m4a
441
+ Type: audio_transcription
442
+ Content Preview: load in certain fragments of other papers that have relevant pieces of information to what exactly...
443
+ --------------------------------------------------------------------------------
444
+ Document 72:
445
+ Source: Synthia by Nuvela-AI.m4a
446
+ Type: audio_transcription
447
+ Content Preview: machinery reading comprehension in order to find user sentiment. Something like that. And then we...
448
+ --------------------------------------------------------------------------------
449
+ Document 73:
450
+ Source: Synthia by Nuvela-AI.m4a
451
+ Type: audio_transcription
452
+ Content Preview: be another tool that the model context protocol system using and topic would actually be able to e...
453
+ --------------------------------------------------------------------------------
454
+ Document 74:
455
+ Source: Synthia by Nuvela-AI.m4a
456
+ Type: audio_transcription
457
+ Content Preview: to use. And it helps just make research smarter, more efficient, and better for users overall. No...
458
+ --------------------------------------------------------------------------------
459
+ Document 75:
460
+ Source: Synthia by Nuvela-AI.m4a
461
+ Type: audio_transcription
462
+ Content Preview: on our existing formatted proxy late-tech code. So that's my overview for our Cynthia front-end or...
463
+ --------------------------------------------------------------------------------
docs/faiss/index.faiss ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ba19108639e5fbf109bd22d4b856c4b50c6b1a91302f07bab21215029ed95b83
3
+ size 311341
docs/faiss/index.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9ac8f74a32c54fd338376934a9cd7b6691f26569029ff505181dbaa7d401e675
3
+ size 79972
docs/faiss/metadata.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:30f47cde2aa55557b2170a4497e54ba46aeb4e65b04834da3397dad759807006
3
+ size 1134
docs/pdfs/paper.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ad1c9232f3adff51a2b33725926753212e4ba90ffc0f0e6d0ba44224dae07ce3
3
+ size 1000812
docs/pdfs/resume.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3ff52c5befd12bbe850e9ab81e9844c71bd4557a72d27fea36c11252f24dbba9
3
+ size 135227
docs/youtube/BinThere.ai.m4a ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5c8f178deba8ddb2eb8cbdf2a60e8b4d4a4124891188ec0084d9f55d4675f252
3
+ size 2210006
docs/youtube/BinThere.ai_transcript.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ All right, so today we're going to be quickly demoing binthere.ai. Now, what this does, it will detect your garbage and it will categorize it and also give you suggestions to reuse it in a fun, interactive way. So let's just get right into it and see what it does. So first, we have a streamlined interface. We change the colors and make it seamlessly fit in with the background for starters. But let's get right into the detection. So this will actually detect all types of waste live. And we're running a model that we got from yellow. So if I go down on a white background, this works better as it contrasts perfectly. But for example, if I put a bottle in, it will detect it as plastic as that kind of waste. And we'll have that section there. If I replace it with say this apple, it will list it as biodegradable piece. So if I lower it down just because it's getting the white background, it will detect that. And then when I press stop detection, it will stop detection and save the last known item that was available, which was biodegradable. Now from here, we can press analyze and it will give us some insights towards this. So it takes a second because we're using OpenAI's API to do this. But once we get the results, you will see the text below. So it tells us the energy saved. And it also, you could get a source where we kind of did the calculation a bit. And that's where it is. So if we go back, I have to press analyze again. It'll take a second, but yeah, I should open on new tab. But should give us the results quickly. Yeah, exactly. Jewels of energy saved. This is our new max score. And then as I listed there, and then it says it typically goes in a compost bin. We check local guidelines. It gives us options of what to do. Avoid contaminants in it as well. It also gives us local laws in the city as well. So it's a bit more applicable to the user than say other existing technologies that are similar. We also have ways to reuse it. For example, compost at home, making apple cider vinegar, making it feeding it to animals or livestock. The help reduce waste. And it gives you more constructive ways than just simply throwing it out in the right bit. So we actually want to add some extra steps to you. So you could actually reach your energy saving amount. And yeah, that was a quick demo of binthere.ai. I hope you enjoyed.
docs/youtube/Synthia by Nuvela-AI.m4a ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b36fac7c00c1d10b03f573acb806315037ad8a8b666b630697d96d8c86195200
3
+ size 4736121
docs/youtube/Synthia by Nuvela-AI_transcript.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ This is the user interface review of Cynthia, which is a service that makes research papers smarter through AI, specifically model context protocols. So first, let's create a project. We're going to create our paper. So I have this previously loaded example where it's unanswered building and question answering frameworks. That'll be my name. And this will be the summary of what the paper entails. This is similar to a task I worked over the summer. So this is a bit relating to me, and I understand this task a bit more. Let's create this new project. So as we create a new project, we'll see it's under list of papers. We have our details. And then under fragments, this is where the model context protocol really sets in. As based on a pine cone database, it will be able to load in certain fragments of other papers that have relevant pieces of information to what exactly we're looking for. For example, this fragment, this segment of this introduction to unanswered building paper is going to be very important in our papers. So it suggests we can use that. Then we have causes of hallucination in LLAMS, hallucination of large language models, et cetera. And they all just have their own unique features and in fragments. So we start with 10 by default. You could add your own fragments. You could also delete existing fragments, like I deleted those two. Let's say I want to do MRC sentiment analysis. And then we're going to enter it in. Author, we're going to do Mark Noble. Let's just let's just say Mark Noble. And then year 2015, summary using machinery reading comprehension in order to find user sentiment. Something like that. And then we'll just enter a certain, like a dev post link or we'll put in a GitHub link for now. If we add the fragment, you'll see it's here. And if we go to source, it will be accessed. Now, these other sources, they are just proxy papers for now. But as we implement it and we load our kind of code database, it will have actual papers within it. Now from here, there are many things we could do. Let's start with generating citations. So this will take our existing paper fragments. And it will show us, oh, based on, they have predefined authors in years. And based on that, we loaded them in a citation form. And now, alumni would help us use different methods as well. And this would also be another tool that the model context protocol system using and topic would actually be able to execute within it. So as you see, these are all citations we have. We also have another feature called source analysis where, again, we do one more semantic match with our project summary and title. And we semantically match with each of these fragments, user added and normal. And then we would get sort of a percent match. Like, oh, for example, using the squad 2.0 database is 94% accurate to what we're actually detailing with our paper, detecting unanswerable questions, similar story there. As you can see, it's a very robust system that gives more insights into the paper author as to, oh, here's some sources that you can use. Here's how accurate they are to what you want to use. And it helps just make research smarter, more efficient, and better for users overall. Now, this last feature, it gives you a generated paper. It will generate some late-tech code. And it'll also show you a preview of what this may look like otherwise. Again, this is sort of proxy data based off of our existing fragments. And another tool set from a model context protocol will be able to do this. Again, we also implement a code here and betting as well, just to make sure we have a vector database so every semantic match would work perfectly. Even this generation would work well as well. But I think a more general L1 would be more implemented here. And finally, for a PDF preview, it also gives us our actual, or an actual PDF-looking document of this sort based on our existing formatted proxy late-tech code. So that's my overview for our Cynthia front-end or user experience throughout the process. Obviously, the backend would make this a bit more robust and customizable towards the user, using features such as model context protocol, cohere, embedding, pine cone database storage. And obviously, running it on in the topic would be the thing with model context protocol. Thank you very much for listening.
gunicorn_config.py ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Gunicorn configuration file
2
+ import multiprocessing
3
+
4
+ # Worker processes
5
+ workers = multiprocessing.cpu_count() * 2 + 1
6
+ worker_class = 'sync'
7
+ worker_connections = 1000
8
+
9
+ # Timeouts
10
+ timeout = 120 # 2 minutes
11
+ graceful_timeout = 120
12
+ keepalive = 5
13
+
14
+ # Logging
15
+ accesslog = '-'
16
+ errorlog = '-'
17
+ loglevel = 'info'
18
+
19
+ # Process naming
20
+ proc_name = 'julien-serbanescu-app'
21
+
22
+ # Server mechanics
23
+ daemon = False
24
+ pidfile = None
25
+ umask = 0
26
+ user = None
27
+ group = None
28
+ tmp_upload_dir = None
29
+
30
+ # SSL
31
+ keyfile = None
32
+ certfile = None