File size: 36,508 Bytes
b84549f
14c6c9d
b84549f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b8c82b8
 
b84549f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
<!DOCTYPE html>
<html>

<head>
  <meta charset="utf-8" />
  <meta name="viewport" content="width=device-width" />
  <title>EdgeTA</title>
  <link rel="stylesheet" href="style.css" />
</head>

<body>
  <div class="card">
    <h1>EdgeTA: Retraining Multiple Foundation Models<br>for Evolving Data at Edge</h1>

    <h2>Table of Contents</h2>
    <ul>
      <li><a href="#abstract">1. Introduction</a></li>
      <li><a href="#installation">2. Code and Installation</a></li>
      <li><a href="#vit">3. Running Example 1: Supporting a Hugging Face FM Vision Transformer</a></li>
      <li><a href="#clip">4. Running Example 2: Supporting a Hugging Face FM CLIP</a></li>
      <li><a href="#sam">5. Running Example 3: Supporting a user-specified FM SAM</a></li>
      <li><a href="#glip">6. Running Example 4: Supporting a user-specified FM GLIP</a></li>
      <li><a href="#gpt-neo">7. Running Example 5: Supporting a Hugging Face FM GPT-Neo</a></li>
      <li><a href="#roberta">8. Running Example 6: Supporting a Hugging Face FM Roberta</a></li>
      <li>
        <a href="#implementation">9. Implementation (Development API Documentation)</a>
        <ul>
          <li><a href="#hugging-face-model">9.1 Supporting a Hugging Face FM</a></li>
          <li><a href="#user-specified-model">9.2 Supporting a user-specified FM</a></li>
        </ul>
      </li>
      <li>
        <a href="#evaluation">10. Experimental evaluation in TMC 2024 submission</a>
        <ul>
          <li><a href="#101">10.1 Basic settings</a></li>
          <li><a href="#102">10.2 Additional details</a></li>
          <li><a href="#103">10.3 Additional experiment results <span style="font-weight: bold;">(applying EdgeTA in SOTA FMs: CLIP, SAM, GLIP, GPT-Neo and Roberta)</span></a></li>
        </ul>
      </li>
      
    </ul>

    <h2 id="abstract">1. Introduction</h2>
    <p>Foundation models (FMs) such as large language models are the driving force of the next generation artificial
      intelligence systems. The trend of deploying FMs at edge challenges their scaling potential when encountering
      massive new input data with compressed model sizes and constrained device resources. The prior art sheds light on
      learning new tasks and domains (data feature shifts) based on deployed networks. However, such learning approaches
      exacerbate the existing limitations: (i) predetermined network architectures lower model accuracy, and (ii) fixed
      model sizes hinder resource allocation optimization at a finer granularity.</p>
    <p>In this paper, we propose EdgeTA, a lightweight, neuron-grained scaling solution to unlock FMs' scaling potency
      in edge intelligence systems. EdgeTA achieves high accuracy and low overheads in model retraining by adaptively
      transforming a FM into a compact model that retains the most important neurons to the current input data. At
      run-time, EdgeTA determines optimal model sizes and assigned resources for multiple applications to maximize their
      overall accuracy. We implement EdgeTA in prevalent FMs of natural language processing, computer vision and
      multimodal applications and compare it against state-of-the-art techniques. Evaluation results show that our
      approach improves accuracy by 21.88% while reducing memory footprint and energy consumptions by 27.14% and 65.65%,
      and further achieves 15.96% overall accuracy improvement via neuron-grained resource scheduling.</p>

    <h2 id="installation">2. Code and Installation</h2>
    <p>The code is released in <a href="https://huggingface.co/spaces/LINC-BIT/EdgeTA/tree/main" target="_blank">https://huggingface.co/spaces/LINC-BIT/EdgeTA/tree/main</a>. You can use the "git clone" command to clone this repository:</p>
    <code>git clone https://huggingface.co/spaces/LINC-BIT/EdgeTA</code>
    <p>The directory structure is organized as below:</p>
    <ul>
      <li>data: it contains datasets implementation</li>
      <li>dnns: it contains models implementation</li>
      <li>experiments: it contains the files to launch the experiments in the submitted paper</li>
      <li>methods: it contains EdgeTA implementation</li>
      <li>new_impl: it applies EdgeTA in several SOTA FMs: CLIP, SAM, and GLIP</li>
      <li>utils: it contains several our implemented tool packages</li>
    </ul>

    <h3>2.1 Requirements</h3>
    <ul>
      <li>Linux and Windows</li>
      <li>Python 3.8+</li>
      <li>CUDA 10.2+</li>
    </ul>

    <h3>2.2 Preparing Environment</h3>
    <p>First, create a conda virtual environment and activate it:</p>
    <code>
        conda create -n EdgeTA python=3.8<br>
 		    conda activate EdgeTA
      </code>
    <p>Second, install torch and torchvision according to the <a href="https://pytorch.org/get-started/locally/">offical
        site</a>.</p>
    <img src="https://user-images.githubusercontent.com/73862727/146364503-5664de5b-24b1-4a85-b342-3d061cd7563f.png" />
    <p>Get the installation command according to the selection in the official site, and copy them to the terminal. </p>
    <p>Finally, install the required dependencies via pip:</p>
    <code>
        pip install -r requirements.txt
      </code>

    <h2 id="vit">3. Running Example 1: Supporting a Hugging Face FM Vision Transformer</h2>

    <h3>3.1 Settings</h3>
    <p><b>Models.</b> We use a semantic segmentation model based on Vision Transformer from Hugging Face as an example
      to explain how to connect a Hugging Face FM to the EdgeTA.</p>
    <p><b>Datasets.</b> We use datasets <a href="https://link.springer.com/chapter/10.1007/978-3-319-46475-6_7">GTA5</a>
      and <a href="https://supervise.ly">SuperviselyPerson</a> as the source domain, and datasets <a
        href="https://openaccess.thecvf.com/content_cvpr_2016/html/Cordts_The_Cityscapes_Dataset_CVPR_2016_paper.html">Cityscapes</a>
      and <a href="https://ieeexplore.ieee.org/abstract/document/6976983">BaiduPerson</a> as the target domain.</p>

    <h3>3.2 Offline Elastic Proxy Construction</h3>
    <p>Run the following command sequentially to pre-train the knowledge base and index:</p>
    <code>
        python experiments/elasticdnn/vit_b_16/offline/fm_lora/cls/cls.py<br>
        python experiments/elasticdnn/vit_b_16/offline/fm_to_md/cls_md_wo_fbs.py<br>
        python experiments/elasticdnn/vit_b_16/offline/fm_to_md/cls_md_index.py<br>
      </code>
    <p>Note that the file path of the model checkpoint in last two files should be modified manually.</p>
    <p>Run the following command to open TensorBoard and watch the metrics (e.g. losses and accuracy) during the
      training process:</p>
    <code>
        tensorboard --logdir &lt;the file path of tensorboard logs outputed in the terminal&gt;
      </code>
    <p>Here are three TensorBoard screenshots when three commands above are running:</p>
    <img src="1.png">
    <img src="2.png">
    <img src="3.png">

    <h3>3.3 Online Evolving Input Data Adaptation</h3>
    <p>Run the following command to evaluate EdgeTA over evolving data:</p>
    <code>
        python experiments/elasticdnn/vit_b_16/online_new/cls/cls.py
      </code>
    <p>You can also launch TensorBoard to watch the retraining accuracy and time during the retraining process. Here is
      a screenshot:</p>
    <img src="4.png">

    <h3>(Optional) 3.4 Tuning the hyperparameters</h3>
    <p>Most of hyperparameters are common and easy to understand (e.g. batch size, learning rate, and optimizer
      arguments, etc). We introduce some unique hyperparameters in EdgeTA below.</p>
    <p>For python experiments/elasticdnn/vit_b_16/offline/fm_lora/cls/cls.py:</p>
    <ul>
      <li><b>ab_r</b>: the value of r in LoRA.</li>
    </ul>
    <p>For python experiments/elasticdnn/vit_b_16/offline/fm_to_md/cls_md_wo_fbs.py:</p>
    <ul>
      <li><b>sample_size</b>: the size of an input sample. For typical image workloads, the size is (1, 3, 224, 224).
        For language workloads, you can directly pass in a tokenized sample directory instead of a size.</li>
      <li><b>generate_md_width_ratio</b>: the ratio of the original FM's width to knowledge base's width. We recommend
        it is 4 or 8, which means that the knowledge base has 1/4 or 1/8 model size of the original FM.</li>
      <li><b>distill_loss_weight</b>: it controls the strength of distilling the original FM's feature to the knowledge
        base's feature using feature-based knowledge distillation. This helps improve the accuracy of the knowledge
        base.</li>
    </ul>
    <p>For python experiments/elasticdnn/vit_b_16/offline/fm_to_md/cls_md_index.py:</p>
    <ul>
      <li><b>FBS_r</b>: the value of r in FBS module. We recommend it is 16.</li>
      <li><b>indexes_optimizer_args</b>: the arguments of the optimizer used in training the neuron index between the
        knowledge base and the FM.</li>
      <li><b>min_sparisty and max_sparsity</b>: in each training iteration, the knowledge base is set to a random
        sparsity and then trained (refer to dynamic neural networks). min_sparisty and max_sparsity determine that the
        maximal model size and the minimal model size of the generated proxy model. For example, if min_sparisty = 0 and
        max_sparsity = 0.9, the maximal model size of the proxy model is the same to the knowledge base, and the minimal
        model size of the proxy model is 10% of the knowledge base.</li>
      <li><b>bn_cal_num_iters</b>: BN statstics is unstable during the training of dynamic neural networks (refer to
        S-Net (ICLR'19)). Therefore, before testing the accuracy of the knowledge base, its BN statstics should be
        calibrated using several iterations of inference on the test dataset (if the model has any BN layers).</li>
      <li><b>index_init</b>: how the value of neuron index is initialized. We recommend it is 'zero'.</li>
    </ul>

    <!-- <h3>3.5 Comparison results</h3>
    <p>The comparsion between EdgeTA and nine baselines in this workload is demonstrated below. This figure is already
      in the submitted paper.</p>
    <img style="width: 70%; margin: 0 auto;" src="5.png"> -->

    
    <h2 id="clip">4. Running Example 2: Supporting a Hugging Face FM CLIP</h2>

    <h3>4.1 Settings</h3>
    <p><b>Models.</b> We use a image classification model based on CLIP from Hugging Face as an example
      to explain how to connect a Hugging Face FM to the EdgeTA.</p>
    <p><b>Datasets.</b> We use datasets <a href="https://link.springer.com/chapter/10.1007/978-3-319-46475-6_7">GTA5</a>
      and <a href="https://supervise.ly">SuperviselyPerson</a> as the source domain, and datasets <a
        href="https://openaccess.thecvf.com/content_cvpr_2016/html/Cordts_The_Cityscapes_Dataset_CVPR_2016_paper.html">Cityscapes</a>
      and <a href="https://ieeexplore.ieee.org/abstract/document/6976983">BaiduPerson</a> as the target domain. We convert these semantic segmentation datasets into image classification datasets by cropping and saving the images in the segmentation bounding boxes.</p>

    <h3>4.2 Offline Elastic Proxy Construction</h3>
    <p>Run the following command sequentially to pre-train the knowledge base and index:</p>
    <code>
      python new_impl/cv/clip/cls.py<br>
      python new_impl/cv/clip/cls_md_wo_fbs.py<br>
      python new_impl/cv/clip/cls_md_index.py<br>
      </code>
    <p>Note that the file path of the model checkpoint in last two files should be modified manually.</p>
    <p>Run the following command to open TensorBoard and watch the metrics (e.g. losses and accuracy) during the
      training process:</p>
    <code>
        tensorboard --logdir &lt;the file path of tensorboard logs outputed in the terminal&gt;
      </code>
    <p>Here are three TensorBoard screenshots when three commands above are running:</p>
    <img src="clip-index.png">

    <h3>4.3 Online Evolving Input Data Adaptation</h3>
    <p>Run the following command to evaluate EdgeTA over evolving data:</p>
    <code>
      python new_impl/cv/clip/cls_online.py
    </code>
    <p>You can also launch TensorBoard to watch the retraining accuracy and time during the retraining process. Here is
      a screenshot:</p>
    <img src="clip-online.png">
      
    <!-- <h3>Compared with baseline</h3>
    <p>Compared to the baseline adaptation method CUA runs alone (colored by blue), EdgeTA (colored by red) improves its accuracy by 15%. When facing drastic shifted domains, we can see that CUA is hard to improve the accuracy by retraining but EdgeTA notably recovers the accuracy because of the distribution-adaptive proxy model.
    </p>
    <img  style="width: 50%; margin: 0 auto;"  src="clip-baseline.png" /> -->


    <h2 id="sam">5. Running Example 3: Supporting a user-specified FM SAM (Segment Anything)</h2>
    
    <h3>5.1 Settings</h3>
    <p><b>Models.</b> We use the SOTA segmentation foundation model SAM. In this example, we support SAM using our designed standard FM API to explain how to connect a user-specified FM to the EdgeTA.</p>
    <p><b>Datasets.</b> We use datasets <a href="https://link.springer.com/chapter/10.1007/978-3-319-46475-6_7">GTA5</a>
      and <a href="https://supervise.ly">SuperviselyPerson</a> as the source domain, and datasets <a
        href="https://openaccess.thecvf.com/content_cvpr_2016/html/Cordts_The_Cityscapes_Dataset_CVPR_2016_paper.html">Cityscapes</a>
      and <a href="https://ieeexplore.ieee.org/abstract/document/6976983">BaiduPerson</a> as the target domain.</p>

    <h3>5.2 Offline Elastic Proxy Construction</h3>
    <p>Run the following command sequentially to pre-train the knowledge base and index:</p>
    <code>
      python new_impl/cv/sam/seg.py<br>
      python new_impl/cv/sam/seg_md_wo_fbs.py<br>
      python new_impl/cv/sam/seg_md_index.py<br>
      </code>
    <p>Note that the file path of the model checkpoint in last two files should be modified manually.</p>
    <p>Run the following command to open TensorBoard and watch the metrics (e.g. losses and accuracy) during the
      training process:</p>
    <code>
        tensorboard --logdir &lt;the file path of tensorboard logs outputed in the terminal&gt;
      </code>
    <p>Here are three TensorBoard screenshots when three commands above are running:</p>
    <img src="sam-index.png">

    <h3>5.3 Online Evolving Input Data Adaptation</h3>
    <p>Run the following command to evaluate EdgeTA over evolving data:</p>
    <code>
      python new_impl/cv/seg/seg_online.py
    </code>
    <p>You can also launch TensorBoard to watch the retraining accuracy and time during the retraining process. Here is
      a screenshot:</p>
    <img src="sam-online.png">
      
    <!-- <h3>5.4 Compared with baseline</h3>
    <p>Compared to the baseline adaptation method CUA runs alone (colored by blue), EdgeTA (colored by red) improves its accuracy by 8%. When facing drastic shifted domains, we can see that CUA is hard to improve the accuracy by retraining but EdgeTA notably recovers the accuracy because of the distribution-adaptive proxy model.
    </p>
    <img style="width: 50%; margin: 0 auto;"   src="sam-baseline.png" /> -->


    <h2 id="glip">6. Running Example 4: Supporting a user-specified FM GLIP</h2>
      <h3>6.1 Settings</h3>
      
      <p><b>Models</b>. GLIP is a language image pretrained model used to learn object level, language aware, and semantically rich visual representations. GLIP combines object detection and phase grounding for pre training, enabling object detection of images based on prompts. In this example, we support GLIP using our designed standard FM API to explain how to connect a user-specified FM to the EdgeTA.</p>
      <p>Because there is no GLIP model code in the transformers library, you need to download the code, the weight and the config of the GLIP model from <a href="https://huggingface.co/harold/GLIP/tree/main">github</a>. Then you should place them under the path "new_impl/cv/glip/object_detection/pretrained_ model" and setup the code. In addition, you should also modify the code about GLIP to make the GLIP model (GeneralizedVLRCNN) outputs the token_logits and the dot_product_logits when it's in eval mode.</p>
      <p><b>Datasets</b>. In the example, we will use datasets <a href="https://cocodataset.org/">COCO2017</a> as the source domain dataset, and <a href="https://openaccess.thecvf.com/content_cvpr_2016/html/Cordts_The_Cityscapes_Dataset_CVPR_2016_paper.html">Cityscapes</a> and <a href="https://link.springer.com/chapter/10.1007/978-3-319-46475-6_7">GTA5</a> as the target domain datasets.</p>
      <h3>6.2 Offline Elastic Proxy Construction</h3>
      <p>Run the following command sequentially to pre-train the knowledge base and index:</p>
      <code>
        python new_impl/cv/glip/object_detection/det_lora.py<br>
		python new_impl/cv/glip/object_detection/det_md_wo_fbs.py<br>
		python new_impl/cv/glip/object_detection/det_md_w_fbs_index.py<br>
      </code>
      <p>Note that the file path of the model checkpoint in last two files should be modified manually.</p>
      <p>Run the following command to open TensorBoard and watch the metrics (e.g. losses and accuracy) during the training process:</p>
      <code>
        tensorboard --logdir &lt;the file path of tensorboard logs outputed in the terminal&gt;
      </code>
      <p>Here are three TensorBoard screenshots when three commands above are running:</p>
      <img  style="width: 80%; margin: 0 auto;"   src="det_lora_map50_0.5432.png">
      <img  style="width: 80%; margin: 0 auto;"   src="det_md_wo_fbs_map50_0.3926.png">
      <img  style="width: 100%; margin: 0 auto; margin-bottom: 10px;"   src="det_md_w_fbs_index_map50_0.4167.png">
      <h3>6.3 Online Evolving Input Data Adaptation</h3>
      <p>Run the following command to evaluate EdgeTA over evolving data:</p>
      <code>
        python new_impl/cv/glip/object_detection/det_online.py
      </code>
      <p>You can launch TensorBoard to watch the retraining mAP@50 score and time during the retraining process. Here is a screenshot:</p>
      <img src="det_online.png">
    
    <h2 id="gpt-neo">7. Running Example 5: Supporting GPT-Neo</h2>
      <h3>7.1 Settings</h3>
      <p><b>Models</b></p>
      <p>GPT-Neo is an open-source text AI model launched by German company Eleuther Artificial Intelligence in late March 2021 to compensate for the lack of open-source GPT-3 models. In this example, we support GPT-Neo using our designed standard FM API to explain how to connect a user-specified FM to the EdgeTA.</p>
      <p><b>Datasets</b></p>
      <p>In the example, we will use datasets <a href="https://huggingface.co/datasets/HuggingFaceH4/no_robots?row=0">No_robots</a> as the source domain dataset. <a href="https://huggingface.co/datasets/AdaptLLM/medicine-tasks?row=0">Medicine-tasks</a> and <a href="https://huggingface.co/datasets/AdaptLLM/law-tasks">law-tasks</a> as the target domain datasets. They are all conversational datasets.</p>
      <h3>7.2 Offline Elastic Proxy Construction</h3>
      <p>Run the following command sequentially to pre-train the knowledge base and index:</p>
      <code>
        python new_impl/nlp/gpt-neo/text_generation/gen_lora.py<br>
		python new_impl/nlp/gpt-neo/text_generation/gen_md_wo_fbs.py<br>
		python new_impl/nlp/gpt-neo/text_generation/gen_md_w_fbs_index.py<br>
      </code>
      <p>Note that the file path of the model checkpoint in last two files should be modified manually.</p>
      <p>Run the following command to open TensorBoard and watch the metrics (e.g. losses and accuracy) during the training process:</p>
      <code>
        tensorboard --logdir &lt;the file path of tensorboard logs outputed in the terminal&gt;
      </code>
      <p>Here are three TensorBoard screenshots when three commands above are running:</p>
      <img src="gen_lora.png">
      <img src="gen_md_wo_fbs.png">
      <img src="gen_md_w_fbs_index.png">
      <h3>7.3 Online Evolving Input Data Adaptation</h3>
      <p>Run the following command to evaluate EdgeTA over evolving data:</p>
      <code>
        python new_impl/nlp/gpt-neo/text_generation/gen_online.py
      </code>
      <p>You can launch TensorBoard to watch the retraining mAP50 score and time during the retraining process. Here is a screenshot:</p>
      <img src="gen_online.png">
    <h2 id="roberta">8. Running Example 6: Supporting Roberta</h2>
      <h3>8.1 Settings</h3>
      <p><b>Models</b></p>
      <p>We used the base version of the Roberta model (an improved version of Bert) to demonstrate how to connect a Hugging Face FM to the EdgeTA.</p>
      <p><b>Datasets</b></p>
      <p>We use the dataset named HL5Domains which includes five datasets called ApexAD2600Progressive, CanonG3, CreativeLabsNomadJukeboxZenXtra40GB, NikonCoolpix4300 and Nokia6610. Among them, ApexAD2600Progressive, CanonG3 and CreativeLabsNomadJukeboxZenXtra40GB are used as the source domains, and the datasets NikonCoolpix4300 and Nokia6610 are used as the target domains. They are all from amazon.com.</p>
      <h3>8.2 Offline Elastic Proxy Construction</h3>
      <p>Run the following command sequentially to pre-train the knowledge base and index:</p>
      <code>
        python new_impl/nlp/roberta/sentiment-classification/cls_lora.py<br>
		python new_impl/nlp/roberta/sentiment-classification/cls_md_wo_fbs.py<br>
		python new_impl/nlp/roberta/sentiment-classification/cls_md_w_fbs_index.py<br>
      </code>
      <p>Note that the file path of the model checkpoint in last two files should be modified manually.</p>
      <p>Run the following command to open TensorBoard and watch the metrics (e.g. losses and accuracy) during the training process:</p>
      <code>
        tensorboard --logdir &lt;the file path of tensorboard logs outputed in the terminal&gt;
      </code>
      <p>Here are three TensorBoard screenshots when three commands above are running:</p>
      <img src="cls_lora.png">
      <img src="cls_md_wo_fbs.png">
      <img src="cls_md_w_fbs_index.png">
      <h3>8.3 Online Evolving Input Data Adaptation</h3>
      <p>Run the following command to evaluate EdgeTA over evolving data:</p>
      <code>
        python new_impl/nlp/roberta/sentiment-classification/cls_online.py
      </code>
      <p>You can launch TensorBoard to watch the retraining mAP50 score and time during the retraining process. Here is a screenshot:</p>
      <img src="cls_online.png">>

    <h2 id="implementation">9. Implementation (Development API Documentation)</h2>
    <p>EdgeTA is implemented in Python with 8k LOCs and it is currently targeted for transformers running on commodity
      edge devices and Linux environment. Its scaling and retraining of transformers are implemented based on timm 0.9.1
      and transformers 4.30.2. Its scheduler is built upon the optimization problem solver in scikit-opt 0.6.6 and
      resource management systems in Docker 10.03.6 and K3s 1.18.12.</p>
    <p>Figure below illustrates the three steps of running a FM using EdgeTA. To facilitate the integration of a model,
      EdgeTA decouples the integration of a model (step 1) from its offline construction of knowledge base and neuron
      index (step 2) and online scaling and retraining of FM (step 3). This system design allows users only need to
      implement the FM API at step 1 to integrate a model. Specifically, EdgeTA supports two types of models.</p>
    <img src="Implementation.png">
    <p><b>Hugging Face FMs.</b> We implement EdgeTA to support FM APIs in the Hugging Face AI community. Using the
      AutoModel as example, EdgeTA calls function AutoModel.from_pretrained() to initialize a FM and calls function
      AutoModel.forward() to perform a forward operation. EdgeTA allows users to run a Hugging Face's FM using 30 about
      LOCs.</p>
    <p><b>User-specified FMs.</b> EdgeTA designs a standard FM API (colored by green in the figure) to unify user
      specified FM implementations. This API mainly defines: (i) how the FM performs an inference using the given
      sample; (ii) how the accuracy of the FM is measured using the given test dataset; (iii) how to manipulate (e.g.
      compress/update/remove) a specific layer in the FM. For each FM, this API can be implemented using about 200 LOCs.
    </p>

    <h3 id="hugging-face-model">9.1 Supporting a Hugging Face FM</h3>

    <p>Supporting a Hugging Face Model is a simplification of supporting a user-specified model, because Hugging Face
      FMs have many consistent implementation style so repetitive implementation work can be saved. The user can only
      implement the following several simple functions:</p>

    <ul>

      <li>
        <code class="inline-code">def get_feature_hook(self)</code>
        <ul>
          <li>Get the PyTorch hook attached before the layer that extracts the key features.</li>
          <li>
            <b>Output:</b> A PyTorch hook.
          </li>
        </ul>
      </li>

      <li>
        <code class="inline-code">def get_task_head_params(self)</code>
        <ul>
          <li>Get the model parameters of the task head of the FM.</li>
          <li>
            <b>Output:</b> The model parameters of the task head of the FM.
          </li>
        </ul>
      </li>

      <li>
        <code class="inline-code">def get_qkv_proj_ff1_ff2_layer_names(self)</code>
        <ul>
          <li>Get a list of names, and each element in the list is a list that contains the names of Q/K/V layer, QKV
            projection layer, Feed Forward Layer 1, Feed Forward Layer 2. For example, for Hugging Face's BERT, this
            function should return [['bert.encoder.layer.0.attention.self.query',
            'bert.encoder.layer.0.attention.self.key', 'bert.encoder.layer.0.attention.self.value',
            'bert.encoder.layer.0.attention.output.dense', 'bert.encoder.layer.0.intermediate.dense',
            'bert.encoder.layer.0.output.dense'], ['bert.encoder.layer.1.attention.self.query',
            'bert.encoder.layer.1.attention.self.key', 'bert.encoder.layer.1.attention.self.value',
            'bert.encoder.layer.1.attention.output.dense', 'bert.encoder.layer.1.intermediate.dense',
            'bert.encoder.layer.1.output.dense'], ...]</li>
          <li>
            <b>Output:</b> A list of names.
          </li>
        </ul>
      </li>

      <li>
        <code class="inline-code">def get_accuracy(self, test_loader)</code>
        <ul>
          <li>Measure the accuracy of the FM using the given test data loader.</li>
          <li>
            <b>Inputs:</b>
            <ul>
              <li>test_loader: A given test dataloader.</li>
            </ul>
          </li>
          <li>
            <b>Output:</b> The measured accuracy.
          </li>
        </ul>
      </li>

    </ul>

    <h3 id="user-specified-model">9.2 Supporting a user-specified FM</h3>
    <p>The user should implement the following functions in the standard FM API.</p>
    <ul>

      <li>
        <code class="inline-code">def forward(self, x, *args, **kwargs)</code>
        <ul>
          <li>Let the FM perform a forward inference operation using the given sample x.</li>
          <li>
            <b>Inputs:</b>
            <ul>
              <li>x: A given sample.</li>
              <li>*args and **kwargs: Possible additional arguments used in the inference.</li>
            </ul>
          </li>
          <li>
            <b>Output:</b> The inference results.
          </li>
        </ul>
      </li>

      <li>
        <code class="inline-code">def get_accuracy(self, test_loader)</code>
        <ul>
          <li>Measure the accuracy of the FM using the given test data loader.</li>
          <li>
            <b>Inputs:</b>
            <ul>
              <li>test_loader: A given test dataloader.</li>
            </ul>
          </li>
          <li>
            <b>Output:</b> The measured accuracy.
          </li>
        </ul>
      </li>

      <li>
        <code class="inline-code">def forward_to_get_task_loss(self, x, y *args, **kwargs)</code>
        <ul>
          <li>Let the FM perform a forward operation using the given sample x, and calculate and return the task loss.
          </li>
          <li>
            <b>Inputs:</b>
            <ul>
              <li>x: A given sample.</li>
              <li>y: The corresponding label of x.</li>
              <li>*args and **kwargs: Possible additional arguments used in the inference.</li>
            </ul>
          </li>
          <li>
            <b>Output:</b> The calculated task loss.
          </li>
        </ul>
      </li>

      <li>
        <code class="inline-code">def get_feature_hook(self)</code>
        <ul>
          <li>Get the PyTorch hook attached before the layer that extracts the key features.</li>
          <li>
            <b>Output:</b> A PyTorch hook.
          </li>
        </ul>
      </li>

      <li>
        <code class="inline-code">def get_task_head_params(self)</code>
        <ul>
          <li>Get the model parameters of the task head of the FM.</li>
          <li>
            <b>Output:</b> The model parameters of the task head of the FM.
          </li>
        </ul>
      </li>

      <li>
        <code class="inline-code">def add_lora_ab_to_fm(self, ab_r: int, samples: torch.Tensor)</code>
        <ul>
          <li>Add a LoRA matrix to each attention layer in the FM. The user should check if the FM's output is changed
            before and after the LoRA is added into the FM.</li>
          <li>
            <b>Inputs:</b>
            <ul>
              <li>ab_r: the factor r in LoRA.</li>
              <li>samples: A given sample for sanity check.</li>
            </ul>
          </li>
          <li>
            <b>Output:</b> A PyTorch hook.
          </li>
        </ul>
      </li>

      <li>
        <code class="inline-code">def fuse_lora_and_recover_net_structure(self, samples: torch.Tensor)</code>
        <ul>
          <li>Fuse the added LoRA matrix into the corresponding attention layer in the FM, and recover the network
            structure to the original. This is invoked after the LoRA fine tuning.</li>
          <li>
            <b>Inputs:</b>
            <ul>
              <li>samples: A given sample for sanity check.</li>
            </ul>
          </li>
          <li>
            <b>Output:</b> A PyTorch hook.
          </li>
        </ul>
      </li>

      <li>
        <code class="inline-code">def is_q_or_k_v_linear(self, layer_name: nn.Module)</code>
        <ul>
          <li>Check if the given layer is a Q/K/V Linear in the FM.</li>
          <li>
            <b>Inputs:</b>
            <ul>
              <li>layer_name: The name of a layer in the FM.</li>
            </ul>
          </li>
          <li>
            <b>Output:</b> Return True if the given layer is a Q/K/V Linear in the FM.
          </li>
        </ul>
      </li>

      <li>
        <code class="inline-code">def is_feed_forward(self, layer_name: nn.Module)</code>
        <ul>
          <li>Check if the given layer is a feed forward layer in the FM.</li>
          <li>
            <b>Inputs:</b>
            <ul>
              <li>layer_name: The name of a layer in the FM.</li>
            </ul>
          </li>
          <li>
            <b>Output:</b> Return True if the given layer is a feed forward layer in the FM.
          </li>
        </ul>
      </li>

      <li>
        <code
          class="inline-code">def prune_an_attention_layer(self, attention_layer_name, sparsity: float, samples: torch.Tensor)</code>
        <ul>
          <li>Pruning an attention layer.</li>
          <li>
            <b>Inputs:</b>
            <ul>
              <li>attention_layer_name: The name of target attention layer.</li>
              <li>sparsity: The pruning strength.</li>
              <li>samples: A given sample.</li>
            </ul>
          </li>
        </ul>
      </li>

      <li>
        <code
          class="inline-code">def prune_an_feed_forward_layer(self, feed_forward_layer_name, sparsity: float, samples: torch.Tensor)</code>
        <ul>
          <li>Pruning an feed forward layer.</li>
          <li>
            <b>Inputs:</b>
            <ul>
              <li>feed_forward_layer_name: The name of target feed forward layer.</li>
              <li>sparsity: The pruning strength.</li>
              <li>samples: A given sample.</li>
            </ul>
          </li>
        </ul>
      </li>

    </ul>

    <h2 id="evaluation">10. Experimental evaluation in ICDE 2024 submission</h2>
    <h3 id="101">10.1 Basic settings</h3>
    <p><b>Testbeds</b>. We evaluate EdgeTA on four heterogeneous edge devices: NVIDIA Jetson TX2 (8GB memory), NVIDIA Xavier NX (16GB memory), NVIDIA AGX Xavier (32GB memory), and NVIDIA AGX Orin (32GB memory).</p>
    <p><b>Baselines</b>. We compare EdgeTA with 13 adaptation methods, including 5 supervised continual learning methods and 8 unsupervised domain adaptation methods.</p>
    <p><b>Workloads</b>. We evaluate EdgeTA on three representative FMs: ViT-B/16 (CV), BERT_base (NLP), and ViLT (multimodal). ViT-B/16 is added three different application heads respectively to perform image classification, object detection, and semantic segmentation application. BERT_base is added two different application heads respectively to perform sentence classification and pos-of-tagging classification application. ViLT performs visual question answering application. Finally, GPT-Neo is evaluated in the discussion of PEFT techniques. We evaluate EdgeTA on 11 different datasets: GTA5, SuperviselyPerson, MSCOCO2017, Cityscapes, BaiduPerson, HL5Domains, Liu3Domains, Ding9Domains, SemEval14, 20Newsgroups, and VQAv2. More details refer to the table below.</p>
    <img style="width: 80%; margin: 0 auto; margin-bottom: 10px;" src="workloads.png">
    

    <h3 id="102">10.2 Additional details</h3>
    <p><b>Online adaptation</b>. For evolving domain shifts, EdgeTA uses the naive feature alignment (the most classical method for unsupervised domain adaptation) to retrain the proxy model. For evolving new tasks, EdgeTA uses the normal supervised learning to retrain the proxy model.</p>
    <p><b>Applicability of baseline adaptation methods.</b> Some baseline adaptation methods are inapplicable in some applications, so Figure 6, Table II, and Table III do not report their metrics. Specifically:</p>
    <ul>
      <li>SHOT and ConDA rely on pseudo labels, and their label generation algorithm can only generate the pseudo label with one dimension. However, the label in semantic segmentation application has three dimensions, the label in object detection application has bounding box information, the label in pos-of-tagging application has two dimensions, so SHOT and ConDA are inapplicable for these applications.</li>
      <li>BUFR calculates the KL divergence between the model's outputs in the source domain and the target domain, and the KL divergence only applies in the classification output. So BUFR is also inapplicable in semantic segmentation, object detection, and pos-of-tagging application.</li>
      <li>ACE uses a image generator so it is inapplicable in text classification, pos-of-tagging and visual question answering application.</li>
    </ul>

    <h3 id="103">10.3 Additional experiment results (applying EdgeTA in SOTA FMs: CLIP, SAM, GLIP, GPT-Neo and Roberta)</h3>
    <p>Besides the evaluated FMs in the submitted paper, there are some SOTA FMs. Some are SOTA CV FMs, such as CLIP for image classification, SAM for semantic segmentation, and GLIP for object detection, and the others are SOTA NLP FMs, such as GPT-Neo for text generation, and Roberta for sentiment classification. These SOTA FMs and the currently tested FMs have similar network architectures and sizes, that is, CLIP, SAM, and GLIP comprise a ViT and a GPT-2 (similar to BERT), as well as GPT Neo and Roberts including Transformer structure (the main structure used by BERT). We therefore expect EdgeTA still works well for these FMs.</p>
    <p>To prove that, we ran a new set of experiments on CLIP, SAM, GLIP, GPT-Neo and Roberta to compare EdgeTA and CUA on NVIDIA Xavier NX. The results are demonstrated below. EdgeTA improves the accuracy by 12.71%, 7.36%, 6.41%, 10.81% and 10.38% for five FMs, respectively, which proves the applicability of EdgeTA to various FMs.</p>
    <img style="width: 50%; margin: 0 auto; margin-bottom: 10px;" src="new_results.png">
  </div>
</body>

</html>