Does not work at all, i tried to calculate cola

#2
by zokica - opened

I Tried to calculate cola as described here (https://huggingface.co/sileod/deberta-v3-base-tasksource-adapters), but it does not work at all

I get error:

reshaped_logits = logits.view(-1, num_choices)
RuntimeError: shape '[-1, 6]' is invalid for input of size 4

#############################################################
##############################################################
from tasknet import Adapter
from transformers import AutoModelForMultipleChoice,AutoTokenizer

model_name="sileod/deberta-v3-base-tasksource-nli"
tokenizer3 = AutoTokenizer.from_pretrained("microsoft/deberta-v3-base")
model3 = AutoModelForMultipleChoice.from_pretrained(model_name,ignore_mismatched_sizes=True, cache_dir="/root/Desktop/models/", low_cpu_mem_usage=True)
adapter = Adapter.from_pretrained(model_name.replace('nli','adapters'))
model_for_rlhf = adapter.adapt_model_to_task(model3, 'glue/cola') #glue/cola #hh-rlhf

if 1==1:
def cola(sentences1):
import torch
import time
beg = time.time()
inputs = tokenizer3(sentences1, return_tensors="pt", padding=True, truncation=True, max_length=40).to("cpu")
with torch.no_grad():
outputs = model_for_rlhf(**inputs)

    the_cola_scores = []
    print("outputs.logits",outputs.logits)
    for aout in outputs.logits:
        cola_prediction = torch.nn.functional.softmax(aout)[1].item()
        the_cola_scores.append(round(cola_prediction,2))
    
    if 1==1:
        try: del cola_prediction
        except:pass    
    
    return the_cola_scores

import time
timea = time.time
sentences1 = ["I likes apples","I love apples."]
cola = cola(sentences1)
print(cola,timea - time.time )
####################################################
####################################################

Hi!
For cola, you have to use auto model for sequence classification.

Hi, thanks, it work when I use model3 = AutoModelForSequenceClassification.from_pretrained(

However results are pretty bad, not sure why:
sentences1 = ["I was there because I like apples.","I are there because i loves apples."]
[0.59, 0.65]

So it comes out as if second sentence is better than the first one. I checked other models and probability for second sentence is low around 0.5 while the first should be around 0.99.
Logits are (first and second sentence) outputs.logits tensor([[ 0.0077, 0.3910],
[-0.1509, 0.4893]])

It is possible that cola was undertrained. There are 500 tasks and cola is relatively small (8k samples).
I am currently running much longer tasksource pretraining. But it could also due failure in the tasknet code to load the adapter. I'll look into it monday and I'll get back to you

PS: I advise you to use pipeline to do what you do.

# !pip install tasknet
from tasknet import Adapter
from transformers import AutoModelForMultipleChoice, AutoModelForSequenceClassification, TextClassificationPipeline, AutoTokenizer

model_name="sileod/deberta-v3-base-tasksource-nli"
model = AutoModelForSequenceClassification.from_pretrained(model_name,ignore_mismatched_sizes=True)
adapter = Adapter.from_pretrained(model_name.replace('nli','adapters'))
model = adapter.adapt_model_to_task(model, 'glue/cola')

pipe = TextClassificationPipeline(
    model=model,
    tokenizer=AutoTokenizer.from_pretrained(model_name))
pipe(["We yelled ourselves.","We yelled ourselves hoarse."])

Ok, thanks very much for the fast answer. I think this model should be state of the art for Cola, all models are trained on just this dataset so it should be sufficient data.

What is strange is that I get the same result with adapter model and model without using the adapt_model_to_task. However, when replaced 'glue/cola' with other task it gave different output.

I tried the pipeline, it yields the same result.

I think that FINE-TUNING this model on cola will lead to state of the art (for base-size)
You can see that it is ranked first on cola https://ibm.github.io/model-recycling/microsoft_deberta-v3-base_table

At the current state, it should have good results but not better than a model that is fine-tuned, deberta-v3-base-tasksource is currently trained for less than one epoch, as multi-task training is way longer.

Yes, deberta is beter even than roberta

Hi.
There was a problem with the adapters.
The poolers were not shared during the multi-task training. They should have been included in the adapter. I am re-running the training from scratch (with more tasks as well) with a single pooler.
There will be a new version in a week, and pipeline loading will be much cleaner.

from tasknet import load_pipeline
pipeline = load_pipeline(model_name, task_name)
print(pipeline.model, pipeline('example to test'))

Awesome, thanks.

I'm looking forward to your update as well :)

You did the update?

(It is done)

sileod changed discussion status to closed

Sign up or log in to comment