Unable to produce the same Eval and Test Results

#2
by clive777 - opened

Dataset used : https://huggingface.co/datasets/conll2003

Evaluation Metric : load_metric("seqeval")

**Results Obtained : **
{'eval_loss': 2.3160810470581055,
'eval_precision': 0.6153949670300094,
'eval_recall': 0.7696061932009425,
'eval_f1': 0.6839153518283106,
'eval_accuracy': 0.9621769588508859,
'eval_runtime': 556.8392,
'eval_samples_per_second': 5.838,
'eval_steps_per_second': 0.731}

Ner label alignment code : Code from : https://huggingface.co/course/chapter7/2

def align_labels_with_tokens(labels, word_ids):
new_labels = []
current_word = None
for word_id in word_ids:
if word_id != current_word:
# Start of a new word!
current_word = word_id
label = -100 if word_id is None else labels[word_id]
new_labels.append(label)
elif word_id is None:
# Special token
new_labels.append(-100)
else:
# Same word as previous token
label = labels[word_id]
# If the label is B-XXX we change it to I-XXX
if label % 2 == 1:
label += 1
new_labels.append(label)

return new_labels

Compute Metric

def compute_metrics(eval_preds):
logits, labels = eval_preds
predictions = np.argmax(logits, axis=-1)

true_labels = [[label_names[l] for l in label if l != -100] for label in labels]
true_predictions = [
    [id2labels[str(p)] for (p, l) in zip(prediction, label) if l != -100]
    for prediction, label in zip(predictions, labels)
]
all_metrics = metric.compute(predictions=true_predictions, references=true_labels)
return {
    "precision": all_metrics["overall_precision"],
    "recall": all_metrics["overall_recall"],
    "f1": all_metrics["overall_f1"],
    "accuracy": all_metrics["overall_accuracy"],
}

Note : Using id2labels from ur models. Please comment on this

Was there any update on this? Did you manage to reproduce the results?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment