Model Card for Model ID

This repository contains the embedding model used to embed artifact for traceability link prediction.

Model Details

used in the siamese models

Model Description

This embedding model is the encoder portion of the siamese model used in the paper cited. This model utilized a relational classifier to create similarity scores between text pairs resembling a cross-encoder and consistently ranked almost as high as the top performer.

  • Developed by: Jinfeng Lin (translated by Alberto Rodriguez)
  • Model type: Roberta encoder trained on automatic traceability link prediction.
  • Language(s) (NLP): en
  • License: mit
  • Finetuned from model [optional]: See Cited Ppaer.

Model Sources [optional]

Uses

Used to embed software artifacts intended to be compared via cosine similarity.

Direct Use

Software traceability link prediction, Retrieval Augmented Generation, Artifact Clustering.

Downstream Use [optional]

The intended vision for this model within a traceability link prediction pipeline, used to retrieve software artifacts for an LLM prompt, and for clustering.

Out-of-Scope Use

This model could be used for a good set of starting weights for requirements classification.

Bias, Risks, and Limitations

This data uses open source git data which can be inaccurate and lead to unexpected results.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

parent_artifacts = [
"Display Artifacts",
]
texts = [
    "Display Artifacts", // parent artifact
    "A table view should be provided to display all project artifacts.", // child 1
    "The system should be able to generate documentation for a set of artifacts." // child 2
]
embeddings = model.encode(texts, convert_to_tensor=False)

parent_embedding = embeddings[0:1]
children_embeddings = embeddings[1:]

# Compute cosine similarity
sim_matrix = cosine_similarity(parent_embedding, children_embeddings)

Training, Evaluation, and Results Details

Please see cited paper for more information on training method, evaluation, and resuts.

Downloads last month
12
Safetensors
Model size
125M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.