|
--- |
|
base_model: Snowflake/snowflake-arctic-embed-m |
|
datasets: [] |
|
language: |
|
- en |
|
library_name: sentence-transformers |
|
license: apache-2.0 |
|
metrics: |
|
- cosine_accuracy@1 |
|
- cosine_accuracy@3 |
|
- cosine_accuracy@5 |
|
- cosine_accuracy@10 |
|
- cosine_precision@1 |
|
- cosine_precision@3 |
|
- cosine_precision@5 |
|
- cosine_precision@10 |
|
- cosine_recall@1 |
|
- cosine_recall@3 |
|
- cosine_recall@5 |
|
- cosine_recall@10 |
|
- cosine_ndcg@10 |
|
- cosine_mrr@10 |
|
- cosine_map@100 |
|
pipeline_tag: sentence-similarity |
|
tags: |
|
- sentence-transformers |
|
- sentence-similarity |
|
- feature-extraction |
|
- generated_from_trainer |
|
- dataset_size:1490 |
|
- loss:MatryoshkaLoss |
|
- loss:MultipleNegativesRankingLoss |
|
widget: |
|
- source_sentence: What is the error message related to the blob-container for the |
|
azure-generic subscription in ZenML? |
|
sentences: |
|
- '─────────────────────────────────────────────────┨┃ 🇦 azure-generic │ ZenML |
|
Subscription ┃ |
|
|
|
|
|
┠───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┨ |
|
|
|
|
|
┃ 📦 blob-container │ 💥 error: connector authorization failure: the ''access-token'' |
|
authentication method is not supported for blob storage resources ┃ |
|
|
|
|
|
┠───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┨ |
|
|
|
|
|
┃ 🌀 kubernetes-cluster │ demo-zenml-demos/demo-zenml-terraform-cluster ┃ |
|
|
|
|
|
┠───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┨ |
|
|
|
|
|
┃ 🐳 docker-registry │ demozenmlcontainerregistry.azurecr.io ┃ |
|
|
|
|
|
┗━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ |
|
|
|
|
|
zenml service-connector describe azure-session-token |
|
|
|
|
|
Example Command Output |
|
|
|
|
|
Service connector ''azure-session-token'' of type ''azure'' with id ''94d64103-9902-4aa5-8ce4-877061af89af'' |
|
is owned by user ''default'' and is ''private''. |
|
|
|
|
|
''azure-session-token'' azure Service Connector Details |
|
|
|
|
|
┏━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ |
|
|
|
|
|
┃ PROPERTY │ VALUE ┃ |
|
|
|
|
|
┠──────────────────┼────────────────────────────────────────────────────────────────────────────────┨ |
|
|
|
|
|
┃ ID │ 94d64103-9902-4aa5-8ce4-877061af89af ┃' |
|
- '🪆Use the Model Control Plane |
|
|
|
|
|
A Model is simply an entity that groups pipelines, artifacts, metadata, and other |
|
crucial business data into a unified entity. A ZenML Model is a concept that more |
|
broadly encapsulates your ML products business logic. You may even think of a |
|
ZenML Model as a "project" or a "workspace" |
|
|
|
|
|
Please note that one of the most common artifacts that is associated with a Model |
|
in ZenML is the so-called technical model, which is the actually model file/files |
|
that holds the weight and parameters of a machine learning training result. However, |
|
this is not the only artifact that is relevant; artifacts such as the training |
|
data and the predictions this model produces in production are also linked inside |
|
a ZenML Model. |
|
|
|
|
|
Models are first-class citizens in ZenML and as such viewing and using them is |
|
unified and centralized in the ZenML API, client as well as on the ZenML Pro dashboard. |
|
|
|
|
|
A Model captures lineage information and more. Within a Model, different Model |
|
versions can be staged. For example, you can rely on your predictions at a specific |
|
stage, like Production, and decide whether the Model version should be promoted |
|
based on your business rules during training. Plus, accessing data from other |
|
Models and their versions is just as simple. |
|
|
|
|
|
The Model Control Plane is how you manage your models through this unified interface. |
|
It allows you to combine the logic of your pipelines, artifacts and crucial business |
|
data along with the actual ''technical model''. |
|
|
|
|
|
To see an end-to-end example, please refer to the starter guide. |
|
|
|
|
|
PreviousDisabling visualizations |
|
|
|
|
|
NextRegistering a Model |
|
|
|
|
|
Last updated 12 days ago' |
|
- 'turns: |
|
|
|
|
|
The Docker image repo digest or name. |
|
|
|
|
|
"""This is a slimmed-down version of the base implementation which aims to highlight |
|
the abstraction layer. In order to see the full implementation and get the complete |
|
docstrings, please check the source code on GitHub . |
|
|
|
|
|
Build your own custom image builder |
|
|
|
|
|
If you want to create your own custom flavor for an image builder, you can follow |
|
the following steps: |
|
|
|
|
|
Create a class that inherits from the BaseImageBuilder class and implement the |
|
abstract build method. This method should use the given build context and build |
|
a Docker image with it. If additionally a container registry is passed to the |
|
build method, the image builder is also responsible for pushing the image there. |
|
|
|
|
|
If you need to provide any configuration, create a class that inherits from the |
|
BaseImageBuilderConfig class and adds your configuration parameters. |
|
|
|
|
|
Bring both the implementation and the configuration together by inheriting from |
|
the BaseImageBuilderFlavor class. Make sure that you give a name to the flavor |
|
through its abstract property. |
|
|
|
|
|
Once you are done with the implementation, you can register it through the CLI. |
|
Please ensure you point to the flavor class via dot notation: |
|
|
|
|
|
zenml image-builder flavor register <path.to.MyImageBuilderFlavor> |
|
|
|
|
|
For example, if your flavor class MyImageBuilderFlavor is defined in flavors/my_flavor.py, |
|
you''d register it by doing: |
|
|
|
|
|
zenml image-builder flavor register flavors.my_flavor.MyImageBuilderFlavor |
|
|
|
|
|
ZenML resolves the flavor class by taking the path where you initialized zenml |
|
(via zenml init) as the starting point of resolution. Therefore, please ensure |
|
you follow the best practice of initializing zenml at the root of your repository. |
|
|
|
|
|
If ZenML does not find an initialized ZenML repository in any parent directory, |
|
it will default to the current working directory, but usually it''s better to |
|
not have to rely on this mechanism, and initialize zenml at the root. |
|
|
|
|
|
Afterward, you should see the new flavor in the list of available flavors:' |
|
- source_sentence: Where can I find more information on configuring the Spark step |
|
operator in ZenML? |
|
sentences: |
|
- 'upplied a custom value while creating the cluster.Run the following command. |
|
|
|
aws eks update-kubeconfig --name <NAME> --region <REGION> |
|
|
|
|
|
Get the name of the deployed cluster. |
|
|
|
|
|
zenml stack recipe output gke-cluster-name\ |
|
|
|
|
|
Figure out the region that the cluster is deployed to. By default, the region |
|
is set to europe-west1, which you should use in the next step if you haven''t |
|
supplied a custom value while creating the cluster.\ |
|
|
|
|
|
Figure out the project that the cluster is deployed to. You must have passed in |
|
a project ID while creating a GCP resource for the first time.\ |
|
|
|
|
|
Run the following command. |
|
|
|
gcloud container clusters get-credentials <NAME> --region <REGION> --project <PROJECT_ID> |
|
|
|
|
|
You may already have your kubectl client configured with your cluster. Check by |
|
running kubectl get nodes before proceeding. |
|
|
|
|
|
Get the name of the deployed cluster. |
|
|
|
|
|
zenml stack recipe output k3d-cluster-name\ |
|
|
|
|
|
Set the KUBECONFIG env variable to the kubeconfig file from the cluster. |
|
|
|
|
|
export KUBECONFIG=$(k3d kubeconfig get <NAME>)\ |
|
|
|
|
|
You can now use the kubectl client to talk to the cluster. |
|
|
|
|
|
Stack Recipe Deploy |
|
|
|
|
|
The steps for the stack recipe case should be the same as the ones listed above. |
|
The only difference that you need to take into account is the name of the outputs |
|
that contain your cluster name and the default regions. |
|
|
|
|
|
Each recipe might have its own values and here''s how you can ascertain those |
|
values. |
|
|
|
|
|
For the cluster name, go into the outputs.tf file in the root directory and search |
|
for the output that exposes the cluster name. |
|
|
|
|
|
For the region, check out the variables.tf or the locals.tf file for the default |
|
value assigned to it. |
|
|
|
|
|
PreviousTroubleshoot the deployed server |
|
|
|
|
|
NextCustom secret stores |
|
|
|
|
|
Last updated 10 months ago' |
|
- 'ettings to specify AzureML step operator settings.Difference between stack component |
|
settings at registration-time vs real-time |
|
|
|
|
|
For stack-component-specific settings, you might be wondering what the difference |
|
is between these and the configuration passed in while doing zenml stack-component |
|
register <NAME> --config1=configvalue --config2=configvalue, etc. The answer is |
|
that the configuration passed in at registration time is static and fixed throughout |
|
all pipeline runs, while the settings can change. |
|
|
|
|
|
A good example of this is the MLflow Experiment Tracker, where configuration which |
|
remains static such as the tracking_url is sent through at registration time, |
|
while runtime configuration such as the experiment_name (which might change every |
|
pipeline run) is sent through as runtime settings. |
|
|
|
|
|
Even though settings can be overridden at runtime, you can also specify default |
|
values for settings while configuring a stack component. For example, you could |
|
set a default value for the nested setting of your MLflow experiment tracker: |
|
zenml experiment-tracker register <NAME> --flavor=mlflow --nested=True |
|
|
|
|
|
This means that all pipelines that run using this experiment tracker use nested |
|
MLflow runs unless overridden by specifying settings for the pipeline at runtime. |
|
|
|
|
|
Using the right key for Stack-component-specific settings |
|
|
|
|
|
When specifying stack-component-specific settings, a key needs to be passed. This |
|
key should always correspond to the pattern: <COMPONENT_CATEGORY>.<COMPONENT_FLAVOR> |
|
|
|
|
|
For example, the SagemakerStepOperator supports passing in estimator_args. The |
|
way to specify this would be to use the key step_operator.sagemaker |
|
|
|
|
|
@step(step_operator="nameofstepoperator", settings= {"step_operator.sagemaker": |
|
{"estimator_args": {"instance_type": "m7g.medium"}}}) |
|
|
|
|
|
def my_step(): |
|
|
|
|
|
... |
|
|
|
|
|
# Using the class |
|
|
|
|
|
@step(step_operator="nameofstepoperator", settings= {"step_operator.sagemaker": |
|
SagemakerStepOperatorSettings(instance_type="m7g.medium")}) |
|
|
|
|
|
def my_step(): |
|
|
|
|
|
... |
|
|
|
|
|
or in YAML: |
|
|
|
|
|
steps: |
|
|
|
|
|
my_step:' |
|
- '_operator |
|
|
|
|
|
@step(step_operator=step_operator.name)def step_on_spark(...) -> ...: |
|
|
|
|
|
... |
|
|
|
|
|
Additional configuration |
|
|
|
|
|
For additional configuration of the Spark step operator, you can pass SparkStepOperatorSettings |
|
when defining or running your pipeline. Check out the SDK docs for a full list |
|
of available attributes and this docs page for more information on how to specify |
|
settings. |
|
|
|
|
|
PreviousAzureML |
|
|
|
|
|
NextDevelop a Custom Step Operator |
|
|
|
|
|
Last updated 19 days ago' |
|
- source_sentence: How can I register an Azure Service Connector for an ACR registry |
|
in ZenML using the CLI? |
|
sentences: |
|
- 'ure Container Registry to the remote ACR registry.To set up the Azure Container |
|
Registry to authenticate to Azure and access an ACR registry, it is recommended |
|
to leverage the many features provided by the Azure Service Connector such as |
|
auto-configuration, local login, best security practices regarding long-lived |
|
credentials and reusing the same credentials across multiple stack components. |
|
|
|
|
|
If you don''t already have an Azure Service Connector configured in your ZenML |
|
deployment, you can register one using the interactive CLI command. You have the |
|
option to configure an Azure Service Connector that can be used to access a ACR |
|
registry or even more than one type of Azure resource: |
|
|
|
|
|
zenml service-connector register --type azure -i |
|
|
|
|
|
A non-interactive CLI example that uses Azure Service Principal credentials to |
|
configure an Azure Service Connector targeting a single ACR registry is: |
|
|
|
|
|
zenml service-connector register <CONNECTOR_NAME> --type azure --auth-method service-principal |
|
--tenant_id=<AZURE_TENANT_ID> --client_id=<AZURE_CLIENT_ID> --client_secret=<AZURE_CLIENT_SECRET> |
|
--resource-type docker-registry --resource-id <REGISTRY_URI> |
|
|
|
|
|
Example Command Output |
|
|
|
|
|
$ zenml service-connector register azure-demo --type azure --auth-method service-principal |
|
--tenant_id=a79f3633-8f45-4a74-a42e-68871c17b7fb --client_id=8926254a-8c3f-430a-a2fd-bdab234d491e |
|
--client_secret=AzureSuperSecret --resource-type docker-registry --resource-id |
|
demozenmlcontainerregistry.azurecr.io |
|
|
|
|
|
⠸ Registering service connector ''azure-demo''... |
|
|
|
|
|
Successfully registered service connector `azure-demo` with access to the following |
|
resources: |
|
|
|
|
|
┏━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ |
|
|
|
|
|
┃ RESOURCE TYPE │ RESOURCE NAMES ┃ |
|
|
|
|
|
┠────────────────────┼───────────────────────────────────────┨ |
|
|
|
|
|
┃ 🐳 docker-registry │ demozenmlcontainerregistry.azurecr.io ┃ |
|
|
|
|
|
┗━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛' |
|
- 'Default Container Registry |
|
|
|
|
|
Storing container images locally. |
|
|
|
|
|
The Default container registry is a container registry flavor that comes built-in |
|
with ZenML and allows container registry URIs of any format. |
|
|
|
|
|
When to use it |
|
|
|
|
|
You should use the Default container registry if you want to use a local container |
|
registry or when using a remote container registry that is not covered by other |
|
container registry flavors. |
|
|
|
|
|
Local registry URI format |
|
|
|
|
|
To specify a URI for a local container registry, use the following format: |
|
|
|
|
|
localhost:<PORT> |
|
|
|
|
|
# Examples: |
|
|
|
|
|
localhost:5000 |
|
|
|
|
|
localhost:8000 |
|
|
|
|
|
localhost:9999 |
|
|
|
|
|
How to use it |
|
|
|
|
|
To use the Default container registry, we need: |
|
|
|
|
|
Docker installed and running. |
|
|
|
|
|
The registry URI. If you''re using a local container registry, check out |
|
|
|
|
|
the previous section on the URI format. |
|
|
|
|
|
We can then register the container registry and use it in our active stack: |
|
|
|
|
|
zenml container-registry register <NAME> \ |
|
|
|
|
|
--flavor=default \ |
|
|
|
|
|
--uri=<REGISTRY_URI> |
|
|
|
|
|
# Add the container registry to the active stack |
|
|
|
|
|
zenml stack update -c <NAME> |
|
|
|
|
|
You may also need to set up authentication required to log in to the container |
|
registry. |
|
|
|
|
|
Authentication Methods |
|
|
|
|
|
If you are using a private container registry, you will need to configure some |
|
form of authentication to login to the registry. If you''re looking for a quick |
|
way to get started locally, you can use the Local Authentication method. However, |
|
the recommended way to authenticate to a remote private container registry is |
|
through a Docker Service Connector. |
|
|
|
|
|
If your target private container registry comes from a cloud provider like AWS, |
|
GCP or Azure, you should use the container registry flavor targeted at that cloud |
|
provider. For example, if you''re using AWS, you should use the AWS Container |
|
Registry flavor. These cloud provider flavors also use specialized cloud provider |
|
Service Connectors to authenticate to the container registry.' |
|
- 'egister gcp-demo-multi --type gcp --auto-configureExample Command Output |
|
|
|
|
|
```text |
|
|
|
|
|
Successfully registered service connector `gcp-demo-multi` with access to the |
|
following resources: |
|
|
|
|
|
┏━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ |
|
|
|
|
|
┃ RESOURCE TYPE │ RESOURCE NAMES ┃ |
|
|
|
|
|
┠───────────────────────┼─────────────────────────────────────────────────┨ |
|
|
|
|
|
┃ 🔵 gcp-generic │ zenml-core ┃ |
|
|
|
|
|
┠───────────────────────┼─────────────────────────────────────────────────┨ |
|
|
|
|
|
┃ 📦 gcs-bucket │ gs://zenml-bucket-sl ┃ |
|
|
|
|
|
┃ │ gs://zenml-core.appspot.com ┃ |
|
|
|
|
|
┃ │ gs://zenml-core_cloudbuild ┃ |
|
|
|
|
|
┃ │ gs://zenml-datasets ┃ |
|
|
|
|
|
┠───────────────────────┼─────────────────────────────────────────────────┨ |
|
|
|
|
|
┃ 🌀 kubernetes-cluster │ zenml-test-cluster ┃ |
|
|
|
|
|
┠───────────────────────┼─────────────────────────────────────────────────┨ |
|
|
|
|
|
┃ 🐳 docker-registry │ gcr.io/zenml-core ┃ |
|
|
|
|
|
┗━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ |
|
|
|
|
|
``` |
|
|
|
|
|
**NOTE**: from this point forward, we don''t need the local GCP CLI credentials |
|
or the local GCP CLI at all. The steps that follow can be run on any machine regardless |
|
of whether it has been configured and authorized to access the GCP project. |
|
|
|
|
|
4. find out which GCS buckets, GCR registries, and GKE Kubernetes clusters we |
|
can gain access to. We''ll use this information to configure the Stack Components |
|
in our minimal GCP stack: a GCS Artifact Store, a Kubernetes Orchestrator, and |
|
a GCP Container Registry. |
|
|
|
|
|
```sh |
|
|
|
|
|
zenml service-connector list-resources --resource-type gcs-bucket |
|
|
|
|
|
``` |
|
|
|
|
|
Example Command Output |
|
|
|
|
|
```text |
|
|
|
|
|
The following ''gcs-bucket'' resources can be accessed by service connectors configured |
|
in your workspace:' |
|
- source_sentence: What resources does the `gcp-demo-multi` service connector have |
|
access to after registration? |
|
sentences: |
|
- 'Find out which configuration was used for a run |
|
|
|
|
|
Sometimes you might want to extract the used configuration from a pipeline that |
|
has already run. You can do this simply by loading the pipeline run and accessing |
|
its config attribute. |
|
|
|
|
|
from zenml.client import Client |
|
|
|
|
|
pipeline_run = Client().get_pipeline_run("<PIPELINE_RUN_NAME>") |
|
|
|
|
|
configuration = pipeline_run.config |
|
|
|
|
|
PreviousConfiguration hierarchy |
|
|
|
|
|
NextAutogenerate a template yaml file |
|
|
|
|
|
Last updated 15 days ago' |
|
- 'onfig class and add your configuration parameters.Bring both the implementation |
|
and the configuration together by inheriting from the BaseModelDeployerFlavor |
|
class. Make sure that you give a name to the flavor through its abstract property. |
|
|
|
|
|
Create a service class that inherits from the BaseService class and implements |
|
the abstract methods. This class will be used to represent the deployed model |
|
server in ZenML. |
|
|
|
|
|
Once you are done with the implementation, you can register it through the CLI. |
|
Please ensure you point to the flavor class via dot notation: |
|
|
|
|
|
zenml model-deployer flavor register <path.to.MyModelDeployerFlavor> |
|
|
|
|
|
For example, if your flavor class MyModelDeployerFlavor is defined in flavors/my_flavor.py, |
|
you''d register it by doing: |
|
|
|
|
|
zenml model-deployer flavor register flavors.my_flavor.MyModelDeployerFlavor |
|
|
|
|
|
ZenML resolves the flavor class by taking the path where you initialized zenml |
|
(via zenml init) as the starting point of resolution. Therefore, please ensure |
|
you follow the best practice of initializing zenml at the root of your repository. |
|
|
|
|
|
If ZenML does not find an initialized ZenML repository in any parent directory, |
|
it will default to the current working directory, but usually, it''s better to |
|
not have to rely on this mechanism and initialize zenml at the root. |
|
|
|
|
|
Afterward, you should see the new flavor in the list of available flavors: |
|
|
|
|
|
zenml model-deployer flavor list |
|
|
|
|
|
It is important to draw attention to when and how these base abstractions are |
|
coming into play in a ZenML workflow. |
|
|
|
|
|
The CustomModelDeployerFlavor class is imported and utilized upon the creation |
|
of the custom flavor through the CLI. |
|
|
|
|
|
The CustomModelDeployerConfig class is imported when someone tries to register/update |
|
a stack component with this custom flavor. Especially, during the registration |
|
process of the stack component, the config will be used to validate the values |
|
given by the user. As Config objects are inherently pydantic objects, you can |
|
also add your own custom validators here.' |
|
- 'egister gcp-demo-multi --type gcp --auto-configureExample Command Output |
|
|
|
|
|
```text |
|
|
|
|
|
Successfully registered service connector `gcp-demo-multi` with access to the |
|
following resources: |
|
|
|
|
|
┏━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ |
|
|
|
|
|
┃ RESOURCE TYPE │ RESOURCE NAMES ┃ |
|
|
|
|
|
┠───────────────────────┼─────────────────────────────────────────────────┨ |
|
|
|
|
|
┃ 🔵 gcp-generic │ zenml-core ┃ |
|
|
|
|
|
┠───────────────────────┼─────────────────────────────────────────────────┨ |
|
|
|
|
|
┃ 📦 gcs-bucket │ gs://zenml-bucket-sl ┃ |
|
|
|
|
|
┃ │ gs://zenml-core.appspot.com ┃ |
|
|
|
|
|
┃ │ gs://zenml-core_cloudbuild ┃ |
|
|
|
|
|
┃ │ gs://zenml-datasets ┃ |
|
|
|
|
|
┠───────────────────────┼─────────────────────────────────────────────────┨ |
|
|
|
|
|
┃ 🌀 kubernetes-cluster │ zenml-test-cluster ┃ |
|
|
|
|
|
┠───────────────────────┼─────────────────────────────────────────────────┨ |
|
|
|
|
|
┃ 🐳 docker-registry │ gcr.io/zenml-core ┃ |
|
|
|
|
|
┗━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ |
|
|
|
|
|
``` |
|
|
|
|
|
**NOTE**: from this point forward, we don''t need the local GCP CLI credentials |
|
or the local GCP CLI at all. The steps that follow can be run on any machine regardless |
|
of whether it has been configured and authorized to access the GCP project. |
|
|
|
|
|
4. find out which GCS buckets, GCR registries, and GKE Kubernetes clusters we |
|
can gain access to. We''ll use this information to configure the Stack Components |
|
in our minimal GCP stack: a GCS Artifact Store, a Kubernetes Orchestrator, and |
|
a GCP Container Registry. |
|
|
|
|
|
```sh |
|
|
|
|
|
zenml service-connector list-resources --resource-type gcs-bucket |
|
|
|
|
|
``` |
|
|
|
|
|
Example Command Output |
|
|
|
|
|
```text |
|
|
|
|
|
The following ''gcs-bucket'' resources can be accessed by service connectors configured |
|
in your workspace:' |
|
- source_sentence: What is the result of executing a Deepchecks test suite in ZenML? |
|
sentences: |
|
- 'urns: |
|
|
|
|
|
Deepchecks test suite execution result |
|
|
|
|
|
"""# validation pre-processing (e.g. dataset preparation) can take place here |
|
|
|
|
|
data_validator = DeepchecksDataValidator.get_active_data_validator() |
|
|
|
|
|
suite = data_validator.data_validation( |
|
|
|
|
|
dataset=dataset, |
|
|
|
|
|
check_list=[ |
|
|
|
|
|
DeepchecksDataIntegrityCheck.TABULAR_OUTLIER_SAMPLE_DETECTION, |
|
|
|
|
|
DeepchecksDataIntegrityCheck.TABULAR_STRING_LENGTH_OUT_OF_BOUNDS, |
|
|
|
|
|
], |
|
|
|
|
|
# validation post-processing (e.g. interpret results, take actions) can happen |
|
here |
|
|
|
|
|
return suite |
|
|
|
|
|
The arguments that the Deepchecks Data Validator methods can take in are the same |
|
as those used for the Deepchecks standard steps. |
|
|
|
|
|
Have a look at the complete list of methods and parameters available in the DeepchecksDataValidator |
|
API in the SDK docs. |
|
|
|
|
|
Call Deepchecks directly |
|
|
|
|
|
You can use the Deepchecks library directly in your custom pipeline steps, and |
|
only leverage ZenML''s capability of serializing, versioning and storing the SuiteResult |
|
objects in its Artifact Store, e.g.: |
|
|
|
|
|
import pandas as pd |
|
|
|
|
|
import deepchecks.tabular.checks as tabular_checks |
|
|
|
|
|
from deepchecks.core.suite import SuiteResult |
|
|
|
|
|
from deepchecks.tabular import Suite |
|
|
|
|
|
from deepchecks.tabular import Dataset |
|
|
|
|
|
from zenml import step |
|
|
|
|
|
@step |
|
|
|
|
|
def data_integrity_check( |
|
|
|
|
|
dataset: pd.DataFrame, |
|
|
|
|
|
) -> SuiteResult: |
|
|
|
|
|
"""Custom data integrity check step with Deepchecks |
|
|
|
|
|
Args: |
|
|
|
|
|
dataset: a Pandas DataFrame |
|
|
|
|
|
Returns: |
|
|
|
|
|
Deepchecks test suite execution result |
|
|
|
|
|
""" |
|
|
|
|
|
# validation pre-processing (e.g. dataset preparation) can take place here |
|
|
|
|
|
train_dataset = Dataset( |
|
|
|
|
|
dataset, |
|
|
|
|
|
label=''class'', |
|
|
|
|
|
cat_features=[''country'', ''state''] |
|
|
|
|
|
suite = Suite(name="custom") |
|
|
|
|
|
check = tabular_checks.OutlierSampleDetection( |
|
|
|
|
|
nearest_neighbors_percent=0.01, |
|
|
|
|
|
extent_parameter=3, |
|
|
|
|
|
check.add_condition_outlier_ratio_less_or_equal( |
|
|
|
|
|
max_outliers_ratio=0.007, |
|
|
|
|
|
outlier_score_threshold=0.5, |
|
|
|
|
|
suite.add(check) |
|
|
|
|
|
check = tabular_checks.StringLengthOutOfBounds( |
|
|
|
|
|
num_percentiles=1000, |
|
|
|
|
|
min_unique_values=3, |
|
|
|
|
|
check.add_condition_number_of_outliers_less_or_equal( |
|
|
|
|
|
max_outliers=3,' |
|
- 'ervice-principal |
|
|
|
|
|
``` |
|
|
|
|
|
Example Command Output |
|
|
|
|
|
```Successfully connected orchestrator `aks-demo-cluster` to the following resources: |
|
|
|
|
|
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ |
|
|
|
|
|
┃ CONNECTOR ID │ CONNECTOR NAME │ CONNECTOR TYPE |
|
│ RESOURCE TYPE │ RESOURCE NAMES ┃ |
|
|
|
|
|
┠──────────────────────────────────────┼─────────────────────────┼────────────────┼───────────────────────┼───────────────────────────────────────────────┨ |
|
|
|
|
|
┃ f2316191-d20b-4348-a68b-f5e347862196 │ azure-service-principal │ 🇦 azure │ |
|
🌀 kubernetes-cluster │ demo-zenml-demos/demo-zenml-terraform-cluster ┃ |
|
|
|
|
|
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ |
|
|
|
|
|
``` |
|
|
|
|
|
Register and connect an Azure Container Registry Stack Component to an ACR container |
|
registry:Copyzenml container-registry register acr-demo-registry --flavor azure |
|
--uri=demozenmlcontainerregistry.azurecr.io |
|
|
|
|
|
Example Command Output |
|
|
|
|
|
``` |
|
|
|
|
|
Successfully registered container_registry `acr-demo-registry`. |
|
|
|
|
|
``` |
|
|
|
|
|
```sh |
|
|
|
|
|
zenml container-registry connect acr-demo-registry --connector azure-service-principal |
|
|
|
|
|
``` |
|
|
|
|
|
Example Command Output |
|
|
|
|
|
``` |
|
|
|
|
|
Successfully connected container registry `acr-demo-registry` to the following |
|
resources: |
|
|
|
|
|
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ |
|
|
|
|
|
┃ CONNECTOR ID │ CONNECTOR NAME │ CONNECTOR TYPE |
|
│ RESOURCE TYPE │ RESOURCE NAMES ┃ |
|
|
|
|
|
┠──────────────────────────────────────┼─────────────────────────┼────────────────┼────────────────────┼───────────────────────────────────────┨ |
|
|
|
|
|
┃ f2316191-d20b-4348-a68b-f5e347862196 │ azure-service-principal │ 🇦 azure │ |
|
🐳 docker-registry │ demozenmlcontainerregistry.azurecr.io ┃' |
|
- 'r │ zenhacks-cluster ┃┠───────────────────────┼──────────────────────────────────────────────┨ |
|
|
|
|
|
┃ 🐳 docker-registry │ 715803424590.dkr.ecr.us-east-1.amazonaws.com ┃ |
|
|
|
|
|
┗━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ |
|
|
|
|
|
The Service Connector configuration shows long-lived credentials were lifted from |
|
the local environment and the AWS Session Token authentication method was configured: |
|
|
|
|
|
zenml service-connector describe aws-session-token |
|
|
|
|
|
Example Command Output |
|
|
|
|
|
Service connector ''aws-session-token'' of type ''aws'' with id ''3ae3e595-5cbc-446e-be64-e54e854e0e3f'' |
|
is owned by user ''default'' and is ''private''. |
|
|
|
|
|
''aws-session-token'' aws Service Connector Details |
|
|
|
|
|
┏━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ |
|
|
|
|
|
┃ PROPERTY │ VALUE ┃ |
|
|
|
|
|
┠──────────────────┼─────────────────────────────────────────────────────────────────────────┨ |
|
|
|
|
|
┃ ID │ c0f8e857-47f9-418b-a60f-c3b03023da54 ┃ |
|
|
|
|
|
┠──────────────────┼─────────────────────────────────────────────────────────────────────────┨ |
|
|
|
|
|
┃ NAME │ aws-session-token ┃ |
|
|
|
|
|
┠──────────────────┼─────────────────────────────────────────────────────────────────────────┨ |
|
|
|
|
|
┃ TYPE │ 🔶 aws ┃ |
|
|
|
|
|
┠──────────────────┼─────────────────────────────────────────────────────────────────────────┨ |
|
|
|
|
|
┃ AUTH METHOD │ session-token ┃ |
|
|
|
|
|
┠──────────────────┼─────────────────────────────────────────────────────────────────────────┨ |
|
|
|
|
|
┃ RESOURCE TYPES │ 🔶 aws-generic, 📦 s3-bucket, 🌀 kubernetes-cluster, 🐳 docker-registry |
|
┃ |
|
|
|
|
|
┠──────────────────┼─────────────────────────────────────────────────────────────────────────┨' |
|
model-index: |
|
- name: zenml/finetuned-snowflake-arctic-embed-m |
|
results: |
|
- task: |
|
type: information-retrieval |
|
name: Information Retrieval |
|
dataset: |
|
name: dim 384 |
|
type: dim_384 |
|
metrics: |
|
- type: cosine_accuracy@1 |
|
value: 0.3614457831325301 |
|
name: Cosine Accuracy@1 |
|
- type: cosine_accuracy@3 |
|
value: 0.6987951807228916 |
|
name: Cosine Accuracy@3 |
|
- type: cosine_accuracy@5 |
|
value: 0.7530120481927711 |
|
name: Cosine Accuracy@5 |
|
- type: cosine_accuracy@10 |
|
value: 0.8554216867469879 |
|
name: Cosine Accuracy@10 |
|
- type: cosine_precision@1 |
|
value: 0.3614457831325301 |
|
name: Cosine Precision@1 |
|
- type: cosine_precision@3 |
|
value: 0.23293172690763048 |
|
name: Cosine Precision@3 |
|
- type: cosine_precision@5 |
|
value: 0.15060240963855417 |
|
name: Cosine Precision@5 |
|
- type: cosine_precision@10 |
|
value: 0.08554216867469877 |
|
name: Cosine Precision@10 |
|
- type: cosine_recall@1 |
|
value: 0.3614457831325301 |
|
name: Cosine Recall@1 |
|
- type: cosine_recall@3 |
|
value: 0.6987951807228916 |
|
name: Cosine Recall@3 |
|
- type: cosine_recall@5 |
|
value: 0.7530120481927711 |
|
name: Cosine Recall@5 |
|
- type: cosine_recall@10 |
|
value: 0.8554216867469879 |
|
name: Cosine Recall@10 |
|
- type: cosine_ndcg@10 |
|
value: 0.6194049451779184 |
|
name: Cosine Ndcg@10 |
|
- type: cosine_mrr@10 |
|
value: 0.5427878179384205 |
|
name: Cosine Mrr@10 |
|
- type: cosine_map@100 |
|
value: 0.5472907234693755 |
|
name: Cosine Map@100 |
|
- task: |
|
type: information-retrieval |
|
name: Information Retrieval |
|
dataset: |
|
name: dim 256 |
|
type: dim_256 |
|
metrics: |
|
- type: cosine_accuracy@1 |
|
value: 0.3433734939759036 |
|
name: Cosine Accuracy@1 |
|
- type: cosine_accuracy@3 |
|
value: 0.6807228915662651 |
|
name: Cosine Accuracy@3 |
|
- type: cosine_accuracy@5 |
|
value: 0.7650602409638554 |
|
name: Cosine Accuracy@5 |
|
- type: cosine_accuracy@10 |
|
value: 0.8373493975903614 |
|
name: Cosine Accuracy@10 |
|
- type: cosine_precision@1 |
|
value: 0.3433734939759036 |
|
name: Cosine Precision@1 |
|
- type: cosine_precision@3 |
|
value: 0.2269076305220883 |
|
name: Cosine Precision@3 |
|
- type: cosine_precision@5 |
|
value: 0.15301204819277103 |
|
name: Cosine Precision@5 |
|
- type: cosine_precision@10 |
|
value: 0.08373493975903612 |
|
name: Cosine Precision@10 |
|
- type: cosine_recall@1 |
|
value: 0.3433734939759036 |
|
name: Cosine Recall@1 |
|
- type: cosine_recall@3 |
|
value: 0.6807228915662651 |
|
name: Cosine Recall@3 |
|
- type: cosine_recall@5 |
|
value: 0.7650602409638554 |
|
name: Cosine Recall@5 |
|
- type: cosine_recall@10 |
|
value: 0.8373493975903614 |
|
name: Cosine Recall@10 |
|
- type: cosine_ndcg@10 |
|
value: 0.602546157610675 |
|
name: Cosine Ndcg@10 |
|
- type: cosine_mrr@10 |
|
value: 0.525891661885638 |
|
name: Cosine Mrr@10 |
|
- type: cosine_map@100 |
|
value: 0.5310273317942533 |
|
name: Cosine Map@100 |
|
- task: |
|
type: information-retrieval |
|
name: Information Retrieval |
|
dataset: |
|
name: dim 128 |
|
type: dim_128 |
|
metrics: |
|
- type: cosine_accuracy@1 |
|
value: 0.3132530120481928 |
|
name: Cosine Accuracy@1 |
|
- type: cosine_accuracy@3 |
|
value: 0.6265060240963856 |
|
name: Cosine Accuracy@3 |
|
- type: cosine_accuracy@5 |
|
value: 0.7168674698795181 |
|
name: Cosine Accuracy@5 |
|
- type: cosine_accuracy@10 |
|
value: 0.7891566265060241 |
|
name: Cosine Accuracy@10 |
|
- type: cosine_precision@1 |
|
value: 0.3132530120481928 |
|
name: Cosine Precision@1 |
|
- type: cosine_precision@3 |
|
value: 0.20883534136546178 |
|
name: Cosine Precision@3 |
|
- type: cosine_precision@5 |
|
value: 0.1433734939759036 |
|
name: Cosine Precision@5 |
|
- type: cosine_precision@10 |
|
value: 0.0789156626506024 |
|
name: Cosine Precision@10 |
|
- type: cosine_recall@1 |
|
value: 0.3132530120481928 |
|
name: Cosine Recall@1 |
|
- type: cosine_recall@3 |
|
value: 0.6265060240963856 |
|
name: Cosine Recall@3 |
|
- type: cosine_recall@5 |
|
value: 0.7168674698795181 |
|
name: Cosine Recall@5 |
|
- type: cosine_recall@10 |
|
value: 0.7891566265060241 |
|
name: Cosine Recall@10 |
|
- type: cosine_ndcg@10 |
|
value: 0.5630057581169484 |
|
name: Cosine Ndcg@10 |
|
- type: cosine_mrr@10 |
|
value: 0.4893144004589788 |
|
name: Cosine Mrr@10 |
|
- type: cosine_map@100 |
|
value: 0.4960510164414996 |
|
name: Cosine Map@100 |
|
- task: |
|
type: information-retrieval |
|
name: Information Retrieval |
|
dataset: |
|
name: dim 64 |
|
type: dim_64 |
|
metrics: |
|
- type: cosine_accuracy@1 |
|
value: 0.25903614457831325 |
|
name: Cosine Accuracy@1 |
|
- type: cosine_accuracy@3 |
|
value: 0.5120481927710844 |
|
name: Cosine Accuracy@3 |
|
- type: cosine_accuracy@5 |
|
value: 0.6325301204819277 |
|
name: Cosine Accuracy@5 |
|
- type: cosine_accuracy@10 |
|
value: 0.7168674698795181 |
|
name: Cosine Accuracy@10 |
|
- type: cosine_precision@1 |
|
value: 0.25903614457831325 |
|
name: Cosine Precision@1 |
|
- type: cosine_precision@3 |
|
value: 0.17068273092369476 |
|
name: Cosine Precision@3 |
|
- type: cosine_precision@5 |
|
value: 0.12650602409638553 |
|
name: Cosine Precision@5 |
|
- type: cosine_precision@10 |
|
value: 0.07168674698795179 |
|
name: Cosine Precision@10 |
|
- type: cosine_recall@1 |
|
value: 0.25903614457831325 |
|
name: Cosine Recall@1 |
|
- type: cosine_recall@3 |
|
value: 0.5120481927710844 |
|
name: Cosine Recall@3 |
|
- type: cosine_recall@5 |
|
value: 0.6325301204819277 |
|
name: Cosine Recall@5 |
|
- type: cosine_recall@10 |
|
value: 0.7168674698795181 |
|
name: Cosine Recall@10 |
|
- type: cosine_ndcg@10 |
|
value: 0.48618223058871674 |
|
name: Cosine Ndcg@10 |
|
- type: cosine_mrr@10 |
|
value: 0.41233027347485207 |
|
name: Cosine Mrr@10 |
|
- type: cosine_map@100 |
|
value: 0.42094598177412385 |
|
name: Cosine Map@100 |
|
--- |
|
|
|
# zenml/finetuned-snowflake-arctic-embed-m |
|
|
|
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-m](https://huggingface.co/Snowflake/snowflake-arctic-embed-m). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
- **Model Type:** Sentence Transformer |
|
- **Base model:** [Snowflake/snowflake-arctic-embed-m](https://huggingface.co/Snowflake/snowflake-arctic-embed-m) <!-- at revision 71bc94c8f9ea1e54fba11167004205a65e5da2cc --> |
|
- **Maximum Sequence Length:** 512 tokens |
|
- **Output Dimensionality:** 768 tokens |
|
- **Similarity Function:** Cosine Similarity |
|
<!-- - **Training Dataset:** Unknown --> |
|
- **Language:** en |
|
- **License:** apache-2.0 |
|
|
|
### Model Sources |
|
|
|
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net) |
|
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) |
|
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) |
|
|
|
### Full Model Architecture |
|
|
|
``` |
|
SentenceTransformer( |
|
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel |
|
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) |
|
(2): Normalize() |
|
) |
|
``` |
|
|
|
## Usage |
|
|
|
### Direct Usage (Sentence Transformers) |
|
|
|
First install the Sentence Transformers library: |
|
|
|
```bash |
|
pip install -U sentence-transformers |
|
``` |
|
|
|
Then you can load this model and run inference. |
|
```python |
|
from sentence_transformers import SentenceTransformer |
|
|
|
# Download from the 🤗 Hub |
|
model = SentenceTransformer("zenml/finetuned-snowflake-arctic-embed-m") |
|
# Run inference |
|
sentences = [ |
|
'What is the result of executing a Deepchecks test suite in ZenML?', |
|
'urns:\n\nDeepchecks test suite execution result\n\n"""# validation pre-processing (e.g. dataset preparation) can take place here\n\ndata_validator = DeepchecksDataValidator.get_active_data_validator()\n\nsuite = data_validator.data_validation(\n\ndataset=dataset,\n\ncheck_list=[\n\nDeepchecksDataIntegrityCheck.TABULAR_OUTLIER_SAMPLE_DETECTION,\n\nDeepchecksDataIntegrityCheck.TABULAR_STRING_LENGTH_OUT_OF_BOUNDS,\n\n],\n\n# validation post-processing (e.g. interpret results, take actions) can happen here\n\nreturn suite\n\nThe arguments that the Deepchecks Data Validator methods can take in are the same as those used for the Deepchecks standard steps.\n\nHave a look at the complete list of methods and parameters available in the DeepchecksDataValidator API in the SDK docs.\n\nCall Deepchecks directly\n\nYou can use the Deepchecks library directly in your custom pipeline steps, and only leverage ZenML\'s capability of serializing, versioning and storing the SuiteResult objects in its Artifact Store, e.g.:\n\nimport pandas as pd\n\nimport deepchecks.tabular.checks as tabular_checks\n\nfrom deepchecks.core.suite import SuiteResult\n\nfrom deepchecks.tabular import Suite\n\nfrom deepchecks.tabular import Dataset\n\nfrom zenml import step\n\n@step\n\ndef data_integrity_check(\n\ndataset: pd.DataFrame,\n\n) -> SuiteResult:\n\n"""Custom data integrity check step with Deepchecks\n\nArgs:\n\ndataset: a Pandas DataFrame\n\nReturns:\n\nDeepchecks test suite execution result\n\n"""\n\n# validation pre-processing (e.g. dataset preparation) can take place here\n\ntrain_dataset = Dataset(\n\ndataset,\n\nlabel=\'class\',\n\ncat_features=[\'country\', \'state\']\n\nsuite = Suite(name="custom")\n\ncheck = tabular_checks.OutlierSampleDetection(\n\nnearest_neighbors_percent=0.01,\n\nextent_parameter=3,\n\ncheck.add_condition_outlier_ratio_less_or_equal(\n\nmax_outliers_ratio=0.007,\n\noutlier_score_threshold=0.5,\n\nsuite.add(check)\n\ncheck = tabular_checks.StringLengthOutOfBounds(\n\nnum_percentiles=1000,\n\nmin_unique_values=3,\n\ncheck.add_condition_number_of_outliers_less_or_equal(\n\nmax_outliers=3,', |
|
"r │ zenhacks-cluster ┃┠───────────────────────┼──────────────────────────────────────────────┨\n\n┃ 🐳 docker-registry │ 715803424590.dkr.ecr.us-east-1.amazonaws.com ┃\n\n┗━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛\n\nThe Service Connector configuration shows long-lived credentials were lifted from the local environment and the AWS Session Token authentication method was configured:\n\nzenml service-connector describe aws-session-token\n\nExample Command Output\n\nService connector 'aws-session-token' of type 'aws' with id '3ae3e595-5cbc-446e-be64-e54e854e0e3f' is owned by user 'default' and is 'private'.\n\n'aws-session-token' aws Service Connector Details\n\n┏━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n\n┃ PROPERTY │ VALUE ┃\n\n┠──────────────────┼─────────────────────────────────────────────────────────────────────────┨\n\n┃ ID │ c0f8e857-47f9-418b-a60f-c3b03023da54 ┃\n\n┠──────────────────┼─────────────────────────────────────────────────────────────────────────┨\n\n┃ NAME │ aws-session-token ┃\n\n┠──────────────────┼─────────────────────────────────────────────────────────────────────────┨\n\n┃ TYPE │ 🔶 aws ┃\n\n┠──────────────────┼─────────────────────────────────────────────────────────────────────────┨\n\n┃ AUTH METHOD │ session-token ┃\n\n┠──────────────────┼─────────────────────────────────────────────────────────────────────────┨\n\n┃ RESOURCE TYPES │ 🔶 aws-generic, 📦 s3-bucket, 🌀 kubernetes-cluster, 🐳 docker-registry ┃\n\n┠──────────────────┼─────────────────────────────────────────────────────────────────────────┨", |
|
] |
|
embeddings = model.encode(sentences) |
|
print(embeddings.shape) |
|
# [3, 768] |
|
|
|
# Get the similarity scores for the embeddings |
|
similarities = model.similarity(embeddings, embeddings) |
|
print(similarities.shape) |
|
# [3, 3] |
|
``` |
|
|
|
<!-- |
|
### Direct Usage (Transformers) |
|
|
|
<details><summary>Click to see the direct usage in Transformers</summary> |
|
|
|
</details> |
|
--> |
|
|
|
<!-- |
|
### Downstream Usage (Sentence Transformers) |
|
|
|
You can finetune this model on your own dataset. |
|
|
|
<details><summary>Click to expand</summary> |
|
|
|
</details> |
|
--> |
|
|
|
<!-- |
|
### Out-of-Scope Use |
|
|
|
*List how the model may foreseeably be misused and address what users ought not to do with the model.* |
|
--> |
|
|
|
## Evaluation |
|
|
|
### Metrics |
|
|
|
#### Information Retrieval |
|
* Dataset: `dim_384` |
|
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) |
|
|
|
| Metric | Value | |
|
|:--------------------|:-----------| |
|
| cosine_accuracy@1 | 0.3614 | |
|
| cosine_accuracy@3 | 0.6988 | |
|
| cosine_accuracy@5 | 0.753 | |
|
| cosine_accuracy@10 | 0.8554 | |
|
| cosine_precision@1 | 0.3614 | |
|
| cosine_precision@3 | 0.2329 | |
|
| cosine_precision@5 | 0.1506 | |
|
| cosine_precision@10 | 0.0855 | |
|
| cosine_recall@1 | 0.3614 | |
|
| cosine_recall@3 | 0.6988 | |
|
| cosine_recall@5 | 0.753 | |
|
| cosine_recall@10 | 0.8554 | |
|
| cosine_ndcg@10 | 0.6194 | |
|
| cosine_mrr@10 | 0.5428 | |
|
| **cosine_map@100** | **0.5473** | |
|
|
|
#### Information Retrieval |
|
* Dataset: `dim_256` |
|
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) |
|
|
|
| Metric | Value | |
|
|:--------------------|:----------| |
|
| cosine_accuracy@1 | 0.3434 | |
|
| cosine_accuracy@3 | 0.6807 | |
|
| cosine_accuracy@5 | 0.7651 | |
|
| cosine_accuracy@10 | 0.8373 | |
|
| cosine_precision@1 | 0.3434 | |
|
| cosine_precision@3 | 0.2269 | |
|
| cosine_precision@5 | 0.153 | |
|
| cosine_precision@10 | 0.0837 | |
|
| cosine_recall@1 | 0.3434 | |
|
| cosine_recall@3 | 0.6807 | |
|
| cosine_recall@5 | 0.7651 | |
|
| cosine_recall@10 | 0.8373 | |
|
| cosine_ndcg@10 | 0.6025 | |
|
| cosine_mrr@10 | 0.5259 | |
|
| **cosine_map@100** | **0.531** | |
|
|
|
#### Information Retrieval |
|
* Dataset: `dim_128` |
|
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) |
|
|
|
| Metric | Value | |
|
|:--------------------|:-----------| |
|
| cosine_accuracy@1 | 0.3133 | |
|
| cosine_accuracy@3 | 0.6265 | |
|
| cosine_accuracy@5 | 0.7169 | |
|
| cosine_accuracy@10 | 0.7892 | |
|
| cosine_precision@1 | 0.3133 | |
|
| cosine_precision@3 | 0.2088 | |
|
| cosine_precision@5 | 0.1434 | |
|
| cosine_precision@10 | 0.0789 | |
|
| cosine_recall@1 | 0.3133 | |
|
| cosine_recall@3 | 0.6265 | |
|
| cosine_recall@5 | 0.7169 | |
|
| cosine_recall@10 | 0.7892 | |
|
| cosine_ndcg@10 | 0.563 | |
|
| cosine_mrr@10 | 0.4893 | |
|
| **cosine_map@100** | **0.4961** | |
|
|
|
#### Information Retrieval |
|
* Dataset: `dim_64` |
|
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) |
|
|
|
| Metric | Value | |
|
|:--------------------|:-----------| |
|
| cosine_accuracy@1 | 0.259 | |
|
| cosine_accuracy@3 | 0.512 | |
|
| cosine_accuracy@5 | 0.6325 | |
|
| cosine_accuracy@10 | 0.7169 | |
|
| cosine_precision@1 | 0.259 | |
|
| cosine_precision@3 | 0.1707 | |
|
| cosine_precision@5 | 0.1265 | |
|
| cosine_precision@10 | 0.0717 | |
|
| cosine_recall@1 | 0.259 | |
|
| cosine_recall@3 | 0.512 | |
|
| cosine_recall@5 | 0.6325 | |
|
| cosine_recall@10 | 0.7169 | |
|
| cosine_ndcg@10 | 0.4862 | |
|
| cosine_mrr@10 | 0.4123 | |
|
| **cosine_map@100** | **0.4209** | |
|
|
|
<!-- |
|
## Bias, Risks and Limitations |
|
|
|
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.* |
|
--> |
|
|
|
<!-- |
|
### Recommendations |
|
|
|
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.* |
|
--> |
|
|
|
## Training Details |
|
|
|
### Training Dataset |
|
|
|
#### Unnamed Dataset |
|
|
|
|
|
* Size: 1,490 training samples |
|
* Columns: <code>positive</code> and <code>anchor</code> |
|
* Approximate statistics based on the first 1000 samples: |
|
| | positive | anchor | |
|
|:--------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------| |
|
| type | string | string | |
|
| details | <ul><li>min: 9 tokens</li><li>mean: 21.15 tokens</li><li>max: 48 tokens</li></ul> | <ul><li>min: 21 tokens</li><li>mean: 373.39 tokens</li><li>max: 512 tokens</li></ul> | |
|
* Samples: |
|
| positive | anchor | |
|
|:---------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |
|
| <code>How do I configure the IAM role for the ZenML AWS CLI profile?</code> | <code>ication method with a configured IAM role instead.The connector needs to be configured with the IAM role to be assumed accompanied by an AWS secret key associated with an IAM user or an STS token associated with another IAM role. The IAM user or IAM role must have permission to assume the target IAM role. The connector will generate temporary STS tokens upon request by calling the AssumeRole STS API.<br><br>The best practice implemented with this authentication scheme is to keep the set of permissions associated with the primary IAM user or IAM role down to the bare minimum and grant permissions to the privilege-bearing IAM role instead.<br><br>An AWS region is required and the connector may only be used to access AWS resources in the specified region.<br><br>One or more optional IAM session policies may also be configured to further restrict the permissions of the generated STS tokens. If not specified, IAM session policies are automatically configured for the generated STS tokens to restrict them to the minimum set of permissions required to access the target resource. Refer to the documentation for each supported Resource Type for the complete list of AWS permissions automatically granted to the generated STS tokens.<br><br>The default expiration period for generated STS tokens is 1 hour with a minimum of 15 minutes up to the maximum session duration setting configured for the IAM role (default is 1 hour). If you need longer-lived tokens, you can configure the IAM role to use a higher maximum expiration value (up to 12 hours) or use the AWS Federation Token or AWS Session Token authentication methods.<br><br>For more information on IAM roles and the AssumeRole AWS API, see the official AWS documentation on the subject.<br><br>For more information about the difference between this method and the AWS Federation Token authentication method, consult this AWS documentation page.<br><br>The following assumes the local AWS CLI has a zenml AWS CLI profile already configured with an AWS Secret Key and an IAM role to be assumed:</code> | |
|
| <code>What command should be used to list all available alerter flavors after initializing ZenML at the root of the repository?</code> | <code>initializing zenml at the root of your repository.If ZenML does not find an initialized ZenML repository in any parent directory, it will default to the current working directory, but usually, it's better to not have to rely on this mechanism and initialize zenml at the root.<br><br>Afterward, you should see the new custom alerter flavor in the list of available alerter flavors:<br><br>zenml alerter flavor list<br><br>It is important to draw attention to when and how these abstractions are coming into play in a ZenML workflow.<br><br>The MyAlerterFlavor class is imported and utilized upon the creation of the custom flavor through the CLI.<br><br>The MyAlerterConfig class is imported when someone tries to register/update a stack component with the my_alerter flavor. Especially, during the registration process of the stack component, the config will be used to validate the values given by the user. As Config objects are inherently pydantic objects, you can also add your own custom validators here.<br><br>The MyAlerter only comes into play when the component is ultimately in use.<br><br>The design behind this interaction lets us separate the configuration of the flavor from its implementation. This way we can register flavors and components even when the major dependencies behind their implementation are not installed in our local setting (assuming the MyAlerterFlavor and the MyAlerterConfig are implemented in a different module/path than the actual MyAlerter).<br><br>PreviousSlack Alerter<br><br>NextImage Builders<br><br>Last updated 15 days ago</code> | |
|
| <code>Where can I find the URL to the UI of a remote orchestrator for a pipeline run in ZenML?</code> | <code>g, and cached.<br><br>status = run.status<br><br>ConfigurationThe pipeline_configuration is an object that contains all configurations of the pipeline and pipeline run, including the pipeline-level settings, which we will learn more about later:<br><br>pipeline_config = run.config<br><br>pipeline_settings = run.config.settings<br><br>Component-Specific metadata<br><br>Depending on the stack components you use, you might have additional component-specific metadata associated with your run, such as the URL to the UI of a remote orchestrator. You can access this component-specific metadata via the run_metadata attribute:<br><br>run_metadata = run.run_metadata<br><br># The following only works for runs on certain remote orchestrators<br><br>orchestrator_url = run_metadata["orchestrator_url"].value<br><br>## Steps<br><br>Within a given pipeline run you can now further zoom in on individual steps using the `steps` attribute:<br><br>```python<br><br># get all steps of a pipeline for a given run<br><br>steps = run.steps<br><br># get a specific step by its invocation ID<br><br>step = run.steps["first_step"]<br><br>If you're only calling each step once inside your pipeline, the invocation ID will be the same as the name of your step. For more complex pipelines, check out this page to learn more about the invocation ID.<br><br>Inspect pipeline runs with our VS Code extension<br><br>If you are using our VS Code extension, you can easily view your pipeline runs by opening the sidebar (click on the ZenML icon). You can then click on any particular pipeline run to see its status and some other metadata. If you want to delete a run, you can also do so from the same sidebar view.<br><br>Step information<br><br>Similar to the run, you can use the step object to access a variety of useful information:<br><br>The parameters used to run the step via step.config.parameters,<br><br>The step-level settings via step.config.settings,<br><br>Component-specific step metadata, such as the URL of an experiment tracker or model deployer, via step.run_metadata<br><br>See the StepRunResponse definition for a comprehensive list of available information.<br><br>Artifacts</code> | |
|
* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: |
|
```json |
|
{ |
|
"loss": "MultipleNegativesRankingLoss", |
|
"matryoshka_dims": [ |
|
384, |
|
256, |
|
128, |
|
64 |
|
], |
|
"matryoshka_weights": [ |
|
1, |
|
1, |
|
1, |
|
1 |
|
], |
|
"n_dims_per_step": -1 |
|
} |
|
``` |
|
|
|
### Training Hyperparameters |
|
#### Non-Default Hyperparameters |
|
|
|
- `eval_strategy`: epoch |
|
- `per_device_train_batch_size`: 32 |
|
- `per_device_eval_batch_size`: 16 |
|
- `gradient_accumulation_steps`: 16 |
|
- `learning_rate`: 2e-05 |
|
- `num_train_epochs`: 4 |
|
- `lr_scheduler_type`: cosine |
|
- `warmup_ratio`: 0.1 |
|
- `bf16`: True |
|
- `tf32`: True |
|
- `load_best_model_at_end`: True |
|
- `optim`: adamw_torch_fused |
|
- `batch_sampler`: no_duplicates |
|
|
|
#### All Hyperparameters |
|
<details><summary>Click to expand</summary> |
|
|
|
- `overwrite_output_dir`: False |
|
- `do_predict`: False |
|
- `eval_strategy`: epoch |
|
- `prediction_loss_only`: True |
|
- `per_device_train_batch_size`: 32 |
|
- `per_device_eval_batch_size`: 16 |
|
- `per_gpu_train_batch_size`: None |
|
- `per_gpu_eval_batch_size`: None |
|
- `gradient_accumulation_steps`: 16 |
|
- `eval_accumulation_steps`: None |
|
- `learning_rate`: 2e-05 |
|
- `weight_decay`: 0.0 |
|
- `adam_beta1`: 0.9 |
|
- `adam_beta2`: 0.999 |
|
- `adam_epsilon`: 1e-08 |
|
- `max_grad_norm`: 1.0 |
|
- `num_train_epochs`: 4 |
|
- `max_steps`: -1 |
|
- `lr_scheduler_type`: cosine |
|
- `lr_scheduler_kwargs`: {} |
|
- `warmup_ratio`: 0.1 |
|
- `warmup_steps`: 0 |
|
- `log_level`: passive |
|
- `log_level_replica`: warning |
|
- `log_on_each_node`: True |
|
- `logging_nan_inf_filter`: True |
|
- `save_safetensors`: True |
|
- `save_on_each_node`: False |
|
- `save_only_model`: False |
|
- `restore_callback_states_from_checkpoint`: False |
|
- `no_cuda`: False |
|
- `use_cpu`: False |
|
- `use_mps_device`: False |
|
- `seed`: 42 |
|
- `data_seed`: None |
|
- `jit_mode_eval`: False |
|
- `use_ipex`: False |
|
- `bf16`: True |
|
- `fp16`: False |
|
- `fp16_opt_level`: O1 |
|
- `half_precision_backend`: auto |
|
- `bf16_full_eval`: False |
|
- `fp16_full_eval`: False |
|
- `tf32`: True |
|
- `local_rank`: 0 |
|
- `ddp_backend`: None |
|
- `tpu_num_cores`: None |
|
- `tpu_metrics_debug`: False |
|
- `debug`: [] |
|
- `dataloader_drop_last`: False |
|
- `dataloader_num_workers`: 0 |
|
- `dataloader_prefetch_factor`: None |
|
- `past_index`: -1 |
|
- `disable_tqdm`: True |
|
- `remove_unused_columns`: True |
|
- `label_names`: None |
|
- `load_best_model_at_end`: True |
|
- `ignore_data_skip`: False |
|
- `fsdp`: [] |
|
- `fsdp_min_num_params`: 0 |
|
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} |
|
- `fsdp_transformer_layer_cls_to_wrap`: None |
|
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} |
|
- `deepspeed`: None |
|
- `label_smoothing_factor`: 0.0 |
|
- `optim`: adamw_torch_fused |
|
- `optim_args`: None |
|
- `adafactor`: False |
|
- `group_by_length`: False |
|
- `length_column_name`: length |
|
- `ddp_find_unused_parameters`: None |
|
- `ddp_bucket_cap_mb`: None |
|
- `ddp_broadcast_buffers`: False |
|
- `dataloader_pin_memory`: True |
|
- `dataloader_persistent_workers`: False |
|
- `skip_memory_metrics`: True |
|
- `use_legacy_prediction_loop`: False |
|
- `push_to_hub`: False |
|
- `resume_from_checkpoint`: None |
|
- `hub_model_id`: None |
|
- `hub_strategy`: every_save |
|
- `hub_private_repo`: False |
|
- `hub_always_push`: False |
|
- `gradient_checkpointing`: False |
|
- `gradient_checkpointing_kwargs`: None |
|
- `include_inputs_for_metrics`: False |
|
- `eval_do_concat_batches`: True |
|
- `fp16_backend`: auto |
|
- `push_to_hub_model_id`: None |
|
- `push_to_hub_organization`: None |
|
- `mp_parameters`: |
|
- `auto_find_batch_size`: False |
|
- `full_determinism`: False |
|
- `torchdynamo`: None |
|
- `ray_scope`: last |
|
- `ddp_timeout`: 1800 |
|
- `torch_compile`: False |
|
- `torch_compile_backend`: None |
|
- `torch_compile_mode`: None |
|
- `dispatch_batches`: None |
|
- `split_batches`: None |
|
- `include_tokens_per_second`: False |
|
- `include_num_input_tokens_seen`: False |
|
- `neftune_noise_alpha`: None |
|
- `optim_target_modules`: None |
|
- `batch_eval_metrics`: False |
|
- `batch_sampler`: no_duplicates |
|
- `multi_dataset_batch_sampler`: proportional |
|
|
|
</details> |
|
|
|
### Training Logs |
|
| Epoch | Step | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_384_cosine_map@100 | dim_64_cosine_map@100 | |
|
|:----------:|:-----:|:----------------------:|:----------------------:|:----------------------:|:---------------------:| |
|
| 0.6667 | 1 | 0.4215 | 0.4509 | 0.4878 | 0.3203 | |
|
| 2.0 | 3 | 0.4835 | 0.5278 | 0.5582 | 0.4141 | |
|
| **2.6667** | **4** | **0.4961** | **0.531** | **0.5473** | **0.4209** | |
|
|
|
* The bold row denotes the saved checkpoint. |
|
|
|
### Framework Versions |
|
- Python: 3.10.14 |
|
- Sentence Transformers: 3.0.1 |
|
- Transformers: 4.41.2 |
|
- PyTorch: 2.3.1+cu121 |
|
- Accelerate: 0.31.0 |
|
- Datasets: 2.19.1 |
|
- Tokenizers: 0.19.1 |
|
|
|
## Citation |
|
|
|
### BibTeX |
|
|
|
#### Sentence Transformers |
|
```bibtex |
|
@inproceedings{reimers-2019-sentence-bert, |
|
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", |
|
author = "Reimers, Nils and Gurevych, Iryna", |
|
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", |
|
month = "11", |
|
year = "2019", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://arxiv.org/abs/1908.10084", |
|
} |
|
``` |
|
|
|
#### MatryoshkaLoss |
|
```bibtex |
|
@misc{kusupati2024matryoshka, |
|
title={Matryoshka Representation Learning}, |
|
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi}, |
|
year={2024}, |
|
eprint={2205.13147}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.LG} |
|
} |
|
``` |
|
|
|
#### MultipleNegativesRankingLoss |
|
```bibtex |
|
@misc{henderson2017efficient, |
|
title={Efficient Natural Language Response Suggestion for Smart Reply}, |
|
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, |
|
year={2017}, |
|
eprint={1705.00652}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |
|
|
|
<!-- |
|
## Glossary |
|
|
|
*Clearly define terms in order to be accessible across audiences.* |
|
--> |
|
|
|
<!-- |
|
## Model Card Authors |
|
|
|
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.* |
|
--> |
|
|
|
<!-- |
|
## Model Card Contact |
|
|
|
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.* |
|
--> |