diff --git "a/README.md" "b/README.md" --- "a/README.md" +++ "b/README.md" @@ -31,939 +31,908 @@ tags: - loss:MatryoshkaLoss - loss:MultipleNegativesRankingLoss widget: -- source_sentence: How can I list the 'kubernetes-cluster' resources that are accessible - by service connectors in my ZenML workspace? +- source_sentence: How can I deploy the ZenML server in different environments and + manage pipelines with the new commands? sentences: - - 'lly registered orchestrator ``.$ zenml service-connector list-resources - --resource-type kubernetes-cluster -e + - 'ed to update the way they are registered in ZenML.the updated ZenML server provides + a new and improved collaborative experience. When connected to a ZenML server, + you can now share your ZenML Stacks and Stack Components with other users. If + you were previously using the ZenML Profiles or the ZenML server to share your + ZenML Stacks, you should switch to the new ZenML server and Dashboard and update + your existing workflows to reflect the new features. - The following ''kubernetes-cluster'' resources can be accessed by service connectors - configured in your workspace: - ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━┓ + ZenML takes over the Metadata Store role - ┃ CONNECTOR ID │ CONNECTOR NAME │ CONNECTOR TYPE - │ RESOURCE TYPE │ RESOURCE NAMES ┃ - ┠──────────────────────────────────────┼───────────────────────┼────────────────┼───────────────────────┼─────────────────────┨ + ZenML can now run as a server that can be accessed via a REST API and also comes + with a visual user interface (called the ZenML Dashboard). This server can be + deployed in arbitrary environments (local, on-prem, via Docker, on AWS, GCP, Azure + etc.) and supports user management, workspace scoping, and more. - ┃ e33c9fac-5daa-48b2-87bb-0187d3782cde │ aws-iam-multi-eu │ 🔶 aws │ - 🌀 kubernetes-cluster │ kubeflowmultitenant ┃ - ┃ │ │ │ │ - zenbox ┃ + The release introduces a series of commands to facilitate managing the lifecycle + of the ZenML server and to access the pipeline and pipeline run information: - ┠──────────────────────────────────────┼───────────────────────┼────────────────┼───────────────────────┼─────────────────────┨ - ┃ ed528d5a-d6cb-4fc4-bc52-c3d2d01643e5 │ aws-iam-multi-us │ 🔶 aws │ - 🌀 kubernetes-cluster │ zenhacks-cluster ┃ + zenml connect / disconnect / down / up / logs / status can be used to configure + your client to connect to a ZenML server, to start a local ZenML Dashboard or + to deploy a ZenML server to a cloud environment. For more information on how to + use these commands, see the ZenML deployment documentation. - ┠──────────────────────────────────────┼───────────────────────┼────────────────┼───────────────────────┼─────────────────────┨ - ┃ 1c54b32a-4889-4417-abbd-42d3ace3d03a │ gcp-sa-multi │ 🔵 gcp │ - 🌀 kubernetes-cluster │ zenml-test-cluster ┃ + zenml pipeline list / runs / delete can be used to display information and about + and manage your pipelines and pipeline runs. - ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━┛' - - 'Name your pipeline runs + In ZenML 0.13.2 and earlier versions, information about pipelines and pipeline + runs used to be stored in a separate stack component called the Metadata Store. + Starting with 0.20.0, the role of the Metadata Store is now taken over by ZenML + itself. This means that the Metadata Store is no longer a separate component in + the ZenML architecture, but rather a part of the ZenML core, located wherever + ZenML is deployed: locally on your machine or running remotely as a server.' + - 'ntainer │ service-principal │ │ ┃┃ │ │ + 🌀 kubernetes-cluster │ access-token │ │ ┃ - In the output logs of a pipeline run you will see the name of the run: + ┃ │ │ 🐳 docker-registry │ │ │ ┃ - Pipeline run training_pipeline-2023_05_24-12_41_04_576473 has finished in 3.742s. + ┠──────────────────────────────┼───────────────┼───────────────────────┼───────────────────┼───────┼────────┨ - This name is automatically generated based on the current date and time. To change - the name for a run, pass run_name as a parameter to the with_options() method: + ┃ AWS Service Connector │ 🔶 aws │ 🔶 aws-generic │ implicit │ + ✅ │ ✅ ┃ - training_pipeline = training_pipeline.with_options( + ┃ │ │ 📦 s3-bucket │ secret-key │ │ ┃ - run_name="custom_pipeline_run_name" + ┃ │ │ 🌀 kubernetes-cluster │ sts-token │ │ ┃ - training_pipeline() + ┃ │ │ 🐳 docker-registry │ iam-role │ │ ┃ - Pipeline run names must be unique, so if you plan to run your pipelines multiple - times or run them on a schedule, make sure to either compute the run name dynamically - or include one of the following placeholders that ZenML will replace: + ┃ │ │ │ session-token │ │ ┃ - {{date}} will resolve to the current date, e.g. 2023_02_19 + ┃ │ │ │ federation-token │ │ ┃ - {{time}} will resolve to the current time, e.g. 11_07_09_326492 + ┠──────────────────────────────┼───────────────┼───────────────────────┼───────────────────┼───────┼────────┨ - training_pipeline = training_pipeline.with_options( + ┃ GCP Service Connector │ 🔵 gcp │ 🔵 gcp-generic │ implicit │ + ✅ │ ✅ ┃ - run_name=f"custom_pipeline_run_name_{{date}}_{{time}}" + ┃ │ │ 📦 gcs-bucket │ user-account │ │ ┃ - training_pipeline() + ┃ │ │ 🌀 kubernetes-cluster │ service-account │ │ ┃ - Be sure to include the f string prefix to allow for the placeholders to be replaced, - as shown in the example above. Without the f prefix, the placeholders will not - be replaced. + ┃ │ │ 🐳 docker-registry │ oauth2-token │ │ ┃ - PreviousUsing a custom step invocation ID + ┃ │ │ │ impersonation │ │ ┃ - NextUse failure/success hooks - - Last updated 19 days ago' - - ' a_new_local_stack -o default -a my_artifact_storestack : This is the CLI group - that enables interactions with the stacks - - - register: Here we want to register a new stack. Explore other operations withzenml - stack --help. - - - a_new_local_stack : This is the unique name that the stack will have. - - - --orchestrator or -o are used to specify which orchestrator to use for the stack - - - --artifact-store or -a are used to specify which artifact store to use for the - stack - - - The output for the command should look something like this: + ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━┷━━━━━━━┷━━━━━━━━┛' + - 'tional) Which Metadata to Extract for the ArtifactOptionally, you can override + the extract_metadata() method to track custom metadata for all artifacts saved + by your materializer. Anything you extract here will be displayed in the dashboard + next to your artifacts. - Using the default local database. + src.zenml.metadata.metadata_types that are displayed in a dedicated way in the + dashboard. See - Running with active workspace: ''default'' (repository) + src.zenml.metadata.metadata_types.MetadataType for more details. - Stack ''a_new_local_stack'' successfully registered! + By default, this method will only extract the storage size of an artifact, but + you can overwrite it to track anything you wish. E.g., the zenml.materializers.NumpyMaterializer + overwrites this method to track the shape, dtype, and some statistical properties + of each np.ndarray that it saves. - You can inspect the stack with the following command: + If you would like to disable artifact metadata extraction altogether, you can + set enable_artifact_metadata at either pipeline or step level via @pipeline(enable_artifact_metadata=False) + or @step(enable_artifact_metadata=False). - zenml stack describe a_new_local_stack + Skipping materialization - Which will give you an output like this: + Skipping materialization might have unintended consequences for downstream tasks + that rely on materialized artifacts. Only skip materialization if there is no + other way to do what you want to do. - Stack Configuration + While materializers should in most cases be used to control how artifacts are + returned and consumed from pipeline steps, you might sometimes need to have a + completely unmaterialized artifact in a step, e.g., if you need to know the exact + path to where your artifact is stored. - ┏━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━┓ + An unmaterialized artifact is a zenml.materializers.UnmaterializedArtifact. Among + others, it has a property uri that points to the unique path in the artifact store + where the artifact is persisted. One can use an unmaterialized artifact by specifying + UnmaterializedArtifact as the type in the step: - ┃ COMPONENT_TYPE │ COMPONENT_NAME ┃ + from zenml.artifacts.unmaterialized_artifact import UnmaterializedArtifact - ┠────────────────┼───────────────────┨ + from zenml import step - ┃ ORCHESTRATOR │ default ┃ + @step - ┠────────────────┼───────────────────┨ + def my_step(my_artifact: UnmaterializedArtifact): # rather than pd.DataFrame - ┃ ARTIFACT_STORE │ my_artifact_store ┃ + pass - ┗━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━┛ + Example' +- source_sentence: How is the verification process different for multi-instance and + single-instance Service Connectors? + sentences: + - 'Develop a Custom Annotator - ''a_new_local_stack'' stack + Learning how to develop a custom annotator. - Stack ''a_new_local_stack'' with id ''...'' is owned by user default and is ''private''. + Before diving into the specifics of this component type, it is beneficial to familiarize + yourself with our general guide to writing custom component flavors in ZenML. + This guide provides an essential understanding of ZenML''s component flavor concepts. - Switch stacks with our VS Code extension + Annotators are a stack component that enables the use of data annotation as part + of your ZenML stack and pipelines. You can use the associated CLI command to launch + annotation, configure your datasets and get stats on how many labeled tasks you + have ready for use. - If you are using our VS Code extension, you can easily view and switch your stacks - by opening the sidebar (click on the ZenML icon). You can then click on the stack - you want to switch to as well as view the stack components it''s made up of. + Base abstraction in progress! - Run a pipeline on the new local stack + We are actively working on the base abstraction for the annotators, which will + be available soon. As a result, their extension is not possible at the moment. + If you would like to use an annotator in your stack, please check the list of + already available feature stores down below. - Let''s use the pipeline in our starter project from the previous guide to see - it in action. + PreviousProdigy - If you have not already, clone the starter template: + NextModel Registries - pip install "zenml[templates,server]" notebook + Last updated 15 days ago' + - 'ld be accessible to larger audiences. - zenml integration install sklearn -y + TerminologyAs with any high-level abstraction, some terminology is needed to express + the concepts and operations involved. In spite of the fact that Service Connectors + cover such a large area of application as authentication and authorization for + a variety of resources from a range of different vendors, we managed to keep this + abstraction clean and simple. In the following expandable sections, you''ll learn + more about Service Connector Types, Resource Types, Resource Names, and Service + Connectors. - mkdir zenml_starter + This term is used to represent and identify a particular Service Connector implementation + and answer questions about its capabilities such as "what types of resources does + this Service Connector give me access to", "what authentication methods does it + support" and "what credentials and other information do I need to configure for + it". This is analogous to the role Flavors play for Stack Components in that the + Service Connector Type acts as the template from which one or more Service Connectors + are created. - cd zenml_starter + For example, the built-in AWS Service Connector Type shipped with ZenML supports + a rich variety of authentication methods and provides access to AWS resources + such as S3 buckets, EKS clusters and ECR registries. - zenml init --template starter --template-with-defaults + The zenml service-connector list-types and zenml service-connector describe-type + CLI commands can be used to explore the Service Connector Types available with + your ZenML deployment. Extensive documentation is included covering supported + authentication methods and Resource Types. The following are just some examples: - # Just in case, we install the requirements again + zenml service-connector list-types - pip install -r requirements.txt + Example Command Output - The starter template is the same as the ZenML quickstart. You can clone it like - so:' -- source_sentence: How can I explicitly name my model version in ZenML? - sentences: - - 'strator supports specifying resources in what way.If you''re using an orchestrator - which does not support this feature or its underlying infrastructure does not - cover your requirements, you can also take a look at step operators which allow - you to execute individual steps of ../...your pipeline in environments independent - of your orchestrator. + ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━┯━━━━━━━┯━━━━━━━━┓ - Ensure your container is CUDA-enabled + ┃ NAME │ TYPE │ RESOURCE TYPES │ AUTH + METHODS │ LOCAL │ REMOTE ┃ - To run steps or pipelines on GPUs, it''s crucial to have the necessary CUDA tools - installed in the environment. This section will guide you on how to configure - your environment to utilize GPU capabilities effectively. + ┠──────────────────────────────┼───────────────┼───────────────────────┼───────────────────┼───────┼────────┨' + - 'ing resources: - Note that these configuration changes are required for the GPU hardware to be - properly utilized. If you don''t update the settings, your steps might run, but - they will not see any boost in performance from the custom hardware. + ┏━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┓┃ RESOURCE TYPE │ RESOURCE NAMES ┃ - All steps running on GPU-backed hardware will be executed within a containerized - environment, whether you''re using the local Docker orchestrator or a cloud instance - of Kubeflow. Therefore, you need to make two amendments to your Docker settings - for the relevant steps: + ┠───────────────┼────────────────┨ - 1. Specify a CUDA-enabled parent image in your DockerSettings + ┃ 📦 s3-bucket │ s3://zenfiles ┃ - For complete details, refer to the containerization page that explains how to - do this. As an example, if you want to use the latest CUDA-enabled official PyTorch - image for your entire pipeline run, you can include the following code: + ┗━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━┛ - from zenml import pipeline + The following might help understand the difference between scopes: - from zenml.config import DockerSettings + the difference between a multi-instance and a multi-type Service Connector is + that the Resource Type scope is locked to a particular value during configuration + for the multi-instance Service Connector - docker_settings = DockerSettings(parent_image="pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime") + similarly, the difference between a multi-instance and a multi-type Service Connector + is that the Resource Name (Resource ID) scope is locked to a particular value + during configuration for the single-instance Service Connector - @pipeline(settings={"docker": docker_settings}) + Service Connector Verification - def my_pipeline(...): + When registering Service Connectors, the authentication configuration and credentials + are automatically verified to ensure that they can indeed be used to gain access + to the target resources: - ... + for multi-type Service Connectors, this verification means checking that the configured + credentials can be used to authenticate successfully to the remote service, as + well as listing all resources that the credentials have permission to access for + each Resource Type supported by the Service Connector Type. - For TensorFlow, you might use the tensorflow/tensorflow:latest-gpu image, as detailed - in the official TensorFlow documentation or their DockerHub overview. + for multi-instance Service Connectors, this verification step means listing all + resources that the credentials have permission to access in addition to validating + that the credentials can be used to authenticate to the target service or platform. - 2. Add ZenML as an explicit pip requirement' - - 'Reuse Docker builds to speed up Docker build times + for single-instance Service Connectors, the verification step simply checks that + the configured credentials have permission to access the target resource. - Avoid building Docker images each time a pipeline runs + The verification can also be performed later on an already registered Service + Connector. Furthermore, for multi-type and multi-instance Service Connectors, + the verification operation can be scoped to a Resource Type and a Resource Name. - When using containerized components in your stack, ZenML needs to build Docker - images to remotely execute your code. Building Docker images without connecting - a git repository includes your step code in the built Docker image. This, however, - means that new Docker images will be built and pushed whenever you make changes - to any of your source files. + The following shows how a multi-type, a multi-instance and a single-instance Service + Connector can be verified with multiple scopes after registration.' +- source_sentence: How long did it take to generate 1800+ questions from documentation + chunks using the local model on a GPU-enabled machine? + sentences: + - 'ns, especially using the basic setup we have here.To give you an indication of + how long this process takes, generating 1800+ questions from an equivalent number + of documentation chunks took a little over 45 minutes using the local model on + a GPU-enabled machine with Ollama. - One way of skipping Docker builds each time is to pass in the ID of a build as - you run the pipeline: + You can view the generated dataset on the Hugging Face Hub here. This dataset + contains the original document chunks, the generated questions, and the URL reference + for the original document. - my_pipeline = my_pipeline.with_options(build=) + Once we have the generated questions, we can then pass them to the retrieval component + and check the results. For convenience we load the data from the Hugging Face + Hub and then pass it to the retrieval component for evaluation. We shuffle the + data and select a subset of it to speed up the evaluation process, but for a more + thorough evaluation you could use the entire dataset. (The best practice of keeping + a separate set of data for evaluation purposes is also recommended here, though + we''re not doing that in this example.) - or when running a pipeline from the CLI: + @step - zenml pipeline run --build= + def retrieval_evaluation_full( - Please note, that this means specifying a custom build when running a pipeline - will not run the code on your client machine but will use the code included in - the Docker images of the build. As a consequence, even if you make local code - changes, reusing a build will always execute the code bundled in the Docker image, - rather than the local code. Therefore, if you would like to reuse a Docker build - AND make sure your local code changes are also downloaded into the image, you - need to connect a git repository. + sample_size: int = 50, - PreviousWhich files are built into the image + ) -> Annotated[float, "full_failure_rate_retrieval"]: - NextUse code repositories to automate Docker build reuse + dataset = load_dataset("zenml/rag_qa_embedding_questions", split="train") - Last updated 19 days ago' - - 'Controlling Model versions + sampled_dataset = dataset.shuffle(seed=42).select(range(sample_size)) - Each model can have many versions. Model versions are a way for you to track different - iterations of your training process, complete with some extra dashboard and API - functionality to support the full ML lifecycle. + total_tests = len(sampled_dataset) - E.g. Based on your business rules during training, you can associate model version - with stages and promote them to production. You have an interface that allows - you to link these versions with non-technical artifacts and data, e.g. business - data, datasets, or even stages in your process and workflow. + failures = 0 - Model versions are created implicitly as you are running your machine learning - training, so you don''t have to immediately think about this. If you want more - control over versions, our API has you covered, with an option to explicitly name - your versions. + for item in sampled_dataset: - Explicitly name your model version + generated_questions = item["generated_questions"] - If you want to explicitly name your model version, you can do so by passing in - the version argument to the Model object. If you don''t do this, ZenML will automatically - generate a version number for you. + question = generated_questions[ - from zenml import Model, step, pipeline + ] # Assuming only one question per item - model= Model( + url_ending = item["filename"].split("/")[ - name="my_model", + 1 - version="1.0.5" + ] # Extract the URL ending from the filename - # The step configuration will take precedence over the pipeline + _, _, urls = query_similar_docs(question, url_ending) - @step(model=model) + if all(url_ending not in url for url in urls): - def svc_trainer(...) -> ...: + logging.error( - ... + f"Failed for question: {question}. Expected URL ending: {url_ending}. Got: {urls}" - # This configures it for all steps within the pipeline + failures += 1 - @pipeline(model=model) + logging.info(f"Total tests: {total_tests}. Failures: {failures}") - def training_pipeline( ... ): + failure_rate = (failures / total_tests) * 100 - # training happens here + return round(failure_rate, 2)' + - '😸Set up a project repository - Here we are specifically setting the model configuration for a particular step - or for the pipeline as a whole. + Setting your team up for success with a project repository. - Please note in the above example if the model version exists, it is automatically - associated with the pipeline and becomes active in the pipeline context. Therefore, - a user should be careful and intentional as to whether you want to create a new - pipeline, or fetch an existing one. See below for an example of fetching a model - from an existing version/stage. + ZenML code typically lives in a git repository. Setting this repository up correctly + can make a huge impact on collaboration and getting the maximum out of your ZenML + deployment. This section walks users through some of the options available to + create a project repository with ZenML. - Fetching model versions by stage' -- source_sentence: What are the different roles available for users in an organization - within ZenML Pro? - sentences: - - 'User Management + PreviousFinetuning LLMs with ZenML - In ZenML Pro, there is a slightly different entity hierarchy as compared to the - open-source ZenML framework. This document walks you through the key differences - and new concepts that are pro-only. + NextConnect your git repository - Organizations, Tenants, and Roles + Last updated 15 days ago' + - 'GCP Service Connector - ZenML Pro arranges various aspects of your work experience around the concept - of an Organization. This is the top-most level structure within the ZenML Cloud - environment. Generally, an organization contains a group of users and one or more - tenants. Tenants are individual, isolated deployments of the ZenML server. + Configuring GCP Service Connectors to connect ZenML to GCP resources such as GCS + buckets, GKE Kubernetes clusters, and GCR container registries. - Every user in an organization has a distinct role. Each role configures what they - can view, modify, and their level of involvement in collaborative tasks. A role - thus helps determine the level of access that a user has within an organization. + The ZenML GCP Service Connector facilitates the authentication and access to managed + GCP services and resources. These encompass a range of resources, including GCS + buckets, GCR container repositories, and GKE clusters. The connector provides + support for various authentication methods, including GCP user accounts, service + accounts, short-lived OAuth 2.0 tokens, and implicit authentication. - The admin has all permissions on an organization. They are allowed to add members, - adjust the billing information and assign roles. The editor can still fully manage - tenants and members but is not allowed to access the subscription information - or delete the organization. The viewer Role allows you to allow users to access - the tenants within the organization with only view permissions. + To ensure heightened security measures, this connector always issues short-lived + OAuth 2.0 tokens to clients instead of long-lived credentials unless explicitly + configured to do otherwise. Furthermore, it includes automatic configuration and + detection of credentials locally configured through the GCP CLI. - Inviting Team Members + This connector serves as a general means of accessing any GCP service by issuing + OAuth 2.0 credential objects to clients. Additionally, the connector can handle + specialized authentication for GCS, Docker, and Kubernetes Python clients. It + also allows for the configuration of local Docker and Kubernetes CLIs. - Inviting users to your organization to work on the organization''s tenants is - easy. Simply click Add Member in the Organization settings, and give them an initial - Role. The User will be sent an invitation email. If a user is part of an organization, - they can utilize their login on all tenants they have authority to access. + $ zenml service-connector list-types --type gcp - PreviousZenML SaaS + ┏━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━┯━━━━━━━┯━━━━━━━━┓ - NextStarter guide + ┃ NAME │ TYPE │ RESOURCE TYPES │ AUTH METHODS │ + LOCAL │ REMOTE ┃ - Last updated 12 days ago' - - ' more information. + ┠───────────────────────┼────────┼───────────────────────┼──────────────────┼───────┼────────┨ - Get the last run of a pipelineTo access the most recent run of a pipeline, you - can either use the last_run property or access it through the runs list: + ┃ GCP Service Connector │ 🔵 gcp │ 🔵 gcp-generic │ implicit │ ✅ │ + ✅ ┃ - last_run = pipeline_model.last_run # OR: pipeline_model.runs[0] + ┃ │ │ 📦 gcs-bucket │ user-account │ │ ┃ - If your most recent runs have failed, and you want to find the last run that has - succeeded, you can use the last_successful_run property instead. + ┃ │ │ 🌀 kubernetes-cluster │ service-account │ │ ┃ - Get the latest run from a pipeline + ┃ │ │ 🐳 docker-registry │ external-account │ │ ┃ - Calling a pipeline executes it and then returns the response of the freshly executed - run. + ┃ │ │ │ oauth2-token │ │ ┃' +- source_sentence: How can I load and render reports in a Jupyter notebook using ZenML? + sentences: + - '❗Alerters - run = training_pipeline() + Sending automated alerts to chat services. - The run that you get back is the model stored in the ZenML database at the point - of the method call. This means the pipeline run is still initializing and no steps - have been run. To get the latest state can get a refreshed version from the client: + Alerters allow you to send messages to chat services (like Slack, Discord, Mattermost, + etc.) from within your pipelines. This is useful to immediately get notified when + failures happen, for general monitoring/reporting, and also for building human-in-the-loop + ML. - from zenml.client import Client + Alerter Flavors - Client().get_pipeline_run(run.id) to get a refreshed version + Currently, the SlackAlerter and DiscordAlerter are the available alerter integrations. + However, it is straightforward to extend ZenML and build an alerter for other + chat services. - Get a run via the client + Alerter Flavor Integration Notes Slack slack slack Interacts with a Slack channel + Discord discord discord Interacts with a Discord channel Custom Implementation + custom Extend the alerter abstraction and provide your own implementation - If you already know the exact run that you want to fetch (e.g., from looking at - the dashboard), you can use the Client.get_pipeline_run() method to fetch the - run directly without having to query the pipeline first: + If you would like to see the available flavors of alerters in your terminal, you + can use the following command: - from zenml.client import Client + zenml alerter flavor list - pipeline_run = Client().get_pipeline_run("first_pipeline-2023_06_20-16_20_13_274466") + How to use Alerters with ZenML - Similar to pipelines, you can query runs by either ID, name, or name prefix, and - you can also discover runs through the Client or CLI via the Client.list_pipeline_runs() - or zenml pipeline runs list commands. + Each alerter integration comes with specific standard steps that you can use out + of the box. - Run information + However, you first need to register an alerter component in your terminal: - Each run has a collection of useful information which can help you reproduce your - runs. In the following, you can find a list of some of the most useful pipeline - run information, but there is much more available. See the PipelineRunResponse - definition for a comprehensive list. + zenml alerter register ... - Status + Then you can add it to your stack using - The status of a pipeline run. There are five possible states: initialized, failed, - completed, running, and cached. + zenml stack register ... -al - status = run.status + Afterward, you can import the alerter standard steps provided by the respective + integration and directly use them in your pipelines. - Configuration' - - 'ner(gamma=gamma, X_train=X_train, y_train=y_train)if __name__ == "__main__": + PreviousDevelop a Custom Step Operator - first_pipeline() + NextDiscord Alerter - python run.py + Last updated 15 days ago' + - 'ry_similar_docs( - ... + question: str, - Registered pipeline first_pipeline (version 2). + url_ending: str,use_reranking: bool = False, - ... + returned_sample_size: int = 5, - This will now create a single run for version 2 of the pipeline called first_pipeline. + ) -> Tuple[str, str, List[str]]: - PreviousHyperparameter tuning + """Query similar documents for a given question and URL ending.""" - NextAccess secrets in a step + embedded_question = get_embeddings(question) - Last updated 15 days ago' -- source_sentence: How can I check which GCP Service Connectors can access a GCS bucket - in my ZenML deployment? - sentences: - - 'dashboard. + db_conn = get_db_conn() - Warning! Usage in remote orchestratorsThe current ZenML version has a limitation - in its base Docker image that requires a workaround for all pipelines using Deepchecks - with a remote orchestrator (e.g. Kubeflow , Vertex). The limitation being that - the base Docker image needs to be extended to include binaries that are required - by opencv2, which is a package that Deepchecks requires. + num_docs = 20 if use_reranking else returned_sample_size - While these binaries might be available on most operating systems out of the box - (and therefore not a problem with the default local orchestrator), we need to - tell ZenML to add them to the containerization step when running in remote settings. - Here is how: + # get (content, url) tuples for the top n similar documents - First, create a file called deepchecks-zenml.Dockerfile and place it on the same - level as your runner script (commonly called run.py). The contents of the Dockerfile - are as follows: + top_similar_docs = get_topn_similar_docs( - ARG ZENML_VERSION=0.20.0 + embedded_question, db_conn, n=num_docs, include_metadata=True - FROM zenmldocker/zenml:${ZENML_VERSION} AS base + if use_reranking: - RUN apt-get update + reranked_docs_and_urls = rerank_documents(question, top_similar_docs)[ - RUN apt-get install ffmpeg libsm6 libxext6 -y + :returned_sample_size - Then, place the following snippet above your pipeline definition. Note that the - path of the dockerfile are relative to where the pipeline definition file is. - Read the containerization guide for more details: + urls = [doc[1] for doc in reranked_docs_and_urls] - import zenml + else: - from zenml import pipeline + urls = [doc[1] for doc in top_similar_docs] # Unpacking URLs - from zenml.config import DockerSettings + return (question, url_ending, urls) - from pathlib import Path + We get the embeddings for the question being passed into the function and connect + to our PostgreSQL database. If we''re using reranking, we get the top 20 documents + similar to our query and rerank them using the rerank_documents helper function. + We then extract the URLs from the reranked documents and return them. Note that + we only return 5 URLs, but in the case of reranking we get a larger number of + documents and URLs back from the database to pass to our reranker, but in the + end we always choose the top five reranked documents to return. - import sys + Now that we''ve added reranking to our pipeline, we can evaluate the performance + of our reranker and see how it affects the quality of the retrieved documents. - docker_settings = DockerSettings( + Code Example - dockerfile="deepchecks-zenml.Dockerfile", + To explore the full code, visit the Complete Guide repository and for this section, + particularly the eval_retrieval.py file. - build_options={ + PreviousUnderstanding reranking - "buildargs": { + NextEvaluating reranking performance - "ZENML_VERSION": f"{zenml.__version__}" + Last updated 1 month ago' + - 'n the respective artifact in the pipeline run DAG.Alternatively, if you are running + inside a Jupyter notebook, you can load and render the reports using the artifact.visualize() + method, e.g.: - }, + from zenml.client import Client - }, + def visualize_results(pipeline_name: str, step_name: str) -> None: - @pipeline(settings={"docker": docker_settings}) + pipeline = Client().get_pipeline(pipeline=pipeline_name) - def my_pipeline(...): + evidently_step = pipeline.last_run.steps[step_name] - # same code as always + evidently_step.visualize() - ... + if __name__ == "__main__": - From here on, you can continue to use the deepchecks integration as is explained - below. + visualize_results("text_data_report_pipeline", "text_report") - The Deepchecks standard steps + visualize_results("text_data_test_pipeline", "text_test") - ZenML wraps the Deepchecks functionality for tabular data in the form of four - standard steps: + PreviousDeepchecks - DeepchecksDataIntegrityCheckStep: use it in your pipelines to run data integrity - tests on a single dataset + NextWhylogs - DeepchecksDataDriftCheckStep: use it in your pipelines to run data drift tests - on two datasets as input: target and reference.' - - ' gs://zenml-core_cloudbuild ┃┃ │ gs://zenml-datasets ┃ + Last updated 19 days ago' +- source_sentence: How do you deploy the Comet Experiment Tracker flavor provided + by ZenML integration? + sentences: + - 'Comet - ┃ │ gs://zenml-internal-artifact-store ┃ + Logging and visualizing experiments with Comet. - ┃ │ gs://zenml-kubeflow-artifact-store ┃ + The Comet Experiment Tracker is an Experiment Tracker flavor provided with the + Comet ZenML integration that uses the Comet experiment tracking platform to log + and visualize information from your pipeline steps (e.g., models, parameters, + metrics). - ┃ │ gs://zenml-project-time-series-bucket ┃ + When would you want to use it? - ┠───────────────────────┼─────────────────────────────────────────────────┨ + Comet is a popular platform that you would normally use in the iterative ML experimentation + phase to track and visualize experiment results. That doesn''t mean that it cannot + be repurposed to track and visualize the results produced by your automated pipeline + runs, as you make the transition towards a more production-oriented workflow. - ┃ 🌀 kubernetes-cluster │ zenml-test-cluster ┃ + You should use the Comet Experiment Tracker: - ┠───────────────────────┼─────────────────────────────────────────────────┨ + if you have already been using Comet to track experiment results for your project + and would like to continue doing so as you are incorporating MLOps workflows and + best practices in your project through ZenML. - ┃ 🐳 docker-registry │ gcr.io/zenml-core ┃ + if you are looking for a more visually interactive way of navigating the results + produced from your ZenML pipeline runs (e.g., models, metrics, datasets) - ┗━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ + if you would like to connect ZenML to Comet to share the artifacts and metrics + logged by your pipelines with your team, organization, or external stakeholders - Long-lived credentials (API keys, account keys) + You should consider one of the other Experiment Tracker flavors if you have never + worked with Comet before and would rather use another experiment tracking tool + that you are more familiar with. - This is the magic formula of authentication methods. When paired with another - ability, such as automatically generating short-lived API tokens, or impersonating - accounts or assuming roles, this is the ideal authentication mechanism to use, - particularly when using ZenML in production and when sharing results with other - members of your ZenML team. + How do you deploy it? - As a general best practice, but implemented particularly well for cloud platforms, - account passwords are never directly used as a credential when authenticating - to the cloud platform APIs. There is always a process in place that exchanges - the account/password credential for another type of long-lived credential: + The Comet Experiment Tracker flavor is provided by the Comet ZenML integration. + You need to install it on your local machine to be able to register a Comet Experiment + Tracker and add it to your stack: - AWS uses the aws configure CLI command + zenml integration install comet -y - GCP offers the gcloud auth application-default login CLI commands + The Comet Experiment Tracker needs to be configured with the credentials required + to connect to the Comet platform using one of the available authentication methods. - Azure provides the az login CLI command + Authentication Methods - None of your original login information is stored on your local machine or used - to access workloads. Instead, an API key, account key or some other form of intermediate - credential is generated and stored on the local host and used to authenticate - to remote cloud service APIs.' - - ' should pick the one that best fits your use case.If you already have one or - more GCP Service Connectors configured in your ZenML deployment, you can check - which of them can be used to access the GCS bucket you want to use for your GCS - Artifact Store by running e.g.: + You need to configure the following credentials for authentication to the Comet + platform:' + - 'guration set up by the GCP CLI on your local host.The following is an example + of lifting GCP user credentials granting access to the same set of GCP resources + and services that the local GCP CLI is allowed to access. The GCP CLI should already + be configured with valid credentials (i.e. by running gcloud auth application-default + login). In this case, the GCP user account authentication method is automatically + detected: - zenml service-connector list-resources --resource-type gcs-bucket + zenml service-connector register gcp-auto --type gcp --auto-configure Example Command Output - ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ - - - ┃ CONNECTOR ID │ CONNECTOR NAME │ CONNECTOR TYPE - │ RESOURCE TYPE │ RESOURCE NAMES ┃ - - - ┠──────────────────────────────────────┼─────────────────────┼────────────────┼───────────────┼─────────────────────────────────────────────────┨ - - - ┃ 7f0c69ba-9424-40ae-8ea6-04f35c2eba9d │ gcp-user-account │ 🔵 gcp │ - 📦 gcs-bucket │ gs://zenml-bucket-sl ┃ - - - ┃ │ │ │ │ - gs://zenml-core.appspot.com ┃ - + Successfully registered service connector `gcp-auto` with access to the following + resources: - ┃ │ │ │ │ - gs://zenml-core_cloudbuild ┃ + ┏━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ - ┃ │ │ │ │ - gs://zenml-datasets ┃ + ┃ RESOURCE TYPE │ RESOURCE NAMES ┃ - ┃ │ │ │ │ - gs://zenml-internal-artifact-store ┃ - - ┃ │ │ │ │ - gs://zenml-kubeflow-artifact-store ┃ + ┠───────────────────────┼─────────────────────────────────────────────────┨ - ┠──────────────────────────────────────┼─────────────────────┼────────────────┼───────────────┼─────────────────────────────────────────────────┨ + ┃ 🔵 gcp-generic │ zenml-core ┃ - ┃ 2a0bec1b-9787-4bd7-8d4a-9a47b6f61643 │ gcs-zenml-bucket-sl │ 🔵 gcp │ - 📦 gcs-bucket │ gs://zenml-bucket-sl ┃' -- source_sentence: Is it possible to update the local AWS CLI configuration with credentials - extracted from the AWS Service Connector? - sentences: - - '36a885: Pull complete + ┠───────────────────────┼─────────────────────────────────────────────────┨ - c9c0554c8e6a: Pull completebacdcd847a66: Pull complete + ┃ 📦 gcs-bucket │ gs://zenml-bucket-sl ┃ - 482033770844: Pull complete + ┃ │ gs://zenml-core.appspot.com ┃ - Digest: sha256:bf2cc3895e70dfa1ee1cd90bbfa599fa4cd8df837e27184bac1ce1cc239ecd3f + ┃ │ gs://zenml-core_cloudbuild ┃ - Status: Downloaded newer image for 715803424590.dkr.ecr.us-east-1.amazonaws.com/zenml-server:latest + ┃ │ gs://zenml-datasets ┃ - 715803424590.dkr.ecr.us-east-1.amazonaws.com/zenml-server:latest + ┃ │ gs://zenml-internal-artifact-store ┃ - It is also possible to update the local AWS CLI configuration with credentials - extracted from the AWS Service Connector: + ┃ │ gs://zenml-kubeflow-artifact-store ┃ - zenml service-connector login aws-session-token --resource-type aws-generic + ┃ │ gs://zenml-project-time-series-bucket ┃ - Example Command Output + ┠───────────────────────┼─────────────────────────────────────────────────┨ - Configured local AWS SDK profile ''zenml-c0f8e857''. + ┃ 🌀 kubernetes-cluster │ zenml-test-cluster ┃ - The ''aws-session-token'' AWS Service Connector connector was used to successfully - configure the local Generic AWS resource client/SDK. + ┠───────────────────────┼─────────────────────────────────────────────────┨ - A new profile is created in the local AWS CLI configuration holding the credentials. - It can be used to access AWS resources and services, e.g.: + ┃ 🐳 docker-registry │ gcr.io/zenml-core ┃ - aws --profile zenml-c0f8e857 s3 ls + ┗━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ - Stack Components use + zenml service-connector describe gcp-auto - The S3 Artifact Store Stack Component can be connected to a remote AWS S3 bucket - through an AWS Service Connector. + Example Command Output' + - 'er Image Builder stack component, or the Vertex AIOrchestrator and Step Operator. + It should be accompanied by a matching set of - The AWS Service Connector can also be used with any Orchestrator or Model Deployer - stack component flavor that relies on Kubernetes clusters to manage workloads. - This allows EKS Kubernetes container workloads to be managed without the need - to configure and maintain explicit AWS or Kubernetes kubectl configuration contexts - and credentials in the target environment and in the Stack Component. + GCP permissions that allow access to the set of remote resources required by the - Similarly, Container Registry Stack Components can be connected to an ECR Container - Registry through an AWS Service Connector. This allows container images to be - built and published to ECR container registries without the need to configure - explicit AWS credentials in the target environment or the Stack Component. + client and Stack Component. - End-to-end examples' - - ' ┃┠──────────────────┼─────────────────────────────────────────────────────────────────────┨ + The resource name represents the GCP project that the connector is authorized + to - ┃ RESOURCE TYPES │ 🌀 kubernetes-cluster ┃ + access. - ┠──────────────────┼─────────────────────────────────────────────────────────────────────┨ + 📦 GCP GCS bucket (resource type: gcs-bucket) - ┃ RESOURCE NAME │ arn:aws:eks:us-east-1:715803424590:cluster/zenhacks-cluster ┃ + Authentication methods: implicit, user-account, service-account, oauth2-token, - ┠──────────────────┼─────────────────────────────────────────────────────────────────────┨ + impersonation - ┃ SECRET ID │ ┃ + Supports resource instances: True - ┠──────────────────┼─────────────────────────────────────────────────────────────────────┨ + Authentication methods: - ┃ SESSION DURATION │ N/A ┃ + 🔒 implicit - ┠──────────────────┼─────────────────────────────────────────────────────────────────────┨ + 🔒 user-account - ┃ EXPIRES IN │ 11h59m57s ┃ + 🔒 service-account - ┠──────────────────┼─────────────────────────────────────────────────────────────────────┨ + 🔒 oauth2-token - ┃ OWNER │ default ┃ + 🔒 impersonation - ┠──────────────────┼─────────────────────────────────────────────────────────────────────┨ + Allows Stack Components to connect to GCS buckets. When used by Stack - ┃ WORKSPACE │ default ┃ + Components, they are provided a pre-configured GCS Python client instance. - ┠──────────────────┼─────────────────────────────────────────────────────────────────────┨ + The configured credentials must have at least the following GCP permissions - ┃ SHARED │ ➖ ┃ + associated with the GCS buckets that it can access: - ┠──────────────────┼─────────────────────────────────────────────────────────────────────┨ + storage.buckets.list - ┃ CREATED_AT │ 2023-06-16 10:17:46.931091 ┃ + storage.buckets.get - ┠──────────────────┼─────────────────────────────────────────────────────────────────────┨ + storage.objects.create - ┃ UPDATED_AT │ 2023-06-16 10:17:46.931094 ┃ + storage.objects.delete - ┗━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ + storage.objects.get - Configuration' - - 'e --authentication_secret. For example, you''d run:zenml secret create argilla_secrets - --api_key="" + storage.objects.list - (Visit the Argilla documentation and interface to obtain your API key.) + storage.objects.update - Then register your annotator with ZenML: + For example, the GCP Storage Admin role includes all of the required - zenml annotator register argilla --flavor argilla --authentication_secret=argilla_secrets + permissions, but it also includes additional permissions that are not required - When using a deployed instance of Argilla, the instance URL must be specified - without any trailing / at the end. If you are using a Hugging Face Spaces instance - and its visibility is set to private, you must also set the extra_headers parameter - which would include a Hugging Face token. For example: + by the connector. - zenml annotator register argilla --flavor argilla --authentication_secret=argilla_secrets - --instance_url="https://[your-owner-name]-[your_space_name].hf.space" --extra_headers="{"Authorization": - f"Bearer {}"}" + If set, the resource name must identify a GCS bucket using one of the following - Finally, add all these components to a stack and set it as your active stack. - For example: + formats: - zenml stack copy default annotation + GCS bucket URI: gs://{bucket-name} - # this must be done separately so that the other required stack components are - first registered + GCS bucket name: {bucket-name} - zenml stack update annotation -an + [...] - zenml stack set annotation + ──────────────────────────────────────────────────────────────────────────────── - # optionally also + Please select a resource type or leave it empty to create a connector that can + be used to access any of the supported resource types (gcp-generic, gcs-bucket, + kubernetes-cluster, docker-registry). []: gcs-bucket - zenml stack describe + Would you like to attempt auto-configuration to extract the authentication configuration + from your local environment ? [y/N]: y - Now if you run a simple CLI command like zenml annotator dataset list this should - work without any errors. You''re ready to use your annotator in your ML workflow! + Service connector auto-configured successfully with the following configuration: - How do you use it? + Service connector ''gcp-interactive'' of type ''gcp'' is ''private''. - ZenML supports access to your data and annotations via the zenml annotator ... - CLI command. We have also implemented an interface to some of the common Argilla - functionality via the ZenML SDK. + ''gcp-interactive'' gcp Service - You can access information about the datasets you''re using with the zenml annotator - dataset list. To work on annotation for a particular dataset, you can run zenml - annotator dataset annotate . What follows is an overview of some - key components to the Argilla integration and how it can be used. + Connector Details - Argilla Annotator Stack Component' + ┏━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━┓' model-index: - name: zenml/finetuned-snowflake-arctic-embed-m results: @@ -975,49 +944,49 @@ model-index: type: dim_384 metrics: - type: cosine_accuracy@1 - value: 0.29518072289156627 + value: 0.28313253012048195 name: Cosine Accuracy@1 - type: cosine_accuracy@3 - value: 0.6204819277108434 + value: 0.572289156626506 name: Cosine Accuracy@3 - type: cosine_accuracy@5 - value: 0.6927710843373494 + value: 0.6807228915662651 name: Cosine Accuracy@5 - type: cosine_accuracy@10 - value: 0.7891566265060241 + value: 0.8012048192771084 name: Cosine Accuracy@10 - type: cosine_precision@1 - value: 0.29518072289156627 + value: 0.28313253012048195 name: Cosine Precision@1 - type: cosine_precision@3 - value: 0.20682730923694775 + value: 0.19076305220883527 name: Cosine Precision@3 - type: cosine_precision@5 - value: 0.13855421686746985 + value: 0.13614457831325297 name: Cosine Precision@5 - type: cosine_precision@10 - value: 0.0789156626506024 + value: 0.08012048192771083 name: Cosine Precision@10 - type: cosine_recall@1 - value: 0.29518072289156627 + value: 0.28313253012048195 name: Cosine Recall@1 - type: cosine_recall@3 - value: 0.6204819277108434 + value: 0.572289156626506 name: Cosine Recall@3 - type: cosine_recall@5 - value: 0.6927710843373494 + value: 0.6807228915662651 name: Cosine Recall@5 - type: cosine_recall@10 - value: 0.7891566265060241 + value: 0.8012048192771084 name: Cosine Recall@10 - type: cosine_ndcg@10 - value: 0.5524302146116403 + value: 0.5407472416922913 name: Cosine Ndcg@10 - type: cosine_mrr@10 - value: 0.4758486326257412 + value: 0.45774765729585015 name: Cosine Mrr@10 - type: cosine_map@100 - value: 0.4836255621339311 + value: 0.46523155503040436 name: Cosine Map@100 - task: type: information-retrieval @@ -1027,49 +996,49 @@ model-index: type: dim_256 metrics: - type: cosine_accuracy@1 - value: 0.28313253012048195 + value: 0.29518072289156627 name: Cosine Accuracy@1 - type: cosine_accuracy@3 - value: 0.5963855421686747 + value: 0.6024096385542169 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.6807228915662651 name: Cosine Accuracy@5 - type: cosine_accuracy@10 - value: 0.7771084337349398 + value: 0.7951807228915663 name: Cosine Accuracy@10 - type: cosine_precision@1 - value: 0.28313253012048195 + value: 0.29518072289156627 name: Cosine Precision@1 - type: cosine_precision@3 - value: 0.19879518072289157 + value: 0.2008032128514056 name: Cosine Precision@3 - type: cosine_precision@5 - value: 0.136144578313253 + value: 0.13614457831325297 name: Cosine Precision@5 - type: cosine_precision@10 - value: 0.07771084337349396 + value: 0.0795180722891566 name: Cosine Precision@10 - type: cosine_recall@1 - value: 0.28313253012048195 + value: 0.29518072289156627 name: Cosine Recall@1 - type: cosine_recall@3 - value: 0.5963855421686747 + value: 0.6024096385542169 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.6807228915662651 name: Cosine Recall@5 - type: cosine_recall@10 - value: 0.7771084337349398 + value: 0.7951807228915663 name: Cosine Recall@10 - type: cosine_ndcg@10 - value: 0.5376005319054157 + value: 0.5458001676537428 name: Cosine Ndcg@10 - type: cosine_mrr@10 - value: 0.46014534327787354 + value: 0.46605230445591894 name: Cosine Mrr@10 - type: cosine_map@100 - value: 0.4690725321460052 + value: 0.4728738562350596 name: Cosine Map@100 - task: type: information-retrieval @@ -1079,49 +1048,49 @@ model-index: type: dim_128 metrics: - type: cosine_accuracy@1 - value: 0.2710843373493976 + value: 0.2469879518072289 name: Cosine Accuracy@1 - type: cosine_accuracy@3 - value: 0.5301204819277109 + value: 0.5843373493975904 name: Cosine Accuracy@3 - type: cosine_accuracy@5 - value: 0.5963855421686747 + value: 0.6265060240963856 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.7409638554216867 name: Cosine Accuracy@10 - type: cosine_precision@1 - value: 0.2710843373493976 + value: 0.2469879518072289 name: Cosine Precision@1 - type: cosine_precision@3 - value: 0.17670682730923692 + value: 0.19477911646586343 name: Cosine Precision@3 - type: cosine_precision@5 - value: 0.11927710843373492 + value: 0.12530120481927706 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.07409638554216866 name: Cosine Precision@10 - type: cosine_recall@1 - value: 0.2710843373493976 + value: 0.2469879518072289 name: Cosine Recall@1 - type: cosine_recall@3 - value: 0.5301204819277109 + value: 0.5843373493975904 name: Cosine Recall@3 - type: cosine_recall@5 - value: 0.5963855421686747 + value: 0.6265060240963856 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.7409638554216867 name: Cosine Recall@10 - type: cosine_ndcg@10 - value: 0.49831034220322973 + value: 0.4994853551416632 name: Cosine Ndcg@10 - type: cosine_mrr@10 - value: 0.4218158347676423 + value: 0.421791929623255 name: Cosine Mrr@10 - type: cosine_map@100 - value: 0.43128822737879013 + value: 0.4323899020969096 name: Cosine Map@100 - task: type: information-retrieval @@ -1131,49 +1100,49 @@ model-index: type: dim_64 metrics: - type: cosine_accuracy@1 - value: 0.26506024096385544 + value: 0.23493975903614459 name: Cosine Accuracy@1 - type: cosine_accuracy@3 - value: 0.4819277108433735 + value: 0.5 name: Cosine Accuracy@3 - type: cosine_accuracy@5 - value: 0.5662650602409639 + value: 0.5783132530120482 name: Cosine Accuracy@5 - type: cosine_accuracy@10 - value: 0.6566265060240963 + value: 0.6927710843373494 name: Cosine Accuracy@10 - type: cosine_precision@1 - value: 0.26506024096385544 + value: 0.23493975903614459 name: Cosine Precision@1 - type: cosine_precision@3 - value: 0.1606425702811245 + value: 0.16666666666666666 name: Cosine Precision@3 - type: cosine_precision@5 - value: 0.11325301204819276 + value: 0.11566265060240961 name: Cosine Precision@5 - type: cosine_precision@10 - value: 0.06566265060240963 + value: 0.06927710843373491 name: Cosine Precision@10 - type: cosine_recall@1 - value: 0.26506024096385544 + value: 0.23493975903614459 name: Cosine Recall@1 - type: cosine_recall@3 - value: 0.4819277108433735 + value: 0.5 name: Cosine Recall@3 - type: cosine_recall@5 - value: 0.5662650602409639 + value: 0.5783132530120482 name: Cosine Recall@5 - type: cosine_recall@10 - value: 0.6566265060240963 + value: 0.6927710843373494 name: Cosine Recall@10 - type: cosine_ndcg@10 - value: 0.45413104746517285 + value: 0.4607453075643617 name: Cosine Ndcg@10 - type: cosine_mrr@10 - value: 0.38985465672212666 + value: 0.38742589405240024 name: Cosine Mrr@10 - type: cosine_map@100 - value: 0.4019553541721889 + value: 0.3969546791348258 name: Cosine Map@100 --- @@ -1227,9 +1196,9 @@ from sentence_transformers import SentenceTransformer model = SentenceTransformer("zenml/finetuned-snowflake-arctic-embed-m") # Run inference sentences = [ - 'Is it possible to update the local AWS CLI configuration with credentials extracted from the AWS Service Connector?', - "36a885: Pull complete\n\nc9c0554c8e6a: Pull completebacdcd847a66: Pull complete\n\n482033770844: Pull complete\n\nDigest: sha256:bf2cc3895e70dfa1ee1cd90bbfa599fa4cd8df837e27184bac1ce1cc239ecd3f\n\nStatus: Downloaded newer image for 715803424590.dkr.ecr.us-east-1.amazonaws.com/zenml-server:latest\n\n715803424590.dkr.ecr.us-east-1.amazonaws.com/zenml-server:latest\n\nIt is also possible to update the local AWS CLI configuration with credentials extracted from the AWS Service Connector:\n\nzenml service-connector login aws-session-token --resource-type aws-generic\n\nExample Command Output\n\nConfigured local AWS SDK profile 'zenml-c0f8e857'.\n\nThe 'aws-session-token' AWS Service Connector connector was used to successfully configure the local Generic AWS resource client/SDK.\n\nA new profile is created in the local AWS CLI configuration holding the credentials. It can be used to access AWS resources and services, e.g.:\n\naws --profile zenml-c0f8e857 s3 ls\n\nStack Components use\n\nThe S3 Artifact Store Stack Component can be connected to a remote AWS S3 bucket through an AWS Service Connector.\n\nThe AWS Service Connector can also be used with any Orchestrator or Model Deployer stack component flavor that relies on Kubernetes clusters to manage workloads. This allows EKS Kubernetes container workloads to be managed without the need to configure and maintain explicit AWS or Kubernetes kubectl configuration contexts and credentials in the target environment and in the Stack Component.\n\nSimilarly, Container Registry Stack Components can be connected to an ECR Container Registry through an AWS Service Connector. This allows container images to be built and published to ECR container registries without the need to configure explicit AWS credentials in the target environment or the Stack Component.\n\nEnd-to-end examples", - 'e --authentication_secret. For example, you\'d run:zenml secret create argilla_secrets --api_key=""\n\n(Visit the Argilla documentation and interface to obtain your API key.)\n\nThen register your annotator with ZenML:\n\nzenml annotator register argilla --flavor argilla --authentication_secret=argilla_secrets\n\nWhen using a deployed instance of Argilla, the instance URL must be specified without any trailing / at the end. If you are using a Hugging Face Spaces instance and its visibility is set to private, you must also set the extra_headers parameter which would include a Hugging Face token. For example:\n\nzenml annotator register argilla --flavor argilla --authentication_secret=argilla_secrets --instance_url="https://[your-owner-name]-[your_space_name].hf.space" --extra_headers="{"Authorization": f"Bearer {}"}"\n\nFinally, add all these components to a stack and set it as your active stack. For example:\n\nzenml stack copy default annotation\n\n# this must be done separately so that the other required stack components are first registered\n\nzenml stack update annotation -an \n\nzenml stack set annotation\n\n# optionally also\n\nzenml stack describe\n\nNow if you run a simple CLI command like zenml annotator dataset list this should work without any errors. You\'re ready to use your annotator in your ML workflow!\n\nHow do you use it?\n\nZenML supports access to your data and annotations via the zenml annotator ... CLI command. We have also implemented an interface to some of the common Argilla functionality via the ZenML SDK.\n\nYou can access information about the datasets you\'re using with the zenml annotator dataset list. To work on annotation for a particular dataset, you can run zenml annotator dataset annotate . What follows is an overview of some key components to the Argilla integration and how it can be used.\n\nArgilla Annotator Stack Component', + 'How do you deploy the Comet Experiment Tracker flavor provided by ZenML integration?', + "Comet\n\nLogging and visualizing experiments with Comet.\n\nThe Comet Experiment Tracker is an Experiment Tracker flavor provided with the Comet ZenML integration that uses the Comet experiment tracking platform to log and visualize information from your pipeline steps (e.g., models, parameters, metrics).\n\nWhen would you want to use it?\n\nComet is a popular platform that you would normally use in the iterative ML experimentation phase to track and visualize experiment results. That doesn't mean that it cannot be repurposed to track and visualize the results produced by your automated pipeline runs, as you make the transition towards a more production-oriented workflow.\n\nYou should use the Comet Experiment Tracker:\n\nif you have already been using Comet to track experiment results for your project and would like to continue doing so as you are incorporating MLOps workflows and best practices in your project through ZenML.\n\nif you are looking for a more visually interactive way of navigating the results produced from your ZenML pipeline runs (e.g., models, metrics, datasets)\n\nif you would like to connect ZenML to Comet to share the artifacts and metrics logged by your pipelines with your team, organization, or external stakeholders\n\nYou should consider one of the other Experiment Tracker flavors if you have never worked with Comet before and would rather use another experiment tracking tool that you are more familiar with.\n\nHow do you deploy it?\n\nThe Comet Experiment Tracker flavor is provided by the Comet ZenML integration. You need to install it on your local machine to be able to register a Comet Experiment Tracker and add it to your stack:\n\nzenml integration install comet -y\n\nThe Comet Experiment Tracker needs to be configured with the credentials required to connect to the Comet platform using one of the available authentication methods.\n\nAuthentication Methods\n\nYou need to configure the following credentials for authentication to the Comet platform:", + "er Image Builder stack component, or the Vertex AIOrchestrator and Step Operator. It should be accompanied by a matching set of\n\nGCP permissions that allow access to the set of remote resources required by the\n\nclient and Stack Component.\n\nThe resource name represents the GCP project that the connector is authorized to\n\naccess.\n\n📦 GCP GCS bucket (resource type: gcs-bucket)\n\nAuthentication methods: implicit, user-account, service-account, oauth2-token,\n\nimpersonation\n\nSupports resource instances: True\n\nAuthentication methods:\n\n🔒 implicit\n\n🔒 user-account\n\n🔒 service-account\n\n🔒 oauth2-token\n\n🔒 impersonation\n\nAllows Stack Components to connect to GCS buckets. When used by Stack\n\nComponents, they are provided a pre-configured GCS Python client instance.\n\nThe configured credentials must have at least the following GCP permissions\n\nassociated with the GCS buckets that it can access:\n\nstorage.buckets.list\n\nstorage.buckets.get\n\nstorage.objects.create\n\nstorage.objects.delete\n\nstorage.objects.get\n\nstorage.objects.list\n\nstorage.objects.update\n\nFor example, the GCP Storage Admin role includes all of the required\n\npermissions, but it also includes additional permissions that are not required\n\nby the connector.\n\nIf set, the resource name must identify a GCS bucket using one of the following\n\nformats:\n\nGCS bucket URI: gs://{bucket-name}\n\nGCS bucket name: {bucket-name}\n\n[...]\n\n────────────────────────────────────────────────────────────────────────────────\n\nPlease select a resource type or leave it empty to create a connector that can be used to access any of the supported resource types (gcp-generic, gcs-bucket, kubernetes-cluster, docker-registry). []: gcs-bucket\n\nWould you like to attempt auto-configuration to extract the authentication configuration from your local environment ? [y/N]: y\n\nService connector auto-configured successfully with the following configuration:\n\nService connector 'gcp-interactive' of type 'gcp' is 'private'.\n\n'gcp-interactive' gcp Service\n\nConnector Details\n\n┏━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━┓", ] embeddings = model.encode(sentences) print(embeddings.shape) @@ -1275,21 +1244,21 @@ You can finetune this model on your own dataset. | Metric | Value | |:--------------------|:-----------| -| cosine_accuracy@1 | 0.2952 | -| cosine_accuracy@3 | 0.6205 | -| cosine_accuracy@5 | 0.6928 | -| cosine_accuracy@10 | 0.7892 | -| cosine_precision@1 | 0.2952 | -| cosine_precision@3 | 0.2068 | -| cosine_precision@5 | 0.1386 | -| cosine_precision@10 | 0.0789 | -| cosine_recall@1 | 0.2952 | -| cosine_recall@3 | 0.6205 | -| cosine_recall@5 | 0.6928 | -| cosine_recall@10 | 0.7892 | -| cosine_ndcg@10 | 0.5524 | -| cosine_mrr@10 | 0.4758 | -| **cosine_map@100** | **0.4836** | +| cosine_accuracy@1 | 0.2831 | +| cosine_accuracy@3 | 0.5723 | +| cosine_accuracy@5 | 0.6807 | +| cosine_accuracy@10 | 0.8012 | +| cosine_precision@1 | 0.2831 | +| cosine_precision@3 | 0.1908 | +| cosine_precision@5 | 0.1361 | +| cosine_precision@10 | 0.0801 | +| cosine_recall@1 | 0.2831 | +| cosine_recall@3 | 0.5723 | +| cosine_recall@5 | 0.6807 | +| cosine_recall@10 | 0.8012 | +| cosine_ndcg@10 | 0.5407 | +| cosine_mrr@10 | 0.4577 | +| **cosine_map@100** | **0.4652** | #### Information Retrieval * Dataset: `dim_256` @@ -1297,21 +1266,21 @@ You can finetune this model on your own dataset. | Metric | Value | |:--------------------|:-----------| -| cosine_accuracy@1 | 0.2831 | -| cosine_accuracy@3 | 0.5964 | +| cosine_accuracy@1 | 0.2952 | +| cosine_accuracy@3 | 0.6024 | | cosine_accuracy@5 | 0.6807 | -| cosine_accuracy@10 | 0.7771 | -| cosine_precision@1 | 0.2831 | -| cosine_precision@3 | 0.1988 | +| cosine_accuracy@10 | 0.7952 | +| cosine_precision@1 | 0.2952 | +| cosine_precision@3 | 0.2008 | | cosine_precision@5 | 0.1361 | -| cosine_precision@10 | 0.0777 | -| cosine_recall@1 | 0.2831 | -| cosine_recall@3 | 0.5964 | +| cosine_precision@10 | 0.0795 | +| cosine_recall@1 | 0.2952 | +| cosine_recall@3 | 0.6024 | | cosine_recall@5 | 0.6807 | -| cosine_recall@10 | 0.7771 | -| cosine_ndcg@10 | 0.5376 | -| cosine_mrr@10 | 0.4601 | -| **cosine_map@100** | **0.4691** | +| cosine_recall@10 | 0.7952 | +| cosine_ndcg@10 | 0.5458 | +| cosine_mrr@10 | 0.4661 | +| **cosine_map@100** | **0.4729** | #### Information Retrieval * Dataset: `dim_128` @@ -1319,21 +1288,21 @@ You can finetune this model on your own dataset. | Metric | Value | |:--------------------|:-----------| -| cosine_accuracy@1 | 0.2711 | -| cosine_accuracy@3 | 0.5301 | -| cosine_accuracy@5 | 0.5964 | +| cosine_accuracy@1 | 0.247 | +| cosine_accuracy@3 | 0.5843 | +| cosine_accuracy@5 | 0.6265 | | cosine_accuracy@10 | 0.741 | -| cosine_precision@1 | 0.2711 | -| cosine_precision@3 | 0.1767 | -| cosine_precision@5 | 0.1193 | +| cosine_precision@1 | 0.247 | +| cosine_precision@3 | 0.1948 | +| cosine_precision@5 | 0.1253 | | cosine_precision@10 | 0.0741 | -| cosine_recall@1 | 0.2711 | -| cosine_recall@3 | 0.5301 | -| cosine_recall@5 | 0.5964 | +| cosine_recall@1 | 0.247 | +| cosine_recall@3 | 0.5843 | +| cosine_recall@5 | 0.6265 | | cosine_recall@10 | 0.741 | -| cosine_ndcg@10 | 0.4983 | +| cosine_ndcg@10 | 0.4995 | | cosine_mrr@10 | 0.4218 | -| **cosine_map@100** | **0.4313** | +| **cosine_map@100** | **0.4324** | #### Information Retrieval * Dataset: `dim_64` @@ -1341,21 +1310,21 @@ You can finetune this model on your own dataset. | Metric | Value | |:--------------------|:----------| -| cosine_accuracy@1 | 0.2651 | -| cosine_accuracy@3 | 0.4819 | -| cosine_accuracy@5 | 0.5663 | -| cosine_accuracy@10 | 0.6566 | -| cosine_precision@1 | 0.2651 | -| cosine_precision@3 | 0.1606 | -| cosine_precision@5 | 0.1133 | -| cosine_precision@10 | 0.0657 | -| cosine_recall@1 | 0.2651 | -| cosine_recall@3 | 0.4819 | -| cosine_recall@5 | 0.5663 | -| cosine_recall@10 | 0.6566 | -| cosine_ndcg@10 | 0.4541 | -| cosine_mrr@10 | 0.3899 | -| **cosine_map@100** | **0.402** | +| cosine_accuracy@1 | 0.2349 | +| cosine_accuracy@3 | 0.5 | +| cosine_accuracy@5 | 0.5783 | +| cosine_accuracy@10 | 0.6928 | +| cosine_precision@1 | 0.2349 | +| cosine_precision@3 | 0.1667 | +| cosine_precision@5 | 0.1157 | +| cosine_precision@10 | 0.0693 | +| cosine_recall@1 | 0.2349 | +| cosine_recall@3 | 0.5 | +| cosine_recall@5 | 0.5783 | +| cosine_recall@10 | 0.6928 | +| cosine_ndcg@10 | 0.4607 | +| cosine_mrr@10 | 0.3874 | +| **cosine_map@100** | **0.397** |