How to manage endpoints for model serving on Databricks using API and UI


To create, modify, and delete endpoints for model serving on Databricks, you can follow the instructions provided below.

Creating Model Serving Endpoints

You have two options to create model serving endpoints: using the Databricks Machine Learning API or the Databricks Machine Learning UI.

  1. API Workflow

To create an endpoint using the API, you can use the following Python code with the requests library:

import requests

url = "https://<databricks-instance>/api/2.0/serving-endpoints"

payload = {
"name": "feed-ads",
"config": {
"served_models": [{
"model_name": "ads1",
"model_version": "1",
"workload_size": "Small",
"scale_to_zero_enabled": True
}]
}
}

headers = {
"Content-Type": "application/json",
"Authorization": "Bearer <access-token>"
}

response = requests.post(url, json=payload, headers=headers)

Notice that:

  • The POST request method is used for creating endpoints.
  • It’s important to note that the API workflow for creating endpoints only works the first time a model is created. If the underlying model version changes or there are any configuration updates, you need to use the modify endpoint method.
  • The access-token is your databricks access token which can be generated follow this blog
  1. UI Workflow

To create an endpoint using the UI, follow these steps:

  1. Go to the Databricks sidebar and click on “Serving”.
  2. Click on “Create serving endpoint”.
  3. Provide a name for your endpoint.
  4. In the “Edit configuration” section, select the model from either the Workspace Model Registry or Unity Catalog, along with its version.
  5. Click “Confirm”.
  6. Select the compute size for your endpoint and specify if it should scale to zero when not in use.
  7. Configure the traffic percentage to route to the served model.
  8. Click “Create serving endpoint”.

Modifying the Compute Configuration of an Endpoint

After enabling an endpoint, you can modify its compute configuration using either the API or the UI.

  1. API Workflow

To modify the compute configuration of an endpoint using the API, you can use the following Python code:

import requests

endpoint_name = "feed-ads"
url = f"https://<databricks-instance>/api/2.0/serving-endpoints/{endpoint_name}/config"

payload = {
"served_models": [{
"model_name": "ads1",
"model_version": "2",
"workload_size": "Small",
"scale_to_zero_enabled": True
}]
}

headers = {
"Content-Type": "application/json",
"Authorization": "Bearer <access-token>"
}

response = requests.put(url, json=payload, headers=headers)

Notice that:

  • The PUT request method is used for modifying the compute configuration of an endpoint.
  • Use this method when you want to update the compute configuration or change the served models of an existing endpoint such as increase the model version.
  1. UI Workflow

To modify the compute configuration of an endpoint using the UI, follow these steps:

  1. Go to the Databricks sidebar and click on “Serving”.
  2. Select the endpoint you want to modify.
  3. Click on “Edit configuration”.
  4. Choose a workload size and specify if the endpoint should scale down to zero when not in use.
  5. Modify the traffic percentage to route to the served model.
  6. Click “Save”.

Deleting a Model Serving Endpoint

To disable serving for a model, you can delete the endpoint it’s served on.

  1. API Workflow

To delete an endpoint using the API, you can use the following Python code:

import requests

endpoint_name = "feed-ads"
url = f"https://<databricks-instance>/api/2.0/serving-endpoints/{endpoint_name}"

headers = {
"Authorization": "Bearer <access-token>"
}

response = requests.delete(url, headers=headers)
  1. UI Workflow

To delete an endpoint using the UI, follow these steps:

  1. Go to the Databricks sidebar and click on “Serving”.
  2. Select the endpoint you want to delete.
  3. Click on the kebab menu at the top and choose “Delete”.

These instructions provide a clearer and more concise way to deploy, modify, and delete model serving endpoints on Databricks, using both the API and the UI.


Author: robot learner
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source robot learner !
  TOC