Skip to main content
Version: Next

Serve a Model Downloaded in Lab

Download a Hugging Face model manually from Jupyter Lab, register the files in Model Artifacts, and deploy them as a Serving. Use this path when you need to run download scripts yourself or modify files before registration.

Prerequisites

  • A Volume for storing model files. If you need a new one, see Volumes.
  • A Jupyter-based Lab. See Lab.

1. Mount a Volume in Lab

When creating the Lab, add the model storage Volume to Data Volumes and set the mount path to /data.

FieldExample
Nametutorial-lab
Server TypeJupyter
Data Volumestutorial-volume -> /data

Lab creation form

2. Download the Model

When the Lab is Running, click Connect to open Jupyter. Open File > New > Terminal and download the model.

pip install -U huggingface_hub

hf download Qwen/Qwen2.5-0.5B-Instruct \
--local-dir /data/Qwen2.5-0.5B-Instruct

For private repositories, log in with a token first.

hf login --token $HF_TOKEN

3. Register the Model

In the left sidebar, click Model Artifacts and run Register Model.

FieldExample
Model Nameqwen-instruct-tutorial
Versionv1
Volumetutorial-volume
PathQwen2.5-0.5B-Instruct
FormatSafeTensors

Add artifact

When validation succeeds, register the model version.

4. Choose a Serving Path

For GPU serving, run Quick Deploy from the model detail page.

For NPU serving, compile the source artifact first in Model Compilations. When compilation reaches Succeeded, run Quick Deploy with the generated NPU artifact.

5. Create the Serving

In the Quick Deploy dialog, confirm the model, version, and artifact, then enter a Serving name.

FieldExample
Service Namelab-downloaded-model-serving
Versionv1
ArtifactSource artifact for GPU serving, compiled artifact for NPU serving

The deployment is complete when the Serving status becomes Running.

Next Steps

To check the serving model's response, continue to Test Responses in Playground.

To check device and node metrics, continue to Check Metrics in Monitoring.