Skip to main content
Version: Next

NPU Compile

Compile a registered model into a binary executable on RNGD NPUs. NuFi supports the following three methods — pick the one that fits your situation.

MethodBest for
A. NuFi UISimple usage, real-time log viewing
B. furiosa-llm CLIFine-grained option tuning
C. Python SDKScript automation

Method A: Compile from the NuFi UI

1. Go to the model version page

In the left sidebar, click Resources > Models, select a model, and in the Artifacts tab click the Compile button on the artifact to compile.

Compilations tab

Artifacts tab

2. Enter compile settings

For compile settings, see the Compile Settings Guide.

Compile creation form

3. Monitor progress

The created compile job is added to the Compilations tab on the model version detail page. Check progress in the Phase column, then click the row to open the detail page and use the step-level Logs buttons to view progress logs. Click the View button in the Config column to open a dialog with the compile settings used when the job was created.

On the right side of each row, the rotate-ccw Re-compile icon opens the creation dialog with the same options pre-filled, and the trash Delete icon deletes that compilation history entry.

Compile in progress

4. Check completion

When the status becomes Succeeded, an RNGD artifact is automatically added to the model's artifact list.

Compile complete — Registered

On failure:

  • OOM error: lower Max Context Length or Tensor Parallel and retry
  • Platform error: ask the administrator to check NPU device status

Method B: Compile in Jupyter Lab with the furiosa-llm CLI

Connect to the NPU Lab created in 02. Create per-device Lab, open a terminal, and proceed.

Create the output directory

mkdir -p /data/Qwen2.5-0.5B-Instruct-rngd-cli

Run compile

furiosa-llm build \
/data/Qwen2.5-0.5B-Instruct \
/data/Qwen2.5-0.5B-Instruct-rngd-cli \
--tensor-parallel-size 1 \
--max-seq-len-to-capture 4096
Argument / optionDescription
First argumentInput model path (Volume mount path)
Second argumentOutput path for compile result (must be under /data/)
--tensor-parallel-sizeTensor Parallel size
--max-seq-len-to-captureMaximum context length
Output path

Always save under the Volume mount path (/data/). Saving outside the Volume (e.g., /tmp) loses the data when the Pod terminates.

After compile, register the model

When compile completes, in 04. Register Model, first register the original GPU model with Add Version, then in the Artifacts tab click Add Artifact to register the compiled RNGD model.

Add Artifact — RNGD

How to determine Path

Path is a relative path under the Volume mount path (/data/). In the example above we used /data/Qwen2.5-0.5B-Instruct-rngd-cli, so enter Qwen2.5-0.5B-Instruct-rngd-cli in Path. If you used a different path, run ls /data/ in the terminal to verify the actual directory name.


Method C: Compile with the Python SDK

Connect to the NPU Lab created in 02. Create per-device Lab, and proceed from a Jupyter notebook or terminal.

Write and run the compile script

from furiosa_llm.artifact import ArtifactBuilder

builder = ArtifactBuilder(
model_id_or_path="/data/Qwen2.5-0.5B-Instruct",
tensor_parallel_size=1,
max_seq_len_to_capture=4096,
)

builder.build(save_dir="/data/Qwen2.5-0.5B-Instruct-rngd-sdk")
Output path

Always set save_dir to a location under the Volume mount path (/data/).

After compile, register the model

Register the result in NuFi Models the same way as Method B.


Next Step

06. Evaluate model quality — Evaluate accuracy with the compiled RNGD artifact → 07. Deploy Model Serving — Jump straight to serving deployment