NVIDIA CUDA-Q
| Field | Details |
|---|---|
| Category | Quantum computing framework and hybrid quantum-classical programming platform |
| Vendor | NVIDIA |
| Primary use | Building, simulating, optimizing, and running hybrid quantum-classical applications across CPUs, GPUs, simulators, and QPUs |
| Languages | Python and C++ |
| Execution targets | Local CPU simulation, GPU-accelerated simulation, multi-GPU workflows, cloud/provider QPUs, and custom backends |
| Core abstractions | Quantum kernels, qubits/qvectors, gates, measurements, targets/backends, sampling, running, observing expectation values, optimizers, and quantum operators |
| Best fit | Researchers and engineering teams that need one programming model for quantum algorithms, high-performance simulation, and QPU/hybrid HPC experimentation |
English
Overview
NVIDIA CUDA-Q is an open-source platform and programming model for accelerated quantum supercomputing. It lets developers express hybrid quantum-classical programs in Python or C++ and execute them on heterogeneous resources: CPUs, NVIDIA GPUs, quantum simulators, and physical quantum processing units.
The platform is designed to be QPU-agnostic. A CUDA-Q program can be developed against local simulators, scaled to GPU-accelerated simulation, and then retargeted to supported quantum hardware providers when appropriate. NVIDIA positions CUDA-Q as a bridge between near-term quantum algorithm development and longer-term fault-tolerant, quantum-centric supercomputing.
Why it matters
CUDA-Q matters because useful quantum workloads are rarely only a quantum circuit. They usually combine classical preprocessing, parameter optimization, quantum kernel execution, measurement aggregation, simulation, and postprocessing. CUDA-Q puts those pieces in one programming model instead of forcing developers to stitch together unrelated circuit tools, HPC code, and hardware interfaces.
Important practical benefits include:
- Hybrid execution across CPU, GPU, and QPU resources from one application.
- GPU-accelerated state-vector, tensor-network, and noisy simulation for algorithm development before hardware is available or affordable.
- Python and C++ APIs, allowing notebooks and research prototypes to share concepts with performance-oriented production code.
- A kernel-based model that lets developers write quantum routines once and target different simulators or QPUs.
- Integration with compiler infrastructure such as MLIR, LLVM, and QIR for lowering, optimization, and backend execution.
- Support for algorithmic workflows such as VQE, QAOA, Hamiltonian simulation, quantum machine learning experiments, dynamics simulation, and quantum error-correction research.
For teams already using NVIDIA GPUs or HPC systems, CUDA-Q is especially relevant because it treats quantum acceleration as part of a broader heterogeneous computing stack rather than as a separate silo.
Architecture/Concepts
CUDA-Q centers on a few core concepts:
- Quantum kernels: A kernel is the unit of quantum code. In C++, kernels are annotated with
__qpu__; in Python, they are commonly declared with@cudaq.kernel. Kernels allocate qubits, apply gates, use supported classical control flow, and perform measurements. - Host and device roles: Classical host code calls quantum kernels, manages parameters, runs optimizers, selects targets, and processes results. Kernel bodies describe the quantum work and a supported subset of classical logic.
- Quantum data types: Programs use qubits and registers such as
cudaq.qubit,cudaq.qvector,cudaq::qubit, andcudaq::qvectorto represent quantum state within kernels. - Quantum operations: Gates, controlled operations, adjoint operations, custom operations, and measurements are expressed directly in the kernel language. Measurement can be used for sampling and, where supported, mid-circuit control.
- Targets and backends: CUDA-Q programs can target CPU simulators, NVIDIA GPU-accelerated simulators, multi-GPU simulation modes, cloud backends, and supported hardware providers.
- Execution primitives:
sampleis used for shot-based measurement counts,runsupports workflows that return classical data from kernels,observecomputes expectation values for operator/Hamiltonian workloads, and state APIs support simulator introspection. - Operators and optimizers: CUDA-Q includes tools for constructing spin operators and running classical optimizers/gradients around parameterized quantum kernels.
- Compiler/toolchain path: CUDA-Q lowers high-level Python or C++ kernel code through a compiler stack that can optimize and transform quantum programs for the selected simulator or QPU target.
- CUDA-QX libraries: NVIDIA also exposes domain libraries such as CUDA-Q Solvers and CUDA-Q QEC for higher-level algorithm and error-correction workflows.
Practical usage
For Python exploration, the current quick start uses pip install cudaq and a normal Python workflow. A minimal program typically imports cudaq, defines a decorated kernel, allocates a qvector, applies gates, measures, and calls cudaq.sample.
For C++ development, CUDA-Q uses the nvq++ compiler. A typical C++ kernel includes <cudaq.h>, marks quantum code with __qpu__, allocates a cudaq::qvector, applies operations, measures, and compiles with nvq++.
Common usage patterns:
- Start locally with a small circuit and
sampleto validate measurement distributions. - Increase qubit count or circuit depth on GPU-accelerated targets when simulation becomes CPU-bound.
- Use
observefor Hamiltonian expectation values in chemistry, materials, VQE, and QAOA-style workflows. - Use parameterized kernels plus CUDA-Q or third-party optimizers for hybrid variational algorithms.
- Add noisy simulation to test robustness against device-like error models.
- Move from local simulation to supported QPU providers once credentials, topology, queueing, shot budgets, and backend constraints are understood.
- Use multi-GPU or multi-QPU modes when workloads can be batched or distributed across processors.
Operational caveats: CUDA-Q does not require a GPU for basic use, but GPU acceleration is Linux-focused and depends on compatible NVIDIA drivers, CUDA, and target support. Multi-GPU workflows add MPI and environment requirements. Hardware-provider execution also depends on provider accounts, backend availability, credentials, and device-specific limitations.
Learning checklist
- Explain CUDA-Q's role as a hybrid quantum-classical programming model rather than only a circuit SDK.
- Write a basic Python kernel with
@cudaq.kernel,cudaq.qvector, gates,mz, andcudaq.sample. - Write the same idea in C++ with
__qpu__,cudaq::qvector, andnvq++. - Distinguish
sample,run,observe, and state retrieval. - Understand how targets/backends change execution without rewriting the algorithm.
- Know when CPU simulation is enough and when GPU or multi-GPU simulation is justified.
- Build a parameterized kernel and connect it to an optimizer for VQE or QAOA.
- Construct simple spin operators or Hamiltonians and compute expectation values.
- Test noisy simulations before assuming ideal-circuit results transfer to hardware.
- Check installation, CUDA, MPI, provider credentials, and backend constraints before planning production runs.
繁體中文
概覽
NVIDIA CUDA-Q 是開源的量子開發平台與程式設計模型,目標是加速量子超級運算。開發者可以用 Python 或 C++ 描述混合量子-classical 程式,並在 CPU、NVIDIA GPU、量子模擬器與實體 QPU 上執行。
CUDA-Q 的設計重點是 QPU-agnostic。開發者可以先用本機模擬器開發,再擴展到 GPU 加速模擬,最後在條件成熟時切換到支援的量子硬體供應商。NVIDIA 將它定位為連接近期量子演算法開發與長期 fault-tolerant、quantum-centric supercomputing 的工具鏈。
為什麼重要
CUDA-Q 重要的原因在於,有用的量子工作負載通常不只是單一量子電路。實務上會同時包含 classical 前處理、參數最佳化、量子 kernel 執行、量測統計、模擬與後處理。CUDA-Q 將這些流程放進同一個程式模型中,降低把電路工具、HPC 程式與硬體介面拼接在一起的成本。
主要實務價值包括:
- 可在同一應用中協調 CPU、GPU 與 QPU 資源。
- 提供 GPU 加速的 state-vector、tensor-network 與 noisy simulation,方便在硬體可用或成本可接受前先開發演算法。
- 同時支援 Python 與 C++,讓 notebook 研究原型與高效能程式碼使用相近概念。
- 以 kernel 為核心的模型,讓量子 routine 可以寫一次再切換不同模擬器或 QPU。
- 與 MLIR、LLVM、QIR 等 compiler infrastructure 整合,用於 lowering、最佳化與 backend 執行。
- 支援 VQE、QAOA、Hamiltonian simulation、量子機器學習實驗、dynamics simulation,以及量子錯誤校正研究等工作流。
對已經使用 NVIDIA GPU 或 HPC 系統的團隊來說,CUDA-Q 特別有價值,因為它把量子加速視為 heterogeneous computing stack 的一部分,而不是獨立孤島。
架構/概念
CUDA-Q 的核心概念如下:
- Quantum kernels: kernel 是量子程式碼的基本單位。C++ 使用
__qpu__標註;Python 通常使用@cudaq.kernel。kernel 會配置 qubit、套用 gate、使用支援的 classical control flow,並執行量測。 - Host 與 device 角色: classical host code 呼叫 quantum kernel、管理參數、執行 optimizer、選擇 target,並處理結果。kernel body 則描述量子工作與受支援的 classical 邏輯子集。
- 量子資料型別: 程式使用
cudaq.qubit、cudaq.qvector、cudaq::qubit、cudaq::qvector等 qubit 與 register 型別,在 kernel 中表示量子狀態。 - 量子操作: gate、controlled operation、adjoint operation、自訂 operation 與 measurement 都可以直接在 kernel language 中描述。量測可用於 sampling,也可在支援情境中用於 mid-circuit control。
- Targets 與 backends: CUDA-Q 程式可切換到 CPU 模擬器、NVIDIA GPU 加速模擬器、多 GPU 模擬模式、cloud backend,以及支援的硬體供應商。
- 執行 primitive:
sample用於 shot-based 量測統計,run支援從 kernel 回傳 classical data 的流程,observe用於 operator/Hamiltonian 的 expectation value,state API 則支援模擬器狀態檢查。 - Operators 與 optimizers: CUDA-Q 提供 spin operator 建構工具,以及可搭配 parameterized quantum kernel 的 classical optimizer/gradient 工具。
- Compiler/toolchain 路徑: CUDA-Q 會將高階 Python 或 C++ kernel code 經由 compiler stack lowering,並依選定的模擬器或 QPU target 做最佳化與轉換。
- CUDA-QX libraries: NVIDIA 也提供 CUDA-Q Solvers、CUDA-Q QEC 等 domain libraries,用於更高階的演算法與錯誤校正工作流。
實務使用
Python 探索流程目前可依 quick start 使用 pip install cudaq,再用一般 Python 方式執行。最小程式通常會 import cudaq、定義 decorated kernel、配置 qvector、套用 gate、量測,最後呼叫 cudaq.sample。
C++ 開發則使用 CUDA-Q 的 nvq++ compiler。典型 C++ kernel 會 include <cudaq.h>,以 __qpu__ 標記量子程式碼,配置 cudaq::qvector,套用操作、量測,並用 nvq++ 編譯。
常見使用模式:
- 先在本機用小型電路與
sample驗證量測分布。 - 當 qubit 數或 circuit depth 讓 CPU 模擬變慢時,切換到 GPU 加速 target。
- 在 chemistry、materials、VQE、QAOA 類工作流中,用
observe計算 Hamiltonian expectation value。 - 將 parameterized kernel 搭配 CUDA-Q 或第三方 optimizer,建立 hybrid variational algorithm。
- 加入 noisy simulation,測試演算法對 device-like error model 的穩健性。
- 在理解 credentials、topology、queueing、shot budget 與 backend 限制後,再從本機模擬移到支援的 QPU provider。
- 當工作負載可 batch 或分散到多個 processor 時,使用 multi-GPU 或 multi-QPU 模式。
實務注意事項:基本使用不需要 GPU,但 GPU 加速主要面向 Linux,且取決於相容的 NVIDIA driver、CUDA 與 target 支援。Multi-GPU 工作流會增加 MPI 與環境變數需求。硬體 provider 執行也取決於帳號、backend availability、credentials 與 device-specific 限制。
學習檢核表
- 說明 CUDA-Q 是 hybrid quantum-classical programming model,而不只是 circuit SDK。
- 使用
@cudaq.kernel、cudaq.qvector、gate、mz、cudaq.sample寫出基本 Python kernel。 - 用 C++ 的
__qpu__、cudaq::qvector與nvq++寫出相同概念。 - 分辨
sample、run、observe與 state retrieval。 - 理解 target/backend 如何在不重寫演算法的情況下改變執行位置。
- 判斷何時 CPU 模擬足夠,何時需要 GPU 或 multi-GPU 模擬。
- 建立 parameterized kernel,並串接 optimizer 實作 VQE 或 QAOA。
- 建構簡單 spin operator 或 Hamiltonian,並計算 expectation value。
- 在假設 ideal-circuit 結果可轉移到硬體前,先測試 noisy simulation。
- 在規劃正式執行前,檢查安裝、CUDA、MPI、provider credentials 與 backend 限制。