进展中 · 0 次更新Fact 9/10

英伟达发布 Nemotron 3 Ultra，面向长时运行 AI 代理推理

文章语言

简体中文

英伟达发布了 Nemotron 3 Ultra，这是一款拥有 5500 亿参数、其中 550 亿参数在推理时激活的混合专家模型。该模型面向长时运行代理系统中的推理与编排任务，英伟达称其吞吐量可比同类开源模型高出五倍，并可将代理任务成本降低最多 30%。

Guidances Staff · Updated June 15, 2026 · 已审阅来源

Open article · no sign-in required

Editorial illustration · June 15, 2026

Nemotron 3 Ultra is positioned as a modular model for long-running agent reasoning and orchestration, where efficiency depends on routing work through specialized components.

来源与披露

View source at developer.nvidia.com

The article accurately presents NVIDIA's claims regarding Nemotron 3 Ultra's specifications, purpose, and performance metrics (throughput and cost reduction). It also includes appropriate caveats about the lack of detailed benchmark conditions and the need for developers to validate performance against their own workloads. The article maintains a neutral tone and offers valuable insights for developers. Two minor contextual claims were not directly supported by the provided single source, but these do not undermine the core factual accuracy or reputation safety of the article.

Market lens

Agent runtime spending can spill into security, observability, and workflow infrastructure

The market signal is not another chatbot category; it is a possible budget shift toward the control layer around enterprise AI.

Impact path

Runtime spend → infra stack

Signals to watch

Procurement language around audit logs and cost ceilings
Security and observability vendors attaching agent controls
Workflow platforms exposing approval and tool-call governance

Verification schedule

D+1 · Jun 16

Do buyers repeat audit/cost-control requirements?

D+3 · Jun 18

Do vendors publish runtime-control SKUs or partnerships?

D+7 · Jun 22

Do budgets move from pilots into operating infrastructure?

Informational context only — not investment, legal, tax, or financial advice.

英伟达推出了 Nemotron 3 Ultra，这是一款旨在提升长时运行代理系统推理性能的模型。该模型采用混合专家（MoE）架构，总参数量为 5500 亿，其中 550 亿参数在推理时处于激活状态。根据英伟达官方开发者博客，该模型面向长时运行代理中的前沿推理与编排任务设计。

混合专家架构在推理过程中仅激活总参数中的一部分，从而有助于提升速度并降低计算成本。英伟达表示，Nemotron 3 Ultra 的吞吐量较同级别其他开源模型高出五倍。公司还表示，该模型可将代理任务成本降低最多 30%。这些数据之所以重要，是因为长时运行代理需要反复执行推理和决策步骤，单次推理的成本与速度会影响整体运营效率。

长时运行代理是超越单次问答交互的系统。它们将复杂任务拆分为多个步骤，并在每个阶段利用推理结果决定后续动作。在客户支持、研究辅助和软件开发自动化等领域，代理可能执行数十次到数百次推理调用。在这类环境中，单次推理的速度和成本会影响整个系统的响应能力和运营效率。Nemotron 3 Ultra 正是围绕这些需求设计。

英伟达一直通过 Nemotron 系列支持企业级生成式 AI 工作负载。早期版本主要聚焦文本生成、摘要和分类等任务。然而，Nemotron 3 Ultra 面向的是更复杂的代理编排领域。编排涉及协调多个工具、API 和数据源，并将每一步的输出连接到下一步的输入。这需要超越文本生成的能力，包括规划、状态跟踪和错误处理。

混合专家架构在近期大语言模型开发中受到关注。尽管总参数量较大，但推理时仅激活部分专家模块，从而降低计算负载。这种方法可以在保留模型表达能力的同时降低推理成本。以 Nemotron 3 Ultra 为例，在 5500 亿参数中仅有 550 亿参数处于激活状态，这在理论上使其能够以接近 550 亿参数模型的推理成本实现更高性能。

英伟达提到的五倍吞吐量提升和 30% 成本降低，基于与同级别其他开源模型的比较。不过，现有信息并未详细说明具体基准条件、比较对象和测量方法。实际生产环境中的表现可能因任务类型、基础设施配置、批处理大小及其他因素而异。开发者和企业应结合自身工作负载进行验证。

代理系统的经济性并不只由模型推理成本决定。代理发起的外部 API 调用、数据存储与传输，以及基础设施运维相关成本，也都需要纳入考量。可靠性和准确性同样重要。如果代理频繁做出错误决策并需要重试，即使推理更快，总体成本也可能发生变化。因此，评估 Nemotron 3 Ultra 的价值时，应将推理质量和稳定性与速度和成本一并考虑。

英伟达在开发 Nemotron 系列时，也考虑了与其 GPU 基础设施的集成。Nemotron 3 Ultra 可能与英伟达的推理优化技术结合使用。例如，TensorRT-LLM 和 Triton Inference Server 等工具可能带来进一步的性能提升。对于使用英伟达硬件的企业而言，这种集成式方案可能具有一定优势，但在其他硬件平台上的表现仍需单独验证。

长时运行代理市场仍处于早期阶段，但正在增长。代理系统已被部署到客户支持自动化、研究辅助、软件开发工具和数据分析等领域。这些系统并非执行单一任务，而是通过多步骤决策实现复杂目标。因此，推理效率和成本结构是代理系统商业可行性的关键因素。

Nemotron 3 Ultra 的发布表明，英伟达正在瞄准代理系统市场。通过提供一款专门面向代理编排而非通用语言模型的产品，公司意在支持特定工作负载。这也与更广泛的行业趋势相一致，即模型开发正从通用能力转向面向任务的优化。

不过，仅凭现有信息，仍无法充分评估该模型的实际性能和运行稳定性。要判断其实际价值，还需要基准测试结果、真实世界使用案例以及社区反馈。与开源模型比较时，还应考虑许可条款、部署限制和定制能力等因素。

构建者启示

构建长时运行代理系统的开发者应结合自身工作负载验证 Nemotron 3 Ultra 的吞吐量和成本效率，并测量混合专家架构带来的推理速度提升在实际代理任务流程中的体现方式。
在代理编排任务中，重要的是综合考虑单次推理成本、重试率、准确性以及整个工作流中的外部 API 调用频率，以计算总拥有成本。
使用英伟达基础设施的团队应探索与 TensorRT-LLM 等优化工具的集成可能性，并提前评估在其他硬件平台上的性能差异，以便制定部署策略。

Want follow-up alerts? Subscribe by email after reading the public article.

Market lens

Agent runtime spending can spill into security, observability, and workflow infrastructure

The market signal is not another chatbot category; it is a possible budget shift toward the control layer around enterprise AI.

Impact path

Runtime spend → infra stack

Signals to watch

Procurement language around audit logs and cost ceilings
Security and observability vendors attaching agent controls
Workflow platforms exposing approval and tool-call governance

Verification schedule

D+1 · Jun 16

Do buyers repeat audit/cost-control requirements?

D+3 · Jun 18

Do vendors publish runtime-control SKUs or partnerships?

D+7 · Jun 22

Do budgets move from pilots into operating infrastructure?

Informational context only — not investment, legal, tax, or financial advice.

Set profile for personalized briefings

◆

视觉简报

A long-running agent repeatedly routes each step through only the experts it needs, helping reduce compute and improve throughput.

更正与安全

See a factual, privacy, rights, or safety issue? Review the corrections process or contact Guidances before relying on this article for important decisions.

Report a correction, privacy, rights, or safety issue

#AI#开发者

◆

英伟达发布 Nemotron 3 Ultra，面向长时运行 AI 代理推理

Agent runtime spending can spill into security, observability, and workflow infrastructure

Impact path

Signals to watch

Verification schedule

构建者启示

Agent runtime spending can spill into security, observability, and workflow infrastructure

Impact path

Signals to watch

Verification schedule

视觉简报

更多报道

Meta 的 AI 转向进入商业检验阶段：难点在于如何卖出这套策略

卡尼关于 AI 依赖的警示将模型访问与采购韧性推至焦点

Anthropic在政府指令后切断对Fable 5和Mythos 5的访问，凸显AI部署与合规之间的关系