Nova Roadmap: Q3 2026

What's shipping next quarter. Deeper auto-remediation, multi-region failover orchestration, and the agent SDK opening up to platform teams who want to build their own.

How we picked these

Three signals drove the Q3 roadmap. The first was customer-call data, the issues that came up in the most renewal conversations and the most prospect demos. The second was support-ticket clustering, what's the most common "we wish Nova could…" pattern in the inbox. The third was internal dogfooding, the things we keep wanting Nova to do for our own infrastructure that it can't quite do yet.

The three themes that fell out aren't surprising and aren't novel. They are, however, the three places where the difference between "Nova helps you respond to incidents" and "Nova prevents most of your incidents from becoming customer-visible" hinges. We're committing the quarter to those three.

Deeper auto-remediation

Today, Nova can run a remediation if you've authored a runbook and granted the agent the relevant credentials. The agent runs the runbook, validates the outcome, writes the ledger entry. That's the full loop. It works for the cases you've written runbooks for; it doesn't help with the cases you haven't.

The Q3 work expands the loop in two ways. First, common-pattern recognition, Nova learns the remediation patterns specific to your infrastructure (restart this pool when its memory crosses 90%, scale this autoscaler when queue depth crosses 500, drain this node when CPU pressure stays above 80% for 10 min) and proposes them as runbook drafts. You review and accept; the runbook ships; future occurrences run automatically.

Second, multi-step remediation. The current engine runs a single runbook end-to-end. The new engine can chain runbooks based on outcome, "if the restart didn't recover the pool, drain the node and reschedule." The chaining is bounded (max depth, max time, explicit termination) so it doesn't run away during a misdiagnosis. We're building the safety rails before the chaining; without the rails, multi-step is dangerous.

Multi-region failover

Multi-region is the feature most large-tenant prospects ask about and the one we don't have a great answer for today. The current product treats each region as a separate Nova instance with shared identity but no shared incident state. Run a global service across us-east-1, us-west-2, and eu-west-1 and you get three views of the same incident, each with its own alerts, its own runbook executions, and its own correlation graph.

Q3 ships unified multi-region incidents. A single incident-ID spans regions; the timeline interleaves events from all three; correlation runs across regional signals; remediation can target a region or all regions. The nasty bit is the data plane, we can't ship every region's signals to a global control plane in real-time without burning budget, so we're using a hub-and-spoke model with regional caches and lazy aggregation. Most reads are local; cross-region reads pay a small latency cost.

The other piece is failover orchestration. When a region goes hard, Nova detects it and runs your failover runbook with the region marked as the failed party. Today this is a manual playbook; in Q3 it's a default auto-remediation with a 5-minute hold-window for a human ack. If the human declines or doesn't respond, the failover proceeds. The default conservatism is configurable per tenant.

Agent SDK

Platform teams have been asking for a way to build their own agents on top of Nova's plumbing. Today they can author runbooks but they can't extend the agent loop itself, can't add a new agent type, can't introspect the correlation graph from custom code, can't subscribe to internal events. The Q3 SDK opens those surfaces.

The SDK is TypeScript-first with bindings to Python coming later. Three primitives: defineAgent (declare a custom agent that participates in the work loop), onSignal (subscribe to internal events with filters), callTool (invoke any of the platform's built-in tools, read trace, write ledger, schedule comms, from your code). Auth and isolation are scoped per-tenant; an SDK agent can only touch its tenant's data, regardless of how it's deployed.

The SDK ships with a beta access list. If you're a platform team running on Nova at scale and you want early access, the form is at /sdk-early-access on the docs site. We're capping the beta at 30 tenants to keep feedback quality high; expect a wider rollout late Q4.

How to influence it

The roadmap is a working document, not a contract. If you're a current customer with a use case that doesn't fit the three themes, tell your account contact, we re-prioritise mid-quarter when something obvious surfaces. If you're a prospect, the demo team can route specific feature asks to the product team and tag them against the roadmap.

The two things that don't move the roadmap: vague "we want better observability" requests, and feature lists copy-pasted from a competitor. The things that do: a specific incident or workflow you can't currently handle, with enough detail that we can prototype a fix and validate it against your environment. Specificity earns priority.

One last note. The roadmap is what we're committing to ship in Q3. The actual quarter will produce other work too, bug fixes, performance work, security upgrades, the small features that don't make a roadmap but matter to specific customers. The themed work above is the headline; the rest happens because it has to.