VCF 9.0 GA Mental Model Part 2: Fleet Services vs Instance Management Planes (and Who Owns What)

TL;DR

Standardize on the official hierarchy: VCF private cloud -> VCF fleet -> VCF instance -> VCF domain -> vSphere clusters. A VCF fleet is managed by one set of fleet-level management components (notably VCF Operations and VCF Automation), while each VCF instance keeps its own management domain and domain-level control planes.

Your fastest path to org alignment is separating two things people constantly mix up:

Fleet-level services: centralized operations, lifecycle for management components, automation, and SSO integration.

Instance management planes: SDDC Manager, management vCenter, management NSX, plus the vCenter and NSX that belong to each workload domain.

Scope and code levels referenced (VCF 9.0 GA core):

VCF Installer: 9.0.1.0 build 24962180 (required to deploy VCF 9.0.0.0 components)

SDDC Manager: 9.0.0.0 build 24703748

vCenter: 9.0.0.0 build 24755230

ESXi: 9.0.0.0 build 24755229

NSX: 9.0.0.0 build 24733065

VCF Operations: 9.0.0.0 build 24695812

VCF Operations fleet management: 9.0.0.0 build 24695816

VCF Automation: 9.0.0.0 build 24701403

VCF Identity Broker: 9.0.0.0 build 24695128

Architecture Diagram

Legend:

Fleet-level management components give you centralized governance, inventory, and services.

Instance management planes are not shared. Each instance still owns its own SDDC Manager, vCenter, and NSX boundaries.

Table of Contents

Scenario

Assumptions

Core vocabulary recap

Core concept: separate fleet services from instance management planes

What runs where in VCF 9.0 GA

Who owns what

Day-0, day-1, day-2 map

Identity and SSO boundaries that actually matter

Topology patterns for single site, two sites, and multi-region

Failure domain analysis

Operational runbook snapshot

Anti-patterns

Summary and takeaways

Conclusion

Scenario

You need architects, operators, and leadership to agree on:

What VCF 9.0 actually manages.

What is centralized at fleet level vs isolated per instance or domain.

Who owns which parts of lifecycle, identity, and day-2 operations.

Assumptions

You are deploying greenfield VCF 9.0 GA (core components at 9.0.0.0, deployed via the documented installer level).

You deploy both VCF Operations and VCF Automation from day-1.

You want patterns for:

Single site

Two sites in one region

Multi-region

You need guidance for both:

Shared identity

Separate identity and SSO boundaries for regulated isolation

Core vocabulary recap

Use these terms consistently in meetings, designs, and runbooks:

VCF private cloud: the highest-level management and consumption boundary; can contain one or more fleets.

VCF fleet: managed by one set of fleet-level management components (notably VCF Operations and VCF Automation); contains one or more instances.

VCF instance: a discrete VCF deployment containing a management domain and optionally workload domains.

VCF domain: a lifecycle and isolation boundary inside an instance (management domain and VI workload domains).

vSphere cluster: where ESXi capacity lives; clusters exist inside domains.

Core concept: separate fleet services from instance management planes

You get clean operations when you stop trying to force everything into a single “management plane” blob.

Instead, run this mental separation:

Fleet services

These are the things you deploy once per fleet to provide centralized capabilities:

VCF Operations: inventory, observability, and the console where centralized lifecycle and identity workflows surface.

VCF Operations fleet management appliance: lifecycle management operations for the fleet management components.

VCF Automation: self-service consumption, organization constructs, and automation.

VCF Identity Broker + VCF Single Sign-On: centralized authentication configuration across components (with important exclusions).

Practical implication: if fleet services are impaired, governance and workflows degrade, but the instance-level control planes do not magically disappear.

Instance management planes

Every instance retains its own control plane boundaries:

SDDC Manager

Management domain vCenter

Management domain NSX

This is where most “core infrastructure lifecycle” actually executes.

Domain-level control planes

Each workload domain is its own lifecycle and isolation boundary, typically with:

Its own vCenter

Its own NSX Manager (dedicated per domain, or shared depending on design)

What runs where in VCF 9.0 GA

A clean greenfield deployment is intentionally opinionated:

The management domain of the first instance hosts the fleet-level management components (VCF Operations and VCF Automation).

Additional instances still have their own instance-level management components (SDDC Manager, vCenter, NSX), and may deploy collectors as needed.

Two other details matter for design reviews:

VCF Operations fleet management is treated as a first-class appliance and should be protected with vSphere HA in the default management cluster.

VCF Single Sign-On can provide one-login access for many components, but not SDDC Manager and not ESXi.

Who owns what

This table is meant to stop “that’s not my job” loops during incidents and upgrades.

Component or capabilityPlatform team (VCF)VI admin (domains and clusters)App and platform teamsFleet bring-up (VCF Installer, fleet creation)OwnConsultInformFleet-level management components (VCF Operations, fleet management appliance, VCF Automation)OwnConsultInformVCF Identity Broker and VCF Single Sign-On configurationOwnConsultInformSDDC Manager (per instance)Own (platform governance)Own day-2 executionInformManagement domain vCenter and NSXSharedOwnInformWorkload domain lifecycle (create domain, add clusters, remediate hosts)SharedOwnInformWorkload consumption (Org structure, projects, templates, quotas, policies)Shared (guardrails)ConsultOwnBackup and restore for fleet management componentsOwnConsultInformBackup and restore for instance components (SDDC Manager, vCenter, NSX)Shared (standards)OwnInformDay-2 password lifecycle (rotation, remediation)Own (policy + tooling)SharedInformCertificates and trust (CA integration, renewal cadence)OwnSharedInformDR plans for management components and identityOwnConsultInformDR plans for workload domains and applicationsShared (platform)Shared (infra)Own

Ownership rule of thumb:

Platform team owns the fleet services and guardrails.

VI admins own domain lifecycle execution and capacity.

App teams own how they consume resources and what SLAs they require.

Day-0, day-1, day-2 map

This matters because VCF 9.0 pushes more workflows into a centralized console, but it does not eliminate domain-level responsibilities.

Day-0

Design-time decisions that are expensive to change later:

How many fleets you need (governance and isolation boundary).

How many instances you need (location and operational boundary).

Identity design:

VCF Identity Broker deployment mode (embedded vs appliance).

SSO scope (single instance vs cross-instance vs fleet-wide).

Shared vs separate IdPs and SSO boundaries.

Network and IP plan:

Subnet sizing for growth matters because changing subnet masks for infrastructure networks is not supported.

Decide whether fleet-level components share the management VM network or get a dedicated network or NSX-backed segment.

Management domain sizing:

Management domains must be sized to host the management components plus future workload domain growth.

Lifecycle blast radius strategy:

How you segment domains, instances, and fleets to control upgrade and incident scope.

Day-1

Bring-up and initial enablement:

Deploy the VCF Installer appliance, download binaries, and start a new VCF fleet deployment.

Bring up the first instance and its management domain.

Deploy the fleet-level management components (VCF Operations, fleet management appliance, VCF Automation).

Deploy VCF Identity Broker (often appliance mode for multi-instance SSO scenarios) and configure VCF Single Sign-On.

Create initial workload domains, and connect them into VCF Automation as needed.

Day-2

Ongoing operations:

Lifecycle management:

Management component lifecycle through VCF Operations fleet management.

Cluster lifecycle through vSphere lifecycle tooling, with VCF coordinating.

Identity operations:

Adding components and instances into SSO scope.

Re-assigning roles and permissions inside vCenter and NSX after SSO configuration changes.

Security hygiene:

Password rotation and remediation flows.

Certificate replacement with CA-signed certs across both management components and instance components.

Platform resilience:

Backup scheduling to an SFTP target for management components and instance components.

Shutdown and startup runbooks that preserve authentication and cluster integrity.

Identity and SSO boundaries that actually matter

What VCF Single Sign-On does (and does not)

VCF Single Sign-On is designed to streamline access across multiple VCF components with one authentication source configured from the VCF Operations console.

Key operational detail:

It supports SSO across components like vCenter, NSX, VCF Operations, VCF Automation, and other VCF management components.

It explicitly excludes SDDC Manager and ESXi, which means you still need local access patterns and break-glass workflows for those systems.

Identity pillars in VCF

Your identity design is built on three pillars:

External IdP (SAML/OIDC or directory)

VCF Identity Broker (brokers authentication and maintains SSO tokens)

VCF Single Sign-On (centralized authentication configuration and user management)

Important constraint:

Each VCF Identity Broker is configured with a single identity provider.

VCF Identity Broker deployment modes

Here’s the practical decision point.

Decision pointEmbedded (vCenter service)Appliance (3-node cluster)Where it runsInside management domain vCenterStand-alone appliances deployed via VCF Operations fleet managementMulti-instance recommendationOne per instanceUp to five instances per Identity Broker applianceAvailability characteristicsRisk of being tied to mgmt vCenter availabilityDesigned for higher availability; handles node failureTypical fitSingle instance, simpler environmentsMulti-instance, larger environments, stronger availability targets

Change management warning: moving from appliance to embedded mode requires resetting the VCF Single Sign-On configuration and re-adding users and groups. Treat the deployment mode decision as day-0.

Challenge: You need shared identity for convenience, but regulated isolation for some tenants

Solutions:

A) Shared enterprise IdP with fleet-wide SSO

Best when you want one login experience across instances in the same fleet.

Biggest tradeoff is blast radius: the SSO scope is large.

B) Cross-instance SSO with multiple Identity Brokers in one fleet

Each Identity Broker serves a defined set of instances.

Reduces SSO blast radius compared to a single broker for the whole fleet.

Constraint: VCF management components (for example, VCF Operations and VCF Automation) can connect to only one Identity Broker instance for SSO, so you must design carefully if you are aiming for multiple identity boundaries under one fleet.

C) Separate fleets for regulated isolation

Strongest governance and identity boundary.

Higher operational overhead: multiple sets of fleet-level management components.

Topology patterns for single site, two sites, and multi-region

Use the design blueprints as your baseline mental model. Then tune.

Challenge: Your topology is not one-size-fits-all

Solutions:

A) Single site with minimal footprint

Best when you need to start small and accept tighter fault domains.

Typical posture:

Single fleet, single instance.

Management components and workloads can be co-located in one cluster for footprint reduction.

Operational reality:

You are trading physical failure-domain isolation for speed and cost.

Plan early if you intend to adopt organization models in VCF Automation that require additional clusters.

B) Two sites in one region

Region in VCF terms is multiple sites within synchronous replication latencies.

Typical posture:

Single fleet, single instance.

Stretched clusters across the two sites for higher availability.

A dedicated workload domain for workloads, with management components protected in the management domain cluster.

Day-2 consequences:

You are now dependent on stretched network and storage behaviors for management plane availability.

You must design first-hop gateway resilience across availability zones for stretched segments.

C) Multi-region

Typical posture:

Single fleet, multiple instances (at least one per region or per major site).

Fleet-level management components run in the management domain of the first instance.

Additional instances bring their own management domain control planes.

Practical design statement:

Recovery between regions is a disaster recovery process. Do not confuse “multi-region” with “active-active without DR work”.

Quick comparison:

TopologyFleet countInstance countTypical SSO scopePrimary operational riskSingle site11Single instance or fleet-wideSmall fault domain, tight couplingTwo sites, one region11Fleet-wide (common)Stretched dependencies for management availabilityMulti-region1+2+Cross-instance or fleet-wideGovernance dependency on where fleet services run

Failure domain analysis

This is the conversation leadership actually needs.

Fleet services failure

If VCF Operations, fleet management, or VCF Automation are impaired:

You lose or degrade centralized lifecycle workflows, automation workflows, and centralized observability.

Instance control planes still exist, but day-2 operations may become more manual.

If VCF Identity Broker is down:

Users from external identity providers cannot authenticate.

You must fall back to local accounts for subsequent operations until Identity Broker is restored.

Instance management domain failure

If an instance management domain is down:

That instance’s domain lifecycle operations are impacted.

Workloads in workload domains may remain running, but you have reduced ability to manage and remediate.

Workload domain failure

If a workload domain’s vCenter or NSX is degraded:

Workloads in that domain take the blast radius.

Other workload domains in other instances are unaffected.

Example RTO/RPO targets you can start with

These are practical starting points to drive a discussion. Adjust to your business requirements.

Fleet services (VCF Operations, fleet management, VCF Automation):

RTO: 4 hours

RPO: 24 hours (aligned to daily backups)

Identity Broker:

RTO: 1 to 2 hours

RPO: 24 hours (align to backup cadence, plus local break-glass accounts)

Instance management domain:

RTO: 2 to 4 hours

RPO: 24 hours

Workload domain:

Driven by application SLAs and data replication strategy

Operational runbook snapshot

Shutdown order matters

In multi-instance environments:

Shut down instances that do not run VCF Operations and VCF Automation first.

The instance running the fleet-level management components should be last.

Within the management domain that hosts fleet services, a typical shutdown sequence starts with:

VCF Automation

VCF Operations

VCF Identity Broker

Instance management components (NSX, vCenter, SDDC Manager)

ESXi hosts

Operational gotcha:

Taking the VCF Operations cluster offline can take significant time. Plan your maintenance windows accordingly.

Backups: get the SFTP target right early

You should treat SFTP backup targets as day-1 prerequisites, not an afterthought.

Configure SFTP settings for VCF management components.

Configure backup schedules for VCF Operations and VCF Automation.

Configure backup schedules for SDDC Manager, NSX Manager, and vCenter at the instance level.

Password lifecycle: know which system is authoritative

You can change passwords for many local users through VCF Operations.

Some password expiration and status information is updated on a schedule; real-time status often requires checking at the instance source (SDDC Manager and related APIs).

You can retrieve default passwords from SDDC Manager using the lookup_passwords command on the appliance.

Before running the lookup_passwords command on SDDC Manager, use this Bash example:

# On the SDDC Manager appliance
sudo lookup_passwords

Fast validation: confirm build levels in your environment

Before you start production onboarding, validate you are actually running the expected code level.

Use this PowerShell example with PowerCLI to validate vCenter and ESXi versions:

# Connect to vCenter
Connect-VIServer -Server <vcenter_fqdn>

# vCenter build and version
$about = (Get-View ServiceInstance).Content.About
[PSCustomObject]@{
Product = $about.FullName
Version = $about.Version
Build = $about.Build
}

# ESXi hosts build and version
Get-VMHost | Sort-Object Name | Select-Object Name, Version, Build

Anti-patterns

Avoid these early, or you will pay in incident response time later:

Treating fleet and instance as synonyms

Fleet is centralized governance and services.

Instance is a discrete VI footprint with its own management domain.

Designing SSO as if SDDC Manager participates

It does not. Plan break-glass access and operational runbooks accordingly.

Choosing embedded Identity Broker for multi-instance and then being surprised by availability coupling

If multi-instance SSO matters, appliance mode is commonly the safer default.

Using one fleet for regulated tenants without validating identity and governance blast radius

Separate fleets remain the cleanest isolation boundary when governance separation is required.

Under-sizing management domains

Fleet services and management components are not free. You will scale them and patch them like any other production system.

Summary and takeaways

Use the official construct hierarchy to keep conversations consistent: private cloud -> fleet -> instance -> domains -> clusters.

Fleet-level management components centralize governance, but they do not collapse instance control planes into a single shared management plane.

Identity design is a day-0 decision. Choose Identity Broker deployment mode and SSO scope intentionally.

Align topology to operations:

Single site is about speed and footprint.

Two-site in one region is about availability with stretched dependencies.

Multi-region is about DR posture and multiple instance management planes.

Conclusion

VCF 9.0 becomes dramatically easier to operate when everyone can point to the same boundaries:

Fleet boundaries for centralized services and governance.

Instance boundaries for discrete infrastructure footprints.

Domain boundaries for lifecycle and workload isolation.

That shared mental model is what lets you scale without scaling confusion.

Sources

VMware Cloud Foundation 9.0 Documentation (VCF 9.0 and later): https://techdocs.broadcom.com/us/en/vmware-cis/vcf/vcf-9-0-and-later/9-0.html

The post VCF 9.0 GA Mental Model Part 2: Fleet Services vs Instance Management Planes (and Who Owns What) appeared first on Digital Thought Disruption.