VCF 9.0 GA Mental Model Part 4: Fleet Topologies and SSO Boundaries (Single Site, Dual Site, Multi-Region)

TL;DR

This post targets VCF 9.0 GA only: VCF 9.0 (17 JUN 2025) build 24755599, with GA BOM examples including VCF Installer 9.0.1.0 build 24962180, ESX 9.0.0.0 build 24755229, vCenter 9.0.0.0 build 24755230, NSX 9.0.0.0 build 24733065, SDDC Manager 9.0.0.0 build 24703748, VCF Operations 9.0.0.0 build 24695812, VCF Automation 9.0.0.0 build 24701403, and VCF Identity Broker 9.0.0.0 build 24695128.

Your topology choice is the first big day-0 decision:

Single site usually starts with one fleet + one instance.

Two sites in one region is typically one fleet + one instance with stretched clusters for higher availability.

Multi-region typically becomes one fleet + multiple instances (often one instance per region for latency, sovereignty, and isolation).

Your identity choice is the second big day-0 decision:

Embedded VCF Identity Broker is the simplest and aligns to one broker per instance.

Appliance VCF Identity Broker is a 3-node cluster, recommended for multi-instance SSO due to availability and scale (rule of thumb: up to 5 instances per broker).

SSO model controls blast radius: fleet-wide SSO has the largest login blast radius; per-instance SSO has the smallest.

Fleets are best treated as shared governance and lifecycle scope for management components, not a shared instance management plane. Instances keep their own SDDC Manager, vCenter, and NSX control planes.

Architecture Diagram

Table of Contents

Scope and Code Levels

Assumptions

Scenario

Core Concepts Refresher

Decision Criteria

Challenge: Pick Your Topology

Challenge: Pick Your SSO and Identity Boundaries

Architecture Tradeoff Matrix

Failure Domain Analysis

Who Owns What

Operational Runbook Snapshot

Change Management Considerations

Anti-patterns

Validation

Summary and Takeaways

Conclusion

Scope and Code Levels

This article is written against VCF 9.0 GA terminology and design guidance.

Version Compatibility Matrix

Use this as your “shared truth” when people ask “what exactly are we talking about?”

ComponentVersionBuildVMware Cloud Foundation9.024755599VCF Installer9.0.1.024962180ESX9.0.0.024755229vCenter9.0.0.024755230NSX9.0.0.024733065SDDC Manager9.0.0.024703748VCF Operations9.0.0.024695812VCF Automation9.0.0.024701403VCF Identity Broker9.0.0.024695128

Assumptions

You are greenfield and building your first VCF 9.0 platform.

You deploy both VCF Operations and VCF Automation from day-1.

You need patterns for:

Single site

Two sites in one region

Multi-region

You may need either:

Shared identity (one IdP and unified SSO experience)

Regulated isolation (separate IdPs and separate SSO boundaries)

Scenario

You are trying to align:

Architects who draw the boxes

Operators who keep the lights on

Leaders who approve budgets and risk decisions

Your success metric is simple: everyone uses the same vocabulary and can answer:

“How many independent failure domains do we actually have?”

“What is the blast radius when identity breaks?”

“Do we have one private cloud or multiple?”

“Who is on the hook when something fails at fleet vs instance scope?”

Core Concepts Refresher

Keep these boundaries straight, or your operating model will drift.

Fleet: where you centralize governance and fleet-scoped management services (not your per-instance management planes).

Instance: a discrete VCF deployment footprint with its own management domain and workload domains.

Domain: where lifecycle and isolation are designed to be independently managed (management domain + workload domains).

Clusters: where you scale capacity and availability inside a domain.

A practical rule:

If the question is “How do we run workloads here?” think instance -> domains -> clusters.

If the question is “How do we standardize and govern across footprints?” think fleet.

Decision Criteria

Use these criteria to avoid arguing in circles.

Design-time decisions you should treat as “hard to change”

Fleet count and why each fleet exists

Instance per site/region strategy

SSO scope and identity broker deployment mode

Which components must survive a site failure vs a region failure

Certificate and backup architecture for management components

Day-2 decisions you can iterate on

Adding workload domains

Adding clusters to domains

Adding instances to an existing fleet

Moving from per-instance SSO to cross-instance SSO (note: requires reset and reconfiguration, so treat as a major change event)

Challenge: Pick Your Topology

You are choosing how you scale and isolate your platform. This is about latency, failure domains, and operational boundaries.

Solutions

Option A: Single site (one fleet, one instance)

What it is

One VCF fleet.

One VCF instance in one physical site.

One management domain plus one or more workload domains.

When it fits

You need the fastest path to a production private cloud.

You can tolerate site-level outages as an availability event, not a DR event.

Your first milestone is standardization, not geographic resiliency.

Day-0 decisions

Decide if you will isolate workloads by workload domains early (recommended) or defer.

Decide your SSO deployment mode (embedded vs appliance) based on uptime requirements.

Day-1 posture

Deploy the first instance and its management domain.

Bring up fleet-level services (VCF Operations and VCF Automation) in the management domain of the first instance.

Create workload domains and onboard consumption.

Day-2 posture

Grow by adding workload domains and clusters.

If you later add a second location, you may expand to a multi-site blueprint or add a new instance.

Failure domain notes

A site outage is a full platform outage unless you have a separate recovery site strategy.

Option B: Two sites in one region (one fleet, one instance, stretched clusters)

What it is

One VCF fleet.

One VCF instance.

Two sites in the same metro region, using stretched clusters to increase availability across sites.

When it fits

You want site-level availability without going full multi-region.

Your sites are within synchronous replication latency expectations.

You can support stretched L2/L3 networking requirements.

Day-0 decisions

Decide which clusters must be stretched (management domain, and which workload domains).

Confirm network stretch requirements between sites (this is usually the gating factor).

Confirm your storage and witness design.

Day-1 posture

Deploy the first instance with a topology designed for two sites.

Establish site-affinity and failover capacity expectations.

Day-2 posture

Operate as one instance, but with more complex networking and storage fault domains.

Your change windows become more sensitive to inter-site network events.

Failure domain notes

You trade “simple” for “site resilience.”

Misconfiguring the network stretch becomes your most common real-world failure mode.

Option C: Multi-region (one fleet, multiple instances)

What it is

One VCF fleet.

Multiple VCF instances, typically aligned to regions (or sovereignty boundaries).

Each instance has its own management domain and workload domains.

When it fits

You need regional isolation for latency, sovereignty, or failure domains.

You want regional operations teams to own day-n execution while central IT owns standards.

You can tolerate additional management footprint and change coordination.

Day-0 decisions

Decide which region hosts the initial fleet-level management components and how you protect them.

Decide if you need one fleet or multiple fleets (governance isolation).

Decide your SSO model:

Fleet-wide SSO (largest login blast radius, lowest footprint)

Cross-instance SSO (balanced)

Per-instance SSO (smallest login blast radius, highest footprint)

Day-1 posture

Deploy the first instance and establish fleet services.

Deploy additional instances per region as needed.

Keep workload domains separated from management domains.

Day-2 posture

Central IT drives lifecycle orchestration, global policy, and certificate standards.

Regional IT runs the local instance operations and workload support.

Failure domain notes

Region failure becomes a DR problem, not just an HA problem.

Your fleet-level services location creates a dependency. Plan for it explicitly.

Challenge: Pick Your SSO and Identity Boundaries

In VCF 9.0, identity is not a minor checkbox. It drives operator experience, automation integration, and incident response.

There are two layers you must reason about:

Platform SSO layer: VCF Identity Broker + VCF Single Sign-On (how admins and platform users authenticate to VCF components).

Tenant identity layer (VCF Automation): whether tenants authenticate to the same IdP as the provider, or to different IdPs.

Solutions

Option A: Single VCF Instance Single Sign-On (small blast radius, higher overhead)

What it is

Each VCF instance uses its own dedicated VCF Identity Broker.

SSO scope is limited to that instance.

Users re-authenticate when moving across instances.

Why you choose it

You want smallest login blast radius.

You have regulated boundaries per region/instance.

You can handle higher footprint and operational overhead.

Operational reality

You manage more identity broker deployments and backups.

This can be paired with separate IdPs for strict isolation.

Option B: Cross VCF Instance Single Sign-On (balanced)

What it is

Multiple identity brokers exist.

Each identity broker serves a set of instances in the same fleet.

VCF management components (VCF Operations and VCF Automation) connect to only one identity broker for SSO, so choose that mapping deliberately.

Why you choose it

You want a balance between footprint and blast radius.

You want a more unified operator experience across a subset of instances.

Operational reality

Identity outages affect only the subset of instances tied to that broker.

You still have multiple backup/restore paths.

Option C: VCF Fleet-Wide Single Sign-On (lowest footprint, largest blast radius)

What it is

One identity broker services all instances in a fleet.

Users log in once and move across instances without re-authentication.

Why you choose it

You value simplicity and a unified admin experience.

You accept that identity is a shared dependency for the fleet.

Operational reality

This has the largest login blast radius.

You should strongly consider the appliance deployment mode for availability.

Embedded vs appliance identity broker

This is about availability and maintenance coupling.

Embedded mode

Runs as a service inside the management domain vCenter.

vCenter maintenance impacts your ability to authenticate to VCF components.

Simplest footprint.

Appliance mode

A standalone 3-node identity broker cluster deployed via VCF Operations fleet management.

High availability comes from nodes running on separate hosts.

Operational tasks on vCenter do not impact the authentication stack in the same way.

Recommended for multi-instance SSO due to availability and scale (rule of thumb: up to five instances per broker).

Tenant multi-tenancy identity patterns in VCF Automation

This is where “separate IdP” usually matters first.

Enterprise model

Provider and tenants use the same identity provider.

Simplest for internal enterprise IT.

Service provider model

Provider and tenants use different identity providers.

Better fit for regulated tenants, partner access, or MSP-style separation.

Architecture Tradeoff Matrix

Use this matrix in design reviews to avoid subjective debates.

Decision pointOptionStrengthsTradeoffsPhysical topologySingle siteFastest to deploy, lowest complexityNo site-level resilience by defaultTwo sites in one regionSite resilience with one instanceRequires stretched network/storage design disciplineMulti-regionRegion isolation, scalable org modelHigher footprint, more coordination, DR becomes explicitFleet countOne fleetCentralized governance and consistencyShared governance dependencies, shared change windowsMultiple fleetsStronger governance isolation, separate identity boundaries possibleDuplicate fleet services, more ops overheadPlatform SSO modelFleet-wideLowest footprint, best UXLargest login blast radiusCross-instanceBalanced footprint and blast radiusMore moving parts than fleet-widePer-instanceSmallest login blast radiusHighest footprint and operational overheadIdentity broker modeEmbeddedLowest footprintCoupled to vCenter maintenance, simpler availability storyApplianceHA and scale, decoupled from vCenter maintenanceMore resources and lifecycle tasks

Failure Domain Analysis

Treat these as distinct failure domains with different operational responses.

Fleet service failure domains

VCF Operations / VCF Automation unavailable

Provisioning workflows, centralized operations views, and governance functions degrade.

Your vCenters and NSX managers inside instances still exist, but you lose the consolidated interface and some automation paths.

VCF Identity Broker outage

Impacts logins based on your SSO model:

Fleet-wide: impacts logins for the fleet.

Cross-instance: impacts the subset of instances attached.

Per-instance: impacts only one instance.

Instance failure domains

Management domain outage (inside an instance)

Impacts that instance’s lifecycle and management capabilities.

May also impact authentication if using embedded identity broker in that instance.

Workload domain outage

Impacts workloads isolated to that domain.

Does not necessarily take down the instance management domain.

Who Owns What

Use this chart to stop escalations from bouncing between teams.

Capability / taskPlatform teamVI adminApp/platform teamsChoose fleet count and topology blueprintDefine instance-per-site/region strategyDeploy first instance and management domainDeploy fleet services (VCF Operations + VCF Automation)Create workload domainsDefine SSO model and identity broker modeConfigure VCF Single Sign-On and component registrationProvider identity in VCF AutomationTenant identity in VCF Automation (enterprise vs service provider model)Day-n operations in a region (multi-region) (workload level)Certificate lifecycle standard and toolingBackup and restore strategy for management componentsWorkload onboarding, catalogs, templates, guardrails

Operational Runbook Snapshot

Keep your runbook short and repeatable. This is a starting point.

Daily

Check platform health in VCF Operations (fleet services and connected instances).

Validate identity broker health and login paths.

Verify capacity alarms and failed automation runs.

Weekly

Confirm backups for:

VCF Operations and fleet management services

VCF Automation

Identity broker (appliance mode) or vCenter backups (embedded mode)

Instance core components (SDDC Manager, vCenter, NSX)

Monthly

Review certificate expirations and renewal pipeline.

Review drift and out-of-band changes.

Review tenancy boundaries and entitlement creep.

Incident workflow

Identify scope first:

Fleet services issue vs instance issue vs workload domain issue

For identity incidents:

Identify which identity broker and which SSO model is in use for impacted components.

Decide if this is a login outage only, or also an authorization/role mapping problem.

Change Management Considerations

These are the changes that most often turn into “why is this so hard?”

Identity resets are a major change event

Changing any of the following typically requires a reset and reconfiguration:

Identity broker deployment mode (embedded <-> appliance)

Identity provider changes

Treat this as:

A planned outage window

A rollback-resistant change (because users/groups and component registrations are impacted)

A runbook that includes role and permission re-assignment

Lifecycle sequencing matters

VCF 9.x formalizes separation between:

Management components managed at fleet level

Core components managed at instance level

Even if you are on 9.0 GA today, your day-2 operating model should assume this separation so upgrades do not surprise you later.

Anti-patterns

These are common failure modes that look fine in diagrams but hurt in production.

Designing multi-region without explicitly deciding where fleet-level services live and how you recover them.

Choosing fleet-wide SSO for operator convenience without acknowledging the login blast radius.

Using embedded identity broker in environments where vCenter maintenance windows are frequent and strict uptime is required.

Treating “separate IdP” as enough isolation while keeping everything in one governance boundary.

Letting tenants share identity or entitlements by accident in VCF Automation due to weak onboarding guardrails.

Skipping workload domains and placing consumer workloads in the management domain.

Validation

Validate code levels quickly (vCenter example)

Use this PowerShell snippet to verify vCenter version and build, and spot drift from your declared GA baseline.

# PowerCLI example: validate vCenter version/build
$vc = “vcsa.example.com”
Connect-VIServer $vc | Out-Null

$about = (Get-View ServiceInstance).Content.About
$about | Select-Object Name, Version, Build, FullName

Validate SSO configuration paths (operator check)

In VCF Operations:

Fleet Management -> Identity & Access -> SSO Overview

Verify:

Selected VCF instance

Identity provider configuration

Component configuration state for vCenter, NSX, VCF Operations, VCF Automation

Summary and Takeaways

Use topology blueprints to align teams quickly:

Single site for speed

Two sites in one region for site resilience

Multi-region for sovereignty, latency, and isolation

Treat instances as your discrete infrastructure footprints.

Treat domains as lifecycle and workload isolation units.

Treat fleets as your centralized governance and fleet-scoped lifecycle boundary.

Choose your SSO model based on blast radius tolerance, not just convenience.

Decide early if tenants need separate identity providers, and use VCF Automation provider/tenant identity models intentionally.

Conclusion

You get operational clarity in VCF 9.0 when you design topology and identity as first-class boundaries:

Topology sets your failure domains and scaling ceiling.

Identity sets your operator experience and incident blast radius.

Fleets centralize governance, while instances keep their own management stacks.

Sources

VMware Cloud Foundation 9.0 and later documentation (includes VCF 9.0 Release Notes, Bill of Materials, Design Blueprints, and VCF Single Sign-On models): https://techdocs.broadcom.com/us/en/vmware-cis/vcf/vcf-9-0-and-later/9-0.html

The post VCF 9.0 GA Mental Model Part 4: Fleet Topologies and SSO Boundaries (Single Site, Dual Site, Multi-Region) appeared first on Digital Thought Disruption.