vCLS Retreat Mode: When to Use It, What It Breaks, and How to Exit Cleanly

Disable vCLS on a Cluster via Retreat Mode KB 316514

vSphere Cluster Services usually stay in the background until they get in the way of something operational.

Most teams first notice vCLS when a cluster task is blocked, a warning appears after maintenance, or a handful of small system VMs show up and someone asks whether they can be deleted. In a VMware Cloud Foundation or vSphere estate, that is exactly the wrong moment to treat vCLS as just another VM cleanup task.

vCLS is not workload inventory. It is part of the cluster services control model.

Retreat Mode exists for specific operational situations where vCLS needs to be temporarily removed from a cluster. It can be useful, but it is also easy to turn a short-lived exception into quiet platform drift. The goal is not simply to disable vCLS. The goal is to understand what you are suspending, why you are suspending it, what cluster behavior changes while it is disabled, and how you will exit cleanly.

Broadcom KB316514 documents the supported Retreat Mode procedure for disabling vCLS on a cluster. It also makes the version split clear: on vSphere versions prior to 9.0, disabling vCLS impacts cluster services; starting with vSphere 9.0, Broadcom states that vCLS can be disabled without impacting vSphere DRS, vSphere HA, or cluster services.

TL;DR

vCLS, or vSphere Cluster Services, provides a cluster-service framework used to help maintain cluster service availability, including services such as vSphere DRS and vSphere HA. Broadcom’s vCLS guidance states that vCLS was introduced in vSphere 7.0 Update 1 and that DRS depends on vCLS health starting with that release.

Use Retreat Mode when a supported VMware/Broadcom procedure requires it, such as a cluster operation blocked by active vCLS VMs, a vCLS cleanup workflow, or a support-directed remediation. Do not use it as a casual way to remove system VMs.

For vSphere 7.x and 8.x, Retreat Mode is disruptive to cluster service behavior. DRS will not function on the cluster, HA may power on VMs on less optimal hosts during failure recovery because it depends on DRS placement recommendations, and vSAN clusters may show Cluster Service health as degraded.

For vSphere 9.0 and later, the operating model changes. Broadcom states that vCenter 9.0 deprecates vCLS, recommends switching vCLS to Retreat Mode to avoid redundant resource usage, and says deactivation does not impact DRS or HA services for vCenter 9.0 and above.

The operational rule is simple: Retreat Mode must have an owner, a reason, a validation plan, and an exit condition.

The Mental Model: vCLS Is Cluster Plumbing, Not Workload Inventory

The easiest mistake is to look at vCLS VMs and think, “These are small system VMs; we can just remove them.”

That view misses the control-plane relationship.

Broadcom describes vCLS as a feature that ensures cluster services such as vSphere DRS and vSphere HA are available to maintain workload resource health in clusters. It also notes that users are not expected to maintain the lifecycle or state of the vCLS agent VMs and should not perform operations on them unless guided by VMware support or supported documentation.

Think of vCLS as part of the cluster-service scaffolding.

It is not a business application.It is not a template.It is not a forgotten VM.It is a managed system component.

The following diagram is the mental model I use when explaining Retreat Mode during operations planning.

What matters in this diagram is the dependency boundary. Retreat Mode does not just remove a VM object. It changes the cluster’s service state. In vSphere 7.x and 8.x, that has direct operational consequences. In vSphere 9.x, Broadcom has changed the dependency model, so the risk profile is different.

Scenario: The Cluster Operation Is Blocked

A common scenario looks like this:

You are working on a vSAN-backed cluster and need to enable or modify Enhanced vMotion Compatibility. The task is blocked because active vCLS VMs cannot be manually powered off. Broadcom lists this as a symptom in KB 316514.

At that point, the right response is not:

“Delete the vCLS VMs.”

The better response is:

“Do we have a supported reason to place this cluster into Retreat Mode, do we understand what cluster services will be affected, and do we have a documented exit plan?”

That distinction matters because vCLS is recreated and managed by the platform. Manual cleanup may appear to solve the immediate visual problem while leaving you with a worse service-state problem.

Scope and Assumptions

This article assumes:

AreaAssumptionPlatformvSphere or VCF-managed vSphere clustersVersionsPrimarily vCenter/vSphere 7.x, 8.x, and version-aware notes for 9.xAudienceVCF administrators, vSphere engineers, platform architects, and operations teamsGoalUse Retreat Mode intentionally, validate the impact, and exit cleanlyOut of scopeManually deleting vCLS VMs outside of a documented KB or support-directed procedureChange controlRetreat Mode should be treated as a cluster-level operational exception

This is not a replacement for Broadcom support guidance. It is a practical operating model for using KB 316514 safely.

What vCLS Does

vCLS exists to support cluster services.

In vSphere 7.0 Update 1 and newer versions, vCLS provides a framework for maintaining the availability of services such as DRS and HA. Broadcom’s vCLS guidance also notes that in vSphere 8.0 U3, VMware introduced Embedded vCLS, which is used when both vCenter and ESXi are updated to 8.0 U3. The earlier model is referred to as External vCLS.

That creates three practical operating models:

Version / ModelWhat Operators Commonly SeeOperational MeaningvSphere 7.x / early 8.x external vCLSSmall vCLS VMs in inventoryvCLS agent VMs support cluster servicesvSphere 8.0 U3 embedded vCLSEmbedded vCLS behavior tied to ESXi/vCenter 8.0 U3Different runtime model, but still part of cluster service healthvSphere 9.xvCLS deprecatedBroadcom recommends Retreat Mode to avoid redundant resource usage, with no DRS/HA impact per current KB guidance

The important point is that vCLS behavior is version-sensitive. A runbook written for vSphere 7.x should not be blindly reused as a vSphere 9.x operating-policy statement, and a vSphere 9.x recommendation should not be backported to vSphere 7.x or 8.x without accounting for the older DRS/HA dependency.

What Retreat Mode Actually Does

Retreat Mode tells vCenter to disable vCLS for a specific cluster.

In KB 316514, Broadcom states that enabling Retreat Mode deletes the vCLS VMs and impacts some cluster services while the cluster is in Retreat Mode.

That is why Retreat Mode should be treated as a controlled state change, not a cleanup preference.

In older versions, this is especially important because Broadcom states that, prior to vSphere 9.0, there is no way to disable vCLS on a vSphere cluster while keeping DRS functional on that cluster.

For vSphere 9.0 and later, the guidance changes. Broadcom states that vCenter 9.0 deprecates vCLS and that disabling it does not impact DRS or HA for vCenter 9.0 and above.

That means your runbook should branch by version.

When to Use Retreat Mode

Use Retreat Mode when one of these conditions is true:

Use CaseWhy Retreat Mode May Be AppropriateA documented Broadcom KB requires itKeeps the change inside a supported procedureEVC or cluster maintenance is blocked by active vCLS VMsKB 316514 specifically calls out EVC modification blocked by active vCLS VMs on a vSAN clustervCLS VMs are stale, orphaned, duplicated, or failing to redeploySome Broadcom remediation workflows use Retreat Mode as part of forcing cleanup and redeploymentBroadcom support directs itSupport may use Retreat Mode to reset cluster-service statevSphere 9.x estate standardizationBroadcom says vCenter 9.0 deprecates vCLS and recommends Retreat Mode to avoid redundant resource usage

One related Broadcom KB describes a stale vCLS condition where deployment fails, DRS becomes non-functional, and the workaround includes disabling vCLS using Retreat Mode, removing stale vCLS VMs, and then re-enabling vCLS so the VMs are recreated.

Another KB for vSphere 8.0 U3 embedded vCLS deployment issues also includes entering Retreat Mode, correcting host configuration, and then enabling system-managed vCLS again.

The pattern is consistent: Retreat Mode is a tool for a specific operational condition, not a substitute for understanding why vCLS is unhealthy.

When Not to Use Retreat Mode

Do not use Retreat Mode just because vCLS VMs are visible in inventory.

Do not use it as a permanent workaround for permissions, monitoring, or reporting annoyances unless the platform version and support guidance explicitly make that acceptable. One Broadcom KB about performance charts notes that setting a cluster to Retreat Mode removed vCLS VMs and made a chart display, but the resolution says leaving the cluster in vCLS Retreat Mode is not ideal and recommends using the vCLSAdmin group instead.

Avoid Retreat Mode when:

Anti-PatternWhy It Is Risky“We do not like seeing vCLS VMs”vCLS is managed system inventory“DRS is not currently used, so it does not matter”The cluster may still rely on HA behavior, future DRS policy, or operational assumptions“We will remember to turn it back on later”That is how temporary exceptions become drift“The vCLS VMs look orphaned, so delete them”Stale vCLS cleanup should follow a documented KB or support procedure“A chart/report works better without vCLS”Fix permissions or tooling visibility rather than disabling cluster services

What Retreat Mode Breaks or Changes

The impact depends on the vSphere version.

vSphere 7.x and 8.x

For vSphere versions prior to 9.0, the impact is significant:

Service AreaExpected ImpactvSphere DRSDRS does not function on that cluster while Retreat Mode is activeLoad balancingWorkloads are not automatically load-balanced by DRSHost maintenanceVMs will not be automatically migrated to other hosts by DRS for maintenance workflowsvSphere HA placementHA can still power on VMs after a host failure, but placement may be less optimal because HA depends on DRS placement recommendationsvSAN healthvSAN clusters may show Cluster Service health as degraded

These are not theoretical caveats. Broadcom lists these impacts directly in KB 316514.

vSphere 9.x

For vCenter 9.0 and later, Broadcom states that vCLS can be disabled without impacting vSphere DRS, vSphere HA, or cluster services.

That does not mean the change should be undocumented. It means the operational reason and expected behavior are different.

For vSphere 7.x and 8.x, Retreat Mode is usually a temporary exception.For vSphere 9.x, Retreat Mode may become an intentional steady-state configuration, but it still needs to be recorded as such.

Runbook: Before You Enable Retreat Mode

Before changing the cluster, capture the following:

CheckWhy It MattersvCenter versionDetermines whether the vSphere 9.x behavior appliesESXi version mixEspecially important for vSphere 8.0 U3 embedded vCLS behaviorCluster name and MoRef/domain IDRequired for older Advanced Settings workflowDRS stateDetermines the operational impact in vSphere 7.x/8.xHA stateHelps assess failure behavior while DRS is unavailablevSAN stateRetreat Mode may show degraded cluster service health on vSANActive maintenance tasksAvoid combining Retreat Mode with unrelated disruptive changesChange ticketPrevents silent driftExit ownerSomeone must be accountable for returning the cluster to normal stateExpiration timeTemporary exceptions should expire by design

For older versions that require the Advanced Settings method, be very careful with the cluster domain ID. KB 316514 says to copy only the domain-c<number> value and warns that using other values, such as the cluster UUID or a combination of cluster ID and UUID, can cause vpxd to fail to start after service restart.

That warning is worth repeating: do not improvise the setting name.

Runbook: Enable Retreat Mode in vSphere 7.0 U3o / 8.0 U2 and Later

Starting in vSphere 7.0 U3o and 8.0 U2, Broadcom documents Retreat Mode as a cluster setting in the vCenter Server UI.

Use the UI path when available:

Go to Hosts and Clusters.

Select the target cluster.

Open the Configure tab.

Under vSphere Cluster Services, select General.

Click Edit vCLS Mode.

Select Retreat Mode.

Confirm the change.

Watch recent tasks for vCLS cleanup.

Confirm that the expected cluster warning appears and that the change ticket reflects the active exception.

This is the preferred path for current 7.x/8.x environments where the UI option exists because it reduces the chance of miskeying the Advanced Setting.

Runbook: Enable Retreat Mode on Older vSphere Versions

For versions before vSphere 7.0 U3o and 8.0 U2, KB 316514 documents the vCenter Advanced Settings method.

The setting format is:

config.vcls.clusters.domain-c<number>.enabled = False

Example:

config.vcls.clusters.domain-c1006.enabled = False

The domain-c<number> value must come from the cluster Managed Object Reference, not the cluster UUID.

The workflow is:

Navigate to the cluster.

Copy only the domain-c<number> part from the cluster URL.

Navigate to the vCenter Server object.

Open Configure.

Go to Advanced Settings.

Edit settings.

Add the setting using this format:

config.vcls.clusters.domain-c<number>.enabled

Set the value to:

False

Save the change.

Watch vCenter tasks for vCLS cleanup.

Confirm the expected cluster warning if DRS is enabled.

Broadcom states that the vCLS monitoring service initiates cleanup of vCLS VMs and that, if DRS is enabled, DRS will not be functional until vCLS is re-enabled.

Optional PowerCLI Helper: Capture the Cluster MoRef Safely

If you are preparing the change record and want to avoid copying the wrong identifier from the URL, you can use PowerCLI to display the cluster MoRef value.

This helper does not enable Retreat Mode. It only retrieves the domain-c<number> value you need to document or use in the Advanced Settings workflow.

# Connect to vCenter first
Connect-VIServer -Server “vcsa01.example.com”

# Replace with the exact cluster name
$clusterName = “Prod-Compute-Cluster”

# Retrieve the cluster object and display the MoRef value
$cluster = Get-Cluster -Name $clusterName
$cluster.ExtensionData.MoRef.Value

Expected output looks like this:

domain-c1006

Use that value in the change ticket and, if required by your vSphere version, in the Advanced Settings key:

config.vcls.clusters.domain-c1006.enabled

This is intentionally a read-only helper. For most environments, especially where the UI supports vCLS Mode, the UI method is cleaner and less error-prone.

Runbook: Validate the Cluster While in Retreat Mode

Once Retreat Mode is active, validate the state instead of assuming the task succeeded.

Check the following:

Validation StepExpected ResultvCLS modeCluster shows Retreat ModevCLS VMsvCLS VMs are removed or absent for the target clusterDRS status on vSphere 7.x/8.xDRS warning or non-functional state is expectedHA statusHA remains configured, but placement caveat is understoodvSAN healthCluster Service health may show degradedChange ticketActive exception is documentedMonitoringAlerts are either expected, annotated, or temporarily suppressed with expiration

KB 316514 also describes how vCLS VMs appear in inventory, including the vCLS (<number>) naming pattern and the vCLS folder under VMs and Templates.

Use that visibility to confirm you are looking at vCLS-managed objects, not workload VMs.

Runbook: Exit Retreat Mode Cleanly

Exiting Retreat Mode is as important as entering it.

For the UI method, return to the same cluster setting and move vCLS back to the normal system-managed mode.

For the older Advanced Settings method, change the value back to:

True

KB 316514 states that to remove Retreat Mode using the Advanced Settings method, you change the value back to True. It also notes that the entry stays in vCenter Advanced Settings and cannot be deleted from the vSphere Client, but there is no issue with keeping the entry.

After re-enabling vCLS, validate:

Validation StepExpected ResultvCLS modeSystem-managed / enabledvCLS VMs or embedded vCLS stateRecreated or healthy, depending on versionDRSFunctional again on vSphere 7.x/8.xHANo unexpected cluster warningsvSAN healthCluster Service health returns to expected stateTasks and eventsvCLS cleanup/recreate tasks completeMonitoringTemporary alert suppressions removedChange ticketClosed with evidence

In stale-vCLS remediation scenarios, Broadcom documents that after re-enabling vCLS, vCLS VMs should be recreated automatically within a few minutes.

Document the Exception So It Does Not Become Drift

The most common Retreat Mode failure is not technical. It is procedural.

Someone enables it for a valid reason.The immediate task succeeds.The cluster warning becomes background noise.Weeks later, another team discovers DRS was not behaving as expected.

That is avoidable.

Use a small exception record every time Retreat Mode is enabled.

vCLS Retreat Mode Exception Record
———————————-

vCenter:
Cluster:
Cluster MoRef / domain ID:
VCF domain:
vSphere / vCenter version:
ESXi version mix:
Requested by:
Approved by:
Change ticket:
Reason for Retreat Mode:
Source KB / support case:
Date and time enabled:
Expected operational impact:
Affected services:
Validation performed after enablement:
Exit owner:
Planned exit date/time:
Exit criteria:
Date and time disabled:
Validation performed after exit:
Notes:

This does not need to be complicated. It just needs to be visible.

For VCF environments, I would also include the domain context:

FieldExampleVCF domainManagement Domain or VI Workload DomainCluster roleManagement, edge, compute, vSAN, stretched, remote siteOperational dependencyNSX Edge placement, management workloads, workload cluster, lifecycle taskSDDC Manager awarenessNote whether this is adjacent to, but not replacing, SDDC Manager workflow

Broadcom’s vCLS guidance notes that VMware Cloud Foundation components such as Cloud Builder and SDDC Manager are not impacted by vCLS in the listed interoperability section, but that does not remove the need to document cluster-local service impact.

In VCF, the practical risk is often not SDDC Manager itself. The risk is that cluster behavior changes beneath the operating model while everyone assumes the domain is in its normal state.

Operational Implications for VCF Teams

In a standalone vSphere environment, Retreat Mode is usually owned by the vSphere administrator.

In VCF, the ownership model is broader. The cluster might support management components, NSX Edge nodes, vSAN storage, workload consolidation, or lifecycle activity. That means Retreat Mode should be coordinated as a platform exception, not just a vCenter toggle.

The questions I would ask before enabling it are:

QuestionWhy It MattersIs this a management domain or VI workload domain cluster?Management clusters have different blast-radius concernsIs DRS relied on for host maintenance or workload balancing?DRS impact matters before vSphere 9.xAre NSX Edge nodes pinned or placed by policy?Cluster placement assumptions may matter during maintenanceIs this a vSAN cluster?vSAN Cluster Service health may show degradedIs this tied to a lifecycle event?Avoid mixing Retreat Mode with unrelated upgrades or patching unless directedWho owns exit validation?Prevents the setting from becoming forgotten drift

The governance point is simple: Retreat Mode is a cluster-level exception with architecture consequences.

Troubleshooting Notes and Edge Cases

The Advanced Setting Entry Cannot Be Deleted

That is expected. KB 316514 says that once Retreat Mode is configured, the cluster entry remains in vCenter Advanced Settings, cannot be deleted from the vSphere Client, and there is no issue with keeping it.

Set it to True when exiting Retreat Mode.

vpxd Fails After an Incorrect Setting Was Added

This is one of the more serious mistakes.

KB 316514 warns that using the wrong value, such as the cluster UUID or a combination of cluster ID and UUID, can cause vpxd to fail to start after restart. The KB provides a recovery approach that removes the vCLS Retreat Mode settings from vpxd.cfg.

This is why I prefer retrieving the cluster MoRef carefully and using the UI method when it exists.

vCLS Does Not Redeploy After Exiting Retreat Mode

Do not keep toggling blindly.

Check whether you are dealing with a stale vCLS condition, an ESX Agent Manager issue, or an embedded vCLS issue on vSphere 8.0 U3. Broadcom has documented cases where malformed ESXi hostnames can prevent embedded vCLS deployment, and the resolution includes correcting the hostname and then enabling system-managed vCLS again.

DRS Is Still Not Functional

Confirm that:

Retreat Mode has actually been exited.

vCLS has returned to a healthy state.

There are no stale or orphaned vCLS VM records.

ESX Agent Manager is healthy.

The cluster is not affected by a known embedded vCLS issue.

One Broadcom stale-vCLS KB explicitly describes DRS becoming non-functional when vCLS deployment fails due to stale VM records.

Decision Matrix: Should You Enable Retreat Mode?

ConditionRecommendationvSphere 7.x/8.x and DRS is critical right nowAvoid unless required; schedule carefullyvSphere 7.x/8.x and Broadcom KB/support requires itProceed with change control and exit planvSphere 8.0 U3 embedded vCLS deployment issueFollow the specific Broadcom KB or support guidancevSphere 9.x and standardizing on vCLS deactivationTreat as intentional steady-state, not an emergency exceptionCharting/reporting issue caused by vCLS visibilityFix permissions/tooling first; Retreat Mode is not idealSomeone wants to delete vCLS VMs manuallyStop and validate against supported documentation

Clean Exit Checklist

Before closing the change, confirm:

[ ] Retreat Mode was disabled or intentionally retained for vSphere 9.x policy
[ ] vCLS mode is System Managed where required
[ ] vCLS VMs or embedded vCLS state are healthy
[ ] DRS is functional where expected
[ ] HA shows no unexpected warnings
[ ] vSAN health is back to expected state
[ ] No stale vCLS VMs remain in inventory
[ ] Monitoring suppressions were removed
[ ] Change ticket includes before/after evidence
[ ] Operational owner has signed off

If any of these checks fail, the change is not complete.

Conclusion

vCLS Retreat Mode is useful because it gives operators a supported way to remove vCLS from a cluster when a specific procedure requires it. It is risky when it is treated as a casual VM cleanup mechanism.

The operational distinction matters.

For vSphere 7.x and 8.x, Retreat Mode can break DRS behavior and affect HA placement quality. For vSphere 9.x and later, Broadcom’s guidance changes because vCLS is deprecated and disabling it no longer impacts DRS or HA. Both realities can be true at the same time, which is why version-aware runbooks matter.

In a VCF environment, the clean operating model is:

Use Retreat Mode only for a clear reason.Record the exception.Understand the service impact.Validate the cluster while it is active.Exit cleanly, or document why the disabled state is now intentional.

The setting itself is easy. The discipline around it is what keeps a temporary exception from becoming permanent platform drift.

External References

Broadcom KB 316514 – Disable vCLS on a Cluster via Retreat Mode: https://knowledge.broadcom.com/external/article/316514/disable-vcls-on-a-cluster-via-retreat-mo.html

Broadcom KB 312147 – vSphere Cluster Services in vSphere 7.0 Update 1 and newer versions: https://knowledge.broadcom.com/external/article?legacyId=80472

Broadcom KB 326366 – Deployment of vCLS VM fails due to duplicate stale vCLS VM: https://knowledge.broadcom.com/external/article/326366/deployment-of-vcls-vm-fails-due-to-dupli.html

Broadcom KB 378133 – Embedded vCLS machines will not deploy if ESXi hostname is malformed: https://knowledge.broadcom.com/external/article/378133/embedded-vcls-machines-will-not-deploy-i.html

Broadcom KB 401771 – Trying to view vCenter performance charts fails due to vCLS permissions issue: https://knowledge.broadcom.com/external/article/401771/trying-to-view-vcenter-performance-chart.html

When Fibre Channel Paths Lie: A Safe Fabric Login Reset Runbook for ESXi
There are storage incidents where the host looks half-recovered. The fabric switch is back online. The link light is good. The array…

The post vCLS Retreat Mode: When to Use It, What It Breaks, and How to Exit Cleanly appeared first on Digital Thought Disruption.