DVS Upgrade Guardrails: What Can Break When Old Distributed Switches Move Forward

A vSphere Distributed Switch upgrade can look deceptively simple in the vCenter UI. Select the switch, choose the target version, confirm the warning, and move on. That is not how it should be treated in a brownfield environment.

The risk is not that a DVS upgrade is always dangerous. The risk is that old distributed switches often sit underneath the exact services you need during a failure: vCenter connectivity, ESXi management, vMotion, vSAN or storage traffic, NSX transport dependencies, backup networks, and production VM port groups. When the switch is old enough to cross the DVS 6.5-or-earlier to DVS 6.6-or-later boundary, Broadcom calls out a specific set of known issues and recommends treating the change with maintenance-window discipline.

Broadcom KB 324546 states that a DVS upgrade affects all hosts attached to that DVS at once, recommends changing DRS to manual before the upgrade, and notes that a safer option may be to create a new DVS at the target version and migrate hosts and objects to it.

This article is a planning guide for that decision point. It is not a click-by-click upgrade walkthrough. The goal is to help you decide what must be documented, validated, staged, and recoverable before anyone upgrades a production distributed switch.

Naming note: Broadcom documentation uses vDS and DVS interchangeably. This article uses DVS throughout for readability.

Scope and Assumptions

This guidance is aimed at vSphere and VMware Cloud Foundation environments where one or more distributed switches are still at DVS 6.5 or earlier, and the environment needs to move forward to DVS 6.6 or later as part of a vCenter, ESXi, VCF, or broader lifecycle path.

The assumed environment looks something like this:

AreaAssumptionPlatformvSphere or VCF-managed estateSource stateOne or more DVS instances at 6.5 or earlierTarget stateDVS 6.6 or later, depending on the lifecycle targetPrimary riskManagement, vMotion, VM, LAG, PVLAN, or switch synchronization issuesOperating modelProduction change window with rollback and recovery planAudienceArchitects, virtualization engineers, platform owners, and VCF operators

There are two important boundaries to keep in mind. First, the DVS upgrade is not the same thing as a host upgrade. You may have upgraded vCenter and ESXi and still be carrying older distributed switch versions. Second, the DVS version can block future vCenter upgrades. Broadcom documents version floors for later vCenter upgrades: vCenter 7.0 requires DVS 6.5 or later, vCenter 8.0 requires DVS 6.6 or later, and vCenter 9.0 requires DVS 7.0 or later. When the upgrade pre-check fails, Broadcom notes that the list of unsupported switches can be found in the vCenter upgrade error output under /var/log/vmware/upgrade/vcdb_req.err.

That makes this more than a networking cleanup task. It is lifecycle debt.

Current State Versus Target State

Before planning the change window, separate the technical upgrade target from the operational migration path. A team may say, “We need the DVS upgraded for vCenter 8.” That is the target. It does not answer whether the safest path is an in-place DVS upgrade, a new parallel DVS, a workload-domain sequence, a management-domain-first plan, or a staged evacuation of high-risk workloads.

Planning AreaCurrent State QuestionTarget State RequirementDVS versionWhich switches are 6.5 or earlier?Each switch meets the version floor for the target vCenter or VCF pathHost membershipWhich hosts attach to each DVS?Blast radius is understood per switchManagement placementIs vCenter or SDDC Manager on a DVS-backed port group?Recovery path exists if vCenter is unavailableVMkernel trafficWhich DVS carries management, vMotion, vSAN, NFS, iSCSI, or replication?Traffic classes are validated after upgradePort group designStatic, ephemeral, VLAN, PVLAN, LAG, NIOC, teaming?Configuration is documented and recoverableMigration methodIn-place upgrade or new DVS?Change path matches risk toleranceRollbackSnapshot, DVS export, host recovery access?Rollback is tested enough to be trusted

The key planning question is not, “Can I click upgrade?” The better question is, “What production dependency disappears if this switch behaves differently for 10 minutes?”

DVS Upgrade Flow at a Glance

The important thing to notice in the diagram below is that the DVS sits under multiple traffic classes. The upgrade decision is not isolated to VM networking. It can affect VMkernel adapters, host membership, vMotion behavior, management access, and workload availability depending on how the environment is built.

This is why the planning conversation has to include more than the virtualization team. Network operations, backup owners, application owners, VCF platform owners, and anyone responsible for out-of-band recovery may need to be part of the change plan.

Why the 6.5-to-6.6 Boundary Deserves Its Own Plan

Broadcom KB 324546 is explicit about the known issue boundary: upgrading from DVS 6.5 or earlier to DVS 6.6 or later introduces several failure modes that should be planned around, not discovered during the change window.

What Can BreakWhy It MattersPlanning GuardrailDVS upgrade fails while vMotion tasks are runningParallel mobility can collide with switch upgrade behaviorFreeze vMotion activity during the DVS upgradeMultiple DVS upgrades run too closely togetherPending tasks or timeouts can leave switches in a bad operational stateUpgrade one DVS at a time and validate before continuingDVS appears out of syncHost and vCenter switch state may not align cleanlyResolve alerts and synchronization issues before upgradevMotion times out during upgradeMobility workflows can fail mid-changeDo not rely on active vMotion during the switch upgrade windowLAG-backed traffic loses connectivityLAG port behavior can affect VMs or vmknics after upgradeInventory LAG-backed port groups and validate them specificallyPVLAN configuration causes upgrade issuesPVLAN-backed workloads may need evacuation or power-off handlingIdentify PVLAN use before the window and plan workload treatmentAll hosts on the DVS are affectedThe switch is a shared failure domainTreat each DVS as a production blast-radius boundary

The operational takeaway is straightforward: do not combine DVS upgrades with uncontrolled DRS, active vMotion waves, unrelated host remediation, or multiple distributed switch upgrades in parallel. That may sound conservative, but it is exactly the kind of conservative that prevents a network-layer lifecycle task from turning into an outage bridge.

Compatibility Guardrails Before the Change Window

Compatibility planning starts with the target lifecycle path. If the DVS upgrade is being done because of a vCenter or VCF upgrade, document the required version floor before deciding the target DVS version. Broadcom documents that vCenter upgrades can fail when unsupported DVS versions are present, and specifically lists DVS version requirements for vCenter 7.0, 8.0, and 9.0.

Lifecycle TargetMinimum DVS Version Documented by BroadcomvCenter 7.0DVS 6.5 or latervCenter 8.0DVS 6.6 or latervCenter 9.0DVS 7.0 or later

That does not mean every environment should jump directly to the newest possible DVS version in one step. Broadcom’s DVS upgrade process guidance recommends handling the 6.5-to-6.6 transition first, validating, and then continuing to later versions such as 7.0.3 where appropriate. It also describes taking a cold snapshot of vCenter from the ESXi host where vCenter resides, setting DRS to manual, upgrading the DVS, validating, and then removing the snapshot after successful validation.

There is also a workload mobility consideration. Broadcom documents that vMotion between DVS instances of different versions is not supported because vCenter checks the DVS versions during vMotion pre-checks. The supported remediation is to upgrade the lower-version DVS to match the higher version or use a cold migration path where needed.

That matters when building a new target DVS. A parallel switch can reduce the risk of changing the existing DVS in place, but it does not eliminate compatibility planning. You still need to understand whether workloads can move live, whether the source and destination DVS versions match, and whether any workaround introduces network-loss risk.

In-Place Upgrade Versus New Target DVS

Broadcom KB324546 explicitly states that to avoid possible issues, you can create a new DVS at the appropriate version and move hosts and network objects to it. It also warns that an in-place DVS upgrade affects all hosts attached to that DVS at once.

That creates a real architectural decision.

OptionBest FitStrengthsRisksIn-place DVS upgradeSmaller environments, well-documented switches, lower complexityFaster, fewer duplicate constructs, less remappingAffects all attached hosts at once; rollback must be strongNew target DVSHigher-risk environments, poor documentation, management-network concern, major redesignEnables controlled mapping and staged migrationRequires more planning, port group parity, uplink mapping, and migration testingHybrid approachMultiple switches or workload domains with different risk profilesAllows low-risk switches in place and high-risk switches by migrationRequires disciplined documentation to avoid inconsistent patterns

The decision should not be made only on the number of hosts. A four-host management cluster with vCenter, SDDC Manager, NSX Managers, DNS, identity, and backup services may be more sensitive than a larger workload cluster with simple VM port groups. In VCF environments, the management domain deserves special handling because it carries the control-plane dependencies used to recover the rest of the platform.

Rollback Is a Design, Not a Button

Rollback planning for a DVS upgrade has three layers. The first layer is vCenter rollback. Broadcom’s DVS upgrade process guidance describes powering off vCenter from the ESXi host where it resides and taking a cold snapshot before upgrading older DVS versions. That gives you a clean vCenter state checkpoint before the switch upgrade begins.

The second layer is DVS configuration backup. Broadcom documents that DVS configurations and port groups can be exported, imported, and restored. Exporting preserves valid network settings in a file that can be used to replicate or restore switch configuration, while restore operations overwrite current switch settings with the file contents. Broadcom also notes that importing a backup from a higher vCenter version into a lower vCenter version is not supported.

The third layer is management recovery. If vCenter is connected to the same DVS it manages, the recovery path can become circular. Broadcom documents that static and dynamic port bindings require vCenter for certain operations, while ephemeral port groups allow the host to create and assign ports directly when vCenter is unavailable. Broadcom also frames ephemeral port groups as a recovery design for management components such as the vCenter Server VM and SDDC Manager VM, with the tradeoff that port-level permissions and historical port context are reduced.

That means rollback should be documented as a sequence, not a vague statement.

Rollback QuestionRequired AnswerWhere is vCenter running?Exact ESXi host, datastore, folder, and port groupIs vCenter on a DVS-backed port group?Yes/no, binding type, VLAN, uplink pathIs there an ephemeral recovery port group?Name, VLAN, teaming, host visibility, validation statusIs the DVS exported?Export path, timestamp, included port groupsIs there a cold vCenter snapshot?Snapshot name, timestamp, owner, removal conditionIs out-of-band host access available?iLO/iDRAC/console access and credentials processWho can modify physical switch ports?Network contact, LACP/LAG owner, escalation pathWhat triggers rollback?Specific failure criteria, not “if things look bad”

A rollback plan that depends entirely on vCenter being healthy is not a rollback plan for a DVS change.

Management-Network Risk in VCF and vSphere Environments

The management network deserves its own section because it is where many teams accidentally build a dependency loop. Broadcom documents the problem clearly: when the vCenter Server VM is connected to the DVS it manages, a vCenter outage can make it difficult to reconnect or reconfigure the VM if the port group uses static or dynamic binding. Ephemeral binding allows the ESXi host to assign the port without vCenter, which creates a recovery path for the vCenter VM.

VCF design guidance also supports ephemeral port binding for management port groups as a recovery option for the vCenter managing the distributed switch. The implication is that some port-level permissions and historical port state controls are reduced across power cycles.

This is not a recommendation to make every port group ephemeral. It is a recommendation to be deliberate.

Management ComponentPlanning QuestionvCenter Server ApplianceIs it on a port group that can be recovered without vCenter?SDDC ManagerIs its management network documented and recoverable?NSX ManagersAre their management networks dependent on the DVS being changed?DNS, NTP, identity, backupAre any required for rollback or validation?ESXi management vmk0Is it on the DVS, and is host console access available?Physical uplinksAre uplinks, VLANs, trunks, and LAGs documented by host?

One nuance matters: Broadcom notes that ESXi management VMkernel ports do not require ephemeral binding in the same way the vCenter Server VM does for recovery scenarios. The design concern is primarily the VM connectivity of the vCenter appliance when vCenter is unavailable.

For a VCF management domain, validate this before the change window, not during it.

Host Sequencing: Treat the DVS as the Failure Domain

A DVS is not upgraded host by host in the same way a remediation baseline or image lifecycle task might be. The DVS object spans its member hosts, and Broadcom warns that the upgrade affects all hosts connected to that DVS at once.

That changes how sequencing should be planned.

Sequencing AreaGuardrailDRSSet DRS to manual before the upgrade and avoid accepting DRS recommendations during the upgrade windowvMotionFreeze planned vMotion activity during the DVS upgradeMultiple DVS instancesUpgrade and validate one DVS at a timeCluster orderStart with the lowest-risk switch or a non-management workload domain when possibleManagement domainTreat as a special case with stronger rollback and recovery validationPVLAN usageIdentify affected workloads and plan evacuation or power-off handling if neededLAG usageValidate LAG-backed VM and VMkernel connectivity before and after the upgradeValidationConfirm host, VMkernel, VM, and physical network paths before continuing

This is also where operations discipline matters. Do not let the DVS upgrade window become a general maintenance window where host patching, storage changes, firewall changes, vCenter upgrades, and application migrations all happen at the same time. When multiple lifecycle tasks overlap, troubleshooting becomes guesswork.

The cleaner pattern is to stabilize the switch and host state, freeze mobility and automation that can change placement, upgrade one DVS, validate host and workload traffic, remove rollback checkpoints only after the validation period, and then move to the next switch. Broadcom’s DVS process guidance also starts with resolving vDS alerts and out-of-sync conditions before upgrade work begins.

Documentation to Capture Before Touching Production

The most valuable artifact before a DVS upgrade is not the change ticket. It is the switch dependency map.

At minimum, capture the following before the maintenance window:

Documentation AreaCaptureDVS inventoryName, version, datacenter, clusters, hosts, uplinks, MTUHost membershipEvery ESXi host attached to each DVSPort groupsName, VLAN, PVLAN, binding type, port count, teaming, failover orderVMkernel adaptersManagement, vMotion, vSAN, NFS, iSCSI, replication, backupWorkload dependenciesCritical VMs, appliances, backup proxies, monitoring collectorsManagement appliancesvCenter, SDDC Manager, NSX Managers, DNS, NTP, identityLAG/LACPLAG names, uplinks, physical switch ports, network ownerPVLANPrimary and secondary VLAN mappings, affected port groupsNIOC and traffic shapingShares, reservations, limits, policiesSecurity settingsForged transmits, MAC changes, promiscuous modeRecovery pathsEphemeral port group, temporary vSS plan, console accessBackupsDVS export, vCenter snapshot, configuration evidence

This documentation should be usable by someone who did not design the original environment. That is the standard. A DVS upgrade plan that requires tribal knowledge is not ready for production.

Tooling and Automation Considerations

Automation should support this change, not hide the risk. Use PowerCLI for inventory capture, comparison, export evidence, and repeatable validation. Avoid using automation to mass-upgrade switches until the manual runbook and rollback plan are proven.

The following example is intentionally report-first. It captures distributed switch versions, host membership, port group binding types, VLAN/PVLAN information, and exports DVS configuration files for review. Broadcom documents DVS export/restore behavior for switch and port group configuration, and PowerCLI cmdlet documentation exposes distributed port group properties such as port binding, including static and ephemeral binding types.

# DVS upgrade preflight capture – report only
# Update $vCenter and $OutputRoot for your environment.

$vCenter = “vcsa01.example.local”
$OutputRoot = “C:TempDVS-Upgrade-Preflight-{0}” -f (Get-Date -Format “yyyyMMdd-HHmm”)

New-Item -ItemType Directory -Path $OutputRoot -Force | Out-Null

Connect-VIServer -Server $vCenter

# Capture DVS version, MTU, and host membership.
Get-VDSwitch | ForEach-Object {
$vds = $_
$hosts = @(Get-VMHost -DistributedSwitch $vds | Sort-Object Name)

[pscustomobject]@{
VDSwitch = $vds.Name
Version = $vds.Version
MTU = $vds.Mtu
NumHosts = $hosts.Count
Hosts = ($hosts.Name -join “;”)
}
} | Export-Csv `
-Path (Join-Path $OutputRoot “vds-switch-hosts.csv”) `
-NoTypeInformation

# Capture port group binding and VLAN/PVLAN details.
Get-VDSwitch | ForEach-Object {
$vds = $_

Get-VDPortgroup -VDSwitch $vds | ForEach-Object {
$pg = $_
$cfg = $pg.ExtensionData.Config
$vlan = $cfg.DefaultPortConfig.Vlan

[pscustomobject]@{
VDSwitch = $vds.Name
PortGroup = $pg.Name
Binding = $cfg.Type
NumPorts = $cfg.NumPorts
VlanSpecType = $vlan.GetType().Name
VlanId = if ($vlan.PSObject.Properties.Match(“VlanId”).Count) { $vlan.VlanId } else { $null }
PvlanId = if ($vlan.PSObject.Properties.Match(“PvlanId”).Count) { $vlan.PvlanId } else { $null }
}
}
} | Export-Csv `
-Path (Join-Path $OutputRoot “vds-portgroups.csv”) `
-NoTypeInformation

# Export each DVS configuration with port groups.
# Validate the export files and store them with the change record.
Get-VDSwitch | ForEach-Object {
$safeName = $_.Name -replace ‘[\/:*?”<>|]’, ‘_’
$destination = Join-Path $OutputRoot “$safeName-with-portgroups.zip”

Export-VDSwitch `
-VDSwitch $_ `
-Description “Pre-upgrade DVS export $(Get-Date -Format s)” `
-Destination $destination `
-Force
}

Disconnect-VIServer -Server $vCenter -Confirm:$false

Successful execution should produce CSV files and DVS export archives under the output folder. The CSVs become the human-readable planning artifacts. The export files become part of the rollback evidence.

What can go wrong? The script can only report what vCenter can see. If a host is disconnected, a switch is out of sync, permissions are incomplete, or the PowerCLI session is pointed at the wrong vCenter, the evidence can be misleading. Treat the output as a starting point for review, not as a substitute for operational validation.

Phased DVS Upgrade Planning Sequence

A strong DVS upgrade plan should move through phases. The phases below align with Broadcom’s recommendation to resolve switch health issues first, take a vCenter rollback checkpoint, move DRS to manual, upgrade the DVS, validate, and only then remove rollback protection.

Phase 0: Confirm the Lifecycle Driver

Identify why the DVS needs to move forward. Is this required for a vCenter upgrade? A VCF lifecycle path? A hardware refresh? A new workload domain standard? A security baseline? Document the target DVS version and the minimum required version separately.

Phase 1: Inventory the Switch Dependency Map

Capture every DVS, version, host, port group, VMkernel adapter, uplink, VLAN, PVLAN, LAG, and management appliance dependency. Pay special attention to switches carrying management traffic or VCF control-plane components.

Phase 2: Decide In-Place Versus New DVS

Use the risk profile to decide the method. In-place is simpler but has a broader immediate blast radius. A new DVS allows cleaner mapping and staged movement, but requires more design discipline and can introduce mobility constraints when DVS versions differ. Broadcom documents that vMotion between different DVS versions is not supported, so this decision must include workload migration behavior.

Phase 3: Build the Rollback and Recovery Path

Take the vCenter checkpoint according to the approved process. Export DVS configurations. Validate host console access. Confirm the management recovery design. If vCenter is on the DVS it manages, validate the ephemeral recovery port group or an equivalent temporary standard-switch recovery procedure before the window. Broadcom documents both the ephemeral recovery pattern and a temporary standard switch recovery approach for situations where vCenter networking is unavailable.

Phase 4: Freeze Placement Automation

Set DRS to manual. Stop planned vMotion activity. Pause automation that could move workloads or reconfigure networks during the upgrade. This includes lifecycle workflows, backup proxy relocation, load-balancing automation, and any operational process that may trigger vMotion during the window.

Phase 5: Upgrade One DVS and Validate

Upgrade a single DVS, then validate before continuing.

Validation AreaTestvCenter accessUI, API, appliance management, DNS/NTP reachabilityESXi managementHost reconnects, management vmk ping, hostd/vpxa healthvMotionControlled post-upgrade test only after the DVS upgrade completesvSAN/storagevmk reachability, cluster health, storage pathsVM networksCritical workload reachability and monitoringLAGUplink state, load distribution, physical switch healthPVLANAffected VM connectivity and isolation behaviorDVS stateNo out-of-sync or host switch warnings

Phase 6: Document the New State

After validation, update the source-of-truth record. Do not leave the environment with upgraded switch versions but stale diagrams, stale port group documentation, or old rollback assumptions.

Phase 7: Continue to the Next DVS or Lifecycle Step

Only move forward after the first DVS is stable. If the DVS upgrade is a prerequisite for vCenter or VCF lifecycle work, treat this as a completed dependency before starting the next lifecycle phase.

Risk Register for the Change Plan

A DVS upgrade risk register should be short enough to use and specific enough to matter.

RiskTriggerMitigationvCenter loses network accessvCenter VM is on affected DVS-backed static port groupEphemeral recovery PG or temporary vSS recovery planvMotion fails or times outvMotion occurs during DVS upgradeFreeze mobility during upgrade, test after completionDVS upgrade failsSwitch is out of sync or parallel task conflicts existResolve alerts first, upgrade one DVS at a timeVM network loss on LAG-backed PGLAG-related known issue after upgradeInventory LAG use and validate affected PGs specificallyPVLAN-backed workloads block upgradePVLAN feature edge caseIdentify PVLAN usage and plan evacuation or power-off if neededvCenter upgrade blocked laterDVS version below required floorValidate DVS versions against target vCenter requirementsRollback cannot be executedRecovery depends on the failed vCenter networkProve console, snapshot, export, and recovery PG access before changeDocumentation driftSwitch config changes without source-of-truth updateAttach exports, reports, and validation evidence to the change record

This is the level of specificity that makes a maintenance window executable.

Practical Guardrails Before Production

Before approving the production change, the plan should satisfy these guardrails:

GuardrailRequired EvidenceCompatibility confirmedTarget vCenter or VCF path mapped to minimum DVS versionDVS inventory completeSwitch, host, port group, uplink, VLAN, PVLAN, and LAG reportManagement recovery validatedvCenter and SDDC Manager network recovery path documentedDRS and vMotion frozenChange steps explicitly disable or pause placement activityRollback readyCold vCenter snapshot, DVS exports, and restore criteriaOne-switch sequencingNo parallel DVS upgrades unless explicitly justifiedPhysical network alignedLAG, trunk, VLAN, and uplink ownership confirmedValidation scripted or documentedPost-upgrade checks defined before the windowDocumentation update requiredNew DVS version and configuration evidence captured after success

The upgrade itself may be a short action. The planning should not be.

Conclusion

A distributed switch upgrade is not just a version bump. It is a change to the network abstraction shared by hosts, VMkernel adapters, management appliances, and workloads. The DVS 6.5-or-earlier to DVS 6.6-or-later boundary deserves special handling because Broadcom documents known issues around DRS, vMotion, multiple DVS upgrades, LAG behavior, PVLAN handling, and switch synchronization. The safest teams treat that boundary as a lifecycle event with compatibility review, rollback design, host sequencing, management-network recovery, and documentation before production changes begin.

The practical path is simple: inventory first, validate compatibility, design rollback, protect management access, upgrade one DVS at a time, and document the new state. That is how an old distributed switch moves forward without turning into the outage no one wanted to own.

External Links

SourcePurposeBroadcom KB 324546 — Known issues when upgrading a DVS from 6.5 or earlier to DVS 6.6 or laterPrimary source for the known DVS 6.5-to-6.6+ upgrade issuesBroadcom KB 413452 — Upgrade vSphere Distributed Switch prerequisites and processDVS upgrade process, DRS guidance, vCenter snapshot guidance, and validation sequenceBroadcom KB 318256 — Source vCenter Server has instances of Distributed Virtual Switch at unsupported versionsDVS version requirements for vCenter upgrade pathsBroadcom KB 318582 — Migrating a virtual machine between two DVS of different versions is not supportedvMotion compatibility limitation between DVS versionsBroadcom KB 2034602 — Export, import, and restore vSphere Distributed Switch configurationsDVS configuration backup and restore behaviorBroadcom KB 324492 — Static, non-ephemeral, or ephemeral port bindingPort binding behavior and ephemeral recovery contextBroadcom KB 426300 — Moving vCenter Server to an ephemeral port group on VDSvCenter recovery pattern when vCenter is connected to the DVS it managesBroadcom KB 318083 — VMware Cloud Foundation design decision for ephemeral management port bindingVCF management-domain design context for ephemeral port bindingBroadcom KB 318719 — vCenter network connectivity lost recoveryTemporary standard switch recovery path for vCenter network lossBroadcom PowerCLI — New-VDPortgroup cmdlet referencePowerCLI reference for distributed port group configuration and binding behavior

VM Network Troubleshooting from Guest OS to Uplink: A Layer by Layer VMware Runbook
Virtual machine network problems rarely arrive with a clean label. The ticket usually says something like “the VM is unreachable,” “the application…

The post DVS Upgrade Guardrails: What Can Break When Old Distributed Switches Move Forward appeared first on Digital Thought Disruption.