When Fibre Channel Paths Lie: A Safe Fabric Login Reset Runbook for ESXi

There are storage incidents where the host looks half-recovered.

The fabric switch is back online. The link light is good. The array port is healthy. Some paths may even show up again. But inside ESXi, the storage view still does not match reality. A datastore has fewer paths than expected. An RDM-backed workload is not behaving correctly. A rescan removes a stale entry but does not restore the missing target. The environment is no longer in a clean outage, but it is not healthy either.

That is the operational tension behind a Fabric Login reset.

Broadcom KB 324547 (KB324547) documents the ESXi-side commands to force a Fabric Login reset for native Fibre Channel and FCoE adapters on ESXi 7.x and 8.x, including esxcli storage san fc reset -A vmhbaX for FC and esxcli storage san fcoe reset -A vmhbaX for FCoE. The ESXCLI command reference describes these reset operations as LIP reset commands against a specific FC or FCoE adapter.

That command can be useful.

It can also be dangerous when used as a reflex.

This runbook is not about memorizing the reset syntax. It is about deciding when a FLOGI reset is the right recovery step, when it is the wrong step, and how to guard the blast radius in a vSphere or VMware Cloud Foundation environment where storage visibility is shared, multipathed, and business-critical.

Scenario

Use this runbook when an ESXi host has stale, missing, dead, or misleading Fibre Channel or FCoE path visibility after a fabric, switch, array-port, host-adapter, or reboot-related event.

Typical examples include:

A storage switch rebooted, crashed, failed over, or had a port-level event.

ESXi paths remain Dead, Offline, or missing after the physical fabric appears healthy.

A standard storage rescan updates the UI but does not restore usable paths.

One adapter or fabric side is not discovering targets that should be visible.

A VM using an RDM stopped or failed after a reboot, and the underlying device pathing is suspect.

Broadcom also documents a related dead-path scenario where, after a storage switch event, ESXi may show paths as Dead; a rescan may remove dead entries but fail to bring targets back to Active/On. In that condition, the physical link can be up while the HBA driver fails to initiate the needed FLOGI/PLOGI sequence, leaving the host effectively “awake but blind” to the fabric.

That distinction matters. A link-up state is not the same as healthy end-to-end storage access.

Why This Matters Operationally

In a vCF or enterprise vSphere environment, FC pathing is not just a host-local concern. A single ESXi host may carry management workloads, workload-domain VMs, database VMs with RDMs, shared VMFS datastores, backup proxies, or platform services that other teams assume are stable.

A FLOGI reset acts at the adapter level. It is not a datastore refresh. It is not a zoning correction. It is not an array masking fix. It is a targeted attempt to make the adapter reinitialize its relationship with the fabric.

That means the reset should only happen after the team has answered four questions:

Is the fabric actually healthy now?

Does the host have enough surviving paths to tolerate the reset?

Is the affected adapter correctly identified?

Has a less disruptive rescan failed to restore path state?

The safest reset is the one that happens after you prove it is needed.

Symptoms and Risk Signals

The symptoms usually show up as a mismatch between expected and observed path state.

Area
What You May See
Why It Matters

vSphere Client
Fewer paths than expected, dead paths, missing devices, datastore warnings
UI confirms the host view is degraded but may not explain why

ESXCLI path view
Paths missing, dead, or not claimed as expected
Helps separate UI refresh issues from PSA/NMP path state

SAN fabric
Host WWPN not logged in, stale login, wrong switch port state, zoning mismatch
Confirms whether the issue is outside ESXi

Array
Initiator offline, host object not logged in, LUN masking mismatch
Confirms whether targets should be visible

Workload
VM stun, datastore latency, RDM device unavailable, guest cluster errors
Defines business impact and maintenance urgency

The major risk is using the reset while the adapter still carries the only viable path to a device. In that case, the reset can turn a degraded-but-running state into an application outage.

Decision Workflow: Do Not Start With the Reset

The diagram below shows the intended decision path. The important point is that the reset is late in the workflow, after coordination, evidence collection, fabric validation, and a targeted rescan.

The reset only appears after the host, fabric, and array views have been reconciled. That is what prevents a valid command from becoming an avoidable outage.

Prerequisites and Safety Checks

Before running a Fabric Login reset, collect enough information to know what you are touching.

Confirm the Affected Host and Domain

Document the host, cluster, workload domain, and management impact.

For vCF environments, be especially careful if the affected host runs management components such as vCenter, NSX appliances, SDDC Manager, backup infrastructure, monitoring collectors, or shared services. A storage operation against one host can still have platform-level consequences if the wrong workloads are sitting on it.

Minimum information to capture:

vCenter:
Cluster:
vCF domain:
ESXi host:
Host management IP:
Maintenance mode state:
Critical VMs currently running:
Affected datastores:
Affected RDMs:
Affected adapter:
Affected fabric:

Confirm Redundancy Before Touching the Adapter

Do not reset an adapter until you know whether the affected storage devices still have healthy alternate paths.

Use ESXCLI to list adapters, paths, NMP path ownership, and filesystems. The command reference supports adapter listing, adapter rescans, path listing, NMP path listing, and filesystem listing, which makes these good first-pass inventory commands before any reset.

# List storage adapters
esxcli storage core adapter list

# List FC adapter attributes
esxcli storage san fc list

# List FCoE adapter attributes, if applicable
esxcli storage san fcoe list

# List storage paths visible to the host
esxcli storage core path list

# Show NMP path details, including SATP and PSP information
esxcli storage nmp path list

# Show mounted filesystems visible to the host
esxcli storage filesystem list

For a focused check, filter by the suspected adapter:

# Replace vmhbaX with the suspected adapter
esxcli storage core adapter device list -A vmhbaX
esxcli storage core path list | grep -i vmhbaX
esxcli storage nmp path list | grep -i -A 12 -B 2 vmhbaX

You are looking for a clear answer:

Expected paths per device:
Observed paths per device:
Healthy alternate paths:
Dead or missing paths:
Datastores impacted:
RDM devices impacted:

If you cannot prove alternate path health, stop and treat the reset as a disruptive change.

Coordinate With the SAN and Storage Teams

A FLOGI reset should not be done from the ESXi side in isolation.

Before executing the reset, the SAN team should confirm:

The switch port connected to the host adapter is identified.

The host WWPN maps to the expected switch interface.

The WWPN is present or absent in the fabric login database as expected.

Zoning still includes the correct host and target WWPNs.

The array target ports are online.

LUN masking or host group membership has not changed.

No switch maintenance, array failover, firmware update, or zoning activation is still in progress.

If the fabric has a stale or ghost login, the SAN-side fix may need to happen first. Broadcom’s related dead-path KB specifically calls out bouncing the switch port before the HBA-level reset when a ghost FLOGI session may persist on the switch.

Treat RDMs and Guest Clusters as Higher Risk

RDM-backed workloads deserve extra care. The KB 324547 issue context includes an RDM-backed VM stopping after reboot, which is a reminder that raw-device visibility problems can surface above the hypervisor layer.

Before resetting an adapter that may affect an RDM:

Identify the VM using the RDM.

Confirm whether the guest OS or application cluster expects persistent device visibility.

Coordinate with the application owner.

Avoid guest-level troubleshooting until the ESXi device and path layer is known good.

Do not assume a VM power cycle will fix missing raw-device presentation.

When a FLOGI Reset Is Appropriate

A Fabric Login reset is reasonable when all of the following are true:

Condition
Required Answer

Fabric event has ended
The SAN team confirms the fabric is stable

Array visibility is expected
The array team confirms masking and target ports are healthy

ESXi still has stale path state
Paths are dead, missing, or not rediscovered after rescan

Redundancy exists
Affected devices have healthy alternate paths or workloads are evacuated

Adapter is known
vmhbaX maps to the intended physical HBA/CNA and fabric

Blast radius is limited
One adapter, one host, one controlled change at a time

Rollback path exists
Host reboot, switch-port bounce, vendor support, or workload evacuation is planned

The reset is most defensible when ESXi is the remaining stale component after the fabric and array have recovered.

When a FLOGI Reset Is Dangerous

Do not treat the reset as safe in these cases:

Scenario
Why It Is Dangerous

The affected adapter carries the only live path to a datastore
Resetting it can trigger storage loss

Multiple fabrics are degraded
You may remove the last usable path

Zoning or LUN masking is still being changed
ESXi cannot rediscover targets that are not correctly presented

The host is boot-from-SAN on the affected path
The operational blast radius may include the ESXi boot device

The workload uses RDMs or guest clustering
Guest-visible device loss can have application-level consequences

You are unsure whether the adapter is FC or FCoE
The wrong reset command or wrong adapter can waste time or cause disruption

The issue affects many hosts simultaneously
The root cause is likely fabric, array, or systemic, not one adapter

You are tempted to reset both fabrics at once
That defeats the purpose of multipathing

Also avoid using manual path enable/disable operations as a substitute for understanding the issue. The ESXCLI command reference notes that storage core path set only toggles paths between active and off, and that VMkernel will not change a path state if doing so would cause an all-paths-down condition or affect a device currently in use. That is a guardrail, not a troubleshooting strategy.

Runbook Stage 1: Capture the Baseline

Start with a timestamped baseline. This gives you something to compare after the reset and something useful for the incident record.

hostname
date
vmware -v

esxcli storage core adapter list
esxcli storage san fc list
esxcli storage san fcoe list
esxcli storage core path list
esxcli storage nmp path list
esxcli storage filesystem list

Capture the affected adapter details separately:

# Replace vmhbaX with the adapter under investigation
esxcli storage core adapter device list -A vmhbaX
esxcli storage san fc list -A vmhbaX
esxcli storage san fcoe list -A vmhbaX
esxcli storage core path list | grep -i vmhbaX

The goal is not just to prove something is broken. The goal is to identify exactly which adapter, fabric, devices, and workloads are in scope.

Runbook Stage 2: Try the Lower-Impact Rescan First

Before forcing an adapter reset, try a targeted rescan.

The ESXCLI reference describes storage core adapter rescan as a rescan operation that can search for new devices, remove dead paths, and update path state. It also supports targeting a specific adapter and specifying rescan type, including update, add, delete, and all.

# Update path state for a specific adapter
esxcli storage core adapter rescan –adapter vmhbaX –type update

# Recheck paths
esxcli storage core path list | grep -i vmhbaX
esxcli storage nmp path list | grep -i -A 12 -B 2 vmhbaX

If datastore visibility is part of the symptom, follow with a filesystem scan:

esxcli storage filesystem rescan
esxcli storage filesystem list

If the rescan restores the expected path count and the SAN team confirms the login state is clean, do not proceed to the reset. Document the recovery and monitor.

Runbook Stage 3: Prepare the Host for the Reset

Before executing the reset, reduce workload exposure.

Preferred actions:

vMotion non-essential workloads away from the host if storage pathing allows it.

Avoid starting new provisioning, backup, snapshot, or migration jobs.

Pause host lifecycle operations.

Confirm no array or fabric maintenance is still active.

Notify the application owner for any RDM-backed or latency-sensitive VM.

Reset only one adapter on one host at a time.

Do not reset both fabrics in the same maintenance step.

For a VCF management-domain host, also confirm the placement of management appliances before proceeding. A technically correct adapter reset can still be a poor operational decision if it hits the host currently carrying the management plane during an unstable storage event.

Runbook Stage 4: Execute the Fabric Login Reset

Use the command that matches the adapter type.

For a native Fibre Channel adapter:

# Replace vmhbaX with the affected FC adapter
esxcli storage san fc reset -A vmhbaX

For an FCoE adapter:

# Replace vmhbaX with the affected FCoE adapter
esxcli storage san fcoe reset -A vmhbaX

Broadcom KB 324547 documents those FC and FCoE commands for forcing the Fabric Login reset from an ESXi console or SSH session. The ESXCLI command reference separately confirms that storage san fc reset and storage san fcoe reset operate against a required adapter parameter.

While the reset runs, monitor VMkernel logs from another session:

# Adjust driver names based on the actual adapter in use
tail -f /var/log/vmkernel.log | egrep -i ‘vmhbaX|NMP|FLOGI|PLOGI|lpfc|qlnativefc|fnic|bnx2fc|fcoe’

Do not loop the reset repeatedly. If one controlled reset does not restore the expected login and path state, you are likely dealing with a deeper fabric, driver, firmware, or array-side problem.

Runbook Stage 5: Rescan and Validate

After the reset, run a targeted rescan and recheck path visibility.

esxcli storage core adapter rescan –adapter vmhbaX –type all
esxcli storage filesystem rescan

esxcli storage core adapter device list -A vmhbaX
esxcli storage core path list | grep -i vmhbaX
esxcli storage nmp path list | grep -i -A 12 -B 2 vmhbaX
esxcli storage filesystem list

Validation should include all three views:

View
What to Confirm

ESXi
Expected devices, expected path count, paths active or appropriate for the array policy

SAN switch
Host WWPN logged into the correct fabric port, no stale login, expected zones

Storage array
Host initiator online, target ports healthy, LUN masking unchanged

Do not close the incident just because the vSphere Client looks cleaner. Confirm that the path count, adapter mapping, datastore visibility, and array-side initiator state all agree.

Command Reference Table

Task
Command

List storage adapters
esxcli storage core adapter list

List FC adapter details
esxcli storage san fc list

List FCoE adapter details
esxcli storage san fcoe list

List devices behind an adapter
esxcli storage core adapter device list -A vmhbaX

List all storage paths
esxcli storage core path list

List NMP path details
esxcli storage nmp path list

Rescan one adapter
esxcli storage core adapter rescan –adapter vmhbaX –type update

Force FC adapter reset
esxcli storage san fc reset -A vmhbaX

Force FCoE adapter reset
esxcli storage san fcoe reset -A vmhbaX

Rescan filesystems
esxcli storage filesystem rescan

List visible filesystems
esxcli storage filesystem list

Rollback and Fallback Guidance

A Fabric Login reset does not have a true undo button. The fallback is operational containment.

If paths recover:

Keep the host under observation.

Confirm path stability from ESXi, switch, and array views.

Watch for repeated dead-path events.

Document the pre/post path count.

Open a follow-up problem record if this was triggered by a switch event, driver issue, or firmware condition.

If paths do not recover:

Do not keep repeating the reset.

Stop after one controlled attempt unless vendor support directs otherwise.

Ask the SAN team to validate or bounce the specific switch port if a stale login remains.

Consider placing the host in maintenance mode if workloads can be evacuated.

Plan a host reboot if the adapter or driver remains stuck.

Engage Broadcom and the HBA, switch, or array vendor with logs and timestamps.

If the reset makes pathing worse:

Treat it as a storage-impacting incident.

Protect remaining hosts and fabrics from repeated resets.

Do not reset the peer fabric.

Stabilize workloads first.

Escalate with the captured command output, VMkernel logs, switch login state, and array initiator state.

Incident Record Template

Use a simple template so the next engineer can understand what happened without reverse-engineering the command history.

Incident:
Date/time:
vCenter:
vCF domain:
Cluster:
ESXi host:
Adapter:
Adapter type: FC / FCoE
Fabric:
Switch port:
Host WWPN:
Target WWPNs:
Affected datastores:
Affected RDMs:
Expected path count:
Observed path count before:
Observed path count after:
SAN team validation:
Array team validation:
Command executed:
Result:
Fallback required:
Follow-up problem record:

This is especially useful when the reset works. A successful reset can hide a root cause that still needs attention.

Common Mistakes

The most common failure pattern is resetting the adapter before proving the fabric is healthy.

Other common mistakes include:

Resetting vmhbaX based only on a UI symptom without mapping it to the physical HBA, WWPN, and fabric.

Resetting both fabrics close together.

Treating FCoE as native FC without confirming the adapter type.

Ignoring RDM-backed workloads because the datastore list looks normal.

Assuming a rescan and a Fabric Login reset are equivalent.

Closing the incident after path visibility returns without checking switch and array state.

Running the same reset across multiple hosts when the problem is actually fabric-wide.

The command is simple. The coordination is the hard part.

Conclusion

A Fabric Login reset can be the right tool when ESXi is stuck with stale FC or FCoE path visibility after the underlying fabric has recovered. It can force the adapter to reinitialize its relationship with the fabric and rediscover targets that a normal rescan did not recover.

But it should not be the first move.

The safe sequence is: define the blast radius, verify fabric and array health, prove alternate path coverage, capture ESXi path evidence, try a targeted rescan, reset one adapter only when justified, then validate from ESXi, switch, and array views.

When Fibre Channel paths lie, the goal is not to make the UI look clean. The goal is to restore trustworthy end-to-end storage visibility without turning a degraded pathing problem into a platform outage.

External Sources

Broadcom KB 324547 – Forcing a Fabric Login reset on Fibre Channel and FCoE Adapters: https://knowledge.broadcom.com/external/article/324547/forcing-a-fabric-login-reset-on-fibre-ch.html

Broadcom ESXCLI Command Reference – Storage namespace: https://developer.broadcom.com/xapis/esxcli-command-reference/latest/namespace/esxcli_storage.html

Broadcom KB 430527 – ESXi paths remain dead following storage switch recovery: https://knowledge.broadcom.com/external/article/430527/esxi-paths-remain-dead-following-storage.html

VLAN Design Translation for VMware: Physical Trunks, Port Groups, and Guest Tagging
VLAN issues in VMware environments are rarely caused by one mysterious setting. More often, they come from a translation problem. The network…

Next PostvCLS Retreat Mode: When to Use It, What It Breaks, and How to Exit CleanlyDisable vCLS on a Cluster via Retreat Mode KB 316514 vSphere Cluster Services usually stay in the background until they get in the way of something operational. Most teams first…
The post When Fibre Channel Paths Lie: A Safe Fabric Login Reset Runbook for ESXi appeared first on Digital Thought Disruption.