The vCenter Log Partition Runbook: Find Growth, Preserve Evidence, Restore Headroom

A full /storage/log partition on a vCenter Server Appliance is not just a housekeeping problem. It is a management-plane risk.

In a standalone vSphere environment, it can interrupt administration, log collection, patching, and service stability. In VMware Cloud Foundation, the blast radius is larger because vCenter is tied into SDDC Manager workflows, workload domain lifecycle operations, and operational visibility.

Broadcom KB313077 calls out /storage/log exhaustion directly and warns that deleting critical files can prevent the vCenter Server Appliance from working, while resizing appliance disks carries data corruption risk if handled poorly.

The wrong response is to SSH into the appliance and start deleting large files until the alarm clears.

The right response is a runbook: confirm the partition, preserve enough evidence to understand the cause, collect support data without making the partition worse, restore safe headroom, and then prevent the same pattern from returning.

This article walks through that workflow.

What This Runbook Helps You Accomplish

By the end of this runbook, you should be able to:

Confirm whether the issue is actually /storage/log.

Identify the largest files and highest-growth directories.

Preserve triage evidence before cleanup.

Collect support bundles safely when /storage/log is already constrained.

Remove low-risk temporary files such as downloaded support bundles after validation.

Avoid deleting active logs, database files, or appliance-critical content blindly.

Decide when cleanup is enough and when the log partition needs resizing.

Add monitoring so the next incident is caught before vCenter services are at risk.

The article assumes vCenter Server Appliance, not the retired Windows-based vCenter deployment model. Broadcom’s current log-location reference notes that vCenter Server Appliance logs are organized under /var/log/vmware/<Service Name>, while this runbook focuses on the appliance log partition mounted at /storage/log.

Scenario: The Alarm Is Already Firing

The common alert path looks familiar:

VAMI shows /storage/log at high utilization.

vSphere Client reports Log Disk Exhaustion.

Log bundle export fails or produces an incomplete bundle.

vCenter services are unstable or unavailable.

In worse cases, the vSphere Client returns 503 Service Unavailable.

Broadcom KB 313077 lists the Log disk exhaustion alarm, VAMI /storage/log warnings, possible 503 Service Unavailable errors, and issues viewing some vSAN-related options when the log partition is under pressure. It also documents VAMI warning behavior at 75% continuous usage and red critical alarms at 85%.

There is also a broader vCenter disk-space behavior to keep in mind: Broadcom’s VCSA disk-space KB says vCenter disk warnings commonly begin around 80%, and when vCenter file systems reach 95%, vmware-vpxd may be turned off automatically to help protect the database from corruption.

That is why the runbook should not start with cleanup.

It starts with containment and evidence.

Why /storage/log Matters in VCF Operations

In VMware Cloud Foundation, vCenter is not an isolated management appliance. It participates in lifecycle workflows, upgrade prechecks, health checks, and domain operations.

Broadcom documents a VCF case where a vCenter upgrade triggered from SDDC Manager failed because /storage/log was full before or during the upgrade. The remediation path included freeing space in /storage/log and retrying the SDDC Manager workflow.

That makes /storage/log capacity a lifecycle readiness item, not just a reactive alarm.

A practical operating model should treat log partition headroom as part of:

vCenter health.

VCF upgrade readiness.

Supportability.

Evidence preservation.

Monitoring and alert response.

Change-window preparation.

If the first time the team checks /storage/log is during a failed upgrade or an active support case, the operational process is already behind.

The Runbook at a Glance

The diagram below shows the flow. The key point is the separation between discovery, evidence preservation, cleanup, and prevention. Those steps are often collapsed during an outage, which is how useful evidence gets deleted before anyone understands the root cause.

Prerequisites and Safety Checks

Before touching files, confirm the operational state.

You need:

Root or approved administrative access to the vCenter Server Appliance.

Console access or SSH access.

A recent, restorable vCenter backup.

Awareness of Enhanced Linked Mode or VCF domain dependencies.

A support case reference if this is already tied to a Broadcom SR.

Enough free space on another appliance partition or external destination if you need to preserve evidence or generate a support bundle.

Broadcom KB 313077 explicitly warns to ensure a good VAMI file backup, VADP backup, or both before deleting files or resizing VCSA disks. If resizing becomes necessary, Broadcom’s disk-space guidance also warns that a good backup is critical before resizing appliance disks.

A few practical guardrails:

Do not delete files from /storage/db, /storage/seat, /storage/core, or / just because they appear in a generic disk report. This article is scoped to /storage/log.

Do not delete active log files without preserving evidence. Removing an active file can also fail to reclaim space until the process releases the file handle.

Do not treat /storage/archive the same way as /storage/log. Broadcom’s VCSA disk-space KB notes that /storage/archive being high or full can be expected by design in vCenter 6.7 and later.

Do not resize first unless you have already determined the partition is genuinely undersized or you need emergency headroom. Resizing can hide a runaway logging condition without fixing the cause.

Stage 1: Confirm the Partition and Current Risk

Start with read-only checks.

# If you land in the appliance shell, enable and enter Bash.
shell.set –enabled True
shell

# Confirm filesystem usage.
df -h

# Show only filesystems at or above the investigation threshold.
df -h | awk ‘0+$5 >= 78 {print}’

# Confirm inode usage as well.
df -ih

# Focus directly on the log partition.
df -hT /storage/log
df -ih /storage/log

Broadcom’s VCSA disk-space KB uses df -h and an awk filter around 78% to catch partitions that may temporarily cross warning thresholds and then fall back below the visible alarm threshold.

What you are looking for:

CheckGood SignRisk Sign/storage/log spaceBelow warning threshold with available GB75–85% and rising/storage/log inodesPlenty of free inodesHigh inode use with many small filesOther partitionsNormal usage/storage/db, /storage/seat, /, or /storage/core also highvCenter servicesRunningvmware-vpxd stopped or client unavailableTimingStable usageRapid growth over minutes

If multiple partitions are high, do not use a /storage/log runbook as a universal cleanup procedure. Identify each partition separately.

Stage 2: Preserve Evidence Before Cleanup

A full log partition is often a symptom, not the root cause. It may be caused by support bundles left behind, unusually high event volume, a service repeatedly writing errors, failed cleanup behavior, or a version-specific known issue. Broadcom KB 313077 lists potential causes including log bundles not being cleared, high-frequency events, services failing to clean up files, and an undersized /storage/log partition.

Before deleting anything, capture a small triage evidence package.

Use a destination that has enough space. The example below uses /storage/core, but only use it after confirming it has headroom.

# Confirm there is space before using /storage/core.
df -h /storage/core

RUNID=”$(date +%Y%m%d-%H%M%S)”
EVDIR=”/storage/core/vc-log-triage-${RUNID}”

mkdir -p “$EVDIR”

hostname -f > “$EVDIR/hostname.txt”
date -Is > “$EVDIR/date.txt”

df -hT > “$EVDIR/df-hT.txt”
df -ih > “$EVDIR/df-ih.txt”

du -xhd1 /storage/log 2>/dev/null | sort -hr > “$EVDIR/storage-log-topdirs.txt”

find /storage/log -xdev -type f
-printf ‘%s %TY-%Tm-%Td %TH:%TM %pn’ 2>/dev/null
| sort -nr
| head -100 > “$EVDIR/storage-log-topfiles.txt”

find /storage/log -xdev -type f -size +100M
-exec ls -lh {} ; 2>/dev/null
> “$EVDIR/storage-log-files-over-100M.txt”

tar -czf “${EVDIR}.tgz” -C “$(dirname “$EVDIR”)” “$(basename “$EVDIR”)”

ls -lh “${EVDIR}.tgz”

This does not preserve every log file. It preserves enough state to explain what was consuming space before remediation started.

That distinction matters. In a production incident, you often need to restore headroom quickly, but you still want a record of what was large, where it lived, when it was modified, and which partition was actually full.

Also remember that support bundles can include sensitive operational data. Broadcom notes that support bundle collection can include logs, configuration files, product-specific data, and core dump material depending on the product and collection method, so these files should be handled according to corporate policy.

Stage 3: Find What Is Consuming /storage/log

Now move from partition-level confirmation to file-level discovery.

Broadcom KB 313077 recommends running a largest-file search from /storage/log using find, du, sort, and head. The broader VCSA disk-space KB also recommends du, find for files larger than 100 MB, and a directory file-count check when many small files are the issue.

Use these commands as the working set:

cd /storage/log

# Top large files.
find . -type f -print0 | xargs -0 du -h | sort -rh | head -n 20

# Top directories on the same filesystem.
du -xhd1 /storage/log 2>/dev/null | sort -hr | head -n 20

# Files larger than 100 MB.
find /storage/log -xdev -type f -size +100M -exec ls -lh {} ; 2>/dev/null

# Directories with high file counts.
find /storage/log -xdev -type d -exec sh -c ‘
count=$(find “$1” -maxdepth 1 -type f 2>/dev/null | wc -l)
if [ “$count” -gt 100 ]; then
echo “$count $1”
fi
‘ sh {} ; | sort -nr | head -n 30

You are trying to classify the pattern, not just identify the biggest number.

Common patterns look like this:

PatternWhat It Usually MeansFirst ResponseLarge *.tgz filesSupport bundles or exported diagnostics left behindConfirm downloaded, then remove selected bundle filesOne huge *.log, *.stdout, or *.stderrActive service logging repeatedlyPreserve evidence, identify service, check known KBsThousands of small filesRotation, cleanup, or service behavior issueCount files, identify directory, check version-specific KBsGrowth during log exportSupport bundle generation is consuming /storage/logGenerate bundle elsewhere using vc-support -wNormal logs but tiny partitionPartition undersized for current version or workloadConsider supported resize after backup

Do not stop at “file is large.” Ask why it is large.

Stage 4: Map the Culprit to a Known Issue or Behavior

Broadcom KB 313077 includes a known-issue matrix for /storage/log growth across vCenter versions and file patterns. The KB points to examples involving SSO logs, excessive pod-startup logs, analytics logs, EAM web logs, SPS runtime logs, support bundles, Java heap dumps, and WCP stream logs.

Your triage output should give you three useful data points:

The exact file path.

The vCenter version and build.

Whether the file is active growth, stale artifact, or many-file accumulation.

Capture version information:

vpxd -v 2>/dev/null || true
cat /etc/applmgmt/appliance/software_update_state.conf 2>/dev/null || true

Then classify the culprit:

Culprit TypeExample Investigation QuestionCleanup PostureStale support bundleWas this generated through vSphere Client or VAMI and already downloaded?Usually safe to remove after confirmationActive service logIs a service repeatedly writing errors?Preserve sample, address service/root causeKnown version defectDoes the file match a Broadcom KB for this build?Follow the specific KB remediationUndersized partitionIs normal log behavior exceeding an old disk layout?Resize after backup and snapshot checksEvent stormAre tasks, events, or integrations generating excessive log volume?Fix upstream behavior, not just the file

This is where the runbook becomes more than Linux cleanup.

You are connecting disk consumption to vCenter behavior.

Stage 5: Collect Logs Without Making /storage/log Worse

Support bundle collection can make the problem worse if it writes into an already-constrained /storage/log.

Broadcom’s diagnostic collection KB documents vCenter support bundle collection through the vSphere Client, while another Broadcom KB explains that support bundles generated from the vSphere Client are created in /storage/log and can fail when that partition has high usage or low free space.

When /storage/log does not have enough space, use an alternate destination. Broadcom documents that vc-support generates bundles in /storage/log by default, but the -w option can place the bundle on another filesystem with more available space.

# Check available space before choosing a destination.
df -h

# Default behavior when /storage/log has enough headroom.
vc-support -l

# Alternate destination when /storage/log is constrained.
vc-support -w /storage/core

Use /storage/core only if it has enough free space and your operational policy allows it. If /storage/core contains large core files, review policy before deleting or moving anything.

If vSAN log collection is causing collection time or size issues, Broadcom also documents manifest-based vc-support options and examples for including or excluding specific manifests, including use of -w /storage/core with manifest exclusions.

Stage 6: Restore Headroom Safely

Once you know what is consuming space, use the least destructive remediation that restores enough headroom.

Option A: Remove Confirmed Support Bundles

Support bundles are one of the cleaner cleanup targets, but only after you confirm they are no longer needed or have been downloaded.

Broadcom documents a vCenter 8.0 issue where support bundle files exported through the vSphere Client might not be removed from /storage/log after download. The documented workaround is to manually remove the file from /storage/log after export or use VAMI cleanup after download.

Start with inspection:

find /storage/log -maxdepth 1 -type f -name “*.tgz” -exec ls -lh {} ;

Then remove only the selected file or files:

# Use exact filenames where possible.
rm -i /storage/log/<confirmed-support-bundle>.tgz

Avoid broad deletion until you understand naming and retention in your environment.

Option B: Remove Old Rotated Logs Only After Policy Review

If the partition is filled with old compressed rotated logs, confirm whether they are still needed for troubleshooting, audit, or support.

Inspect first:

find /storage/log -xdev -type f
( -name “*.gz” -o -name “*.zip” -o -name “*.old” )
-mtime +30 -exec ls -lh {} ;

Then remove selected files only after approval:

rm -i /path/to/selected/old-rotated-log.gz

This is not the first cleanup option during an unresolved incident. Old rotated logs may still be useful if the issue has been building over time.

Option C: Handle a Huge Active Log Carefully

If one active log is consuming the partition, do not blindly delete it.

Preserve metadata and a sample first:

FILE=”/storage/log/path/to/large-active.log”
RUNID=”$(date +%Y%m%d-%H%M%S)”
EVDIR=”/storage/core/vc-log-active-file-${RUNID}”

mkdir -p “$EVDIR”

stat “$FILE” > “$EVDIR/file.stat.txt”
head -n 500 “$FILE” > “$EVDIR/file.head.txt”
tail -n 5000 “$FILE” > “$EVDIR/file.tail.txt”

Then identify which service owns the log, check whether the exact filename and vCenter build map to a Broadcom known issue, and remediate the cause. If you must truncate a log in an emergency, treat it as an exception and document the approval:

# Emergency-only after evidence preservation and approval.
: > “$FILE”

Truncation preserves the file path and inode, which is usually less disruptive than deleting an active file, but it still destroys log history. Use it only when the operational risk of a full partition is higher than the loss of local log content.

Option D: Resize /storage/log When It Is Truly Undersized

Sometimes cleanup is not enough. Older appliances upgraded across multiple major versions can have a log partition that is too small for current support bundle and logging behavior.

Broadcom’s KB for vCenter 8.0 states that from 8.0 U3h onward, /storage/log is 50 GB by default for fresh deployments, major upgrades from 7.x to 8.x, and patching using reduced downtime upgrade. It also recommends increasing /storage/log to 50 GB after in-place updates to 8.0 U3h and later.

Before resizing:

Confirm you have a current backup.

Confirm there are no VM snapshots blocking disk expansion.

Confirm the exact virtual disk mapped to /storage/log.

Follow the Broadcom procedure for your version.

Do not resize the wrong VMDK.

Broadcom’s 50 GB KB calls out a full file-level backup before extending disks and removing snapshots if disk extension is unavailable.

Stage 7: Validate Recovery

After cleanup or resizing, validate from the appliance and the management interfaces.

df -hT /storage/log
df -ih /storage/log

# Check service state.
service-control –status –all

If vmware-vpxd or related services stopped because disk pressure crossed the protection threshold, do not restart services until /storage/log has meaningful headroom and the growth source is controlled.

When ready:

service-control –start vmware-vpxd

Then validate:

VAMI health status.

vSphere Client login.

Recent tasks and events.

SDDC Manager inventory and health, if this is a VCF environment.

Any failed lifecycle workflow that may need retry.

Whether /storage/log continues growing after cleanup.

A good recovery is not “the alarm is gone.”

A good recovery is “the alarm is gone, services are stable, support data is preserved, and the growth pattern is understood.”

Stage 8: Prevent Recurrence With Monitoring

Broadcom’s monitoring guidance for vCenter appliance disks includes monitoring disk use, configuring alarms, setting email alerts, managing statistics level, retention policy, and burst events. It also notes that alarms specific to /storage/seat and /storage/log filling were introduced in vCenter Appliance 6.7.

At minimum, monitor:

SignalWarningCriticalWhy It Matters/storage/log capacity75%85%Aligns with VAMI behavior for log disk exhaustion/storage/log inodes75%85%Catches many-small-file failuresGrowth rate>5% per day>10% per dayDetects runaway logging earlySupport bundle countAny stale bundleMultiple stale bundlesPrevents artifacts from consuming logsvCenter service healthAny stopped servicevmware-vpxd stoppedProtects management-plane availabilityUpgrade readinessPrecheck warningLifecycle failurePrevents VCF upgrade disruption

Here is a simple external SSH-based monitoring example. This is intentionally read-only and should run from an approved monitoring host. In production, prefer your standard monitoring platform, VMware Aria Operations / VCF Operations, SNMP, syslog, or enterprise observability tooling where available.

#!/usr/bin/env bash
set -euo pipefail

VCENTER=”vcsa01.lab.local”
WARN=75
CRIT=85

usage=”$(
ssh -o BatchMode=yes root@”$VCENTER”
“df -P /storage/log | awk ‘NR==2 {gsub(/%/,””,$5); print $5}'”
)”

if [ “$usage” -ge “$CRIT” ]; then
echo “CRITICAL: $VCENTER /storage/log is ${usage}% used”
exit 2
elif [ “$usage” -ge “$WARN” ]; then
echo “WARNING: $VCENTER /storage/log is ${usage}% used”
exit 1
else
echo “OK: $VCENTER /storage/log is ${usage}% used”
exit 0
fi

What to modify:

Replace vcsa01.lab.local with the appliance FQDN.

Use your approved authentication model.

Set thresholds to match your operational policy.

Send results into your monitoring system rather than relying on terminal output.

This script should not become the only control. It is a lightweight example of the signal you want your monitoring platform to track continuously.

Command Reference

TaskCommandEnter Bash from appliance shellshell.set –enabled True then shellShow filesystem usagedf -hShow high-usage filesystemsdf -h | awk ‘0+$5 >= 78 {print}’Show inode usagedf -ihCheck /storage/log directlydf -hT /storage/logFind largest filesfind . -type f -print0 | xargs -0 du -h | sort -rh | head -n 20Find top directoriesdu -xhd1 /storage/log | sort -hr | head -n 20Find files over 100 MBfind /storage/log -xdev -type f -size +100M -exec ls -lh {} ;Generate default support bundlevc-support -lGenerate support bundle elsewherevc-support -w /storage/coreCheck all service stateservice-control –status –allStart vCenter serviceservice-control –start vmware-vpxd

Troubleshooting Notes and Gotchas

The Largest File Is Not Always Safe to Delete

A large support bundle is different from a large active service log. A support bundle may be removable after download. An active service log may be the only local evidence of a repeated service failure.

A Successful Cleanup Does Not Prove the Issue Is Fixed

If a service is writing several GB per hour, deleting old files only buys time. Watch the growth rate after cleanup.

Support Bundle Collection Can Fail Because /storage/log Is Already Full

Broadcom documents failed or incomplete log bundle collection when /storage/log has high usage or low available space and recommends using an alternate destination with vc-support -w when needed.

VCF Lifecycle Failures May Need Workflow Retry

If /storage/log filled during a VCF-triggered vCenter upgrade, clearing space is not the whole recovery. You may need to retry the SDDC Manager workflow or follow the specific Broadcom guidance for the failed state.

Some Fixes Are Version-Specific

Do not apply a workaround for vCenter 7.0 to vCenter 8.0 just because the filename looks similar. Match the exact vCenter version, build, file path, and symptom pattern.

Final Operational Checklist

Before declaring the incident resolved, confirm:

/storage/log has safe headroom.

Inode usage is healthy.

vCenter services are running.

VAMI storage health is green or understood.

vSphere Client login works.

SDDC Manager health is clean, if applicable.

Support evidence was preserved.

Any generated support bundle was downloaded and removed if no longer needed.

The culprit file or directory was mapped to a cause.

Monitoring exists for capacity and growth rate.

A follow-up patch, resize, or configuration action is tracked.

Conclusion

The temptation with /storage/log exhaustion is to treat it as a disk cleanup task. That is too narrow.

A full vCenter log partition can affect supportability, lifecycle operations, and management-plane availability. In VCF, it can also interfere with SDDC Manager-driven workflows. The better pattern is disciplined and repeatable: confirm the partition, preserve evidence, identify the growth source, collect logs safely, restore headroom, validate services, and monitor for recurrence.

The goal is not just to make the alarm disappear.

The goal is to leave vCenter healthier than it was before the alert fired.

External Links and Sources

Broadcom KB 313077: vCenter log disk exhaustion or /storage/log full

Broadcom KB 312194: Location of vCenter Server log files

Broadcom KB 318953: vCenter Server Appliance disk space is full

Broadcom KB 376654: vCenter upgrade fails from SDDC Manager due to /storage/log full

Broadcom KB 330178: Collecting diagnostic information for VMware vCenter Server

Broadcom KB 418919: Unable to collect vCenter Server log bundles due to /storage/log full

Broadcom KB 312587: Collecting limited and specific information using vc-support manifests

Broadcom KB 323151: Support bundle files are not removed from /storage/log after export

Broadcom KB 409413: Increase /storage/log to 50 GB for vCenter Server 8.0 guidance

Broadcom KB 318930: Monitor and prevent vCenter Appliance disks from filling

EAM Certificate Trust Failures: Why vSphere Extensions Break After Certificate Changes
Certificate changes in vSphere environments rarely fail in only one place. The obvious place to look is the browser warning, the expired…

The post The vCenter Log Partition Runbook: Find Growth, Preserve Evidence, Restore Headroom appeared first on Digital Thought Disruption.