Step-by-Step Guide to Switching from STCS to LCS in Cassandra v4.x/v5.x
Transitioning a production table from Size-Tiered Compaction Strategy (STCS) to Leveled Compaction Strategy (LCS) fundamentally rewrites the SSTable lifecycle, shifting I/O patterns from write-optimized batching to read-optimized tiered merging. This migration is not a metadata toggle; it triggers a full background compaction that reorganizes every existing SSTable into a strict, non-overlapping level hierarchy. Operators must execute this transition with deterministic validation, controlled throughput, and automated rollback capabilities. A rigorous grasp of Cassandra Architecture & Compaction Fundamentals and the operational trade-offs documented in Understanding STCS vs LCS vs TWCS is required before touching production schemas. The following workflow provides an idempotent, automation-ready procedure for Cassandra v4.x/v5.x clusters, integrating explicit safety gates, real-time monitoring, and post-migration repair synchronization.
The flow below summarizes the migration stages and the rollback branch that returns the table to STCS if guardrails breach.
1. Pre-Migration Validation & Safety Gates
LCS compaction requires substantial temporary disk space and sustained I/O to stage overlapping SSTables during the rewrite. Execute the following validation sequence on every target node before initiating schema changes.
1.1 Disk Space & I/O Headroom Verification
LCS compaction typically consumes 2.0–2.5× the current data directory size. Validate available space using a deterministic Python check.
Safety Check: Abort if free space falls below current_data_size × 1.5.
Expected Output: True with logged headroom metrics, or RuntimeError with exact deficit.
Rollback Path: Expand storage volumes or archive cold partitions before proceeding.
import shutil
import logging
def validate_disk_headroom(data_dir: str, multiplier: float = 2.2) -> bool:
usage = shutil.disk_usage(data_dir)
current_size = usage.total - usage.free
required_free = current_size * (multiplier - 1.0)
if usage.free < required_free:
raise RuntimeError(
f"Insufficient free space: {usage.free / 1e9:.1f}GB available, "
f"{required_free / 1e9:.1f}GB required for LCS transition."
)
logging.info(f"Disk headroom validated: {usage.free / 1e9:.1f}GB free, {required_free / 1e9:.1f}GB required.")
return True1.2 Compaction Backlog & Pending Repairs
Never initiate an LCS migration while major compactions or full repairs are active. Concurrent compaction threads will saturate disk I/O and trigger read timeouts.
Safety Check: Assert zero pending compaction tasks and empty repair sessions.
Expected Output: 0 pending tasks, no Repair session lines in netstats.
Rollback Path: Defer migration until nodetool compactionstats and nodetool netstats report idle states.
# Verify compaction queue
nodetool compactionstats | grep -i "pending tasks" | awk '{print $NF}'
# Expected: 0 or < 5
# Verify repair state
nodetool netstats | grep -i "repair"
# Expected: empty output1.3 Schema Consistency & Snapshot Creation
Ensure all nodes agree on the current schema, then capture a point-in-time snapshot. LCS migration cannot be paused; a snapshot is the only reliable rollback mechanism.
Safety Check: Single UUID across all nodes in describecluster. Snapshot must complete with zero errors.
Expected Output: Schema versions: [UUID]: [node_count]. Snapshot directory created under data/keyspace/table/.
Rollback Path: If schema disagrees, run nodetool repair -full on the affected node before proceeding.
# Schema validation
nodetool describecluster | grep "Schema versions"
# Expected: Single UUID mapping to all nodes
# Snapshot creation (replace KEYSPACE/TABLE)
nodetool snapshot -t pre_lcs_migration KEYSPACE TABLE
# Expected: Snapshot directory created successfully2. Idempotent Schema Transition & Throughput Control
The schema change must be idempotent to prevent duplicate execution in automated pipelines. Wrap the ALTER TABLE statement in a Python routine that verifies the current strategy, applies compaction throttling, and executes the transition.
Safety Check: Verify current compaction class is SizeTieredCompactionStrategy before altering. Cap compaction throughput to 50 MB/s to prevent I/O starvation.
Expected Output: Compaction strategy updated to LeveledCompactionStrategy. Background compaction begins immediately.
Rollback Path: Revert to STCS via ALTER TABLE ... WITH compaction = {'class': 'SizeTieredCompactionStrategy'} if latency exceeds SLOs. Note: Reverting triggers another full compaction.
import subprocess
import sys
def apply_lcs_migration(keyspace: str, table: str, host: str = "127.0.0.1", port: int = 9042):
# Idempotency check
check_cmd = ["cqlsh", host, str(port), "-e",
f"SELECT compaction FROM system_schema.tables WHERE keyspace_name='{keyspace}' AND table_name='{table}';"]
result = subprocess.run(check_cmd, capture_output=True, text=True, check=True)
if "LeveledCompactionStrategy" in result.stdout:
print("Table already using LCS. Skipping migration.")
return
# Throttle compaction to protect production I/O
throttle_cmd = ["nodetool", "setcompactionthroughput", "50"]
subprocess.run(throttle_cmd, check=True)
# Execute schema change
alter_cmd = ["cqlsh", host, str(port), "-e",
f"ALTER TABLE {keyspace}.{table} WITH compaction = {{'class': 'LeveledCompactionStrategy'}};"]
subprocess.run(alter_cmd, check=True)
print("LCS schema transition initiated. Monitor compaction progress.")3. Real-Time Compaction Monitoring & I/O Guardrails
Background compaction must be polled continuously. Automate monitoring to detect stalls, excessive pending tasks, or disk pressure. Use the official Apache Cassandra compaction documentation as a reference for metric interpretation.
Safety Check: Poll nodetool compactionstats every 30 seconds. Abort if pending tasks exceed 500 or disk usage crosses 85%.
Expected Output: Continuous progress logs showing Compacted X/Y SSTables. Completion message when pending tasks hit 0.
Rollback Path: If I/O guardrails breach, increase setcompactionthroughput to 0 (unlimited) only if disk space permits, or revert schema as documented in Section 5.
import subprocess
import time
import re
def monitor_lcs_compaction(poll_interval: int = 30, max_pending: int = 500):
# `nodetool compactionstats` emits text (no JSON). The first line reads
# "pending tasks: N", followed by a table whose rows carry the per-task
# "completed" and "total" byte columns.
pending_re = re.compile(r"pending tasks:\s*(\d+)", re.IGNORECASE)
while True:
stats = subprocess.run(
["nodetool", "compactionstats"],
capture_output=True, text=True, check=True
)
lines = stats.stdout.splitlines()
pending = 0
for line in lines:
m = pending_re.search(line)
if m:
pending = int(m.group(1))
break
# Sum per-row progress from the active-compaction table. Columns are:
# id compaction type keyspace table completed total unit progress
completed = total = 0
for line in lines:
cols = line.split()
if len(cols) >= 8 and cols[-4].isdigit() and cols[-3].isdigit():
completed += int(cols[-4])
total += int(cols[-3])
if pending == 0 and total == 0:
print("LCS compaction complete. Proceed to repair synchronization.")
break
if pending > max_pending:
raise RuntimeError(f"Compaction backlog critical: {pending} pending tasks. Abort and evaluate rollback.")
print(f"Progress: {completed}/{total} bytes merged. Pending tasks: {pending}")
time.sleep(poll_interval)4. Post-Migration Repair Synchronization
LCS migration rewrites SSTables but does not reconcile partition-level inconsistencies introduced during the transition. Run incremental repairs to synchronize replicas. Automate repair queuing to prevent concurrent repair storms across the cluster.
Safety Check: Verify nodetool netstats shows no active streams before starting. Use -pr (primary range) to limit scope.
Expected Output: Repair session [UUID] completed successfully. nodetool tablestats shows SSTable count aligned with LCS level distribution.
Rollback Path: If repair fails due to tombstone overload, increase gc_grace_seconds temporarily or run nodetool repair -full on the affected range.
import subprocess
import logging
def execute_post_migration_repair(keyspace: str, table: str):
# Safety: Verify no active repairs
netstats = subprocess.run(["nodetool", "netstats"], capture_output=True, text=True, check=True)
if "Repair session" in netstats.stdout:
logging.warning("Active repair detected. Deferring until idle.")
return
repair_cmd = ["nodetool", "repair", "-pr", keyspace, table]
try:
subprocess.run(repair_cmd, check=True)
logging.info("Incremental repair completed successfully.")
except subprocess.CalledProcessError as e:
logging.error(f"Repair failed: {e.stderr}")
raise5. Failure Recovery & Rollback Execution
If migration causes unacceptable read latency, disk exhaustion, or compaction stalls, execute a deterministic rollback. Reverting to STCS is safe but triggers a secondary full compaction. Use the pre-migration snapshot only if data corruption or SSTable loss occurs.
Safety Check: Verify cluster health via nodetool status. Ensure no client write failures are occurring.
Expected Output: Schema reverts to STCS. Background compaction begins merging leveled SSTables into size-tiered groups.
Rollback Path: If snapshot restore is required, stop the node, move the snapshot aside, clear the live SSTables, copy the snapshot back, restart the node, and run nodetool refresh KEYSPACE TABLE to load the restored SSTables.
# Schema rollback (idempotent)
cqlsh -e "ALTER TABLE keyspace.table WITH compaction = {'class': 'SizeTieredCompactionStrategy'};"
# Verify rollback initiation
nodetool compactionstats | grep "Compacting"
# Expected: Active compaction tasks merging L0-Ln SSTables
# Snapshot restore (only if data integrity is compromised)
nodetool stopdaemon
TABLE_DIR=/var/lib/cassandra/data/keyspace/table
# Move the snapshot aside FIRST so clearing live SSTables cannot destroy it.
mv "$TABLE_DIR/snapshots/pre_lcs_migration" /var/lib/cassandra/restore_pre_lcs_migration
# Remove live SSTables but preserve the (now empty) snapshots directory.
find "$TABLE_DIR" -maxdepth 1 -type f -delete
# Copy the snapshot data back into the live table directory.
cp -r /var/lib/cassandra/restore_pre_lcs_migration/* "$TABLE_DIR/"
sudo systemctl start cassandra
# Load the restored SSTables without restreaming from peers.
nodetool refresh keyspace tableOperational Notes for Automation Builders
- Python Integration: Use the
subprocessmodule with explicitcheck=Trueand timeout parameters to prevent hanging automation pipelines. Refer to the official Python subprocess documentation for advanced stream handling. - Node Management: Distribute schema changes via a rolling coordinator. Never execute
ALTER TABLEconcurrently on multiple nodes for the same table. - Repair Topology: LCS reduces read amplification but increases write amplification during migration. Schedule the transition during low-traffic windows and monitor
ReadLatencyandWriteLatencyvia JMX or Prometheus exporters.