Configuring TWCS for IoT Sensor Data Streams in Cassandra 4.x/5.x

IoT telemetry pipelines generate high-velocity, append-only time-series data that demand deterministic write paths and bounded storage growth. This runbook configures TimeWindowCompactionStrategy (TWCS) for a sensor-ingest table, sizing its windows against retention TTL so expired data leaves the disk by whole-file deletion rather than expensive scans. Use it when your table is append-mostly, timestamp-ordered, and TTL-bounded, and you need a repeatable deployment that will not destabilize a live cluster. It assumes Cassandra 4.0, 4.1, or 5.0, cassandra-driver >= 3.25.0 on Python 3.10+, and a nodetool/JMX path to every target node. It sits under Strategy Selection for Time-Series Workloads; read that first, because the TWCS window-sizing guidance there — roughly 20–30 active windows across the data lifetime — is the input every step below assumes you have already computed. Picking TWCS at all is a decision upstream of this page, worked through in the STCS, LCS, and TWCS trade-offs; this runbook takes that choice as given and hardens the deployment.

Misconfigured window sizes, uncoordinated repair cycles, or unbounded compaction queues will degrade cluster stability faster on IoT workloads than on any other profile, because ingest never pauses. The failure that matters most is a write whose timestamp lands in an already-closed window — a backfill or a skewed sensor clock — which forces Cassandra to merge cold data back into a sealed bucket and defeats the entire model. The procedure below is idempotent, gates on live cluster state before every mutation, and gives an explicit rollback at each phase.

Windows sized against the TTL age leftward and drop whole at the boundary; only the newest window merges. A late or skewed write reopens a sealed SSTable — the failure mode this runbook guards against.

Pre-conditions & safety gates

Before altering the compaction strategy or scheduling repair, enforce explicit cluster-health checks. TWCS assumes predictable write rates; introducing it during an active compaction storm or a repair backlog causes SSTable overlap and read-latency spikes. These checks are read-only — run the full sequence on a representative node and stop if any gate fails.

# 1. Node state and disk headroom. TWCS merges the active window aggressively;
#    thin headroom triggers OutOfSpaceException when a window closes.
nodetool status | grep -E "^(UN|DN)"
nodetool compactionstats --human-readable

Safety Check: Reject deployment if pending tasks > 20, if active compactions exceed 4, or if any node’s Used disk space exceeds 75% of Total. The --human-readable long form replaces the deprecated -H short flag on 4.1+ but both still work. Expected Output:

UN  10.0.1.12  2.14 GiB   256  100.0%  abcdef-1234  rack1
UN  10.0.1.13  2.11 GiB   256  100.0%  abcdef-5678  rack1
pending tasks: 3

Rollback Path: If thresholds are breached, halt deployment. Run nodetool cleanup <keyspace> <table> on non-essential tables to reclaim space, or temporarily raise max_threshold on existing tables to reduce merge frequency until utilization drops below 65%.

# 2. Repair baseline — no anti-entropy may be in flight during a strategy change.
nodetool netstats | grep -i "repair"

Safety Check: Ensure zero active repair sessions. Running a repair concurrently with a compaction-strategy change causes tombstone accumulation, streaming failures, and inconsistent anti-entropy repair state. Expected Output: Empty output, or a Repairing line only if a session is active. Rollback Path: If active repairs are detected, defer the migration and poll nodetool netstats until they complete. If a repair hangs, terminate in-flight sessions via the JMX StorageService.forceTerminateAllRepairSessions operation (or restart the affected node), then run nodetool repair -pr on a single node to re-establish a clean streaming state before proceeding.

Implementation: idempotent TWCS schema application

The automation below applies TWCS only when the table is not already on it, preventing a redundant ALTER TABLE that would trigger unnecessary SSTable rewrites. It aligns compaction_window_size with the retention TTL and enforces strict thresholds. The same driver session pattern underpins the Python monitoring for Cassandra compaction tooling you should wire in afterward.

#!/usr/bin/env python3
# Requires: Python 3.10+, cassandra-driver >= 3.25.0, node in UN state.
"""Idempotent TWCS configuration for IoT sensor tables (Cassandra 4.x/5.x)."""
import logging
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
from cassandra.query import SimpleStatement

logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")


def get_current_compaction(session, keyspace: str, table: str) -> dict | None:
    """Fetch the table's current compaction map, or None if the table is absent."""
    stmt = SimpleStatement(
        "SELECT compaction FROM system_schema.tables "
        "WHERE keyspace_name=%s AND table_name=%s"
    )
    row = session.execute(stmt, (keyspace, table)).one()
    return row.compaction if row else None


def apply_twcs(keyspace: str, table: str, window_size: int,
               window_unit: str, ttl_days: int) -> None:
    auth = PlainTextAuthProvider(username="cassandra", password="cassandra")
    cluster = Cluster(contact_points=["127.0.0.1"], auth_provider=auth)
    session = cluster.connect(keyspace)
    try:
        current = get_current_compaction(session, keyspace, table)
        if current and current.get("class", "").endswith("TimeWindowCompactionStrategy"):
            logging.info("TWCS already active. Skipping idempotent application.")
            return

        # Guard: a window longer than the TTL means data expires before its
        # window ever closes, so the whole-file drop never fires cheaply.
        if window_unit == "HOURS" and window_size > ttl_days * 24:
            logging.warning("Window larger than TTL; expired data will accumulate uncompacted.")

        # max_threshold/min_threshold apply to STCS-style merges WITHIN the
        # active window only; tombstone_compaction_interval is a safety net —
        # correctly sized windows drop whole SSTables and rarely need it.
        cql = f"""
            ALTER TABLE {keyspace}.{table} WITH compaction = {{
                'class': 'TimeWindowCompactionStrategy',
                'compaction_window_unit': '{window_unit}',
                'compaction_window_size': '{window_size}',
                'tombstone_compaction_interval': '86400',
                'max_threshold': '32'
            }}
        """
        logging.info("Applying TWCS: window=%s %s", window_size, window_unit)
        session.execute(cql)
        logging.info("Schema mutation applied successfully.")
    finally:
        cluster.shutdown()


def rollback_to_stcs(keyspace: str, table: str) -> None:
    """Explicit rollback to SizeTieredCompactionStrategy."""
    auth = PlainTextAuthProvider(username="cassandra", password="cassandra")
    cluster = Cluster(contact_points=["127.0.0.1"], auth_provider=auth)
    session = cluster.connect(keyspace)
    try:
        session.execute(
            f"ALTER TABLE {keyspace}.{table} "
            "WITH compaction = {'class': 'SizeTieredCompactionStrategy'}"
        )
        logging.info("Rollback to STCS executed. Monitor the compaction queue for 30 minutes.")
    finally:
        cluster.shutdown()


if __name__ == "__main__":
    # 6-hour windows over a 30-day TTL yields ~120 windows for a high-rate stream.
    apply_twcs("iot_telemetry", "sensor_readings",
               window_size=6, window_unit="HOURS", ttl_days=30)

Safety Check: The script reads system_schema.tables before executing ALTER TABLE, uses a bound SimpleStatement (no string interpolation of identifiers into the WHERE clause), validates window-to-TTL alignment, and closes the driver session in a finally block. Expected Output: TWCS already active. Skipping idempotent application. on a repeat run, or Schema mutation applied successfully. on first application. Rollback Path: Call rollback_to_stcs() if the compaction queue spikes above 50 pending tasks after migration, then run nodetool compact <keyspace> <table> to force a single merge pass and stabilize the SSTable distribution.

With the strategy in place, coordinate repair so anti-entropy never collides with the window-closing compactions. The following gate runs repair only when the compaction queue is idle.

#!/bin/bash
# repair_orchestrator.sh — gate repair on an idle compaction queue.
KEYSPACE="iot_telemetry"
TABLE="sensor_readings"
MAX_PENDING=5

PENDING=$(nodetool compactionstats | awk -F'pending tasks:' '/pending tasks:/ {print $2}' | awk '{print $1}')
PENDING=${PENDING:-0}

if [ "$PENDING" -gt "$MAX_PENDING" ]; then
    echo "ABORT: compaction queue saturated ($PENDING pending). Deferring repair."
    exit 1
fi

echo "Initiating repair for $KEYSPACE.$TABLE..."
nodetool repair "$KEYSPACE" "$TABLE" -pr

if [ "$(nodetool netstats | grep -ci 'repair')" -eq 0 ]; then
    echo "SUCCESS: repair completed. No active sessions."
else
    echo "WARNING: repair session still active. Monitor with nodetool netstats."
fi

Safety Check: The script aborts if pending compactions exceed MAX_PENDING. It uses -pr (primary range); on 4.x/5.x repair is incremental by default, so no separate flag is needed to enable it. Expected Output: SUCCESS: repair completed. No active sessions. or ABORT: compaction queue saturated.... Rollback Path: If repair stalls or saturates disk I/O, terminate in-flight sessions via StorageService.forceTerminateAllRepairSessions (or restart the node) and reschedule for an off-peak window. Note that nodetool cleanup only removes data no longer owned after token-range changes; it does not clear repair state.

Verification steps

Confirm the strategy took effect and the nodes stayed healthy through the window transitions.

# 1. Confirm the compaction map on the live schema.
cqlsh -e "SELECT compaction FROM system_schema.tables \
  WHERE keyspace_name='iot_telemetry' AND table_name='sensor_readings';"

Safety Check: The class must read TimeWindowCompactionStrategy and the compaction_window_unit/compaction_window_size must match the values you computed from the TTL. Expected Output:

 compaction
------------------------------------------------------------------------------
 {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
  'compaction_window_size': '6', 'compaction_window_unit': 'HOURS',
  'max_threshold': '32', 'min_threshold': '4'}

Rollback Path: If the map still shows the previous strategy, the ALTER TABLE did not commit — re-run apply_twcs() and check for schema-agreement warnings in system.log.

# 2. Watch the active window merge and old windows stay sealed.
nodetool compactionstats --human-readable

Safety Check: Only the current window should show active compactions; sealed windows must not be rewritten. The column-by-column reading of this output is covered in interpreting nodetool compactionstats output. Expected Output: pending tasks low and any active Compaction row scoped to the newest SSTables. Rollback Path: If sealed windows keep recompacting, you have late-arriving writes landing in closed windows — see Troubleshooting below.

Once verified, hold the table to these deterministic thresholds and wire them into your monitoring stack rather than checking by hand.

Metric	Threshold	Action
`PendingCompactions`	> 20	Pause non-critical writes, run `nodetool compactionstats -H`, evaluate a `max_threshold` reduction
`TombstoneScannedHistogram` (per read)	rising	Force `nodetool repair`, validate TTL alignment, review tombstone management and garbage collection
Disk utilization	> 80%	Throttle with `nodetool setcompactionthroughput 1` or halt with `nodetool stop COMPACTION`, then expand storage or archive cold windows
Read latency (p99)	> 50 ms	Check SSTable overlap in `nodetool compactionstats`; verify window boundaries match query granularity

Safety Check: Poll these counters from the Prometheus JMX exporter and alert only when a threshold breaches for more than 5 consecutive intervals, suppressing false positives from transient I/O spikes. Rollback Path: If throttling drops write throughput below SLA, restore the prior value (the default is nodetool setcompactionthroughput 64; 0 means unlimited, not “baseline”). The reasoning behind that ceiling is in how to tune compaction_throughput_mb_per_sec safely.

The window lifecycle those thresholds protect is summarized below.

TWCS window lifecycle: a sealed window ages until TTL plus gc_grace elapses, then the entire SSTable is dropped without a tombstone compaction pass.

Troubleshooting

OutOfSpaceException when a window closes. The window-closing major compaction needs transient headroom roughly equal to the largest window’s size, and thin disks cannot stage it. Root cause: too few windows (each SSTable grew too large) or utilization already above 75% at deployment. Fix: shrink compaction_window_size so each bucket is smaller, archive or nodetool cleanup cold data to reclaim space, and never enable TWCS above 75% disk use.
TombstoneOverwhelmingException on reads of old data. Reads scan tombstones instead of skipping dropped files, which means expired windows are not being deleted whole. Root cause: late-arriving or clock-skewed writes are landing in sealed windows and pinning them open, so the cheap file-drop path never fires. Fix: enforce sane client timestamps at ingest, keep default_time_to_live uniform, and if a backfill is unavoidable, route it to a separate STCS table rather than mixing it into the TWCS stream.
WriteTimeoutException during peak ingest. The active window’s STCS-style merges are competing with the sensor write path for disk bandwidth. Root cause: compaction throughput or concurrent_compactors set above what the array sustains under full ingest load. Fix: cap compaction throughput at roughly 50% of sustained write bandwidth, hold concurrent_compactors near min(cores, disks), and never let compaction plus repair streaming exceed the device ceiling.

Strategy Selection for Time-Series Workloads — the parent guide that maps workload profile to strategy and derives the window-to-TTL sizing this runbook consumes.
Understanding STCS vs LCS vs TWCS — why TWCS is the right choice for TTL-bounded telemetry before you configure it.
Tombstone management and garbage collection — the gc_grace and purge mechanics behind whole-window expiry.
Advanced Compaction Strategy Tuning & Monitoring — the overview guide covering SSTable overlap analysis, throughput calibration, and JMX observability.

Back to Strategy Selection for Time Series Workloads

Configuring TWCS for IoT Sensor Data Streams in Cassandra 4.x/5.x

Pre-conditions & safety gates

Implementation: idempotent TWCS schema application

Verification steps

Troubleshooting

Related