Python Script for Real-Time Compaction Latency Tracking in Cassandra 4.x/5.x

Compaction latency directly dictates read amplification, repair-window viability, and node stability under sustained write pressure. This page delivers a complete, copy-paste Python 3.10+ script that polls nodetool compactionstats at a fixed cadence, correlates active compaction IDs across cycles, and emits per-task runtime and throughput as structured JSON — the raw signal you feed into alerting or repair-gating automation. It sits under Python Monitoring for Cassandra Compaction; read that first for the JMX-versus-nodetool telemetry model, then use this script when you specifically need elapsed time per compaction rather than aggregate pending-task counts. Prerequisites: Cassandra 4.0, 4.1, or 5.0 with a reachable nodetool/JMX path, Python 3.8+ (3.10+ recommended for the type-hint syntax below), and a node in the UN state. The script is read-only and holds no disk state, so it is safe to run against a live cluster, but the telemetry it produces informs live-impacting decisions — throughput tuning, repair scheduling, decommission gating — so treat its output as an operational input, not a dashboard toy.

Pre-conditions & safety gates

Every check below is read-only. Run the full sequence on one representative node before rolling the tracker out cluster-wide, and stop if any gate fails.

1. Python runtime

python3 -c "import sys, subprocess, json, logging, time, re, signal; assert sys.version_info >= (3,8), 'Python 3.8+ required'; print('PASS: runtime validated')"

Safety Check: Fails fast if the interpreter is older than 3.8. No third-party packages are imported, so there is nothing to pip install. Expected Output: PASS: runtime validated Rollback Path: If validation fails, install a newer interpreter (apt install python3.10 / yum install python3.11) and select it explicitly for the service unit — do not symlink over the system python3.

2. nodetool reachability

timeout 10 nodetool version && nodetool compactionstats -H > /dev/null 2>&1 \
  && echo "PASS: nodetool responsive" || echo "FAIL: JMX or nodetool unreachable"

Safety Check: The 10-second timeout prevents the gate itself from hanging on a saturated JMX port. Only read-only subcommands are exercised. Expected Output: PASS: nodetool responsive Rollback Path: On FAIL, verify cassandra-env.sh JMX settings (LOCAL_JMX=yes or exported JMX_USERNAME/JMX_PASSWORD) and confirm the node is UN via nodetool status. Do not start the tracker against a DN/UJ node.

3. Execution context

[ "$(id -u)" -eq 0 ] && echo "WARN: running as root — create a dedicated service account" \
  || echo "PASS: non-root context verified"

Safety Check: A monitor never needs root. Running unprivileged prevents an accidental state-mutating nodetool call from a fat-fingered config. Expected Output: PASS: non-root context verified Rollback Path: Create an unprivileged account (useradd -r -s /sbin/nologin cassandra-monitor), grant it a read-only sudoers rule scoped to nodetool compactionstats, and switch context before proceeding.

Only start the tracker once all three gates pass. Because a stalled compaction queue lets deleted rows survive past expiry, the latency signal produced here couples directly to tombstone management and garbage collection — a task whose elapsed_sec keeps climbing is also a task that is not yet reclaiming space.

Implementation

The tracker runs one poll loop: shell out to nodetool compactionstats, positionally parse each active-compaction row, correlate IDs against the previous cycle to recover first-seen timestamps, derive elapsed latency and byte-delta throughput, and print a JSON snapshot. It keeps state only in memory, so a restart cannot corrupt metrics or leave residual artifacts. A circuit breaker skips any cycle where nodetool exceeds its timeout, which prevents the monitor from compounding JMX-thread exhaustion during exactly the storms you most want to observe.

A note on units and version drift: raw (non--H) nodetool compactionstats reports completed/total as plain byte counts with the literal unit column reading bytes, on both 4.x and 5.x. The parser therefore does not depend on -H human-readable formatting, and the throughput figure is a true bytes/sec derivative rather than a parsed human string. Save the following as compaction_latency_tracker.py.

#!/usr/bin/env python3
# Requires: Python 3.10+, nodetool on PATH, target node in UN state. No third-party deps.
"""
Real-time compaction latency tracker for Cassandra 4.x/5.x.
Idempotent, read-only, in-memory only. Polls `nodetool compactionstats`
and emits per-task elapsed latency and byte-delta throughput as JSON.
"""
import subprocess
import time
import json
import logging
import sys
import signal
from typing import Optional

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(message)s",
    handlers=[logging.StreamHandler(sys.stdout)],
)
logger = logging.getLogger("compaction_latency_tracker")

# Raw compactionstats reports completed/total in plain bytes (unit column == "bytes").
# The multipliers also cover -H human-readable output and rare non-byte units gracefully.
UNIT_MULTIPLIERS = {"BYTES": 1, "B": 1, "KIB": 1024, "MIB": 1024**2, "GIB": 1024**3,
                    "TIB": 1024**4, "KB": 1024, "MB": 1024**2, "GB": 1024**3, "TB": 1024**4}


def parse_bytes(value: str, unit: str) -> float:
    """Convert a compactionstats value+unit pair to bytes; never raises."""
    try:
        return float(value.replace(",", "")) * UNIT_MULTIPLIERS.get(unit.upper(), 1)
    except ValueError:
        return 0.0


class CompactionLatencyTracker:
    def __init__(self, poll_interval: float = 5.0, timeout: int = 10,
                 nodetool_path: str = "nodetool", max_latency_threshold: float = 300.0) -> None:
        self.poll_interval = poll_interval
        self.timeout = timeout
        self.nodetool_path = nodetool_path
        self.max_latency_threshold = max_latency_threshold
        self.active: dict[str, dict] = {}   # in-memory correlation state, keyed by compaction id
        self.running = True
        signal.signal(signal.SIGINT, self._shutdown)
        signal.signal(signal.SIGTERM, self._shutdown)

    def _shutdown(self, signum: int, frame) -> None:
        logger.info("Termination signal received; exiting cleanly.")
        self.running = False

    def _run_nodetool(self) -> Optional[str]:
        """Read-only call with a hard timeout. Returns None so the loop can skip a cycle."""
        try:
            result = subprocess.run(
                [self.nodetool_path, "compactionstats"],
                capture_output=True, text=True, timeout=self.timeout, check=True,
            )
            return result.stdout
        except subprocess.TimeoutExpired:
            logger.warning("nodetool timed out; circuit breaker skipped this cycle.")
            return None
        except subprocess.CalledProcessError as e:
            logger.error(f"nodetool failed: {e.stderr.strip()}")
            return None

    def _parse(self, raw: str) -> list[dict]:
        """Positionally parse the compactionstats table.

        Header row:  id  compaction type  keyspace  table  completed  total  unit  progress
        Each active compaction is one whitespace-separated row; summary lines are ignored.
        """
        tasks: list[dict] = []
        in_table = False
        for line in raw.splitlines():
            s = line.strip()
            if not s:
                continue
            if s.startswith("id") and "compaction type" in s and "progress" in s:
                in_table = True
                continue
            if not in_table:
                continue
            if s.lower().startswith(("pending tasks", "active compaction")):
                break
            parts = s.split()
            if len(parts) < 8:   # id, type, keyspace, table, completed, total, unit, progress
                continue
            tasks.append({
                "id": parts[0], "type": parts[1], "keyspace": parts[2], "table": parts[3],
                "completed": parts[4], "total": parts[5], "unit": parts[6], "progress": parts[7],
            })
        return tasks

    def _correlate(self, tasks: list[dict]) -> list[dict]:
        now = time.time()
        current_ids = {t["id"] for t in tasks}
        # Drop finished compactions so state cannot grow unbounded (idempotent across cycles).
        for cid in list(self.active):
            if cid not in current_ids:
                del self.active[cid]

        snapshot: list[dict] = []
        for t in tasks:
            tid, unit = t["id"], t["unit"]
            completed = parse_bytes(t["completed"], unit)
            if tid not in self.active:
                self.active[tid] = {"start": now, "initial_completed": completed}
            state = self.active[tid]
            elapsed = now - state["start"]
            throughput = (completed - state["initial_completed"]) / max(elapsed, 0.1)
            entry = {
                "id": tid, "keyspace": t["keyspace"], "table": t["table"], "type": t["type"],
                "progress_pct": float(t["progress"].replace("%", "") or 0.0),
                "elapsed_sec": round(elapsed, 2),
                "throughput_bytes_sec": round(throughput, 2),
                "completed_bytes": round(completed, 2),
                "total_bytes": round(parse_bytes(t["total"], unit), 2),
                "timestamp_utc": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
            }
            if elapsed > self.max_latency_threshold:
                logger.warning(f"Compaction {tid} exceeded {self.max_latency_threshold}s "
                               f"(elapsed {elapsed:.0f}s); flagging for review.")
                entry["latency_alert"] = True
            snapshot.append(entry)
        return snapshot

    def run(self) -> None:
        logger.info(f"Tracker start: interval={self.poll_interval}s timeout={self.timeout}s")
        while self.running:
            raw = self._run_nodetool()
            if raw is not None:
                snapshot = self._correlate(self._parse(raw))
                print(json.dumps({"compactions": snapshot, "active_count": len(snapshot)}))
            time.sleep(self.poll_interval)


if __name__ == "__main__":
    CompactionLatencyTracker(poll_interval=5.0, timeout=10).run()

Safety Check: subprocess.run enforces a strict timeout; the circuit breaker logs and skips on failure instead of crashing; SIGINT/SIGTERM handlers exit cleanly with no orphaned processes; finished compactions are pruned from self.active every cycle so memory stays bounded. Expected Output: One JSON object per poll, e.g. {"compactions": [{"id": "c1d2e3f0-...", "keyspace": "app", "table": "events", "progress_pct": 40.0, "elapsed_sec": 12.51, "throughput_bytes_sec": 9830400.0, ...}], "active_count": 1}. Rollback Path: If the JSON floods your log pipeline, raise poll_interval to 15.0; to lower alert sensitivity adjust max_latency_threshold. To stop entirely, systemctl stop cassandra-compaction-tracker — there is no state to unwind.

Deploy under systemd

# /etc/systemd/system/cassandra-compaction-tracker.service
[Unit]
Description=Cassandra real-time compaction latency tracker
After=network.target cassandra.service

[Service]
Type=simple
User=cassandra-monitor
ExecStart=/usr/bin/python3 /opt/scripts/compaction_latency_tracker.py
Restart=on-failure
RestartSec=10
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

Safety Check: Restart=on-failure recovers silent crashes; the unit runs as the unprivileged cassandra-monitor account from gate 3. Expected Output: systemctl status cassandra-compaction-tracker shows active (running); snapshots stream to journalctl -u cassandra-compaction-tracker -f. Rollback Path: systemctl disable --now cassandra-compaction-tracker, remove the unit file, then systemctl daemon-reload.

Verification steps

Confirm the tracker’s numbers agree with Cassandra’s own view before you trust them in automation.

# 1. Cross-check active-task count against nodetool's own report.
nodetool compactionstats -H

Safety Check: The active_count in the JSON snapshot should equal the number of table rows here (excluding the pending tasks: and summary lines). Expected Output: A row count that matches active_count; the per-row progress percentage should track progress_pct within one poll interval. Reading these columns in depth is covered in interpreting nodetool compactionstats output.

# 2. Confirm derived throughput is plausible against the configured ceiling.
nodetool getcompactionthroughput   # 4.x/5.x: "Current compaction throughput: 64 MB/s"

Safety Check: Summed throughput_bytes_sec across active tasks must not exceed the configured ceiling (convert MB/s → bytes/s). A sum that pins the ceiling means compaction is throughput-bound, not stalled. Expected Output: Aggregate throughput at or below the ceiling. On Cassandra 4.1+/5.0 the YAML key is compaction_throughput; on 4.0 and earlier it is compaction_throughput_mb_per_sec, but nodetool getcompactionthroughput is identical across versions.

# 3. Watch a single long task cross the alert threshold.
journalctl -u cassandra-compaction-tracker -f | grep -i "latency"

Safety Check: A task whose elapsed_sec passes max_latency_threshold must emit a WARN line and gain "latency_alert": true in the JSON. Correlate flagged IDs with nodetool compactionhistory. Expected Output: ... [WARNING] Compaction <id> exceeded 300.0s (elapsed 314s); flagging for review.

Troubleshooting

nodetool: Failed to connect to '127.0.0.1:7199' — ConnectException. The circuit breaker logs nodetool failed and the loop skips the cycle, so the tracker keeps running but emits no snapshots. Root cause: JMX is unreachable — the node is restarting, JMX auth changed, or the port is firewalled. Fix: verify nodetool status returns UN and re-check the gate-2 command; the tracker resumes emitting automatically once JMX answers.
Every snapshot shows throughput_bytes_sec near 0 while elapsed_sec climbs. The task is genuinely stalled, not slow: completed_bytes is not advancing between cycles even though the compaction is still listed. Root cause: CompactionExecutor is starved or the I/O device is saturated by repair streaming. Fix: check nodetool tpstats for CompactionExecutor Pending, and if a repair is competing, coordinate it — the drain procedure in resolving high compaction backlog without downtime arbitrates that shared I/O budget. Whether a stall is even expected depends on the table’s strategy: the trade-offs between STCS, LCS, and TWCS mean a size-tiered table legitimately runs long, high-byte merges.
IndexError / rows silently missing from the snapshot. A compactionstats row parsed to fewer than eight whitespace-separated fields and was skipped by the len(parts) < 8 guard — typically a keyspace or table name containing a space, or a future column addition. Root cause: positional parsing assumes the documented eight-column layout. Fix: log the raw row, and if your naming really contains spaces, switch to fixed-width column slicing based on the header offsets rather than str.split().

Python Monitoring for Cassandra Compaction — the parent guide covering the JMX/Prometheus telemetry model this script feeds into.
Interpreting nodetool compactionstats output — column-by-column reading of the raw output this tracker parses.
Resolving high compaction backlog without downtime — what to do when the tracker shows tasks stalling instead of draining.
Advanced Compaction Strategy Tuning & Monitoring — the top-level guide that ties latency signals back to strategy and throughput decisions.

Back to Python Monitoring for Cassandra Compaction