Changed Block Tracking for Volume Snapshots

As of Kubernetes v1.34, the Changed Block Tracking (CBT) API is available in alpha status. This feature enhances the storage ecosystem by providing an efficient mechanism for CSI storage drivers to identify changed blocks in PersistentVolume snapshots, enabling faster incremental backups and disaster recovery operations.

Issue: Traditional full-volume snapshots are inefficient for large volumes, consuming excessive storage space and time, making frequent backups impractical.
Fix: Changed Block Tracking API enables CSI drivers to identify only the blocks that have changed since the last snapshot, dramatically reducing backup time and storage requirements.

1. Understanding Changed Block Tracking

Changed Block Tracking is a storage optimization technique that records which blocks of a volume have been modified since a previous snapshot. This information enables:

Incremental Backups - Copy only changed data instead of entire volumes
Faster Disaster Recovery - Restore only modified blocks
Reduced Storage Costs - Store smaller differential snapshots
Improved RPO/RTO - More frequent backups with less impact

Security Benefits

Ransomware Detection - Unusual block change patterns can indicate encryption attacks
Compliance - Efficient backup windows enable more frequent snapshots for audit trails
Data Integrity - Faster verification of backup consistency
Reduced Attack Surface - Shorter backup windows mean less exposure time

2. How Changed Block Tracking Works

The CBT API introduces new resources and operations:

VolumeSnapshotDelta - Represents the differences between two snapshots
Changed Block Query - CSI driver API to retrieve changed block ranges
Block Range Metadata - Information about which blocks have changed

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Kubernetes API Server                    │
└────────────────────────────┬────────────────────────────────┘
                             │
                             │ CBT API Requests
                             │
┌────────────────────────────▼────────────────────────────────┐
│                    CSI Driver (with CBT)                    │
│  • Tracks block changes                                     │
│  • Maintains change bitmaps                                 │
│  • Provides delta information                               │
└────────────────────────────┬────────────────────────────────┘
                             │
                             │ Storage Backend Communication
                             │
┌────────────────────────────▼────────────────────────────────┐
│              Storage Backend (e.g., Ceph, NFS)              │
│  • Stores actual volume data                                │
│  • Maintains snapshots                                      │
└─────────────────────────────────────────────────────────────┘

3. Enabling Changed Block Tracking

Prerequisites

Kubernetes v1.34 or later
CSI driver with CBT support (driver-specific)
Feature gate enabled: VolumeSnapshotDelta=true

Example: Enable CBT Feature Gate

For clusters using kubeadm, add to the API server configuration:

apiVersion: kubeadm.k8s.io/v1beta4
kind: ClusterConfiguration
apiServer:
  extraArgs:
    feature-gates: "VolumeSnapshotDelta=true"
controllerManager:
  extraArgs:
    feature-gates: "VolumeSnapshotDelta=true"

For managed Kubernetes (EKS, GKE, AKS), consult provider documentation for alpha feature enablement.

4. Creating Snapshots with CBT

Example: PersistentVolumeClaim with CBT-Compatible Storage

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: database-pvc
  namespace: production
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
  storageClassName: cbt-enabled-storage

Example: VolumeSnapshotClass with CBT Enabled

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: cbt-snapshot-class
driver: csi.example.com
deletionPolicy: Retain
parameters:
  enableChangedBlockTracking: "true"
  snapshotType: "incremental"

Example: Create Initial Snapshot

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: database-snapshot-baseline
  namespace: production
spec:
  volumeSnapshotClassName: cbt-snapshot-class
  source:
    persistentVolumeClaimName: database-pvc

Example: Create Incremental Snapshot

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: database-snapshot-incremental-1
  namespace: production
spec:
  volumeSnapshotClassName: cbt-snapshot-class
  source:
    persistentVolumeClaimName: database-pvc
  # Reference to previous snapshot for delta tracking
  sourceSnapshotName: database-snapshot-baseline

5. Querying Changed Blocks

Issue: Backup tools need efficient access to changed block information without reading entire volumes.
Fix: Use the VolumeSnapshotDelta API to query only changed blocks.

Example: Query Changed Blocks Between Snapshots

apiVersion: snapshot.storage.k8s.io/v1alpha1
kind: VolumeSnapshotDelta
metadata:
  name: database-delta-query
  namespace: production
spec:
  sourceSnapshot: database-snapshot-baseline
  targetSnapshot: database-snapshot-incremental-1

Check Delta Query Status

kubectl get volumesnapshotdelta database-delta-query -n production -o yaml

Example Output:

apiVersion: snapshot.storage.k8s.io/v1alpha1
kind: VolumeSnapshotDelta
metadata:
  name: database-delta-query
  namespace: production
status:
  changedBlocks:
    - startOffset: 1048576 # 1 MB
      length: 524288 # 512 KB
    - startOffset: 104857600 # 100 MB
      length: 1048576 # 1 MB
  totalChangedBytes: 1572864
  snapshotSizeBytes: 107374182400 # 100 GB
  percentageChanged: 0.00146

6. Security Considerations

Issue: Changed block information could reveal sensitive patterns about data modifications and application behavior.
Fix: Implement RBAC controls and encryption for CBT metadata and operations.

Example: RBAC for CBT Operations

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: snapshot-operator
  namespace: production
rules:
  - apiGroups: ["snapshot.storage.k8s.io"]
    resources: ["volumesnapshots", "volumesnapshotdelta"]
    verbs: ["get", "list", "create", "delete"]
  - apiGroups: ["snapshot.storage.k8s.io"]
    resources: ["volumesnapshotclasses"]
    verbs: ["get", "list"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: backup-operator-binding
  namespace: production
subjects:
  - kind: ServiceAccount
    name: backup-operator
    namespace: production
roleRef:
  kind: Role
  name: snapshot-operator
  apiGroup: rbac.authorization.k8s.io

Restrict CBT Access to Backup Tools Only

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cbt-admin
rules:
  - apiGroups: ["snapshot.storage.k8s.io"]
    resources: ["volumesnapshotdelta"]
    verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: velero-cbt-access
subjects:
  - kind: ServiceAccount
    name: velero
    namespace: velero
roleRef:
  kind: ClusterRole
  name: cbt-admin
  apiGroup: rbac.authorization.k8s.io

7. Integration with Backup Tools

Velero Integration Example

Issue: Traditional Velero backups copy entire volumes even when minimal data has changed.
Fix: Use CBT-aware Velero plugins for incremental volume backups.

apiVersion: velero.io/v1
kind: Backup
metadata:
  name: production-incremental
  namespace: velero
spec:
  includedNamespaces:
    - production
  snapshotVolumes: true
  volumeSnapshotLocations:
    - cbt-enabled-location
  csiSnapshotTimeout: 10m
  # Enable CBT for incremental snapshots
  hooks:
    resources:
      - name: use-changed-block-tracking
        includedNamespaces:
          - production
        labelSelector:
          matchLabels:
            backup-method: incremental

Custom Backup Script with CBT

#!/bin/bash
# Incremental backup script using CBT API

NAMESPACE="production"
PVC_NAME="database-pvc"
BASELINE_SNAPSHOT="database-snapshot-baseline"
NEW_SNAPSHOT="database-snapshot-$(date +%Y%m%d-%H%M%S)"

# Create new snapshot
kubectl create -f - <<EOF
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: ${NEW_SNAPSHOT}
  namespace: ${NAMESPACE}
spec:
  volumeSnapshotClassName: cbt-snapshot-class
  source:
    persistentVolumeClaimName: ${PVC_NAME}
  sourceSnapshotName: ${BASELINE_SNAPSHOT}
EOF

# Wait for snapshot to be ready
kubectl wait --for=condition=ReadyToUse \
  volumesnapshot/${NEW_SNAPSHOT} \
  -n ${NAMESPACE} \
  --timeout=300s

# Query changed blocks
kubectl create -f - <<EOF
apiVersion: snapshot.storage.k8s.io/v1alpha1
kind: VolumeSnapshotDelta
metadata:
  name: ${NEW_SNAPSHOT}-delta
  namespace: ${NAMESPACE}
spec:
  sourceSnapshot: ${BASELINE_SNAPSHOT}
  targetSnapshot: ${NEW_SNAPSHOT}
EOF

# Extract and backup only changed blocks
kubectl get volumesnapshotdelta ${NEW_SNAPSHOT}-delta \
  -n ${NAMESPACE} \
  -o jsonpath='{.status.changedBlocks}' | \
  jq -r '.[] | "\(.startOffset) \(.length)"' | \
  while read offset length; do
    echo "Backing up changed block: offset=$offset length=$length"
    # Custom logic to extract and backup changed blocks
  done

8. Monitoring and Alerting

Issue: CBT operations can fail silently, leading to incomplete backups and data loss risks.
Fix: Implement monitoring and alerting for CBT operations and snapshot health.

Prometheus Metrics to Monitor

# Failed snapshot operations
rate(volume_snapshot_failures_total[5m]) > 0

# CBT query duration
histogram_quantile(0.99, rate(cbt_query_duration_seconds_bucket[5m])) > 30

# Changed block percentage (detect ransomware)
cbt_changed_blocks_percentage > 50

# Snapshot age (detect stale backups)
time() - volume_snapshot_creation_timestamp_seconds > 86400

Example PrometheusRule for CBT Alerts

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: cbt-alerts
  namespace: monitoring
spec:
  groups:
    - name: changed-block-tracking
      interval: 30s
      rules:
        - alert: CBTQueryFailed
          expr: cbt_query_failures_total > 0
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Changed Block Tracking query failed"
            description: "CBT query for {{ $labels.namespace }}/{{ $labels.snapshot }} has failed"

        - alert: UnusualBlockChangeRate
          expr: cbt_changed_blocks_percentage > 70
          for: 10m
          labels:
            severity: critical
          annotations:
            summary: "Unusual percentage of blocks changed (possible ransomware)"
            description: "Volume {{ $labels.pvc }} shows {{ $value }}% blocks changed, investigate immediately"

        - alert: SnapshotStale
          expr: (time() - volume_snapshot_creation_timestamp_seconds) / 3600 > 24
          for: 1h
          labels:
            severity: warning
          annotations:
            summary: "Volume snapshot is over 24 hours old"
            description: "Snapshot {{ $labels.snapshot }} in {{ $labels.namespace }} has not been updated"

9. Best Practices for CBT

✅ Enable only on supported CSI drivers - Verify driver CBT capabilities before enabling
✅ Use incremental snapshots for large volumes - CBT is most beneficial for volumes >100GB
✅ Implement RBAC restrictions - Limit CBT query access to backup tools only
✅ Monitor change patterns - Alert on unusual block change rates (ransomware detection)
✅ Test restore procedures - Regularly validate incremental backup restores
✅ Retain baseline snapshots - Keep full snapshots for recovery validation
✅ Implement snapshot retention policies - Automate cleanup of old snapshots
✅ Encrypt snapshot data - Use storage backend encryption for snapshot security
✅ Document snapshot dependencies - Track which incremental snapshots depend on baselines
✅ Monitor CBT performance - Track query latency and optimize as needed

10. Security Checklist

✅ CBT feature gate enabled only in controlled environments (alpha feature)
✅ RBAC policies restrict VolumeSnapshotDelta access to backup operators
✅ Snapshot encryption enabled at storage backend level
✅ Monitoring and alerting configured for CBT operations
✅ Unusual block change rate alerts configured for ransomware detection
✅ Baseline snapshots protected from accidental deletion
✅ Backup and restore procedures tested regularly
✅ Snapshot retention policies automated and audited
✅ Access logs enabled for snapshot and CBT operations
✅ Documentation maintained for CBT-enabled volumes and dependencies

11. Limitations and Considerations

As an alpha feature in Kubernetes v1.34, Changed Block Tracking has important limitations:

CSI Driver Support - Limited to CSI drivers that implement CBT (check driver documentation)
Performance Overhead - Block tracking adds small overhead to I/O operations
Storage Backend Requirements - Underlying storage must support efficient delta tracking
API Stability - Alpha API may change in future Kubernetes versions
Limited Tooling - Backup tools are still adding CBT support
Documentation - Vendor-specific implementation details may vary

Recommendation: Test thoroughly in non-production environments before enabling in production clusters.

1. Understanding Changed Block Tracking​

Security Benefits​

2. How Changed Block Tracking Works​

Architecture​

3. Enabling Changed Block Tracking​

Prerequisites​

Example: Enable CBT Feature Gate​

4. Creating Snapshots with CBT​

Example: PersistentVolumeClaim with CBT-Compatible Storage​

Example: VolumeSnapshotClass with CBT Enabled​

Example: Create Initial Snapshot​

Example: Create Incremental Snapshot​

5. Querying Changed Blocks​

Example: Query Changed Blocks Between Snapshots​

Check Delta Query Status​

6. Security Considerations​

Example: RBAC for CBT Operations​

Restrict CBT Access to Backup Tools Only​

7. Integration with Backup Tools​

Velero Integration Example​

Custom Backup Script with CBT​

8. Monitoring and Alerting​

Prometheus Metrics to Monitor​

Example PrometheusRule for CBT Alerts​

9. Best Practices for CBT​

10. Security Checklist​

11. Limitations and Considerations​

Additional Resources​