Skip to content

Perform a Hot Backup on a Cluster#

In this tutorial, we're going to guide you through performing a hot backup of TheHive on a cluster using the provided scripts.

By the end, you'll have created complete backups of your database and search index across all three nodes, plus your file storage

Hot backups let you protect your data while keeping TheHive running, which means zero downtime for your security operations team.

These backups are essential to protect your data and ensure you can recover quickly in case of a system failure or data loss.

Understand the implications

Hot backups allow TheHive to keep running during the process, but they don’t guarantee perfect data consistency. Review the Cold vs. Hot Backups and Restores topic to ensure this method fits your organization's risk tolerance and operational needs.

Best practices for safe backup and restore

  • Coordinate your Apache Cassandra, Elasticsearch, and file storage backups to run at the same time. Using automation like a cron job helps minimize the chance of inconsistencies between components.
  • Before relying on these backups in a real incident, test the full backup and restore flow in a staging environment. It’s the only way to make sure everything works as expected.
  • Ensure you have an up-to-date backup before starting the restore operation, as errors during the restoration could lead to data loss.

Script restrictions

These scripts work only for native installations following the Setting up a Cluster with TheHive configuration. Docker and Kubernetes deployments aren't supported.

Step 1: Install required tools#

Before we begin, let's make sure your system has all the necessary tools installed.

You'll need the following:

  • Cassandra nodetool: Command-line tool for managing Cassandra clusters, used for creating database snapshots
  • tar: Utility for archiving backup files
  • cqlsh: Command-line interface for executing CQL queries against the Cassandra database
  • curl: Tool for transferring data with URLs, useful for interacting with the Elasticsearch API
  • jq: Lightweight command-line JSON processor for parsing and manipulating JSON data in scripts

Python compatibility for cqlsh

cqlsh requires Python 3.9. If your Linux distribution provides a newer Python version by default, you must install Python 3.9 alongside it and explicitly tell cqlsh which interpreter to use. You can do this by setting the CQLSH_PYTHON environment variable when running cqlsh: sudo -u cassandra CQLSH_PYTHON=/path/to/python3.9 cqlsh.

If any tools are missing, install them using your package manager. For example:

  • sudo apt install jq for DEB-based operating systems
  • sudo yum install jq for RPM-based operating systems

Step 2: Configure NFS-shared storage for Elasticsearch snapshots#

Elasticsearch requires a snapshot repository that's accessible from all cluster nodes. To meet this requirement, we will set up an NFS share so every node can reach the backup location. If you don't have a dedicated NFS server, you can export an NFS share directly from one of the Elasticsearch nodes.

On the NFS server#

  1. Create the directory and set the correct permissions for Elasticsearch.

    sudo mkdir -p /mnt/backup/elasticsearch
    sudo chown elasticsearch:elasticsearch /mnt/backup/elasticsearch
    sudo chmod 770 /mnt/backup/elasticsearch
    
  2. Export the directory by adding this line to /etc/exports.

    /mnt/backup/elasticsearch <cluster_network>(rw,sync,no_subtree_check,no_root_squash)
    

    Replace <cluster_network> with your network range.

  3. Apply the export configuration.

    sudo exportfs -ra
    

On all cluster nodes#

  1. Create the mount point and mount the NFS share.

    sudo mkdir -p /mnt/backup/elasticsearch
    sudo mount <nfs_server_ip>:/mnt/backup/elasticsearch /mnt/backup/elasticsearch
    

    Replace <nfs_server_ip> with the IP address of your NFS server.

  2. Set the correct permissions on the mounted directory.

    sudo chown elasticsearch:elasticsearch /mnt/backup/elasticsearch
    sudo chmod 770 /mnt/backup/elasticsearch
    
  3. Add an entry to /etc/fstab to ensure the mount persists after reboot.

    <nfs_server_ip>:/mnt/backup/elasticsearch /mnt/backup/elasticsearch nfs defaults,_netdev 0 0
    

    Replace <nfs_server_ip> with the IP address of your NFS server.

  4. Verify the mount is working.

    df -h | grep /mnt/backup/elasticsearch
    

    You should see the NFS mount listed in the output.

Step 3: Set up the Elasticsearch snapshot repository#

We're going to configure Elasticsearch to store snapshots with timestamped names. This repository will be used to create backups of your search index.

  1. In the elasticsearch.yml file on each node, define the directory where snapshots will be stored.

    path.repo: /mnt/backup/elasticsearch
    
  2. After saving your changes, restart Elasticsearch on each node.

    sudo systemctl restart elasticsearch
    
  3. Register the repository.

    curl -X PUT "http://127.0.0.1:9200/_snapshot/thehive_repository" \
      -H "Content-Type: application/json" \
      -d '{
        "type": "fs",
        "settings": {
          "location": "/mnt/backup/elasticsearch"
        }
      }'
    

    You should see a response like this:

    {
      "acknowledged": true
    }
    

For step-by-step details, see the official Elasticsearch documentation.

Step 4: Perform health checks#

Before creating any backups, we're going to verify that all TheHive components are healthy. This helps us catch any issues that could affect backup integrity.

Check service status#

Let's confirm that all TheHive components are running.

sudo systemctl status thehive
sudo systemctl status cassandra
sudo systemctl status elasticsearch

All services should show as active and running.

Check Cassandra status#

Run the following command:

nodetool status

You should see nodes marked as UN (Up/Normal). This indicates your Cassandra cluster is healthy.

Check Elasticsearch cluster health#

curl -X GET "http://127.0.0.1:9200/_cluster/health?pretty"

The status should be green, which means your cluster is healthy and fully functional.

Other possible statuses include:

  • yellow: Some replicas are missing but data is still available.
  • red: Some data is unavailable—you should investigate before proceeding.

Review system logs#

Check for any recent errors or warnings.

sudo journalctl -u thehive
sudo journalctl -u cassandra
sudo journalctl -u elasticsearch

If you find any critical errors, resolve them before continuing with the backup process.

Step 5: Replicate Cassandra and Elasticsearch data across all three nodes#

Data replication requirement

It's your responsibility to ensure data replication across all nodes before proceeding. If this requirement isn't met, cluster restoration may fail, and integrity issues could arise.

Before we proceed with the backup, we need to ensure your Cassandra cluster has a replication factor that provides full data redundancy across all nodes. This way, we can take snapshots from a single node while maintaining data consistency.

  1. Verify replication factor.

    Check the replication factor for your keyspace. It should be set to 3 for a three-node cluster.

    Use the following command in cqlsh:

    DESCRIBE KEYSPACE thehive;
    

    If needed, adjust the replication factor:

    ALTER KEYSPACE thehive WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', '<datacenter_name>' : 3 };
    

    Replace <datacenter_name> with your actual data center name.

  2. Check cluster status.

    Ensure all nodes are up and running:

    nodetool status
    

    All nodes should show the UN (Up/Normal) status.

  3. Run nodetool repair.

    Run a repair to ensure data consistency across all nodes:

    nodetool repair
    

    This process may take some time depending on the size of your data. Wait for it to complete before proceeding.

  4. Verify data replication.

    Check for any replication issues:

    nodetool netstats
    

    Look for any pending operations or errors in the output.

Step 6: Create Cassandra and Elasticsearch snapshots#

Now we're going to create snapshots of both your database and search index simultaneously. The script captures snapshots from one node, since data is fully replicated, then packages both into separate .tar archives for safe storage.

1. Prepare the backup script#

Before running the script, you'll need to update several values to match your environment:

For Cassandra#

  • Update CASSANDRA_KEYSPACE to match your configuration. You can find this in /etc/thehive/application.conf file under the db.janusgraph.storage.cql.keyspace attribute. The script uses thehive by default.
  • Update CASSANDRA_CONNECTION with any Cassandra node IP address in the cluster.
  • If you configured authentication in /etc/thehive/application.conf, replace the value of the CASSANDRA_CONNECTION variable with: "<ip_node_cassandra> admin -p <authentication_admin_password>".

For Elasticsearch#

  • Update ELASTICSEARCH_SNAPSHOT_REPOSITORY to match the repository name you registered in a previous step. The script uses thehive_repository by default.
  • If you configured authentication in /etc/thehive/application.conf, add -u thehive:<thehive_user_password> to all curl commands, using your actual password.

2. Run the backup script#

How to run this script

Run this script with sudo privileges on a node that has both Elasticsearch and Cassandra installed and running.

#!/bin/bash

set -e

# Configuration
# Cassandra variables
CASSANDRA_KEYSPACE=thehive
CASSANDRA_CONNECTION="<ip_node_cassandra>"
CASSANDRA_GENERAL_ARCHIVE_PATH=/mnt/backup/cassandra
CASSANDRA_DATA_FOLDER=/var/lib/cassandra
CASSANDRA_SNAPSHOT_NAME="cassandra_$(date +%Y%m%d_%Hh%Mm%Ss)"
CASSANDRA_ARCHIVE_PATH="${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}/${CASSANDRA_KEYSPACE}"

# Elasticsearch variables
ELASTICSEARCH_API_URL='http://127.0.0.1:9200'
ELASTICSEARCH_SNAPSHOT_REPOSITORY=thehive_repository
ELASTICSEARCH_GENERAL_ARCHIVE_PATH=/mnt/backup/elasticsearch
ELASTICSEARCH_SNAPSHOT_NAME="elasticsearch_$(date +%Y%m%d_%Hh%Mm%Ss)"


# Check if the snapshot repository is correctly registered
repository_config=$(curl -s -L "${ELASTICSEARCH_API_URL}/_snapshot")
repository_ok=$(jq 'has("'${ELASTICSEARCH_SNAPSHOT_REPOSITORY}'")' <<< ${repository_config})
if ! ${repository_ok}; then
  echo "Abort, no snapshot repository registered in Elasticsearch"
  echo "Set the repository folder 'path.repo'"
  echo "in an environment variable"
  echo "or in elasticsearch.yml"
  exit 1
fi

# Make sure the snapshot folder exists and its subcontent permissions are correct
mkdir -p ${CASSANDRA_ARCHIVE_PATH}
chown -R cassandra:cassandra ${CASSANDRA_ARCHIVE_PATH}
echo "Snapshot of all ${CASSANDRA_KEYSPACE} tables will be stored inside ${CASSANDRA_ARCHIVE_PATH}"

# Run both backups in parallel
{
    set -e

    # Creating snapshot name information file
    touch ${ELASTICSEARCH_GENERAL_ARCHIVE_PATH}/${ELASTICSEARCH_SNAPSHOT_NAME}.info

    echo "[ES] Starting the Elasticsearch snapshot..."
    RESPONSE=$(curl -s -L -X PUT "${ELASTICSEARCH_API_URL}/_snapshot/${ELASTICSEARCH_SNAPSHOT_REPOSITORY}/${ELASTICSEARCH_SNAPSHOT_NAME}" \
        -H 'Content-Type: application/json' \
        -d '{"indices":"thehive_global", "ignore_unavailable":true, "include_global_state":false}')
    if echo "$RESPONSE" | grep -q '"accepted":true'; then
        echo "[ES] ✓ Elasticsearch snapshot started successfully"
        exit 0
    else
        echo "[ES] ✗ Elasticsearch ERROR: $RESPONSE"
        exit 1
    fi

    # Verify that the snapshot is finished
    state="NONE"
    while [ "${state}" != "\"SUCCESS\"" ]; do
        echo "Snapshot in progress, waiting 5 seconds before checking status again..."
        sleep 5
        snapshot_list=$(curl -s -L "${ELASTICSEARCH_API_URL}/_snapshot/${ELASTICSEARCH_SNAPSHOT_REPOSITORY}/*?verbose=false")
        state=$(jq '.snapshots[] | select(.snapshot == "'${ELASTICSEARCH_SNAPSHOT_NAME}'").state' <<< ${snapshot_list})
    done
    echo "Snapshot finished"    

} &
PID_ES=$!

{
    set -e

    echo "[CASS] Starting snapshot ${CASSANDRA_SNAPSHOT_NAME} for keyspace ${CASSANDRA_KEYSPACE}"
    if nodetool snapshot -t "${CASSANDRA_SNAPSHOT_NAME}" "${CASSANDRA_KEYSPACE}"; then
        echo "[CASS] ✓ Snapshot Cassandra created successfully"

        # Save the cql schema of the keyspace
        cqlsh ${CASSANDRA_CONNECTION}  -e "DESCRIBE KEYSPACE ${CASSANDRA_KEYSPACE}" | grep -v "^WARNING" > "${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}/create_keyspace_${CASSANDRA_KEYSPACE}.cql"
        echo "The keyspace cql definition for ${CASSANDRA_KEYSPACE} is stored in this file: ${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}/create_keyspace_${CASSANDRA_KEYSPACE}.cql"

        # For each table folder in the keyspace folder of the snapshot
        for TABLE in $(ls ${CASSANDRA_DATA_FOLDER}/data/${CASSANDRA_KEYSPACE}); do
            # Folder where the snapshot files are stored
            TABLE_SNAPSHOT_FOLDER=${CASSANDRA_DATA_FOLDER}/data/${CASSANDRA_KEYSPACE}/${TABLE}/snapshots/${CASSANDRA_SNAPSHOT_NAME}
            if [ -d ${TABLE_SNAPSHOT_FOLDER} ]; then 
                # Create a folder for each table
                mkdir "${CASSANDRA_ARCHIVE_PATH}/${TABLE}"
                chown -R cassandra:cassandra ${CASSANDRA_ARCHIVE_PATH}/${TABLE}

                # Copy the snapshot files to the proper table folder
                # Snapshots files are hardlinks,
                # so we use --remove-destination to make sure the files are actually copied and not just linked
                cp -p --remove-destination ${TABLE_SNAPSHOT_FOLDER}/* ${CASSANDRA_ARCHIVE_PATH}/${TABLE}
            fi
        done

        # Delete Cassandra snapshot once it's backed up
        nodetool clearsnapshot -t ${CASSANDRA_SNAPSHOT_NAME} > /dev/null

        # Create a .tar archive with the folder containing the backed up Cassandra data
        tar cf ${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}.tar -C "${CASSANDRA_GENERAL_ARCHIVE_PATH}" ${CASSANDRA_SNAPSHOT_NAME}
        # Remove the folder once the archive is created
        rm -rf ${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}

        exit 0
    else
        echo "[CASS] ✗ Cassandra ERROR"
        exit 1
    fi
} &
PID_CASS=$!

ES_EXIT=0
CASS_EXIT=0

# Wait for the two snapshots to finish
wait $PID_ES || ES_EXIT=$?
wait $PID_CASS || CASS_EXIT=$?

# Final check
if [ $ES_EXIT -eq 0 ] && [ $CASS_EXIT -eq 0 ]; then
    echo "=== ✓ Full backup successful ==="

    # Display the location of the Elasticsearch archive
    echo "Elasticsearch backup done!" 

    # Display the location of the Cassandra archive
    echo "Cassandra backup done! Keep the following backup archive safe:"
    echo "${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}.tar"

    exit 0
else
    echo "=== ✗ ERROR - ES: exit $ES_EXIT, Cassandra: exit $CASS_EXIT ==="
    exit 1
fi

After running the script, the backup archives are available at /mnt/backup/cassandra and /mnt/backup/elasticsearch. Be sure to copy these archives to a separate server or storage location to safeguard against data loss if the TheHive server fails.

For more details about snapshot management, refer to the official Cassandra documentation and Elasticsearch documentation.

Step 7: Back up file storage#

Finally, we're going to back up TheHive file storage, which contains all the attachments and files.

The backup procedure depends on your storage backend—either NFS or an S3-compatible object storage service. The script below uses MinIO as an example, but you can adapt the same approach to any S3-compatible implementation.

1. Prepare the backup script#

Before running the script, update ATTACHMENT_FOLDER to match your environment. You can find this path in /etc/thehive/application.conf under the storage.localfs.location attribute. The script uses /opt/thp/thehive/files by default.

2. Run the backup script#

#!/bin/bash

# TheHive attachment variables
ATTACHMENT_FOLDER=/opt/thp/thehive/files

# Backup variables
GENERAL_ARCHIVE_PATH=/mnt/backup/storage
SNAPSHOT_NAME="files_$(date +%Y%m%d_%Hh%Mm%Ss)"
ATTACHMENT_ARCHIVE_PATH="${GENERAL_ARCHIVE_PATH}/${SNAPSHOT_NAME}"

# Creating the backup folder if needed
mkdir -p ${ATTACHMENT_ARCHIVE_PATH}

# Copy all TheHive attachment
cp -r ${ATTACHMENT_FOLDER}/. ${ATTACHMENT_ARCHIVE_PATH}/

# Create a .tar archive with the folder containing the backed up attachment files
cd ${GENERAL_ARCHIVE_PATH}
tar cf ${SNAPSHOT_NAME}.tar ${SNAPSHOT_NAME}

# Remove the folder once the archive is created
rm -rf ${GENERAL_ARCHIVE_PATH}/${SNAPSHOT_NAME}

# Display the location of the attachment archive
echo ""
echo "TheHive attachment files backup done! Keep the following backup archive safe:"
echo "${GENERAL_ARCHIVE_PATH}/${SNAPSHOT_NAME}.tar"

After running the script, the backup archive is available at /mnt/backup/storage. Be sure to copy this archive to a separate server or storage location to safeguard against data loss if the TheHive server fails.

1. Prepare the backup script#

Before running the script, you'll need to update several values to match your environment:

  • Update MINIO_ENDPOINT with your MinIO server URL.
  • Update MINIO_ACCESS_KEY with your MinIO access key.
  • Update MINIO_SECRET_KEY with your MinIO secret key.
  • Change MINIO_BUCKET if you want to use a different bucket name.
  • Change MINIO_ALIAS if you want to use a different alias name.

2. Configure the MinIO alias#

Run this command once to configure the MinIO alias using the same values you defined in the script:

mcli alias set <minio_alias> <minio_endpoint> <minio_access_key> <minio_secret_key>

3. Run the backup script#

#!/bin/bash

# TheHive attachment variables
MINIO_ARCHIVE_PATH=/mnt/backup/minio/

# MinIO variables
MINIO_ENDPOINT="<minio_server_url>"
MINIO_ACCESS_KEY="<access_key>"
MINIO_SECRET_KEY="<secret_key>"
MINIO_BUCKET="thehive"
MINIO_ALIAS=th_minio
MINIO_SNAPSHOT_NAME="minio_$(date +%Y%m%d_%Hh%Mm%Ss)"

# Check if MinIO is accessible
if ! mcli ls ${MINIO_ALIAS} > /dev/null 2>&1; then
    echo "Error: Cannot connect to MinIO server"
    exit 1
fi

# Mirror MinIO bucket content to local backup folder
mcli mirror ${MINIO_ALIAS}/${MINIO_BUCKET} ${MINIO_ARCHIVE_PATH}/${MINIO_SNAPSHOT_NAME}

tar cvf ${MINIO_ARCHIVE_PATH}/${MINIO_SNAPSHOT_NAME}.tar -C "${MINIO_ARCHIVE_PATH}" ${MINIO_SNAPSHOT_NAME} 

# Display the location of the backup
echo ""
echo "TheHive attachment files backup done! Keep the following backup archive safe:"
echo "${MINIO_ARCHIVE_PATH}/${MINIO_SNAPSHOT_NAME}.tar"

After running the script, the backup archive is available at /mnt/backup/minio. Be sure to copy this archive to a separate server or storage location to safeguard against data loss if the TheHive server fails.

You've completed the hot backup process for your TheHive cluster. We recommend verifying your backup archives are complete and accessible before relying on them for recovery.

Next steps