Skip to content

Restore a Hot Backup on a Cluster#

In this tutorial, we're going to guide you through restoring a hot backup of TheHive on a cluster using the provided scripts.

By the end, you'll have brought your system back to a working state using the backups you created previously, with all your database, search index, and file storage fully restored across your cluster.

Shutdown required

Restoring from a hot backup requires stopping TheHive application. Plan a maintenance window.

Best practices for safe backup and restore

  • Coordinate your Apache Cassandra, Elasticsearch, and file storage backups to run at the same time. Using automation like a cron job helps minimize the chance of inconsistencies between components.
  • Before relying on these backups in a real incident, test the full backup and restore flow in a staging environment. It’s the only way to make sure everything works as expected.
  • Ensure you have an up-to-date backup before starting the restore operation, as errors during the restoration could lead to data loss.

Python compatibility for cqlsh

cqlsh requires Python 3.9. If your Linux distribution provides a newer Python version by default, you must install Python 3.9 alongside it and explicitly tell cqlsh which interpreter to use. You can do this by setting the CQLSH_PYTHON environment variable when running cqlsh: sudo -u cassandra CQLSH_PYTHON=/path/to/python3.9 cqlsh.

This tutorial assumes you’ve completed the Perform a Hot Backup on a Cluster tutorial, and that you're using the same directory paths during restore as during backup.

Step 1: Stop TheHive on all nodes#

Before we begin restoring data, we need to stop TheHive on all cluster nodes to prevent any conflicts during the restoration process.

Run this command on each node:

sudo systemctl stop thehive

Verify that TheHive has stopped on all nodes:

sudo systemctl status thehive

You should see that the service is inactive on each node.

Step 2: Restore Cassandra and Elasticsearch snapshots#

Now we're going to restore both your database and search index from the backup archives. Since data is fully replicated across all nodes, we only need to run the restore script on one node—the script will automatically distribute the data across your entire cluster. The script handles situations where nodes have been added or removed since the backup was created.

1. Prepare the restore script#

Before running the script, you'll need to update several values to match your environment:

For Cassandra#

  • Update CASSANDRA_KEYSPACE to match your configuration. You can find this in /etc/thehive/application.conf file under the db.janusgraph.storage.cql.keyspace attribute. The script uses thehive by default.
  • Update CASSANDRA_CONNECTION with any Cassandra node IP address in the cluster.
  • If you configured authentication in /etc/thehive/application.conf:
    • Replace the value of the CASSANDRA_CONNECTION variable with: "<ip_node_cassandra> admin -p <authentication_admin_password>"
    • Uncomment the following line: cqlsh ${CASSANDRA_CONNECTION} --request-timeout=120 -e "GRANT ALL PERMISSIONS ON KEYSPACE ${CASSANDRA_THEHIVE_USER} TO thehive;"
  • Update CASSANDRA_THEHIVE_USER to match the role that accesses the keyspace. The script uses thehive by default.

For Elasticsearch#

  • Update ELASTICSEARCH_SNAPSHOT_REPOSITORY to match the repository name you registered in a previous step. The script uses thehive_repository by default.
  • If you configured authentication in /etc/thehive/application.conf, add -u thehive:<thehive_user_password> to all curl commands, using your actual password.

2. Run the restore script#

How to run this script

Run this script with sudo privileges on a node that has both Elasticsearch and Cassandra installed and running.

#!/bin/bash

set -e

# Configuration
# Cassandra variables
CASSANDRA_KEYSPACE=thehive
CASSANDRA_GENERAL_ARCHIVE_PATH=/mnt/backup/cassandra
CASSANDRA_CONNECTION="<ip_node_cassandra>"
CASSANDRA_THEHIVE_USER="thehive"
CASSANDRA_BACKUP_LIST=(${CASSANDRA_GENERAL_ARCHIVE_PATH}/cassandra_????????_??h??m??s.tar)
CASSANDRA_LATEST_BACKUP_NAME=$(basename ${CASSANDRA_BACKUP_LIST[-1]})
CASSANDRA_SNAPSHOT_NAME=$(echo ${CASSANDRA_LATEST_BACKUP_NAME} | cut -d '.' -f 1)
CASSANDRA_SNAPSHOT_FOLDER="${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}"

# Elasticsearch variables
ELASTICSEARCH_API_URL='http://127.0.0.1:9200'
ELASTICSEARCH_SNAPSHOT_REPOSITORY=thehive_repository
ELASTICSEARCH_GENERAL_ARCHIVE_PATH=/mnt/backup/elasticsearch
ELASTICSEARCH_BACKUP_LIST=(${ELASTICSEARCH_GENERAL_ARCHIVE_PATH}/elasticsearch_????????_??h??m??s.info)
ELASTICSEARCH_LATEST_BACKUP_NAME=$(basename ${ELASTICSEARCH_BACKUP_LIST[-1]})
ELASTICSEARCH_SNAPSHOT_NAME=$(echo ${ELASTICSEARCH_LATEST_BACKUP_NAME} | cut -d '.' -f 1)
ELASTICSEARCH_SNAPSHOT_FOLDER="${ELASTICSEARCH_GENERAL_ARCHIVE_PATH}/${ELASTICSEARCH_SNAPSHOT_NAME}"

echo "Latest Elasticsearch backup archive found is ${ELASTICSEARCH_GENERAL_ARCHIVE_PATH}/${ELASTICSEARCH_LATEST_BACKUP_NAME}"
echo "Latest Cassandra backup archive found is ${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_LATEST_BACKUP_NAME}"

{
    set -e

    echo "Latest Elasticsearch backup archive extracted in ${ELASTICSEARCH_SNAPSHOT_FOLDER}"

    # Delete an existing Elasticsearch index
    echo "Trying to delete the existing Elasticsearch index"
    delete_index=$(curl -s -L -X DELETE "${ELASTICSEARCH_API_URL}/thehive_global/")

    ack_delete=$(jq '.acknowledged == true' <<< $delete_index)
    if [ $ack_delete != true ]; then
        echo "Couldn't delete thehive_global index, maybe it was already deleted"
    else
        echo "Existing thehive_global index deleted"
    fi

    # Restoring the extracted snapshot
    echo "Restoring ${ELASTICSEARCH_SNAPSHOT_NAME} snapshot"
    restore_status=$(curl -s -L -X POST "${ELASTICSEARCH_API_URL}/_snapshot/${ELASTICSEARCH_SNAPSHOT_REPOSITORY}/${ELASTICSEARCH_SNAPSHOT_NAME}/_restore?wait_for_completion=true")

    echo "Elasticsearch data restoration done!"
    rm -rf ${ELASTICSEARCH_SNAPSHOT_FOLDER}

} &
PID_ES=$!

{

    set -e

    tar xvf "${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_LATEST_BACKUP_NAME}" -C ${CASSANDRA_GENERAL_ARCHIVE_PATH} > /dev/null
    echo "Latest Cassandra backup archive extracted in ${CASSANDRA_SNAPSHOT_FOLDER}"

    # Check if Cassandra already has an existing keyspace
    cqlsh ${CASSANDRA_CONNECTION}  -e "DESCRIBE KEYSPACE ${CASSANDRA_KEYSPACE}" || true > "${CASSANDRA_SNAPSHOT_FOLDER}/target_keyspace_${CASSANDRA_KEYSPACE}.cql"

    if cmp --silent -- "${CASSANDRA_SNAPSHOT_FOLDER}/create_keyspace_${CASSANDRA_KEYSPACE}.cql" "${CASSANDRA_SNAPSHOT_FOLDER}/target_keyspace_${CASSANDRA_KEYSPACE}.cql"; then
        echo "Existing ${CASSANDRA_KEYSPACE} keyspace definition is identical to the one in the backup, no need to drop and recreate it"
    else
        echo "Existing ${CASSANDRA_KEYSPACE} keyspace definition does not match the one in the backup, dropping it"
        cqlsh ${CASSANDRA_CONNECTION} --request-timeout=120 -e "DROP KEYSPACE IF EXISTS ${CASSANDRA_KEYSPACE};"
        sleep 5s
        echo "Creating ${CASSANDRA_KEYSPACE} keyspace using the definition from the backup"
        cqlsh ${CASSANDRA_CONNECTION} --request-timeout=120 -f ${CASSANDRA_SNAPSHOT_FOLDER}/create_keyspace_${CASSANDRA_KEYSPACE}.cql
        #cqlsh ${CASSANDRA_CONNECTION} --request-timeout=120 -e "GRANT ALL PERMISSIONS ON KEYSPACE ${CASSANDRA_THEHIVE_USER} TO thehive;"
    fi

    # Create the tables and load related data
    for TABLE in $(ls "${CASSANDRA_SNAPSHOT_FOLDER}/${CASSANDRA_KEYSPACE}"); do
        TABLE_BASENAME=$(basename ${TABLE})
        TABLE_NAME=${TABLE_BASENAME%%-*}
        echo "Importing ${TABLE_NAME} table and related data"
        sstableloader -d ${CASSANDRA_CONNECTION/-p /-pw } ${CASSANDRA_SNAPSHOT_FOLDER}/${CASSANDRA_KEYSPACE}/${TABLE} > sstableloader.log
        echo ""
    done

    echo "Cassandra data restoration done!"
    rm -rf "${CASSANDRA_SNAPSHOT_FOLDER}"
} &
PID_CASS=$!

ES_EXIT=0
CASS_EXIT=0

# Wait for the two snapshots to finish
wait $PID_ES || ES_EXIT=$?
wait $PID_CASS || CASS_EXIT=$?

# Final check
if [ $ES_EXIT -eq 0 ] && [ $CASS_EXIT -eq 0 ]; then
    echo "=== ✓ Successful full restore ==="
    exit 0
else
    echo "=== ✗ ERROR - ES: exit $ES_EXIT, Cassandra: exit $CASS_EXIT ==="
    exit 1
fi

The script will automatically identify your most recent backup archives and restore them. You'll see progress messages as the restoration proceeds. When complete, you should see the message === ✓ Successful full restore ===.

For additional details, refer to the official Cassandra documentation and Elasticsearch documentation.

Step 3: Restore file storage#

Finally, we're going to restore the file attachments that were backed up.

The restore procedure depends on your storage backend—either NFS or an S3-compatible object storage service. The script below uses MinIO as an example, but you can adapt the same approach to any S3-compatible implementation.

1. Prepare the restore script#

Before running the script, update ATTACHMENT_FOLDER to match your environment. You can find this path in /etc/thehive/application.conf under the storage.localfs.location attribute. The script uses /opt/thp/thehive/files by default.

2. Run the restore script#

#!/bin/bash

# TheHive attachment variables
ATTACHMENT_FOLDER=/opt/thp/thehive/files

# Backup variables
GENERAL_ARCHIVE_PATH=/mnt/backup/storage

# Look for the latest archived attachment files snapshot
ATTACHMENT_BACKUP_LIST=(${GENERAL_ARCHIVE_PATH}/files_????????_??h??m??s.tar)
ATTACHMENT_LATEST_BACKUP_NAME=$(basename ${ATTACHMENT_BACKUP_LIST[-1]})

echo "Latest attachment files backup archive found is ${GENERAL_ARCHIVE_PATH}/${ATTACHMENT_LATEST_BACKUP_NAME}"

# Extract the latest archive
ATTACHMENT_SNAPSHOT_NAME=$(echo ${ATTACHMENT_LATEST_BACKUP_NAME} | cut -d '.' -f 1)
ATTACHMENT_SNAPSHOT_FOLDER="${GENERAL_ARCHIVE_PATH}/${ATTACHMENT_SNAPSHOT_NAME}"

tar xvf "${GENERAL_ARCHIVE_PATH}/${ATTACHMENT_LATEST_BACKUP_NAME}" -C ${GENERAL_ARCHIVE_PATH}
echo "Latest attachment files backup archive extracted in ${ATTACHMENT_SNAPSHOT_FOLDER}"

# Clean existing TheHive attachment files
rm -rf ${ATTACHMENT_FOLDER}/*

# Copy the attachment files from the backup
cp -r ${ATTACHMENT_SNAPSHOT_FOLDER}/* ${ATTACHMENT_FOLDER}/

echo "attachment files data restoration done!"
rm -rf ${ATTACHMENT_SNAPSHOT_FOLDER}

1. Prepare the restore script#

Before running the script, you'll need to update several values to match your environment:

  • Update MINIO_ENDPOINT with your MinIO server URL.
  • Update MINIO_ACCESS_KEY with your MinIO access key.
  • Update MINIO_SECRET_KEY with your MinIO secret key.
  • Change MINIO_BUCKET if you want to use a different bucket name.
  • Change MINIO_ALIAS if you want to use a different alias name.

2. Run the restore script#

#!/bin/bash

# TheHive attachment variables
MINIO_ARCHIVE_PATH=/mnt/backup/minio/

# MinIO variables
MINIO_ENDPOINT="<minio_server_url>"
MINIO_ACCESS_KEY="<access_key>"
MINIO_SECRET_KEY="<secret_key>"
MINIO_BUCKET="thehive"
MINIO_ALIAS=th_minio

# Check if MinIO is accessible
if ! mcli ls ${MINIO_ALIAS} > /dev/null 2>&1; then
    echo "Error: Cannot connect to MinIO server"
    exit 1
fi

# Look for the latest backup snapshot in MinIO
MINIO_BACKUP_LIST=(${MINIO_ARCHIVE_PATH}/minio_????????_??h??m??s.tar)
MINIO_LATEST_BACKUP_NAME=$(basename ${MINIO_BACKUP_LIST[-1]})

if [ -z "${LATEST_BACKUP}" ]; then
    echo "Error: No backup snapshots found in ${MINIO_ARCHIVE_PATH}"
    exit 1
fi

echo "Latest attachment files backup snapshot found is ${MINIO_ARCHIVE_PATH}/${MINIO_LATEST_BACKUP_NAME}"

tar xvf "${MINIO_ARCHIVE_PATH}/${MINIO_LATEST_BACKUP_NAME}" -C ${MINIO_ARCHIVE_PATH} > /dev/null
echo "Latest Minio backup archive extracted in ${MINIO_ARCHIVE_PATH}/${MINIO_LATEST_BACKUP_NAME}"

# Restore attachments from MinIO
echo "Restoring attachments from MinIO snapshot ${MINIO_LATEST_BACKUP_NAME}..."
mcli mirror ${MINIO_ARCHIVE_PATH}/${MINIO_LATEST_BACKUP_NAME%.tar} ${MINIO_ALIAS}/${MINIO_BUCKET}/

# Display completion message
echo ""
echo "Attachment files data restoration done!"
echo "Restored from: ${MINIO_ALIAS}/${MINIO_BUCKET}/${LATEST_BACKUP}"

Step 4: Start TheHive on all nodes and verify#

Now that all data components have been restored, we're going to start TheHive on all cluster nodes and verify that everything works as expected.

  1. Start TheHive on each node.

    sudo systemctl start thehive
    
  2. Check the service status on all nodes.

    sudo systemctl status thehive
    

    The service should show as active and running on each node.

  3. Monitor the logs on each node for any errors.

    sudo journalctl -u thehive -f
    

    Watch for any error messages during startup. If everything restored correctly, you should see TheHive initializing normally across all nodes.

  4. Access TheHive through your web browser and verify that everything works as usual.

You've completed the restore process! If you encounter any issues with missing data or functionality, review the logs and verify that all three backup archives were present and successfully restored.

Next steps