Restore a Hot Backup on a Standalone Server#

In this tutorial, we're going to guide you through restoring a hot backup of TheHive on a standalone server using the provided scripts.

By the end, you'll have brought your system back to a working state using the backups you created previously, with all your database, search index, and file storage fully restored.

Shutdown required

Restoring from a hot backup requires stopping TheHive application. Plan a maintenance window.

Best practices for safe backup and restore

Coordinate your Apache Cassandra, Elasticsearch, and file storage backups to run at the same time. Using automation like a cron job helps minimize the chance of inconsistencies between components.
Before relying on these backups in a real incident, test the full backup and restore flow in a staging environment. It’s the only way to make sure everything works as expected.
Ensure you have an up-to-date backup before starting the restore operation, as errors during the restoration could lead to data loss.

Python compatibility for cqlsh

cqlsh requires Python 3.9. If your Linux distribution provides a newer Python version by default, you must install Python 3.9 alongside it and explicitly tell cqlsh which interpreter to use. You can do this by setting the CQLSH_PYTHON environment variable when running cqlsh: sudo -u cassandra CQLSH_PYTHON=/path/to/python3.9 cqlsh.

This tutorial assumes you’ve completed the Perform a Hot Backup on a Standalone Server tutorial, and that you're using the same directory paths during restore as during backup.

Step 1: Stop TheHive#

Before we begin restoring data, we need to stop TheHive to prevent any conflicts during the restoration process.

sudo systemctl stop thehive

Verify that TheHive has stopped:

sudo systemctl status thehive

Step 2: Restore Cassandra and Elasticsearch snapshots#

We're going to restore both your database and search index from the backup archives. The script automatically identifies your most recent backup files, extracts them, and restores both Cassandra and Elasticsearch in parallel while handling keyspace recreation if needed.

1. Prepare the restore script#

Before running the script, you'll need to update several values to match your environment:

For Cassandra#

Update CASSANDRA_KEYSPACE to match your configuration. You can find this in /etc/thehive/application.conf file under the db.janusgraph.storage.cql.keyspace attribute. The script uses thehive by default.
Update CASSANDRA_CONNECTION with 127.0.0.1.
If you configured authentication in /etc/thehive/application.conf:
- Replace the value of the CASSANDRA_CONNECTION variable with: "127.0.0.1 admin -p <authentication_admin_password>"
- Uncomment the following line: cqlsh ${CASSANDRA_CONNECTION} --request-timeout=120 -e "GRANT ALL PERMISSIONS ON KEYSPACE ${CASSANDRA_THEHIVE_USER} TO thehive;"
Update CASSANDRA_THEHIVE_USER to match the role that accesses the keyspace. The script uses thehive by default.

For Elasticsearch#

Update ELASTICSEARCH_SNAPSHOT_REPOSITORY to match the repository name you registered in a previous step. The script uses thehive_repository by default.
If you configured authentication in /etc/thehive/application.conf, add -u thehive:<thehive_user_password> to all curl commands, using your actual password.

2. Run the restore script#

How to run this script

Run this script with sudo privileges on a node that has both Elasticsearch and Cassandra installed and running.

#!/bin/bash

set -e

# Configuration
# Cassandra variables
CASSANDRA_KEYSPACE=thehive
CASSANDRA_GENERAL_ARCHIVE_PATH=/mnt/backup/cassandra
CASSANDRA_CONNECTION="<ip_node_cassandra>"
CASSANDRA_THEHIVE_USER="thehive"
CASSANDRA_BACKUP_LIST=(${CASSANDRA_GENERAL_ARCHIVE_PATH}/cassandra_????????_??h??m??s.tar)
CASSANDRA_LATEST_BACKUP_NAME=$(basename ${CASSANDRA_BACKUP_LIST[-1]})
CASSANDRA_SNAPSHOT_NAME=$(echo ${CASSANDRA_LATEST_BACKUP_NAME} | cut -d '.' -f 1)
CASSANDRA_SNAPSHOT_FOLDER="${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}"

# Elasticsearch variables
ELASTICSEARCH_API_URL='http://127.0.0.1:9200'
ELASTICSEARCH_SNAPSHOT_REPOSITORY=thehive_repository
ELASTICSEARCH_GENERAL_ARCHIVE_PATH=/mnt/backup/elasticsearch
ELASTICSEARCH_BACKUP_LIST=(${ELASTICSEARCH_GENERAL_ARCHIVE_PATH}/elasticsearch_????????_??h??m??s.info)
ELASTICSEARCH_LATEST_BACKUP_NAME=$(basename ${ELASTICSEARCH_BACKUP_LIST[-1]})
ELASTICSEARCH_SNAPSHOT_NAME=$(echo ${ELASTICSEARCH_LATEST_BACKUP_NAME} | cut -d '.' -f 1)

echo "Latest Elasticsearch backup archive found is ${ELASTICSEARCH_GENERAL_ARCHIVE_PATH}/${ELASTICSEARCH_LATEST_BACKUP_NAME}"
echo "Latest Cassandra backup archive found is ${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_LATEST_BACKUP_NAME}"

{
    set -e

    echo "Latest Elasticsearch snapshot to be restore: ELASTICSEARCH_SNAPSHOT_NAME"

    # Delete an existing Elasticsearch index
    echo "Trying to delete the existing Elasticsearch index"
    delete_index=$(curl -s -L -X DELETE "${ELASTICSEARCH_API_URL}/thehive_global/")

    ack_delete=$(jq '.acknowledged == true' <<< $delete_index)
    if [ $ack_delete != true ]; then
        echo "Couldn't delete thehive_global index, maybe it was already deleted"
    else
        echo "Existing thehive_global index deleted"
    fi

    # Restoring the extracted snapshot
    echo "Restoring ${ELASTICSEARCH_SNAPSHOT_NAME} snapshot"
    restore_status=$(curl -s -L -X POST "${ELASTICSEARCH_API_URL}/_snapshot/${ELASTICSEARCH_SNAPSHOT_REPOSITORY}/${ELASTICSEARCH_SNAPSHOT_NAME}/_restore?wait_for_completion=true")

    echo "Elasticsearch data restoration done!"

} &
PID_ES=$!

{

    set -e

    tar xvf "${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_LATEST_BACKUP_NAME}" -C ${CASSANDRA_GENERAL_ARCHIVE_PATH} > /dev/null
    echo "Latest Cassandra backup archive extracted in ${CASSANDRA_SNAPSHOT_FOLDER}"

    # Check if Cassandra already has an existing keyspace
    cqlsh ${CASSANDRA_CONNECTION}  -e "DESCRIBE KEYSPACE ${CASSANDRA_KEYSPACE}" || true > "${CASSANDRA_SNAPSHOT_FOLDER}/target_keyspace_${CASSANDRA_KEYSPACE}.cql"

    if cmp --silent -- "${CASSANDRA_SNAPSHOT_FOLDER}/create_keyspace_${CASSANDRA_KEYSPACE}.cql" "${CASSANDRA_SNAPSHOT_FOLDER}/target_keyspace_${CASSANDRA_KEYSPACE}.cql"; then
        echo "Existing ${CASSANDRA_KEYSPACE} keyspace definition is identical to the one in the backup, no need to drop and recreate it"
    else
        echo "Existing ${CASSANDRA_KEYSPACE} keyspace definition does not match the one in the backup, dropping it"
        cqlsh ${CASSANDRA_CONNECTION} --request-timeout=120 -e "DROP KEYSPACE IF EXISTS ${CASSANDRA_KEYSPACE};"
        sleep 5s
        echo "Creating ${CASSANDRA_KEYSPACE} keyspace using the definition from the backup"
        cqlsh ${CASSANDRA_CONNECTION} --request-timeout=120 -f ${CASSANDRA_SNAPSHOT_FOLDER}/create_keyspace_${CASSANDRA_KEYSPACE}.cql
        #cqlsh ${CASSANDRA_CONNECTION} --request-timeout=120 -e "GRANT ALL PERMISSIONS ON KEYSPACE ${CASSANDRA_THEHIVE_USER} TO thehive;"
    fi

    # Create the tables and load related data
    for TABLE in $(ls "${CASSANDRA_SNAPSHOT_FOLDER}/${CASSANDRA_KEYSPACE}"); do
        TABLE_BASENAME=$(basename ${TABLE})
        TABLE_NAME=${TABLE_BASENAME%%-*}
        echo "Importing ${TABLE_NAME} table and related data"
        nodetool import ${CASSANDRA_KEYSPACE} ${TABLE_NAME} ${CASSANDRA_SNAPSHOT_FOLDER}/${CASSANDRA_KEYSPACE}/${TABLE}
        echo ""
    done

    echo "Cassandra data restoration done!"
    rm -rf "${CASSANDRA_SNAPSHOT_FOLDER}"
} &
PID_CASS=$!

ES_EXIT=0
CASS_EXIT=0

# Wait for the two snapshots to finish
wait $PID_ES || ES_EXIT=$?
wait $PID_CASS || CASS_EXIT=$?

# Final check
if [ $ES_EXIT -eq 0 ] && [ $CASS_EXIT -eq 0 ]; then
    echo "=== ✓ Successful full restore ==="
    exit 0
else
    echo "=== ✗ ERREUR - ES: exit $ES_EXIT, Cassandra: exit $CASS_EXIT ==="
    exit 1
fi

The script will automatically identify your most recent backup archives and restore them. You'll see progress messages as the restoration proceeds. When complete, you should see the message === ✓ Successful full restore ===.

For additional details, refer to the official Cassandra documentation and Elasticsearch documentation.

Step 3: Restore file storage#

Finally, we're going to restore the file attachments that were backed up.

1. Prepare the restore script#

Before running the script, update ATTACHMENT_FOLDER to match your environment. You can find this path in /etc/thehive/application.conf under the storage.localfs.location attribute. The script uses /opt/thp/thehive/files by default.

2. Run the restore script#

#!/bin/bash

# TheHive attachment variables
ATTACHMENT_FOLDER=/opt/thp/thehive/files

# Backup variables
GENERAL_ARCHIVE_PATH=/mnt/backup/storage

# Look for the latest archived attachment files snapshot
ATTACHMENT_BACKUP_LIST=(${GENERAL_ARCHIVE_PATH}/files_????????_??h??m??s.tar)
ATTACHMENT_LATEST_BACKUP_NAME=$(basename ${ATTACHMENT_BACKUP_LIST[-1]})

echo "Latest attachment files backup archive found is ${GENERAL_ARCHIVE_PATH}/${ATTACHMENT_LATEST_BACKUP_NAME}"

# Extract the latest archive
ATTACHMENT_SNAPSHOT_NAME=$(echo ${ATTACHMENT_LATEST_BACKUP_NAME} | cut -d '.' -f 1)
ATTACHMENT_SNAPSHOT_FOLDER="${GENERAL_ARCHIVE_PATH}/${ATTACHMENT_SNAPSHOT_NAME}"

tar xvf "${GENERAL_ARCHIVE_PATH}/${ATTACHMENT_LATEST_BACKUP_NAME}" -C ${GENERAL_ARCHIVE_PATH}
echo "Latest attachment files backup archive extracted in ${ATTACHMENT_SNAPSHOT_FOLDER}"

# Clean existing TheHive attachment files
rm -rf ${ATTACHMENT_FOLDER}/*

# Copy the attachment files from the backup
cp -r ${ATTACHMENT_SNAPSHOT_FOLDER}/* ${ATTACHMENT_FOLDER}/

echo "attachment files data restoration done!"
rm -rf ${ATTACHMENT_SNAPSHOT_FOLDER}

Step 4: Start TheHive and verify#

Now that all data components have been restored, we're going to start TheHive and verify that everything works as expected.

Start TheHive.
```
sudo systemctl start thehive
```
Check the service status.
```
sudo systemctl status thehive
```
The service should show as active and running.
Monitor the logs for any errors.
```
sudo journalctl -u thehive -f
```
Watch for any error messages during startup. If everything restored correctly, you should see TheHive initializing normally.
Access TheHive through your web browser and verify that everything works as usual.

You've completed the restore process! If you encounter any issues with missing data or functionality, review the logs and verify that all three backup archives were present and successfully restored.

Next steps

Perform a Hot Backup on a Standalone Server