Restore a Hot Backup on a Standalone Server#
In this tutorial, we're going to guide you through restoring a hot backup of TheHive on a standalone server using the provided scripts.
By the end, you'll have brought your system back to a working state using the backups you created previously, with all your database, search index, and file storage fully restored.
Shutdown required
Restoring from a hot backup requires stopping TheHive application. Plan a maintenance window.
Best practices for safe backup and restore
- Coordinate your Apache Cassandra, Elasticsearch, and file storage backups to run at the same time. Using automation like a cron job helps minimize the chance of inconsistencies between components.
- Before relying on these backups in a real incident, test the full backup and restore flow in a staging environment. It’s the only way to make sure everything works as expected.
- Ensure you have an up-to-date backup before starting the restore operation, as errors during the restoration could lead to data loss.
Python compatibility for cqlsh
cqlsh requires Python 3.9. If your Linux distribution provides a newer Python version by default, you must install Python 3.9 alongside it and explicitly tell cqlsh which interpreter to use. You can do this by setting the CQLSH_PYTHON environment variable when running cqlsh: sudo -u cassandra CQLSH_PYTHON=/path/to/python3.9 cqlsh.
This tutorial assumes you’ve completed the Perform a Hot Backup on a Standalone Server tutorial, and that you're using the same directory paths during restore as during backup.
Step 1: Stop TheHive#
Before we begin restoring data, we need to stop TheHive to prevent any conflicts during the restoration process.
sudo systemctl stop thehive
Verify that TheHive has stopped:
sudo systemctl status thehive
Step 2: Restore Cassandra and Elasticsearch snapshots#
We're going to restore both your database and search index from the backup archives. The script automatically identifies your most recent backup files, extracts them, and restores both Cassandra and Elasticsearch in parallel while handling keyspace recreation if needed.
1. Prepare the restore script#
Before running the script, you'll need to update several values to match your environment:
For Cassandra#
- Update
CASSANDRA_KEYSPACEto match your configuration. You can find this in/etc/thehive/application.conffile under thedb.janusgraph.storage.cql.keyspaceattribute. The script usesthehiveby default. - Update
CASSANDRA_CONNECTIONwith127.0.0.1. - If you configured authentication in
/etc/thehive/application.conf:- Replace the value of the
CASSANDRA_CONNECTIONvariable with:"127.0.0.1 admin -p <authentication_admin_password>" - Uncomment the following line:
cqlsh ${CASSANDRA_CONNECTION} --request-timeout=120 -e "GRANT ALL PERMISSIONS ON KEYSPACE ${CASSANDRA_THEHIVE_USER} TO thehive;"
- Replace the value of the
- Update
CASSANDRA_THEHIVE_USERto match the role that accesses the keyspace. The script usesthehiveby default.
For Elasticsearch#
- Update
ELASTICSEARCH_SNAPSHOT_REPOSITORYto match the repository name you registered in a previous step. The script usesthehive_repositoryby default. - If you configured authentication in
/etc/thehive/application.conf, add-u thehive:<thehive_user_password>to allcurlcommands, using your actual password.
2. Run the restore script#
How to run this script
Run this script with sudo privileges on a node that has both Elasticsearch and Cassandra installed and running.
#!/bin/bash
set -e
# Configuration
# Cassandra variables
CASSANDRA_KEYSPACE=thehive
CASSANDRA_GENERAL_ARCHIVE_PATH=/mnt/backup/cassandra
CASSANDRA_CONNECTION="<ip_node_cassandra>"
CASSANDRA_THEHIVE_USER="thehive"
CASSANDRA_BACKUP_LIST=(${CASSANDRA_GENERAL_ARCHIVE_PATH}/cassandra_????????_??h??m??s.tar)
CASSANDRA_LATEST_BACKUP_NAME=$(basename ${CASSANDRA_BACKUP_LIST[-1]})
CASSANDRA_SNAPSHOT_NAME=$(echo ${CASSANDRA_LATEST_BACKUP_NAME} | cut -d '.' -f 1)
CASSANDRA_SNAPSHOT_FOLDER="${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}"
# Elasticsearch variables
ELASTICSEARCH_API_URL='http://127.0.0.1:9200'
ELASTICSEARCH_SNAPSHOT_REPOSITORY=thehive_repository
ELASTICSEARCH_GENERAL_ARCHIVE_PATH=/mnt/backup/elasticsearch
ELASTICSEARCH_BACKUP_LIST=(${ELASTICSEARCH_GENERAL_ARCHIVE_PATH}/elasticsearch_????????_??h??m??s.info)
ELASTICSEARCH_LATEST_BACKUP_NAME=$(basename ${ELASTICSEARCH_BACKUP_LIST[-1]})
ELASTICSEARCH_SNAPSHOT_NAME=$(echo ${ELASTICSEARCH_LATEST_BACKUP_NAME} | cut -d '.' -f 1)
echo "Latest Elasticsearch backup archive found is ${ELASTICSEARCH_GENERAL_ARCHIVE_PATH}/${ELASTICSEARCH_LATEST_BACKUP_NAME}"
echo "Latest Cassandra backup archive found is ${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_LATEST_BACKUP_NAME}"
{
set -e
echo "Latest Elasticsearch snapshot to be restore: ELASTICSEARCH_SNAPSHOT_NAME"
# Delete an existing Elasticsearch index
echo "Trying to delete the existing Elasticsearch index"
delete_index=$(curl -s -L -X DELETE "${ELASTICSEARCH_API_URL}/thehive_global/")
ack_delete=$(jq '.acknowledged == true' <<< $delete_index)
if [ $ack_delete != true ]; then
echo "Couldn't delete thehive_global index, maybe it was already deleted"
else
echo "Existing thehive_global index deleted"
fi
# Restoring the extracted snapshot
echo "Restoring ${ELASTICSEARCH_SNAPSHOT_NAME} snapshot"
restore_status=$(curl -s -L -X POST "${ELASTICSEARCH_API_URL}/_snapshot/${ELASTICSEARCH_SNAPSHOT_REPOSITORY}/${ELASTICSEARCH_SNAPSHOT_NAME}/_restore?wait_for_completion=true")
echo "Elasticsearch data restoration done!"
} &
PID_ES=$!
{
set -e
tar xvf "${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_LATEST_BACKUP_NAME}" -C ${CASSANDRA_GENERAL_ARCHIVE_PATH} > /dev/null
echo "Latest Cassandra backup archive extracted in ${CASSANDRA_SNAPSHOT_FOLDER}"
# Check if Cassandra already has an existing keyspace
cqlsh ${CASSANDRA_CONNECTION} -e "DESCRIBE KEYSPACE ${CASSANDRA_KEYSPACE}" || true > "${CASSANDRA_SNAPSHOT_FOLDER}/target_keyspace_${CASSANDRA_KEYSPACE}.cql"
if cmp --silent -- "${CASSANDRA_SNAPSHOT_FOLDER}/create_keyspace_${CASSANDRA_KEYSPACE}.cql" "${CASSANDRA_SNAPSHOT_FOLDER}/target_keyspace_${CASSANDRA_KEYSPACE}.cql"; then
echo "Existing ${CASSANDRA_KEYSPACE} keyspace definition is identical to the one in the backup, no need to drop and recreate it"
else
echo "Existing ${CASSANDRA_KEYSPACE} keyspace definition does not match the one in the backup, dropping it"
cqlsh ${CASSANDRA_CONNECTION} --request-timeout=120 -e "DROP KEYSPACE IF EXISTS ${CASSANDRA_KEYSPACE};"
sleep 5s
echo "Creating ${CASSANDRA_KEYSPACE} keyspace using the definition from the backup"
cqlsh ${CASSANDRA_CONNECTION} --request-timeout=120 -f ${CASSANDRA_SNAPSHOT_FOLDER}/create_keyspace_${CASSANDRA_KEYSPACE}.cql
#cqlsh ${CASSANDRA_CONNECTION} --request-timeout=120 -e "GRANT ALL PERMISSIONS ON KEYSPACE ${CASSANDRA_THEHIVE_USER} TO thehive;"
fi
# Create the tables and load related data
for TABLE in $(ls "${CASSANDRA_SNAPSHOT_FOLDER}/${CASSANDRA_KEYSPACE}"); do
TABLE_BASENAME=$(basename ${TABLE})
TABLE_NAME=${TABLE_BASENAME%%-*}
echo "Importing ${TABLE_NAME} table and related data"
nodetool import ${CASSANDRA_KEYSPACE} ${TABLE_NAME} ${CASSANDRA_SNAPSHOT_FOLDER}/${CASSANDRA_KEYSPACE}/${TABLE}
echo ""
done
echo "Cassandra data restoration done!"
rm -rf "${CASSANDRA_SNAPSHOT_FOLDER}"
} &
PID_CASS=$!
ES_EXIT=0
CASS_EXIT=0
# Wait for the two snapshots to finish
wait $PID_ES || ES_EXIT=$?
wait $PID_CASS || CASS_EXIT=$?
# Final check
if [ $ES_EXIT -eq 0 ] && [ $CASS_EXIT -eq 0 ]; then
echo "=== ✓ Successful full restore ==="
exit 0
else
echo "=== ✗ ERREUR - ES: exit $ES_EXIT, Cassandra: exit $CASS_EXIT ==="
exit 1
fi
The script will automatically identify your most recent backup archives and restore them. You'll see progress messages as the restoration proceeds. When complete, you should see the message === ✓ Successful full restore ===.
For additional details, refer to the official Cassandra documentation and Elasticsearch documentation.
Step 3: Restore file storage#
Finally, we're going to restore the file attachments that were backed up.
1. Prepare the restore script#
Before running the script, update ATTACHMENT_FOLDER to match your environment. You can find this path in /etc/thehive/application.conf under the storage.localfs.location attribute. The script uses /opt/thp/thehive/files by default.
2. Run the restore script#
#!/bin/bash
# TheHive attachment variables
ATTACHMENT_FOLDER=/opt/thp/thehive/files
# Backup variables
GENERAL_ARCHIVE_PATH=/mnt/backup/storage
# Look for the latest archived attachment files snapshot
ATTACHMENT_BACKUP_LIST=(${GENERAL_ARCHIVE_PATH}/files_????????_??h??m??s.tar)
ATTACHMENT_LATEST_BACKUP_NAME=$(basename ${ATTACHMENT_BACKUP_LIST[-1]})
echo "Latest attachment files backup archive found is ${GENERAL_ARCHIVE_PATH}/${ATTACHMENT_LATEST_BACKUP_NAME}"
# Extract the latest archive
ATTACHMENT_SNAPSHOT_NAME=$(echo ${ATTACHMENT_LATEST_BACKUP_NAME} | cut -d '.' -f 1)
ATTACHMENT_SNAPSHOT_FOLDER="${GENERAL_ARCHIVE_PATH}/${ATTACHMENT_SNAPSHOT_NAME}"
tar xvf "${GENERAL_ARCHIVE_PATH}/${ATTACHMENT_LATEST_BACKUP_NAME}" -C ${GENERAL_ARCHIVE_PATH}
echo "Latest attachment files backup archive extracted in ${ATTACHMENT_SNAPSHOT_FOLDER}"
# Clean existing TheHive attachment files
rm -rf ${ATTACHMENT_FOLDER}/*
# Copy the attachment files from the backup
cp -r ${ATTACHMENT_SNAPSHOT_FOLDER}/* ${ATTACHMENT_FOLDER}/
echo "attachment files data restoration done!"
rm -rf ${ATTACHMENT_SNAPSHOT_FOLDER}
Step 4: Start TheHive and verify#
Now that all data components have been restored, we're going to start TheHive and verify that everything works as expected.
-
Start TheHive.
sudo systemctl start thehive -
Check the service status.
sudo systemctl status thehiveThe service should show as active and running.
-
Monitor the logs for any errors.
sudo journalctl -u thehive -fWatch for any error messages during startup. If everything restored correctly, you should see TheHive initializing normally.
-
Access TheHive through your web browser and verify that everything works as usual.
You've completed the restore process! If you encounter any issues with missing data or functionality, review the logs and verify that all three backup archives were present and successfully restored.