Perform a Hot Backup on a Standalone Server#
In this tutorial, we're going to guide you through performing a hot backup of TheHive on a standalone server using the provided scripts.
By the end, you'll have created complete backups of your database, search index, and file storage—all without stopping TheHive.
Hot backups let you protect your data while keeping TheHive running, which means zero downtime for your security operations team.
These backups are essential to protect your data and ensure you can recover quickly in case of a system failure or data loss.
Understand the implications
Hot backups allow TheHive to keep running during the process, but they don’t guarantee perfect data consistency. Review the Cold vs. Hot Backups and Restores topic to ensure this method fits your organization's risk tolerance and operational needs.
Best practices for safe backup and restore
- Coordinate your Apache Cassandra, Elasticsearch, and file storage backups to run at the same time. Using automation like a cron job helps minimize the chance of inconsistencies between components.
- Before relying on these backups in a real incident, test the full backup and restore flow in a staging environment. It’s the only way to make sure everything works as expected.
- Ensure you have an up-to-date backup before starting the restore operation, as errors during the restoration could lead to data loss.
Script restrictions
These scripts work only for native installations following the Install TheHive on Linux Systems configuration. Docker and Kubernetes deployments aren't supported.
Step 1: Install required tools#
Before we begin, let's make sure your system has all the necessary tools installed.
You'll need the following:
- Cassandra nodetool: Command-line tool for managing Cassandra clusters, used for creating database snapshots
- tar: Utility for archiving backup files
- cqlsh: Command-line interface for executing CQL queries against the Cassandra database
- curl: Tool for transferring data with URLs, useful for interacting with the Elasticsearch API
- jq: Lightweight command-line JSON processor for parsing and manipulating JSON data in scripts
Python compatibility for cqlsh
cqlsh requires Python 3.9. If your Linux distribution provides a newer Python version by default, you must install Python 3.9 alongside it and explicitly tell cqlsh which interpreter to use. You can do this by setting the CQLSH_PYTHON environment variable when running cqlsh: sudo -u cassandra CQLSH_PYTHON=/path/to/python3.9 cqlsh.
If any tools are missing, install them using your package manager. For example:
sudo apt install jqfor DEB-based operating systemssudo yum install jqfor RPM-based operating systems
Step 2: Set Elasticsearch permissions#
Let's ensure Elasticsearch has the correct permissions to access the snapshot directory.
sudo mkdir -p /mnt/backup/elasticsearch
sudo chown elasticsearch:elasticsearch /mnt/backup/elasticsearch
sudo chmod 770 /mnt/backup/elasticsearch
Step 3: Set up the Elasticsearch snapshot repository#
We're going to configure Elasticsearch to store snapshots with timestamped names. This repository will be used to create backups of your search index.
-
In your
elasticsearch.ymlfile, define the location where snapshots will be stored.path.repo: /mnt/backup/elasticsearch -
After saving your changes, restart Elasticsearch.
sudo systemctl restart elasticsearch -
Register the repository.
curl -X PUT "http://127.0.0.1:9200/_snapshot/thehive_repository" \ -H "Content-Type: application/json" \ -d '{ "type": "fs", "settings": { "location": "/mnt/backup/elasticsearch" } }'You should see a response like this:
{ "acknowledged": true }
For step-by-step details, see the official Elasticsearch documentation.
Step 4: Perform health checks#
Before creating any backups, we're going to verify that all TheHive components are healthy. This helps us catch any issues that could affect backup integrity.
Check service status#
Let's confirm that all TheHive components are running.
sudo systemctl status thehive
sudo systemctl status cassandra
sudo systemctl status elasticsearch
All services should show as active and running.
Check Cassandra status#
Run the following command:
nodetool status
You should see nodes marked as UN (Up/Normal). This indicates your Cassandra cluster is healthy.
Check Elasticsearch cluster health#
curl -X GET "http://127.0.0.1:9200/_cluster/health?pretty"
The status should be green, which means your cluster is healthy and fully functional.
Other possible statuses include:
yellow: Some replicas are missing but data is still available.red: Some data is unavailable—you should investigate before proceeding.
Review system logs#
Check for any recent errors or warnings.
sudo journalctl -u thehive
sudo journalctl -u cassandra
sudo journalctl -u elasticsearch
If you find any critical errors, resolve them before continuing with the backup process.
Step 5: Create Cassandra and Elasticsearch snapshots#
Now we're going to create snapshots of both your database and search index simultaneously. This parallel approach minimizes the time window between snapshots.
1. Prepare the backup script#
We're going to use a script that creates hot backups of both Cassandra and Elasticsearch in parallel. The script simultaneously captures snapshots of your database and index, then packages both into separate .tar archives for safe storage.
Before running the script, you'll need to update several values to match your environment:
For Cassandra#
- Update
CASSANDRA_KEYSPACEto match your configuration. You can find this in/etc/thehive/application.conffile under thedb.janusgraph.storage.cql.keyspaceattribute. The script usesthehiveby default. - Update
CASSANDRA_CONNECTIONwith127.0.0.1. - If you configured authentication in
/etc/thehive/application.conf, replace the value of theCASSANDRA_CONNECTIONvariable with:"127.0.0.1 admin -p <authentication_admin_password>".
For Elasticsearch#
- Update
ELASTICSEARCH_SNAPSHOT_REPOSITORYto match the repository name you registered in a previous step. The script usesthehive_repositoryby default. - If you configured authentication in
/etc/thehive/application.conf, add-u thehive:<thehive_user_password>to allcurlcommands, using your actual password.
2. Run the backup script#
How to run this script
Run this script with sudo privileges on a node that has both Elasticsearch and Cassandra installed and running.
#!/bin/bash
set -e
# Configuration
# Cassandra variables
CASSANDRA_KEYSPACE=thehive
CASSANDRA_CONNECTION="<ip_node_cassandra>"
CASSANDRA_GENERAL_ARCHIVE_PATH=/mnt/backup/cassandra
CASSANDRA_DATA_FOLDER=/var/lib/cassandra
CASSANDRA_SNAPSHOT_NAME="cassandra_$(date +%Y%m%d_%Hh%Mm%Ss)"
CASSANDRA_ARCHIVE_PATH="${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}/${CASSANDRA_KEYSPACE}"
# Elasticsearch variables
ELASTICSEARCH_API_URL='http://127.0.0.1:9200'
ELASTICSEARCH_SNAPSHOT_REPOSITORY=thehive_repository
ELASTICSEARCH_GENERAL_ARCHIVE_PATH=/mnt/backup/elasticsearch
ELASTICSEARCH_SNAPSHOT_NAME="elasticsearch_$(date +%Y%m%d_%Hh%Mm%Ss)"
# Check if the snapshot repository is correctly registered
repository_config=$(curl -s -L "${ELASTICSEARCH_API_URL}/_snapshot")
repository_ok=$(jq 'has("'${ELASTICSEARCH_SNAPSHOT_REPOSITORY}'")' <<< ${repository_config})
if ! ${repository_ok}; then
echo "Abort, no snapshot repository registered in Elasticsearch"
echo "Set the repository folder 'path.repo'"
echo "in an environment variable"
echo "or in elasticsearch.yml"
exit 1
fi
# Make sure the snapshot folder exists and its subcontent permissions are correct
mkdir -p ${CASSANDRA_ARCHIVE_PATH}
chown -R cassandra:cassandra ${CASSANDRA_ARCHIVE_PATH}
echo "Snapshot of all ${CASSANDRA_KEYSPACE} tables will be stored inside ${CASSANDRA_ARCHIVE_PATH}"
# Run both backups in parallel
{
set -e
# Creating snapshot name information file
touch ${ELASTICSEARCH_GENERAL_ARCHIVE_PATH}/${ELASTICSEARCH_SNAPSHOT_NAME}.info
echo "[ES] Starting the Elasticsearch snapshot..."
RESPONSE=$(curl -s -L -X PUT "${ELASTICSEARCH_API_URL}/_snapshot/${ELASTICSEARCH_SNAPSHOT_REPOSITORY}/${ELASTICSEARCH_SNAPSHOT_NAME}" \
-H 'Content-Type: application/json' \
-d '{"indices":"thehive_global", "ignore_unavailable":true, "include_global_state":false}')
if echo "$RESPONSE" | grep -q '"accepted":true'; then
echo "[ES] ✓ Elasticsearch snapshot started successfully"
exit 0
else
echo "[ES] ✗ Elasticsearch ERROR: $RESPONSE"
exit 1
fi
# Verify that the snapshot is finished
state="NONE"
while [ "${state}" != "\"SUCCESS\"" ]; do
echo "Snapshot in progress, waiting 5 seconds before checking status again..."
sleep 5
snapshot_list=$(curl -s -L "${ELASTICSEARCH_API_URL}/_snapshot/${ELASTICSEARCH_SNAPSHOT_REPOSITORY}/*?verbose=false")
state=$(jq '.snapshots[] | select(.snapshot == "'${ELASTICSEARCH_SNAPSHOT_NAME}'").state' <<< ${snapshot_list})
done
echo "Snapshot finished"
} &
PID_ES=$!
{
set -e
echo "[CASS] Starting snapshot ${CASSANDRA_SNAPSHOT_NAME} for keyspace ${CASSANDRA_KEYSPACE}"
if nodetool snapshot -t "${CASSANDRA_SNAPSHOT_NAME}" "${CASSANDRA_KEYSPACE}"; then
echo "[CASS] ✓ Snapshot Cassandra created successfully"
# Save the cql schema of the keyspace
cqlsh ${CASSANDRA_CONNECTION} -e "DESCRIBE KEYSPACE ${CASSANDRA_KEYSPACE}" | grep -v "^WARNING" > "${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}/create_keyspace_${CASSANDRA_KEYSPACE}.cql"
echo "The keyspace cql definition for ${CASSANDRA_KEYSPACE} is stored in this file: ${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}/create_keyspace_${CASSANDRA_KEYSPACE}.cql"
# For each table folder in the keyspace folder of the snapshot
for TABLE in $(ls ${CASSANDRA_DATA_FOLDER}/data/${CASSANDRA_KEYSPACE}); do
# Folder where the snapshot files are stored
TABLE_SNAPSHOT_FOLDER=${CASSANDRA_DATA_FOLDER}/data/${CASSANDRA_KEYSPACE}/${TABLE}/snapshots/${CASSANDRA_SNAPSHOT_NAME}
if [ -d ${TABLE_SNAPSHOT_FOLDER} ]; then
# Create a folder for each table
mkdir "${CASSANDRA_ARCHIVE_PATH}/${TABLE}"
chown -R cassandra:cassandra ${CASSANDRA_ARCHIVE_PATH}/${TABLE}
# Copy the snapshot files to the proper table folder
# Snapshots files are hardlinks,
# so we use --remove-destination to make sure the files are actually copied and not just linked
cp -p --remove-destination ${TABLE_SNAPSHOT_FOLDER}/* ${CASSANDRA_ARCHIVE_PATH}/${TABLE}
fi
done
# Delete Cassandra snapshot once it's backed up
nodetool clearsnapshot -t ${CASSANDRA_SNAPSHOT_NAME} > /dev/null
# Create a .tar archive with the folder containing the backed up Cassandra data
tar cf ${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}.tar -C "${CASSANDRA_GENERAL_ARCHIVE_PATH}" ${CASSANDRA_SNAPSHOT_NAME}
# Remove the folder once the archive is created
rm -rf ${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}
exit 0
else
echo "[CASS] ✗ Cassandra ERROR"
exit 1
fi
} &
PID_CASS=$!
ES_EXIT=0
CASS_EXIT=0
# Wait for the two snapshots to finish
wait $PID_ES || ES_EXIT=$?
wait $PID_CASS || CASS_EXIT=$?
# Final check
if [ $ES_EXIT -eq 0 ] && [ $CASS_EXIT -eq 0 ]; then
echo "=== ✓ Full backup successful ==="
# Display the location of the Elasticsearch archive
echo "Elasticsearch backup done!"
# Display the location of the Cassandra archive
echo "Cassandra backup done! Keep the following backup archive safe:"
echo "${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}.tar"
exit 0
else
echo "=== ✗ ERROR - ES: exit $ES_EXIT, Cassandra: exit $CASS_EXIT ==="
exit 1
fi
After running the script, the backup archives are available at /mnt/backup/cassandra and /mnt/backup/elasticsearch. Be sure to copy these archives to a separate server or storage location to safeguard against data loss if the TheHive server fails.
For more details about snapshot management, refer to the official Cassandra documentation and Elasticsearch documentation.
Step 6: Back up file storage#
Finally, we're going to back up TheHive file storage, which contains all the attachments and files.
1. Prepare the backup script#
Before running the script, update ATTACHMENT_FOLDER to match your environment. You can find this path in /etc/thehive/application.conf under the storage.localfs.location attribute. The script uses /opt/thp/thehive/files by default.
2. Run the backup script#
#!/bin/bash
# TheHive attachment variables
ATTACHMENT_FOLDER=/opt/thp/thehive/files
# Backup variables
GENERAL_ARCHIVE_PATH=/mnt/backup/storage
SNAPSHOT_NAME="files_$(date +%Y%m%d_%Hh%Mm%Ss)"
ATTACHMENT_ARCHIVE_PATH="${GENERAL_ARCHIVE_PATH}/${SNAPSHOT_NAME}"
# Creating the backup folder if needed
mkdir -p ${ATTACHMENT_ARCHIVE_PATH}
# Copy all TheHive attachment
cp -r ${ATTACHMENT_FOLDER}/. ${ATTACHMENT_ARCHIVE_PATH}/
# Create a .tar archive with the folder containing the backed up attachment files
cd ${GENERAL_ARCHIVE_PATH}
tar cf ${SNAPSHOT_NAME}.tar ${SNAPSHOT_NAME}
# Remove the folder once the archive is created
rm -rf ${GENERAL_ARCHIVE_PATH}/${SNAPSHOT_NAME}
# Display the location of the attachment archive
echo ""
echo "TheHive attachment files backup done! Keep the following backup archive safe:"
echo "${GENERAL_ARCHIVE_PATH}/${SNAPSHOT_NAME}.tar"
After running the script, the backup archive is available at /mnt/backup/storage. Be sure to copy this archive to a separate server or storage location to safeguard against data loss if the TheHive server fails.
You've completed the hot backup process for your TheHive standalone server. We recommend verifying your backup archives are complete and accessible before relying on them for recovery.