Perform a Hot Backup on a Cluster#
In this tutorial, we're going to guide you through performing a hot backup of TheHive on a cluster using the provided scripts.
By the end, you'll have created complete backups of your database and search index across all three nodes, plus your file storage
Hot backups let you protect your data while keeping TheHive running, which means zero downtime for your security operations team.
These backups are essential to protect your data and ensure you can recover quickly in case of a system failure or data loss.
Understand the implications
Hot backups allow TheHive to keep running during the process, but they don’t guarantee perfect data consistency. Review the Cold vs. Hot Backups and Restores topic to ensure this method fits your organization's risk tolerance and operational needs.
Best practices for safe backup and restore
- Coordinate your Apache Cassandra, Elasticsearch, and file storage backups to run at the same time. Using automation like a cron job helps minimize the chance of inconsistencies between components.
- Before relying on these backups in a real incident, test the full backup and restore flow in a staging environment. It’s the only way to make sure everything works as expected.
- Ensure you have an up-to-date backup before starting the restore operation, as errors during the restoration could lead to data loss.
Script restrictions
These scripts work only for native installations following the Setting up a Cluster with TheHive configuration. Docker and Kubernetes deployments aren't supported.
Step 1: Install required tools#
Before we begin, let's make sure your system has all the necessary tools installed.
You'll need the following:
- Cassandra nodetool: Command-line tool for managing Cassandra clusters, used for creating database snapshots
- tar: Utility for archiving backup files
- cqlsh: Command-line interface for executing CQL queries against the Cassandra database
- curl: Tool for transferring data with URLs, useful for interacting with the Elasticsearch API
- jq: Lightweight command-line JSON processor for parsing and manipulating JSON data in scripts
Python compatibility for cqlsh
cqlsh requires Python 3.9. If your Linux distribution provides a newer Python version by default, you must install Python 3.9 alongside it and explicitly tell cqlsh which interpreter to use. You can do this by setting the CQLSH_PYTHON environment variable when running cqlsh: sudo -u cassandra CQLSH_PYTHON=/path/to/python3.9 cqlsh.
If any tools are missing, install them using your package manager. For example:
sudo apt install jqfor DEB-based operating systemssudo yum install jqfor RPM-based operating systems
Step 2: Configure NFS-shared storage for Elasticsearch snapshots#
Elasticsearch requires a snapshot repository that's accessible from all cluster nodes. To meet this requirement, we will set up an NFS share so every node can reach the backup location. If you don't have a dedicated NFS server, you can export an NFS share directly from one of the Elasticsearch nodes.
On the NFS server#
-
Create the directory and set the correct permissions for Elasticsearch.
sudo mkdir -p /mnt/backup/elasticsearch sudo chown elasticsearch:elasticsearch /mnt/backup/elasticsearch sudo chmod 770 /mnt/backup/elasticsearch -
Export the directory by adding this line to
/etc/exports./mnt/backup/elasticsearch <cluster_network>(rw,sync,no_subtree_check,no_root_squash)Replace
<cluster_network>with your network range. -
Apply the export configuration.
sudo exportfs -ra
On all cluster nodes#
-
Create the mount point and mount the NFS share.
sudo mkdir -p /mnt/backup/elasticsearch sudo mount <nfs_server_ip>:/mnt/backup/elasticsearch /mnt/backup/elasticsearchReplace
<nfs_server_ip>with the IP address of your NFS server. -
Set the correct permissions on the mounted directory.
sudo chown elasticsearch:elasticsearch /mnt/backup/elasticsearch sudo chmod 770 /mnt/backup/elasticsearch -
Add an entry to
/etc/fstabto ensure the mount persists after reboot.<nfs_server_ip>:/mnt/backup/elasticsearch /mnt/backup/elasticsearch nfs defaults,_netdev 0 0Replace
<nfs_server_ip>with the IP address of your NFS server. -
Verify the mount is working.
df -h | grep /mnt/backup/elasticsearchYou should see the NFS mount listed in the output.
Step 3: Set up the Elasticsearch snapshot repository#
We're going to configure Elasticsearch to store snapshots with timestamped names. This repository will be used to create backups of your search index.
-
In the
elasticsearch.ymlfile on each node, define the directory where snapshots will be stored.path.repo: /mnt/backup/elasticsearch -
After saving your changes, restart Elasticsearch on each node.
sudo systemctl restart elasticsearch -
Register the repository.
curl -X PUT "http://127.0.0.1:9200/_snapshot/thehive_repository" \ -H "Content-Type: application/json" \ -d '{ "type": "fs", "settings": { "location": "/mnt/backup/elasticsearch" } }'You should see a response like this:
{ "acknowledged": true }
For step-by-step details, see the official Elasticsearch documentation.
Step 4: Perform health checks#
Before creating any backups, we're going to verify that all TheHive components are healthy. This helps us catch any issues that could affect backup integrity.
Check service status#
Let's confirm that all TheHive components are running.
sudo systemctl status thehive
sudo systemctl status cassandra
sudo systemctl status elasticsearch
All services should show as active and running.
Check Cassandra status#
Run the following command:
nodetool status
You should see nodes marked as UN (Up/Normal). This indicates your Cassandra cluster is healthy.
Check Elasticsearch cluster health#
curl -X GET "http://127.0.0.1:9200/_cluster/health?pretty"
The status should be green, which means your cluster is healthy and fully functional.
Other possible statuses include:
yellow: Some replicas are missing but data is still available.red: Some data is unavailable—you should investigate before proceeding.
Review system logs#
Check for any recent errors or warnings.
sudo journalctl -u thehive
sudo journalctl -u cassandra
sudo journalctl -u elasticsearch
If you find any critical errors, resolve them before continuing with the backup process.
Step 5: Replicate Cassandra and Elasticsearch data across all three nodes#
Data replication requirement
It's your responsibility to ensure data replication across all nodes before proceeding. If this requirement isn't met, cluster restoration may fail, and integrity issues could arise.
Before we proceed with the backup, we need to ensure your Cassandra cluster has a replication factor that provides full data redundancy across all nodes. This way, we can take snapshots from a single node while maintaining data consistency.
-
Verify replication factor.
Check the replication factor for your keyspace. It should be set to 3 for a three-node cluster.
Use the following command in
cqlsh:DESCRIBE KEYSPACE thehive;If needed, adjust the replication factor:
ALTER KEYSPACE thehive WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', '<datacenter_name>' : 3 };Replace
<datacenter_name>with your actual data center name. -
Check cluster status.
Ensure all nodes are up and running:
nodetool statusAll nodes should show the
UN(Up/Normal) status. -
Run
nodetool repair.Run a repair to ensure data consistency across all nodes:
nodetool repairThis process may take some time depending on the size of your data. Wait for it to complete before proceeding.
-
Verify data replication.
Check for any replication issues:
nodetool netstatsLook for any pending operations or errors in the output.
Step 6: Create Cassandra and Elasticsearch snapshots#
Now we're going to create snapshots of both your database and search index simultaneously. The script captures snapshots from one node, since data is fully replicated, then packages both into separate .tar archives for safe storage.
1. Prepare the backup script#
Before running the script, you'll need to update several values to match your environment:
For Cassandra#
- Update
CASSANDRA_KEYSPACEto match your configuration. You can find this in/etc/thehive/application.conffile under thedb.janusgraph.storage.cql.keyspaceattribute. The script usesthehiveby default. - Update
CASSANDRA_CONNECTIONwith any Cassandra node IP address in the cluster. - If you configured authentication in
/etc/thehive/application.conf, replace the value of theCASSANDRA_CONNECTIONvariable with:"<ip_node_cassandra> admin -p <authentication_admin_password>".
For Elasticsearch#
- Update
ELASTICSEARCH_SNAPSHOT_REPOSITORYto match the repository name you registered in a previous step. The script usesthehive_repositoryby default. - If you configured authentication in
/etc/thehive/application.conf, add-u thehive:<thehive_user_password>to allcurlcommands, using your actual password.
2. Run the backup script#
How to run this script
Run this script with sudo privileges on a node that has both Elasticsearch and Cassandra installed and running.
#!/bin/bash
set -e
# Configuration
# Cassandra variables
CASSANDRA_KEYSPACE=thehive
CASSANDRA_CONNECTION="<ip_node_cassandra>"
CASSANDRA_GENERAL_ARCHIVE_PATH=/mnt/backup/cassandra
CASSANDRA_DATA_FOLDER=/var/lib/cassandra
CASSANDRA_SNAPSHOT_NAME="cassandra_$(date +%Y%m%d_%Hh%Mm%Ss)"
CASSANDRA_ARCHIVE_PATH="${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}/${CASSANDRA_KEYSPACE}"
# Elasticsearch variables
ELASTICSEARCH_API_URL='http://127.0.0.1:9200'
ELASTICSEARCH_SNAPSHOT_REPOSITORY=thehive_repository
ELASTICSEARCH_GENERAL_ARCHIVE_PATH=/mnt/backup/elasticsearch
ELASTICSEARCH_SNAPSHOT_NAME="elasticsearch_$(date +%Y%m%d_%Hh%Mm%Ss)"
# Check if the snapshot repository is correctly registered
repository_config=$(curl -s -L "${ELASTICSEARCH_API_URL}/_snapshot")
repository_ok=$(jq 'has("'${ELASTICSEARCH_SNAPSHOT_REPOSITORY}'")' <<< ${repository_config})
if ! ${repository_ok}; then
echo "Abort, no snapshot repository registered in Elasticsearch"
echo "Set the repository folder 'path.repo'"
echo "in an environment variable"
echo "or in elasticsearch.yml"
exit 1
fi
# Make sure the snapshot folder exists and its subcontent permissions are correct
mkdir -p ${CASSANDRA_ARCHIVE_PATH}
chown -R cassandra:cassandra ${CASSANDRA_ARCHIVE_PATH}
echo "Snapshot of all ${CASSANDRA_KEYSPACE} tables will be stored inside ${CASSANDRA_ARCHIVE_PATH}"
# Run both backups in parallel
{
set -e
# Creating snapshot name information file
touch ${ELASTICSEARCH_GENERAL_ARCHIVE_PATH}/${ELASTICSEARCH_SNAPSHOT_NAME}.info
echo "[ES] Starting the Elasticsearch snapshot..."
RESPONSE=$(curl -s -L -X PUT "${ELASTICSEARCH_API_URL}/_snapshot/${ELASTICSEARCH_SNAPSHOT_REPOSITORY}/${ELASTICSEARCH_SNAPSHOT_NAME}" \
-H 'Content-Type: application/json' \
-d '{"indices":"thehive_global", "ignore_unavailable":true, "include_global_state":false}')
if echo "$RESPONSE" | grep -q '"accepted":true'; then
echo "[ES] ✓ Elasticsearch snapshot started successfully"
exit 0
else
echo "[ES] ✗ Elasticsearch ERROR: $RESPONSE"
exit 1
fi
# Verify that the snapshot is finished
state="NONE"
while [ "${state}" != "\"SUCCESS\"" ]; do
echo "Snapshot in progress, waiting 5 seconds before checking status again..."
sleep 5
snapshot_list=$(curl -s -L "${ELASTICSEARCH_API_URL}/_snapshot/${ELASTICSEARCH_SNAPSHOT_REPOSITORY}/*?verbose=false")
state=$(jq '.snapshots[] | select(.snapshot == "'${ELASTICSEARCH_SNAPSHOT_NAME}'").state' <<< ${snapshot_list})
done
echo "Snapshot finished"
} &
PID_ES=$!
{
set -e
echo "[CASS] Starting snapshot ${CASSANDRA_SNAPSHOT_NAME} for keyspace ${CASSANDRA_KEYSPACE}"
if nodetool snapshot -t "${CASSANDRA_SNAPSHOT_NAME}" "${CASSANDRA_KEYSPACE}"; then
echo "[CASS] ✓ Snapshot Cassandra created successfully"
# Save the cql schema of the keyspace
cqlsh ${CASSANDRA_CONNECTION} -e "DESCRIBE KEYSPACE ${CASSANDRA_KEYSPACE}" | grep -v "^WARNING" > "${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}/create_keyspace_${CASSANDRA_KEYSPACE}.cql"
echo "The keyspace cql definition for ${CASSANDRA_KEYSPACE} is stored in this file: ${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}/create_keyspace_${CASSANDRA_KEYSPACE}.cql"
# For each table folder in the keyspace folder of the snapshot
for TABLE in $(ls ${CASSANDRA_DATA_FOLDER}/data/${CASSANDRA_KEYSPACE}); do
# Folder where the snapshot files are stored
TABLE_SNAPSHOT_FOLDER=${CASSANDRA_DATA_FOLDER}/data/${CASSANDRA_KEYSPACE}/${TABLE}/snapshots/${CASSANDRA_SNAPSHOT_NAME}
if [ -d ${TABLE_SNAPSHOT_FOLDER} ]; then
# Create a folder for each table
mkdir "${CASSANDRA_ARCHIVE_PATH}/${TABLE}"
chown -R cassandra:cassandra ${CASSANDRA_ARCHIVE_PATH}/${TABLE}
# Copy the snapshot files to the proper table folder
# Snapshots files are hardlinks,
# so we use --remove-destination to make sure the files are actually copied and not just linked
cp -p --remove-destination ${TABLE_SNAPSHOT_FOLDER}/* ${CASSANDRA_ARCHIVE_PATH}/${TABLE}
fi
done
# Delete Cassandra snapshot once it's backed up
nodetool clearsnapshot -t ${CASSANDRA_SNAPSHOT_NAME} > /dev/null
# Create a .tar archive with the folder containing the backed up Cassandra data
tar cf ${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}.tar -C "${CASSANDRA_GENERAL_ARCHIVE_PATH}" ${CASSANDRA_SNAPSHOT_NAME}
# Remove the folder once the archive is created
rm -rf ${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}
exit 0
else
echo "[CASS] ✗ Cassandra ERROR"
exit 1
fi
} &
PID_CASS=$!
ES_EXIT=0
CASS_EXIT=0
# Wait for the two snapshots to finish
wait $PID_ES || ES_EXIT=$?
wait $PID_CASS || CASS_EXIT=$?
# Final check
if [ $ES_EXIT -eq 0 ] && [ $CASS_EXIT -eq 0 ]; then
echo "=== ✓ Full backup successful ==="
# Display the location of the Elasticsearch archive
echo "Elasticsearch backup done!"
# Display the location of the Cassandra archive
echo "Cassandra backup done! Keep the following backup archive safe:"
echo "${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}.tar"
exit 0
else
echo "=== ✗ ERROR - ES: exit $ES_EXIT, Cassandra: exit $CASS_EXIT ==="
exit 1
fi
After running the script, the backup archives are available at /mnt/backup/cassandra and /mnt/backup/elasticsearch. Be sure to copy these archives to a separate server or storage location to safeguard against data loss if the TheHive server fails.
For more details about snapshot management, refer to the official Cassandra documentation and Elasticsearch documentation.
Step 7: Back up file storage#
Finally, we're going to back up TheHive file storage, which contains all the attachments and files.
The backup procedure depends on your storage backend—either NFS or an S3-compatible object storage service. The script below uses MinIO as an example, but you can adapt the same approach to any S3-compatible implementation.
1. Prepare the backup script#
Before running the script, update ATTACHMENT_FOLDER to match your environment. You can find this path in /etc/thehive/application.conf under the storage.localfs.location attribute. The script uses /opt/thp/thehive/files by default.
2. Run the backup script#
#!/bin/bash
# TheHive attachment variables
ATTACHMENT_FOLDER=/opt/thp/thehive/files
# Backup variables
GENERAL_ARCHIVE_PATH=/mnt/backup/storage
SNAPSHOT_NAME="files_$(date +%Y%m%d_%Hh%Mm%Ss)"
ATTACHMENT_ARCHIVE_PATH="${GENERAL_ARCHIVE_PATH}/${SNAPSHOT_NAME}"
# Creating the backup folder if needed
mkdir -p ${ATTACHMENT_ARCHIVE_PATH}
# Copy all TheHive attachment
cp -r ${ATTACHMENT_FOLDER}/. ${ATTACHMENT_ARCHIVE_PATH}/
# Create a .tar archive with the folder containing the backed up attachment files
cd ${GENERAL_ARCHIVE_PATH}
tar cf ${SNAPSHOT_NAME}.tar ${SNAPSHOT_NAME}
# Remove the folder once the archive is created
rm -rf ${GENERAL_ARCHIVE_PATH}/${SNAPSHOT_NAME}
# Display the location of the attachment archive
echo ""
echo "TheHive attachment files backup done! Keep the following backup archive safe:"
echo "${GENERAL_ARCHIVE_PATH}/${SNAPSHOT_NAME}.tar"
After running the script, the backup archive is available at /mnt/backup/storage. Be sure to copy this archive to a separate server or storage location to safeguard against data loss if the TheHive server fails.
1. Prepare the backup script#
Before running the script, you'll need to update several values to match your environment:
- Update
MINIO_ENDPOINTwith your MinIO server URL. - Update
MINIO_ACCESS_KEYwith your MinIO access key. - Update
MINIO_SECRET_KEYwith your MinIO secret key. - Change
MINIO_BUCKETif you want to use a different bucket name. - Change
MINIO_ALIASif you want to use a different alias name.
2. Configure the MinIO alias#
Run this command once to configure the MinIO alias using the same values you defined in the script:
mcli alias set <minio_alias> <minio_endpoint> <minio_access_key> <minio_secret_key>
3. Run the backup script#
#!/bin/bash
# TheHive attachment variables
MINIO_ARCHIVE_PATH=/mnt/backup/minio/
# MinIO variables
MINIO_ENDPOINT="<minio_server_url>"
MINIO_ACCESS_KEY="<access_key>"
MINIO_SECRET_KEY="<secret_key>"
MINIO_BUCKET="thehive"
MINIO_ALIAS=th_minio
MINIO_SNAPSHOT_NAME="minio_$(date +%Y%m%d_%Hh%Mm%Ss)"
# Check if MinIO is accessible
if ! mcli ls ${MINIO_ALIAS} > /dev/null 2>&1; then
echo "Error: Cannot connect to MinIO server"
exit 1
fi
# Mirror MinIO bucket content to local backup folder
mcli mirror ${MINIO_ALIAS}/${MINIO_BUCKET} ${MINIO_ARCHIVE_PATH}/${MINIO_SNAPSHOT_NAME}
tar cvf ${MINIO_ARCHIVE_PATH}/${MINIO_SNAPSHOT_NAME}.tar -C "${MINIO_ARCHIVE_PATH}" ${MINIO_SNAPSHOT_NAME}
# Display the location of the backup
echo ""
echo "TheHive attachment files backup done! Keep the following backup archive safe:"
echo "${MINIO_ARCHIVE_PATH}/${MINIO_SNAPSHOT_NAME}.tar"
After running the script, the backup archive is available at /mnt/backup/minio. Be sure to copy this archive to a separate server or storage location to safeguard against data loss if the TheHive server fails.
You've completed the hot backup process for your TheHive cluster. We recommend verifying your backup archives are complete and accessible before relying on them for recovery.