Perform a Hot Backup on a Standalone Server#

In this tutorial, you'll perform a hot backup of TheHive on a standalone server using the provided scripts.

By the end, you'll have created complete backups of your database, search index, and file storage—all without stopping TheHive.

Hot backups let you protect your data while keeping TheHive running, which means zero downtime for your security operations team.

These backups are essential to protect your data and ensure you can recover quickly in case of a system failure or data loss.

Understand the implications

Hot backups allow TheHive to keep running during the process, but they don’t guarantee perfect data consistency. Review the Cold vs. Hot Backups and Restores topic to ensure this method fits your organization's risk tolerance and operational needs.

Best practices for safe backup and restore

Coordinate your Apache Cassandra, Elasticsearch, and file storage backups to run at the same time. Using automation like a cron job helps minimize the chance of inconsistencies between components.
Before relying on these backups in a real incident, test the full backup and restore flow in a staging environment. It’s the only way to make sure everything works as expected.
Ensure you have an up-to-date backup before starting the restore operation, as errors during the restoration could lead to data loss.

Script restrictions

These scripts work only for native installations following the Install TheHive with Packages configuration. Docker and Kubernetes deployments aren't supported.

Step 1: Install required tools#

Before you begin, make sure your system has all the necessary tools installed.

You'll need the following:

Cassandra nodetool: Command-line tool for managing Cassandra clusters, used for creating database snapshots
tar: Utility for archiving backup files
cqlsh: Command-line interface for executing CQL queries against the Cassandra database
curl: Tool for transferring data with URLs, useful for interacting with the Elasticsearch API
jq: Lightweight command-line JSON processor for parsing and manipulating JSON data in scripts

Python compatibility for cqlsh

cqlsh requires Python 3.9. If your Linux distribution provides a newer Python version by default, you must install Python 3.9 alongside it and explicitly tell cqlsh which interpreter to use. You can do this by setting the CQLSH_PYTHON environment variable when running cqlsh: sudo -u cassandra CQLSH_PYTHON=/path/to/python3.9 cqlsh.

If any tools are missing, install them using your package manager. For example:

sudo apt install jq for DEB-based operating systems
sudo yum install jq for RPM-based operating systems

Step 2: Set Elasticsearch permissions#

Ensure Elasticsearch has the correct permissions to access the snapshot directory.

sudo mkdir -p /mnt/backup/elasticsearch
sudo chown elasticsearch:elasticsearch /mnt/backup/elasticsearch
sudo chmod 770 /mnt/backup/elasticsearch

Step 3: Set up the Elasticsearch snapshot repository#

Configure Elasticsearch to store snapshots with timestamped names. This repository will be used to create backups of your search index.

In your elasticsearch.yml file, define the location where snapshots will be stored.
```
path.repo: /mnt/backup/elasticsearch
```
After saving your changes, restart Elasticsearch.
```
sudo systemctl restart elasticsearch
```

Register the repository.

curl -X PUT "http://127.0.0.1:9200/_snapshot/thehive_repository" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "fs",
    "settings": {
      "location": "/mnt/backup/elasticsearch"
    }
  }'

You should see a response like this:

{
  "acknowledged": true
}

For step-by-step details, see the official Elasticsearch documentation.

Step 4: Perform health checks#

Before creating any backups, verify that all TheHive components are healthy. This helps catch any issues that could affect backup integrity.

Check service status#

Confirm that all TheHive components are running.

sudo systemctl status thehive
sudo systemctl status cassandra
sudo systemctl status elasticsearch

All services should show as active and running.

Check Cassandra status#

Run the following command:

nodetool status

You should see nodes marked as UN (Up/Normal). This indicates your Cassandra cluster is healthy.

Check Elasticsearch cluster health#

curl -X GET "http://127.0.0.1:9200/_cluster/health?pretty"

The status should be green, which means your cluster is healthy and fully functional.

Other possible statuses include:

yellow: Some replicas are missing but data is still available.
red: Some data is unavailable—you should investigate before proceeding.

Review system logs#

Check for any recent errors or warnings.

sudo journalctl -u thehive
sudo journalctl -u cassandra
sudo journalctl -u elasticsearch

If you find any critical errors, resolve them before continuing with the backup process.

Step 5: Create Cassandra and Elasticsearch snapshots#

Now you'll create snapshots of both your database and search index simultaneously. This parallel approach minimizes the time window between snapshots.

1. Prepare the backup script#

You'll use a script that creates hot backups of both Cassandra and Elasticsearch in parallel. The script simultaneously captures snapshots of your database and index, then packages both into separate .tar archives for safe storage.

Before running the script, you'll need to update several values to match your environment:

For Cassandra#

Update CASSANDRA_KEYSPACE to match your configuration. You can find this in /etc/thehive/application.conf file under the db.janusgraph.storage.cql.keyspace attribute. The script uses thehive by default.
Update CASSANDRA_CONNECTION with 127.0.0.1.
If you configured authentication in /etc/thehive/application.conf, replace the value of the CASSANDRA_CONNECTION variable with: "127.0.0.1 admin -p <authentication_admin_password>".

For Elasticsearch#

Update ELASTICSEARCH_SNAPSHOT_REPOSITORY to match the repository name you registered in a previous step. The script uses thehive_repository by default.
If you configured authentication in /etc/thehive/application.conf, add -u thehive:<thehive_user_password> to all curl commands, using your actual password.

2. Run the backup script#

How to run this script

Run this script with sudo privileges on a node that has both Elasticsearch and Cassandra installed and running.

#!/bin/bash

set -e

# Configuration
# Cassandra variables
CASSANDRA_KEYSPACE=thehive
CASSANDRA_CONNECTION="<ip_node_cassandra>"
CASSANDRA_GENERAL_ARCHIVE_PATH=/mnt/backup/cassandra
CASSANDRA_DATA_FOLDER=/var/lib/cassandra
CASSANDRA_SNAPSHOT_NAME="cassandra_$(date +%Y%m%d_%Hh%Mm%Ss)"
CASSANDRA_ARCHIVE_PATH="${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}/${CASSANDRA_KEYSPACE}"

# Elasticsearch variables
ELASTICSEARCH_API_URL='http://127.0.0.1:9200'
ELASTICSEARCH_SNAPSHOT_REPOSITORY=thehive_repository
ELASTICSEARCH_GENERAL_ARCHIVE_PATH=/mnt/backup/elasticsearch
ELASTICSEARCH_SNAPSHOT_NAME="elasticsearch_$(date +%Y%m%d_%Hh%Mm%Ss)"


# Check if the snapshot repository is correctly registered
repository_config=$(curl -s -L "${ELASTICSEARCH_API_URL}/_snapshot")
repository_ok=$(jq 'has("'${ELASTICSEARCH_SNAPSHOT_REPOSITORY}'")' <<< ${repository_config})
if ! ${repository_ok}; then
  echo "Abort, no snapshot repository registered in Elasticsearch"
  echo "Set the repository folder 'path.repo'"
  echo "in an environment variable"
  echo "or in elasticsearch.yml"
  exit 1
fi

# Make sure the snapshot folder exists and its subcontent permissions are correct
mkdir -p ${CASSANDRA_ARCHIVE_PATH}
chown -R cassandra:cassandra ${CASSANDRA_ARCHIVE_PATH}
echo "Snapshot of all ${CASSANDRA_KEYSPACE} tables will be stored inside ${CASSANDRA_ARCHIVE_PATH}"

# Run both backups in parallel
{
    set -e

    # Creating snapshot name information file
    touch ${ELASTICSEARCH_GENERAL_ARCHIVE_PATH}/${ELASTICSEARCH_SNAPSHOT_NAME}.info

    echo "[ES] Starting the Elasticsearch snapshot..."
    RESPONSE=$(curl -s -L -X PUT "${ELASTICSEARCH_API_URL}/_snapshot/${ELASTICSEARCH_SNAPSHOT_REPOSITORY}/${ELASTICSEARCH_SNAPSHOT_NAME}" \
        -H 'Content-Type: application/json' \
        -d '{"indices":"thehive_global", "ignore_unavailable":true, "include_global_state":false}')
    if echo "$RESPONSE" | grep -q '"accepted":true'; then
        echo "[ES] ✓ Elasticsearch snapshot started successfully"
        exit 0
    else
        echo "[ES] ✗ Elasticsearch ERROR: $RESPONSE"
        exit 1
    fi

    # Verify that the snapshot is finished
    state="NONE"
    while [ "${state}" != "\"SUCCESS\"" ]; do
        echo "Snapshot in progress, waiting 5 seconds before checking status again..."
        sleep 5
        snapshot_list=$(curl -s -L "${ELASTICSEARCH_API_URL}/_snapshot/${ELASTICSEARCH_SNAPSHOT_REPOSITORY}/*?verbose=false")
        state=$(jq '.snapshots[] | select(.snapshot == "'${ELASTICSEARCH_SNAPSHOT_NAME}'").state' <<< ${snapshot_list})
    done
    echo "Snapshot finished"    

} &
PID_ES=$!

{
    set -e

    echo "[CASS] Starting snapshot ${CASSANDRA_SNAPSHOT_NAME} for keyspace ${CASSANDRA_KEYSPACE}"
    if nodetool snapshot -t "${CASSANDRA_SNAPSHOT_NAME}" "${CASSANDRA_KEYSPACE}"; then
        echo "[CASS] ✓ Snapshot Cassandra created successfully"

        # Save the cql schema of the keyspace
        cqlsh ${CASSANDRA_CONNECTION}  -e "DESCRIBE KEYSPACE ${CASSANDRA_KEYSPACE}" | grep -v "^WARNING" > "${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}/create_keyspace_${CASSANDRA_KEYSPACE}.cql"
        echo "The keyspace cql definition for ${CASSANDRA_KEYSPACE} is stored in this file: ${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}/create_keyspace_${CASSANDRA_KEYSPACE}.cql"

        # For each table folder in the keyspace folder of the snapshot
        for TABLE in $(ls ${CASSANDRA_DATA_FOLDER}/data/${CASSANDRA_KEYSPACE}); do
            # Folder where the snapshot files are stored
            TABLE_SNAPSHOT_FOLDER=${CASSANDRA_DATA_FOLDER}/data/${CASSANDRA_KEYSPACE}/${TABLE}/snapshots/${CASSANDRA_SNAPSHOT_NAME}
            if [ -d ${TABLE_SNAPSHOT_FOLDER} ]; then 
                # Create a folder for each table
                mkdir "${CASSANDRA_ARCHIVE_PATH}/${TABLE}"
                chown -R cassandra:cassandra ${CASSANDRA_ARCHIVE_PATH}/${TABLE}

                # Copy the snapshot files to the proper table folder
                # Snapshots files are hardlinks,
                # so we use --remove-destination to make sure the files are actually copied and not just linked
                cp -p --remove-destination ${TABLE_SNAPSHOT_FOLDER}/* ${CASSANDRA_ARCHIVE_PATH}/${TABLE}
            fi
        done

        # Delete Cassandra snapshot once it's backed up
        nodetool clearsnapshot -t ${CASSANDRA_SNAPSHOT_NAME} > /dev/null

        # Create a .tar archive with the folder containing the backed up Cassandra data
        tar cf ${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}.tar -C "${CASSANDRA_GENERAL_ARCHIVE_PATH}" ${CASSANDRA_SNAPSHOT_NAME}
        # Remove the folder once the archive is created
        rm -rf ${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}

        exit 0
    else
        echo "[CASS] ✗ Cassandra ERROR"
        exit 1
    fi
} &
PID_CASS=$!

ES_EXIT=0
CASS_EXIT=0

# Wait for the two snapshots to finish
wait $PID_ES || ES_EXIT=$?
wait $PID_CASS || CASS_EXIT=$?

# Final check
if [ $ES_EXIT -eq 0 ] && [ $CASS_EXIT -eq 0 ]; then
    echo "=== ✓ Full backup successful ==="

    # Display the location of the Elasticsearch archive
    echo "Elasticsearch backup done!" 

    # Display the location of the Cassandra archive
    echo "Cassandra backup done! Keep the following backup archive safe:"
    echo "${CASSANDRA_GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}.tar"

    exit 0
else
    echo "=== ✗ ERROR - ES: exit $ES_EXIT, Cassandra: exit $CASS_EXIT ==="
    exit 1
fi

After running the script, the backup archives are available at /mnt/backup/cassandra and /mnt/backup/elasticsearch. Be sure to copy these archives to a separate server or storage location to safeguard against data loss if TheHive server fails.

For more details about snapshot management, refer to the official Cassandra documentation and Elasticsearch documentation.

Step 6: Back up file storage#

Finally, back up TheHive file storage, which contains all the attachments and files.

1. Prepare the backup script#

Before running the script, update ATTACHMENT_FOLDER to match your environment. You can find this path in /etc/thehive/application.conf under the storage.localfs.location attribute. The script uses /opt/thp/thehive/files by default.

2. Run the backup script#

#!/bin/bash

# TheHive attachment variables
ATTACHMENT_FOLDER=/opt/thp/thehive/files

# Backup variables
GENERAL_ARCHIVE_PATH=/mnt/backup/storage
SNAPSHOT_NAME="files_$(date +%Y%m%d_%Hh%Mm%Ss)"
ATTACHMENT_ARCHIVE_PATH="${GENERAL_ARCHIVE_PATH}/${SNAPSHOT_NAME}"

# Creating the backup folder if needed
mkdir -p ${ATTACHMENT_ARCHIVE_PATH}

# Copy all TheHive attachment
cp -r ${ATTACHMENT_FOLDER}/. ${ATTACHMENT_ARCHIVE_PATH}/

# Create a .tar archive with the folder containing the backed up attachment files
cd ${GENERAL_ARCHIVE_PATH}
tar cf ${SNAPSHOT_NAME}.tar ${SNAPSHOT_NAME}

# Remove the folder once the archive is created
rm -rf ${GENERAL_ARCHIVE_PATH}/${SNAPSHOT_NAME}

# Display the location of the attachment archive
echo ""
echo "TheHive attachment files backup done! Keep the following backup archive safe:"
echo "${GENERAL_ARCHIVE_PATH}/${SNAPSHOT_NAME}.tar"

After running the script, the backup archive is available at /mnt/backup/storage. Be sure to copy this archive to a separate server or storage location to safeguard against data loss if TheHive server fails.

You've completed the hot backup process for your TheHive standalone server. Verify that your backup archives are complete and accessible before relying on them for recovery.

Next steps

Restore a Hot Backup on a Standalone Server