Restore hot backups

Restore data#

Cassandra#

Pre requisites#

Following data is required to restore TheHive database successfully:

A backup of the database (${SNAPSHOT}_${SNAPSHOT_DATE}.tbz)
Keyspace to restore does not exist in the database (or it will be overwritten)
All nodes in the cluster should be up before starting the restore procedure.
TheHive application should NOT be running.

Restore your data#

Start by drop the database `TheHive`#

## SOURCE_KEYSPACE contains the name of theHive database
CASSANDRA_PASSWORD=<your_admin_password>
CASSANDRA_ADDRESS=<cassandra_node_ip>
SOURCE_KEYSPACE=thehive
cqlsh -u admin -p ${CASSANDRA_PASSWORD} ${CASSANDRA_ADDRESS} -e "DROP KEYSPACE IF EXISTS ${SOURCE_KEYSPACE};"

Create the keyspace#

## TARGET_KEYSPACE contains the new name of theHive database 
## NOTE that you can keep the same database name since the old one has been deleted
cqlsh -u admin -p ${CASSANDRA_PASSWORD} ${CASSANDRA_ADDRESS} -e "
    CREATE KEYSPACE ${TARGET_KEYSPACE}
    WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': '3'}
    AND durable_writes = true;"

Unarchive backup files#

Note

Note that the following steps should be executed for every Cassandra cluster node.

mkdir -p /var/lib/cassandra/restore
RESTORE_PATH="/var/lib/cassandra/restore"
SOURCE_KEYSPACE="thehive"
SNAPSHOT_DATE=<date_of_backup>
## TABLES should contain the list of all tables that will be restored
## table_name should include the uuid generated by cassandra, for example: table1-f901e0c05d8811ef87c71fc3a94044f4

TABLES="ls -1 /var/lib/cassandra/data/${SOURCE_KEYSPACE}/"
## copy all of the snapshot tables that you want to restore from remote server or local cassandra node then extract it
## repeat the following  steps on each node Cassandra for each table in the TABLES List

cd /var/lib/cassandra/restore
for table in $TABLES; do
scp remoteuser@remotehost:/remote/node_name_directory/${SNAPSHOT_DATE}/${SOURCE_KEYSPACE}/${table}.tbz .
mkdir -p ${RESTORE_PATH}/${SOURCE_KEYSPACE}/${table}
echo "Unarchive backup files for table: $table"
tar jxf ${table}.tbz -C ${RESTORE_PATH}/${SOURCE_KEYSPACE}/${table}
done

Create tables from archive#

The archive contains the table schemas. They must be executed in the new keyspace. The schema files are in ${RESTORE_PATH}/${SOURCE_KEYSPACE}/${table}

for CQL in $(find ${RESTORE_PATH} -name schema.cql)
do
cqlsh -u admin -p ${CASSANDRA_PASSWORD} -f $CQL
done

If you want to change the name of the keyspace (${SOURCE_KEYSPACE} => ${TARGET_KEYSPACE}), you need to rewrite the cql command:

for CQL in $(find ${RESTORE_PATH} -name schema.cql)
do
cqlsh cassandra -e "$(sed -e '/CREATE TABLE/s/'${SOURCE_KEYSPACE}/${TARGET_KEYSPACE}/ $CQL)"
done

Load table data#

Note

Note that the following command should be executed on each Cassandra node in the cluster.

for TABLE in ${RESTORE_PATH}/${TARGET_KEYSPACE}/*
do 
TABLE_BASENAME=$(basename ${TABLE})
TABLE_NAME=${TABLE_BASENAME%%-*}
nodetool import ${TARGET_KEYSPACE} ${TABLE_NAME} ${RESTORE_PATH}/${TARGET_KEYSPACE}/${TABLE_BASENAME}
done

If the cluster topology has changed (new nodes added ou removed from the cluster since the last data backup), please follow the run the following command to perform a restore:

for TABLE in ${RESTORE_PATH}/${TARGET_KEYSPACE}/*
do 
TABLE_BASENAME=$(basename ${TABLE})
sstableloader -d ${CASSANDRA_IP} ${RESTORE_PATH}/${TARGET_KEYSPACE}/${TABLE_BASENAME}
done

Cleanup#

rm -rf ${RESTORE_PATH}

Rebuid an existing node#

If for a particular reason (such as corrupted system data), you need to reintegrate the node into the cluster and restore all data (including system data), here is the procedure:

Make sure that the Cassandra service is still down then delete the contents of the data volume:#

cd /var/lib/cassandra/data
rm -rf *

Copy and unarchive backup files:#

DATA_PATH="/var/lib/cassandra/data"

SNAPSHOT_DATE=<date_of_snaphot>
## KEYSPACES list should inlude all keyspaces
KEYSPACES="system system_distributed system_traces system_virtual_schema system_auth system_schema system_views thehive"
cd ${DATA_PATH}
for ks in $KEYSPACES; do
scp -r remoteuser@remotehost:/remote/node_name_directory/${SNAPSHOT_DATE}/${ks}/ .
for file in /var/lib/cassandra/data/${ks}/*; do
    echo "Processing $file"
    filename=$(basename "$file")
    table_name="${filename%%.*}"
    sudo mkdir -p ${ks}/${table_name}
    sudo tar jxf $file -C ${ks}/${table_name}
    rm -f $file
done
done

chown -R cassandra:cassandra  /var/lib/cassandra/data

Start cassandra service#

service cassandra start

## heck if Cassandra has started successfully by reviewing its logs
tail -n 100 /var/log/cassandra/system.log | grep -iE "listening for|startup complete|error|warning"

INFO  [main] ********,773 PipelineConfigurator.java:125 - Starting listening for CQL clients on localhost/127.0.0.1:9042 (unencrypted)...
INFO  [main] ********,790 CassandraDaemon.java:776 - Startup complete

Ensure no Commitlog file exist before restarting Cassandra service. (/var/lib/cassandra/commitlog)

Example of script to restore TheHive keyspace in Cassandra

#!/bin/bash

## Restore a KEYSPACE and its data from a CQL file with the schema of the
## KEYSPACE and an tbz archive containing the snapshot

## Complete variables before running:
## CASSANDRA_ADDRESS: IP of cassandra server
## RESTORE_PATH: choose a TMP folder !!! this folder will be removed if exists.
## SOURCE_KEYSPACE: KEYSPACE used in the backup
## TARGET_KEYSPACE: new KEYSPACE name ; use same name of SOURCE_KEYSPACE if no changes
## TABLES: should contain the list of all tables that will be restored, table_name should include the uuid generated by cassandra, for example: table1-f901e0c05d8811ef87c71fc3a94044f4
## SNAPSHOT: choose a name for the backup
## SNAPSHOT_DATE: date of the snapshot to restore

## IMPORTANT: Note that the following steps should be executed on each Cassandra cluster node.

CASSANDRA_ADDRESS="10.1.1.1"
RESTORE_PATH="/var/lib/cassandra/restore"
SOURCE_KEYSPACE="thehive"
TARGET_KEYSPACE="thehive_restore"
TABLES="
table1-f901e0c05d8811ef87c71fc3a94044f4
table2-d502a0c05d8811ef87c71fc3a94044f5
table3-a703c0c05d8811ef87c71fc3a94044f6
" 
SNAPSHOT_DATE="2024-09-23"

## Copy from backup folder and Uncompress data in restore folder

cd ${RESTORE_PATH}
for table in $TABLES; do
    cp -r PATH_TO_BACKUP_DIRECTORY/${SNAPSHOT_DATE}/${SOURCE_KEYSPACE}/${table}.tbz .
    mkdir -p ${RESTORE_PATH}/${SOURCE_KEYSPACE}/${table}
    echo "Unarchive backup files for table: $table"
    tar jxf ${table}.tbz -C ${RESTORE_PATH}/${SOURCE_KEYSPACE}/${table}
done
## Read Cassandra password
echo -n "Cassandra admin password: " 
read -s CASSANDRA_PASSWORD

# Drop the keyspace
cqlsh -u admin -p ${CASSANDRA_PASSWORD} ${CASSANDRA_ADDRESS} -e "DROP KEYSPACE IF EXISTS ${SOURCE_KEYSPACE};"

# Create the keyspace
## TARGET_KEYSPACE contains the new name of theHive database 
## NOTE that you can keep the same database name since the old one has been deleted

cqlsh -u admin -p ${CASSANDRA_PASSWORD} ${CASSANDRA_ADDRESS} -e "
    CREATE KEYSPACE ${TARGET_KEYSPACE}
    WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': '3'}
    AND durable_writes = true;"

# Create table in keyspace
for CQL in $(find ${RESTORE_PATH} -name schema.cql)
do
cqlsh -u admin -p ${CASSANDRA_PASSWORD} ${CASSANDRA_ADDRESS} -e "$(sed -e '/CREATE TABLE/s/'${SOURCE_KEYSPACE}/${TARGET_KEYSPACE}/ $CQL)"
done


## Load data
for TABLE in ${RESTORE_PATH}/${TARGET_KEYSPACE}/*
    do 
    TABLE_BASENAME=$(basename ${TABLE})
    TABLE_NAME=${TABLE_BASENAME%%-*}
    nodetool import ${TARGET_KEYSPACE} ${TABLE_NAME} ${RESTORE_PATH}/${TARGET_KEYSPACE}/${TABLE_BASENAME}
done

Restore Elasticsearch index#

Several solutions exist regarding the index:

Restore a saved Elasticsearch index ; follow Elasticsearch guides to perform this action
Rebuild the index on the new server, when TheHive start for the first time.

Restoration steps#

Restoring from a snapshot involves creating a new cluster or restoring the snapshot into an existing one. Here’s an example to restore all indices from a snapshot.

curl -X POST "http://localhost>:9200/_snapshot/my_backup/snapshot_1/_restore" -H 'Content-Type: application/json' -d'
{
"indices": "*",
"include_global_state": false
}'

Rebuild the index#

Once Cassandra database is restored, update the configuration of TheHive to rebuild the index.

These lines should be added to the configuration file only for the first start of TheHive application, and removed later on.

extract from /etc/thehive/application.conf

db.janusgraph.forceDropAndRebuildIndex = true

Once TheHive application is started, both lines should be removed or commented from the application.conf configuration file

Restore Files#

Restore the saved files into the destination folder/bucket that will be used by TheHive. Ensure the account running TheHive application has permissions to create files and folders into the destination folder.

References#

Backing up and restoring Cassandra data: https://cassandra.apache.org/doc/stable/cassandra/operating/backups.html
Backing up and restoring Elasticsearch data: https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore.html