Use TheHive as a cluster#
This guide provides configuration examples for TheHive, Cassandra and MinIO to build a fault-tolerant cluster of 3 active nodes, each one including:
- Cassandra as database
- Elasticsearch as indexing engine
- Minio S3 data storage
- TheHive
- Haproxy (in order to illustrate a load balancer)
- Keepalived (in order to illustrate the setup of a virtual IP)
Info
All of these applications can be installed on their own server of shared on a same one. For the purpose of this documentation, we decided to use only 3 operating systems.
Target Architecture#
Cassandra#
We are considering setting up a cluster of 3 active nodes of Cassandra with a replication factor of 3. That means that all nodes are active and the data is present on each node. This setup is tolerant to a 1 node failure.
For the rest of this section, we consider that all nodes sit on the same network.
Installation#
Install Cassandra on each node. Follow the steps described here.
Configuration#
For each node, update configuration files /etc/cassandra/cassandra.yml
with the following parameters:
cluster_name: 'thp'
num_tokens: 256
authenticator: PasswordAuthenticator
authorizer: CassandraAuthorizer
role_manager: CassandraRoleManager
data_file_directories:
- /var/lib/cassandra/data
commitlog_directory: /var/lib/cassandra/commitlog
saved_caches_directory: /var/lib/cassandra/saved_caches
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "<ip node 1>, <ip node 2>, <ip node 3>" # (1)
listen_interface : eth0 # (2)
rpc_interface: eth0 # (3)
endpoint_snitch: SimpleSnitch
- Ensure to list all IP addresses of the nodes that are included in the cluster
- Ensure to setup the right interface name
- Ensure to setup the right interface name
Then, delete file /etc/cassandra/cassandra-topology.properties
rm /etc/cassandra/cassandra-topology.properties
Start nodes#
On each node, start the service:
service cassandra start
Ensure that all nodes are up and running:
root@cassandra:/# nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN <ip node 1> 776.53 KiB 256 100.0% a79c9a8c-c99b-4d74-8e78-6b0c252abd86 rack1
UN <ip node 2> 671.72 KiB 256 100.0% 8fda2906-2097-4d62-91f8-005e33d3e839 rack1
UN <ip node 3> 611.54 KiB 256 100.0% 201ab99c-8e16-49b1-9b66-5444044fb1cd rack1
Initialise the database#
On one node run (default password for cassandra
account is cassandra
):
cqlsh <ip node X> -u cassandra
- Start by changing the password of superadmin named
cassandra
:
ALTER USER cassandra WITH PASSWORD 'NEWPASSWORD';
Exit and reconnect.
- Ensure user accounts are duplicated on all nodes
ALTER KEYSPACE system_auth WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3 };
- Create keyspace named
thehive
CREATE KEYSPACE thehive WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3' } AND durable_writes = 'true';
- Create role
thehive
and grant permissions onthehive
keyspace (choose a password)
CREATE ROLE thehive WITH LOGIN = true AND PASSWORD = 'PASSWORD';
GRANT ALL PERMISSIONS ON KEYSPACE thehive TO 'thehive';
Elasticsearch#
Installation#
We are considering setting up a cluster of 3 active nodes of Elasticsearch. Follow this page to install Elasticsearch on each node.
Configuration#
For each node, update configuration files /etc/cassandra/elasticsearch.yml
with the following parameters, and update network.host
accordingly.
http.host: 0.0.0.0
network.bind_host: 0.0.0.0
script.allowed_types: inline,stored
cluster.name: thehive
node.name: 'es1'
path.data: /usr/share/elasticsearch/data
path.logs: /usr/share/elasticsearch/logs
network.host: 'es1' # (1)
http.port: 9200
cluster.initial_master_nodes:
- es1
node.master: true
discovery.seed_hosts: # (2)
- 'es1'
- 'es2'
- 'es3'
thread_pool.search.queue_size: 100000
thread_pool.write.queue_size: 100000
xpack.security.enabled: true
xpack.security.http.ssl.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.http.ssl.key: /usr/share/elasticsearch/config/certs/es1/es1.key
xpack.security.http.ssl.certificate: /usr/share/elasticsearch/config/certs/es1/es1.crt
xpack.security.http.ssl.certificate_authorities: /usr/share/elasticsearch/config/certs/ca/ca.crt
xpack.security.transport.ssl.key: /usr/share/elasticsearch/config/certs/es1/es1.key
xpack.security.transport.ssl.certificate: /usr/share/elasticsearch/config/certs/es1/es1.crt
xpack.security.transport.ssl.certificate_authorities: /usr/share/elasticsearch/config/certs/ca/ca.crt
- Update this parameter with the IP or hostname of the node
- Keep this parameter with the same value for all nodes
Warning
To configure Xpack and SSL with Elasticsearch, please review the documentation of your version of Elasticsearch.
Custom JVM options#
add the file /etc/elasticsearch/jvm.options.d/jvm.options
with following lines:
-Dlog4j2.formatMsgNoLookups=true
-Xms4g
-Xmx4g
This can be updated according the amount of memory available
Start nodes#
On each node, start the service:
service elasticsearch start
MinIO#
MinIO distributed mode requires fresh directories. Here is an example of implementation of MinIO with TheHive.
The following procedure should be applied to all servers belonging the the cluster. We are considering the setup where the cluster is composed of 3 servers named minio1, minio2 & minio3.
Info
Minio does not work like Cassandra and Elasticsearch. A load balancer should be installed in front of the nodes to distribute connections.
Create a dedicated system account#
Create a dedicated user and group for MinIO.
adduser minio-user
addgroup minio-user
Create at least 2 data volumes on each server#
Create 2 folders on each server:
mkdir -p /srv/minio/{1,2}
chown -R minio-user:minio-user /srv/minio
Setup hosts files#
Edit /etc/hosts
of all servers
ip-minio-1 minio1
ip-minio-2 minio2
ip-minio-3 minio3
installation#
Example for DEB packages
wget https://dl.min.io/server/minio/release/linux-amd64/minio_20220607003341.0.0_amd64.deb
wget https://dl.min.io/client/mc/release/linux-amd64/mcli_20220509040826.0.0_amd64.deb
dpkg -i minio_20220607003341.0.0_amd64.deb
dpkg -i mcli_20220509040826.0.0_amd64.deb
Visit https://dl.min.io/ to find last version of required packages.
Configuration#
Create or edit file /etc/default/minio
MINIO_OPTS="--address :9100 --console-address :9001"
MINIO_VOLUMES="http://minio{1...3}:9100/srv/minio/{1...2}"
MINIO_ROOT_USER=thehive
MINIO_ROOT_PASSWORD=password
MINIO_SITE_REGION="us-east-1"
Enable and start the service#
systemctl daemon-reload
systemctl enable minio
systemctl start minio.service
Prepare the service for TheHive#
Following operations should be performed once all servers are up and running. A new server CAN NOT be added afterward.
Connect using the access key and secret key to one server with your browser on port 9100: http://minio:9100
Create a bucket named thehive
The bucket should be created and available on all your servers.
TheHive#
Akka toolkit is used by TheHive to manage the cluster when this configuration is setup. Akka also helps to manage thread & multi-processing. Akka allows TheHive to be scalable.
Quote
Akka is a toolkit for building highly concurrent, distributed, and resilient message-driven applications for Java and Scala.
Source: https://akka.io
Configuration#
Cluster#
Unlike the single node configuration, the first thing to configure is Akka to ensure the cluster is well managed by the application.
In this guide, we are considering the node 1 to be the master node. Start by configuring akka
component by editing the /etc/thehive/application.conf
file of each node like the following:
akka {
cluster.enable = on
actor {
provider = cluster
}
remote.artery {
canonical {
hostname = "<My IP address>" # (1)
port = 2551
}
}
# seed node list contains at least one active node
cluster.seed-nodes = [
"akka://application@<Node 1 IP address>:2551", # (2)
"akka://application@<Node 2 IP address>:2551",
"akka://application@<Node 3 IP address>:2551"
]
cluster.min-nr-of-members = 2 # (3)
}
- Set the IP address of the node
- The value of this parameter should be similar on all nodes
- Choose the value corresponding the half number of nodes +1 (for 3 nodes --> 2)
Database and index engine#
Update the configuration of thehive accordingly in /etc/thehive/application.conf
:
## Database configuration
db.janusgraph {
storage {
## Cassandra configuration
# More information at https://docs.janusgraph.org/basics/configuration-reference/#storagecql
backend = cql
hostname = ["<ip node 1>", "<ip node 2>", "<ip node 3>"] #(1)
# Cassandra authentication (if configured)
username = "thehive"
password = "PASSWORD"
cql {
cluster-name = thp
keyspace = thehive
}
}
- Set IP addresses of Cassandra nodes
MinIO S3 file storage#
For each TheHive node of the cluster, add the relevant storage configuration. Example for the first node:
storage {
provider: s3
s3 {
bucket = "thehive"
readTimeout = 1 minute
writeTimeout = 1 minute
chunkSize = 1 MB
endpoint = "http://<IP_MINIO_1>:9100"
accessKey = "thehive"
aws.credentials.provider = "static"
aws.credentials.secret-access-key = "password"
access-style = path
aws.region.provider = "static"
aws.region.default-region = "us-east-1"
}
}
- The configuration is backward compatible
- Either each TheHive server connects to one MinIO server, or use a load balancer to distribute connections to all nodes of the cluster (see the example for TheHive).
Start the service#
systemctl start thehive
Load balancers with HAProxy#
In front of TheHive cluster, you can add a load balancer which distributes HTTP requests to cluster nodes. One client does not need to always use the same node as affinity is not required.
Below is an non-optimized example of what should be added in haproxy configuration file, /etc/haproxy/haproxy.cfg
. Same configuration goes for all HAProxy instances.
In this example, the service is bound on TCP port 80. Bind the service on the virtual IP address that will be set up by keepalived service (see next part):
# Listen on all interfaces, on port 80/tcp
frontend thehive-in
bind <VIRTUAL_IP>:80 # (1)
default_backend thehive
# Configure all cluster node
backend thehive
balance roundrobin
server node1 THEHIVE-NODE1-IP:9000 check # (2)
server node2 THEHIVE-NODE2-IP:9000 check
server node3 THEHIVE-NODE3-IP:9000 check
- Configure the virtual IP address dedicated to the cluster
- Configure all nodes IP addresses and port of TheHive
Virtual IP with Keypalived#
If you decide to use keepalived to setup a virtual IP address for load balancers, this part contains a basic example of configuration.
This service checks if the load balancers (for example HAProxy), installed on the same system, is running or not. In our case, LB1 is master so the virtual IP address is on LB1 server. if haproxy service is not running any more, keepalived on server LB2 is setting up the virtual ip address until haproxy service on LB1 server is running again.
vrrp_script chk_haproxy { # (1)
script "/usr/bin/killall -0 haproxy" # cheaper than pidof
interval 2 # check every 2 seconds
weight 2 # add 2 points of priority if OK
}
vrrp_instance VI_1 {
interface eth0
state MASTER
virtual_router_id 51
priority 101 # 101 on primary, 100 on secondary # (2)
virtual_ipaddress {
10.10.1.50/24 brd 10.10.1.255 dev eth0 scope global # (3)
}
track_script {
chk_haproxy
}
}
- Requires keepalived version > 1.1.13
- Use
priority 100
for a secondary node - This is an example. Update with your IP address and broadcast address
Troubleshooting#
Example of error message in /var/log/cassandra/ log files
InvalidRequest: code=2200 [Invalid query] message=”org.apache.cassandra.auth.CassandraRoleManager doesn’t support PASSWORD”.`
set the value authenticator: PasswordAuthenticator
in cassandra.yaml
Caused by: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.UnauthorizedException: Unable to perform authorization of permissions: Unable to perform authorization of super-user permission: Cannot achieve consistency level LOCAL_ONE
Fix it by running following CQL command:
ALTER KEYSPACE system_auth WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3 };
and with the following command:
nodetool repair -full