Difference between revisions of "Build an M3 Anvil! Cluster"

From Alteeve Wiki
Jump to navigation Jump to search
(Created page with "{{howto_header}} First off, what is an Anvil!? In short, it is a "self-maintaining 'cluster of clusters'". It's goal is to keep servers (virtual machines) running, without human intervention, even under multiple failure conditions. Think about ship-board computer systems, remote research facilities, factories without dedicated IT staff, un-staffed branch offices and so forth. In these cases, the Anvil! system will predict component failure and mitigate. It will...")
 
Line 70: Line 70:
= Configuration =
= Configuration =


{{note|1=This is SUPER basic and minimal at this stage.}}
 
== Striker Dashboards ==
 
Striker dashboards are often described as "cheap and cheerful", generally being a fairly small and inexpensive device, like a [[Dell Optiplex 3090]], [[Intel NUC]], or similar.
 
You can choose any vendor you wish, but when selecting hardware, be mindful that all [[Scancore]] data is stored in [[PostgreSQL]] databases running on each dashboard. As such, we recommend an Intel [[Core i5]] or AMD [[Ryzen 5]] class CPU, 8 [[GiB]] or more of RAM, a ~250 GiB SSD (mixed use, decent IOPS) storage and two ethernet ports.
 
Striker Dashboards host the [[Striker]] web interface, and act as a bridge between your [[IFN]] network and the Anvil! cluster's [[BCN]] management network. As such, they must have a minimum of two ethernet ports.
 
== Node Pairs ==
 
An Anvil! ''Node Pair'' is made up of two identical physical machines. These two machines act as a single logical unit, providing fault tolerance and automated live migrations of hosted servers to mitigate against predicted hardware faults.
 
Each sub-node (a single hardware node) must have;
* Redundant [[PSU]]s
* Six ethernet ports (eight recommended). If six, use 3x dual-port. If eight, 2x quad port will do.
* Redundant storage ([[RAID]] level 1 (mirroring) or level 5 or 6 (striping with parity). Sufficient capacity and [[IOPS]] to host the servers that will run on each pair.
* [[IPMI]] (out-of-band) management ports. Strongly recommend on a dedicated network interface.
* Sufficient CPU core count and core speed for expected hosted servers.
* Sufficient RAM for the expected servers (note that the Anvil! reserves 8 [[GiB]]).
 
== Disaster Recovery (DR) Host ==
 
Optionally, a "third node" of a sort can be added to a node-pair. This is called a [[DR Host]], and should (but doesn't have to be) identical to the node pair hardware it is extending.
 
A DR (disaster recovery) Host acts as a remotely hosted "third node" that can be manually pressed into service in a situation where both nodes in a node pair are destroyed. A common example would be a DR Host being in another building on a campus installation, or on the far side of the building / vessel.
 
A DR host can in theory be in another city, but storage replication speeds and latency need to be considered. Storage replication between node pairs is synchronous, where replication to DR can be asynchronous. However, consideration of storage loads are required to insure that storage data can keep up with the rate of data change.
 
== Foundation Pack Equipment ==
 
The Anvil! is, fundamentally, hardware agnostic. That said, the hardware you select must be configured to meet the Anvil! requirements.
 
As we are hardware agnostic, we've created three linked pages. As we validate hardware ourselves, we will expand hardware-specific configuration guides. If you've configured foundation pack equipment not in the pages below, and you are willing, we would love to add your configuration instructions to our list.
 
* [[Ethernet Switch Configuration]]
* [[Switched PDU Configuration]]
* [[UPS Configuration]]
 
= Base OS Install =
 
For all three machine types; (striker dashboards, node-pair sub-node, dr host), begin with a minimal [[RHEL]] 8 or [[CentOS Stream]] 8 install.
 
 
 


{{footer}}
{{footer}}

Revision as of 00:58, 23 June 2022

 Alteeve Wiki :: How To :: Build an M3 Anvil! Cluster

First off, what is an Anvil!? In short, it is a "self-maintaining 'cluster of clusters'". It's goal is to keep servers (virtual machines) running, without human intervention, even under multiple failure conditions.

Think about ship-board computer systems, remote research facilities, factories without dedicated IT staff, un-staffed branch offices and so forth.

In these cases, the Anvil! system will predict component failure and mitigate. It will adapt to changing threat conditions, like cooling or power loss, including automatic recovery from full power loss. It is designed around the understanding that a fault condition may not be repaired for weeks or months, and can do automated risk analysis and mitigation.

That's an Anvil! cluster!

Components

The minimum configuration needed to host servers on an Anvil! is this;

Simplest Anvil! system
Management Layer
Striker Dashboard 1 Striker Dashboard 2
Sub-Clusters
Node-Block 1
Sub-node 1 Sub-node 2
Foundation Pack 1
Ethernet Switch 1 Ethernet Switch 2
Switched PDU 1 Switched PDU 2
UPS 1 UPS 2

With this configuration, you can host as many servers as you would like, limited only by the resources of Node Pair 1 (itself made of a pair of physical nodes with your choice of processing, RAM and storage resources).

Scaling

This is a significant investment to get started, but you will soon find that scaling is easy!

Management Layer; Striker Dashboards

The management layer, the Striker dashboards, have no hard limit on how many Node Blocks they can manage. All node-blocks record their data to the Strikers (to offload processing and storage loads). There is a practical limit to how many node blocks can use the Strikers, but this can be accounted for in the hardware selected for the dashboards.

Node Blocks

An Anvil! cluster uses one or more node blocks, pairs of matched physical nodes configured as a single logical unit. The power of a given node block is set by you and based on the loads you expect to place on it.

There is no hard limit on how many node blocks exist in an Anvil! cluster. Your servers will be deployed across the node blocks and, when you want to add more servers than you currently have resource for, you simple add another node block.

Foundation Packs

A foundation pack is the power and ethernet layer that feeds into one or more node blocks. At it's most basic, it consists of three pairs of equipment;

  • Two stacked (or VLT-domain'ed) ethernet switches.
  • Two switched PDUs (network-switched power bars
  • Two UPSes.

Each UPS feeds one PDU, forming two separate "power rails". Ethernet switches and all sub-nodes are equipped with redundant PSUs, with one PSU fed by either power rail.

In this way, any component in the foundation pack can fault, and all equipment will continue to have power and ethernet resources available. How many Anvil! node-pairs can be run on a given foundation pack is limited only by the sizing of the selected foundation pack equipment.

Configuration

Template note icon.svg
Note: This is SUPER basic and minimal at this stage.

Striker Dashboards

Striker dashboards are often described as "cheap and cheerful", generally being a fairly small and inexpensive device, like a Dell Optiplex 3090, Intel NUC, or similar.

You can choose any vendor you wish, but when selecting hardware, be mindful that all Scancore data is stored in PostgreSQL databases running on each dashboard. As such, we recommend an Intel Core i5 or AMD Ryzen 5 class CPU, 8 GiB or more of RAM, a ~250 GiB SSD (mixed use, decent IOPS) storage and two ethernet ports.

Striker Dashboards host the Striker web interface, and act as a bridge between your IFN network and the Anvil! cluster's BCN management network. As such, they must have a minimum of two ethernet ports.

Node Pairs

An Anvil! Node Pair is made up of two identical physical machines. These two machines act as a single logical unit, providing fault tolerance and automated live migrations of hosted servers to mitigate against predicted hardware faults.

Each sub-node (a single hardware node) must have;

  • Redundant PSUs
  • Six ethernet ports (eight recommended). If six, use 3x dual-port. If eight, 2x quad port will do.
  • Redundant storage (RAID level 1 (mirroring) or level 5 or 6 (striping with parity). Sufficient capacity and IOPS to host the servers that will run on each pair.
  • IPMI (out-of-band) management ports. Strongly recommend on a dedicated network interface.
  • Sufficient CPU core count and core speed for expected hosted servers.
  • Sufficient RAM for the expected servers (note that the Anvil! reserves 8 GiB).

Disaster Recovery (DR) Host

Optionally, a "third node" of a sort can be added to a node-pair. This is called a DR Host, and should (but doesn't have to be) identical to the node pair hardware it is extending.

A DR (disaster recovery) Host acts as a remotely hosted "third node" that can be manually pressed into service in a situation where both nodes in a node pair are destroyed. A common example would be a DR Host being in another building on a campus installation, or on the far side of the building / vessel.

A DR host can in theory be in another city, but storage replication speeds and latency need to be considered. Storage replication between node pairs is synchronous, where replication to DR can be asynchronous. However, consideration of storage loads are required to insure that storage data can keep up with the rate of data change.

Foundation Pack Equipment

The Anvil! is, fundamentally, hardware agnostic. That said, the hardware you select must be configured to meet the Anvil! requirements.

As we are hardware agnostic, we've created three linked pages. As we validate hardware ourselves, we will expand hardware-specific configuration guides. If you've configured foundation pack equipment not in the pages below, and you are willing, we would love to add your configuration instructions to our list.

Base OS Install

For all three machine types; (striker dashboards, node-pair sub-node, dr host), begin with a minimal RHEL 8 or CentOS Stream 8 install.



 

Any questions, feedback, advice, complaints or meanderings are welcome.
Us: Alteeve's Niche! Support: Mailing List IRC: #clusterlabs on Libera Chat   © Alteeve's Niche! Inc. 1997-2022
legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions.