Difference between revisions of "Build an M3 Anvil! Cluster"

From Alteeve Wiki
Jump to navigation Jump to search
Line 378: Line 378:
== Configuring the Alteeve Repo ==
== Configuring the Alteeve Repo ==


To configure the repo, we will use the <span class="code">alteeve-repo-setup</span> program that was just installed.
You can see a full list of options, including the use of the <span class="code">--key <uuid></span> to enable to Enterprise Repo. For this tutorial, we will configure the community repo.
<syntaxhighlight lang="bash">
alteeve-repo-setup
</syntaxhighlight>
<syntaxhighlight lang="text">
You have not specified an Enterprise repo key. This will enable the community
repository. We work quite hard to make it as stable as we possibly can, but it
does lead Enterprise.
</syntaxhighlight>
<syntaxhighlight lang="text">
Proceed? [y/N]:
</syntaxhighlight>
<syntaxhighlight lang="text">
Writing: [/etc/yum.repos.d/alteeve-anvil.repo]...
Repo: [rhel-8] created successfuly.
This is RHEL 8. Once subscribed, please enable this repo;
# subscription-manager repos --enable codeready-builder-for-rhel-8-x86_64-rpms
NOTE: On *nodes*, also add the High-Availability Addon repo as well;
# subscription-manager repos --enable rhel-8-for-x86_64-highavailability-rpms
</syntaxhighlight>
If you are installing using CentOS Stream 8, you are done now and can move on.
If you are using RHEL 8 proper, with the system now subscribed, we now need to enable additional repositories




Line 383: Line 412:
[[image:an-striker01-rhel8-m3-os-install-20.png|thumb|center|800px|No disc in the drive.]]
[[image:an-striker01-rhel8-m3-os-install-20.png|thumb|center|800px|No disc in the drive.]]


<span class="code"></span>
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
</syntaxhighlight>
</syntaxhighlight>

Revision as of 21:41, 28 July 2023

 Alteeve Wiki :: How To :: Build an M3 Anvil! Cluster

First off, what is an Anvil!?

In short, it's a system designed to keep servers running through an array of failures, without need for an internet connection.

Think about ship-board computer systems, remote research facilities, factories without dedicated IT staff, un-staffed branch offices and so forth. Where most hosted solutions expect for technical staff to be available in short order, and Anvil! is designed to continue functioning properly for weeks or months with faulty components.

In these cases, the Anvil! system will predict component failure and mitigate automatically. It will adapt to changing threat conditions, like cooling or power loss, including automatic recovery from full power loss. It is designed around the understanding that a fault condition may not be repaired for weeks or months, and can do automated risk analysis and mitigation.

That's an Anvil! cluster!

An Anvil! cluster is designed so that any component in the cluster can fail, be removed and a replacement installed without needing a maintenance window. This includes power, network, compute and management systems.

Components

The minimum configuration needed to host servers on an Anvil! is this;

Simplest Anvil! system
Management Layer
Striker Dashboard 1 Striker Dashboard 2
Anvil! Node
Node 1
Subnode 1 Subnode 2
Foundation Pack 1
Ethernet Switch 1 Ethernet Switch 2
Switched PDU 1 Switched PDU 2
UPS 1 UPS 2

With this configuration, you can host as many servers as you would like, limited only by the resources of Node 1 (itself made of a pair of physical nodes with your choice of processing, RAM and storage resources).

Scaling

To add capacity for hosted servers, individual nodes can be upgraded (online!), and/or additional nodes can be added. There is no hard limit on how many nodes can be in a given cluster.

Each 'Foundation Pack' can handle as many nodes as you'd like, though for reasons we'll explain in more detail later, it is recommended to run two to four nodes per foundation pack.

Management Layer; Striker Dashboards

The management layer, the Striker dashboards, have no hard limit on how many Node Blocks they can manage. All node-blocks record their data to the Strikers (to offload processing and storage loads). There is a practical limit to how many node blocks can use the Strikers, but this can be accounted for in the hardware selected for the dashboards.

Nodes

An Anvil! cluster uses one or more nodes, with each node being a pair of matched physical subnodes configured as a single logical unit. The power of a given node block is set by you and based on the loads you expect to place on it.

There is no hard limit on how many node blocks exist in an Anvil! cluster. Your servers will be deployed across the node blocks and, when you want to add more servers than you currently have resource for, you simple add another node block.

Foundation Packs

A foundation pack is the power and ethernet layer that feeds into one or more node blocks. At it's most basic, it consists of three pairs of equipment;

  • Two stacked (or VLT-domain'ed) ethernet switches.
  • Two switched PDUs (network-switched power bars
  • Two UPSes.

Each UPS feeds one PDU, forming two separate "power rails". Ethernet switches and all sub-nodes are equipped with redundant PSUs, with one PSU fed by either power rail.

In this way, any component in the foundation pack can fault, and all equipment will continue to have power and ethernet resources available. How many Anvil! node-pairs can be run on a given foundation pack is limited only by the sizing of the selected foundation pack equipment.

Configuration

Template note icon.svg
Note: This is SUPER basic and minimal at this stage.

Striker Dashboards

Striker dashboards are often described as "cheap and cheerful", generally being a fairly small and inexpensive device, like a Dell Optiplex 3090, Intel NUC, or similar.

You can choose any vendor you wish, but when selecting hardware, be mindful that all Scancore data is stored in PostgreSQL databases running on each dashboard. As such, we recommend an Intel Core i5 or AMD Ryzen 5 class CPU, 8 GiB or more of RAM, a ~250 GiB SSD (mixed use, decent IOPS) storage and two ethernet ports.

Striker Dashboards host the Striker web interface, and act as a bridge between your IFN network and the Anvil! cluster's BCN management network. As such, they must have a minimum of two ethernet ports.

Node Pairs

An Anvil! Node Pair is made up of two identical physical machines. These two machines act as a single logical unit, providing fault tolerance and automated live migrations of hosted servers to mitigate against predicted hardware faults.

Each sub-node (a single hardware node) must have;

  • Redundant PSUs
  • Six ethernet ports (eight recommended). If six, use 3x dual-port. If eight, 2x quad port will do.
  • Redundant storage (RAID level 1 (mirroring) or level 5 or 6 (striping with parity). Sufficient capacity and IOPS to host the servers that will run on each pair.
  • IPMI (out-of-band) management ports. Strongly recommend on a dedicated network interface.
  • Sufficient CPU core count and core speed for expected hosted servers.
  • Sufficient RAM for the expected servers (note that the Anvil! reserves 8 GiB).

Disaster Recovery (DR) Host

Optionally, a "third node" of a sort can be added to a node-pair. This is called a DR Host, and should (but doesn't have to be) identical to the node pair hardware it is extending.

A DR (disaster recovery) Host acts as a remotely hosted "third node" that can be manually pressed into service in a situation where both nodes in a node pair are destroyed. A common example would be a DR Host being in another building on a campus installation, or on the far side of the building / vessel.

A DR host can in theory be in another city, but storage replication speeds and latency need to be considered. Storage replication between node pairs is synchronous, where replication to DR can be asynchronous. However, consideration of storage loads are required to insure that storage data can keep up with the rate of data change.

Foundation Pack Equipment

The Anvil! is, fundamentally, hardware agnostic. That said, the hardware you select must be configured to meet the Anvil! requirements.

As we are hardware agnostic, we've created three linked pages. As we validate hardware ourselves, we will expand hardware-specific configuration guides. If you've configured foundation pack equipment not in the pages below, and you are willing, we would love to add your configuration instructions to our list.

Striker, Node and DR Host Configuration

In UEFI (BIOS), configure;

  • Striker Dashboards to power on after power loss in all cases.
  • Configure Subnodes to stay powered off after power loss in all cases.
  • Configure any machines with redundant PSUs to balance the load across PSUs (don't use "hot spare" where only one PSU is active carrying the full load)

If using RAID

  • If you have two drives, configure RAID level 1 (mirroring)
  • If using 3 to 8 drives, configure RAID level 5 (striping with N-1 parity)
  • If using 9+ drives, configure RAID level 6 (striping with N-2 parity)

Note that a server on a given node-pair will have it's data mirrored, effectively creating a sort of RAID level 11 (mirror of mirrors), 15 (mirror of N-1 stripes) or 16 (mirror of N-2 stripes). This is why we're comfortable pushing RAID level 5 to 8 disks.

Installation of Base OS

For all three machine types; (striker dashboards, node-pair sub-node, dr host), begin with a minimal RHEL 8 or CentOS Stream 8 install.

Template note icon.svg
Note: This tutorial assumes an existing understanding of installed RHEL 8. If you are new to RHEL, you can setup a free Red Hat account, and then follow their installation guide.

Base OS Install

Template note icon.svg
Note: Every effort has been made in the development of the Anvil! to ensure it will work with localisations. However, parsing of command output has been tested with Canadian and American English. As such, it is recommended that you install using one of these localisations. If you use a different localisation, and run into any problems, please let us know and we will try to add support.

Localisation

Choose your localisation;

Localisation selection.

KDUMP

Disable kdump; This prevents kernel dumps if the OS crashes, but it means the host will recover faster. If you want to leave kdump enabled, that is fine, but be aware of the slower recovery times. Note that a subnode getting fenced will be forced off, and so kernel dumps won't be collected regardless of this configuration.

Disable kdump.

Network & Host Name

Set the host name for the machine. It's useful to do this before configuring the network, so that the volume group name includes the host's short host name. This doesn't effect the operation of the Anvil! system, but it can assist with debugging down the road.

Template note icon.svg
Note: Don't worry about configuring the network, this will be handles by the Anvil! later. Setting the IFN IP at this stage can be useful, but is not required.
Set the host name.

Time & Date

Setting the timezone is very much specific to you and your install. The most important part is that the time zone is set consistently across all machines in the Anvil! cluster.

Setting the timezone consistently on all Anvil! cluster systems.

Software Selection

All machines can start with a Minimal Install. On Strikers, if you'd prefer to use Server With GUI, that is fine, but it is not needed at this step. The anvil-striker RPM will pull in the graphical interface.

Template note icon.svg
Note: If you select a graphical install on a Striker Dashboard, create a user called admin and set a password for that user.
Selecting the Minimal Install.

Installation Destination

Template note icon.svg
Note: It is strongly suggested to set the host name before configuring storage.
Template note icon.svg
Note: This is where the installation of a Striker dashboard will differ from an Anvil! Node's sub-node or DR host

In this example, there is a single hard drive that will be configured. It's entirely valid to have a dedicated OS drive, and using a second drive for hosting servers. If you're planning to use a different storage plan, then you can ignore this stage. The key requirement is that there is unused space sufficiently large to host the servers you plan to run on a given node or DR host.

Striker Dashboards Anvil! Subnodes and DR Hosts
Striker Dashboard Drive Selection.
Subnode or DR Host Drive Selection.

Click on Click here to create them automatically. This will create the base storage configuration, which we will adapt.

Striker Dashboards Anvil! Subnodes and DR Hosts
Striker Dashboard auto-configured disk layout.
Subnode or DR Host auto-configured disk layout.

In all cases, the auto-created /home logical volume will be deleted.

  • For Striker dashboards, after deleting /home, assign the freed space to the / partition. To do this, select the / partition, and set the Desired Capacity to some much larger size than is available (like 1TiB), and click on Update Setting. The size will change to the largest valid value.
  • For Anvil! subnodes and DR hosts, simply delete the /home partition, and do not give the free space to /. The space freed up by deleting /home will be used later for hosting servers.
Striker Dashboards Anvil! Subnodes and DR Hosts
Striker Dashboard /home LV deleted.
Subnode or DR Host /home LV deleted.

From this point forward, the rest of the OS install is the same for all systems.

Optional; Connect to Red Hat

If you are installing RHEL 8, as opposed to CentOS Stream 8, you can register the server during installation. If you don't do this, the Anvil! will give you a chance to register the server during the installation process also.

Registering the system with Red Hat.

Root Password

Set the root user password.

Setting the root user password.

Begin Installation

With everything selected, click on Begin Installation. When the install has completed, reboot into the minimal install.

Ready to install!

Post OS Install Configuration

Setting up the Alteeve repos is the same, but after that, the steps start to diverge depending on which machine type we're setting up in the Anvil! cluster.

Installing the Alteeve Repo

Template note icon.svg
Note: Our repo pulls in a bunch of other packages that will be needed shortly.

There are two Alteeve repositories that you can install; Community and Enterprise. Which is used is selected after the repository RPM is installed. Lets install the repo RPM, and then we will discuss the differences before we select one.

dnf install https://www.alteeve.com/an-repo/m3/anvil-release-latest.noarch.rpm
Updating Subscription Management repositories.
Last metadata expiration check: 0:39:42 ago on Fri 28 Jul 2023 04:21:39 PM EDT.
anvil-release-latest.noarch.rpm                                                                    59 kB/s |  12 kB     00:00    
Dependencies resolved.
==================================================================================================================================
 Package                         Arch      Version                                      Repository                           Size
==================================================================================================================================
Installing:
 alteeve-release                 noarch    0.1-2                                        @commandline                         12 k
Installing dependencies:
 dwz                             x86_64    0.12-10.el8                                  rhel-8-for-x86_64-appstream-rpms    109 k
 efi-srpm-macros                 noarch    3-3.el8                                      rhel-8-for-x86_64-appstream-rpms     22 k
 ghc-srpm-macros                 noarch    1.4.2-7.el8                                  rhel-8-for-x86_64-appstream-rpms    9.4 k
 go-srpm-macros                  noarch    2-17.el8                                     rhel-8-for-x86_64-appstream-rpms     13 k
 make                            x86_64    1:4.2.1-11.el8                               rhel-8-for-x86_64-baseos-rpms       498 k
 ocaml-srpm-macros               noarch    5-4.el8                                      rhel-8-for-x86_64-appstream-rpms    9.5 k
 openblas-srpm-macros            noarch    2-2.el8                                      rhel-8-for-x86_64-appstream-rpms    8.0 k
 perl                            x86_64    4:5.26.3-422.el8                             rhel-8-for-x86_64-appstream-rpms     73 k
<...snip...> 
 perl-version                    x86_64    6:0.99.24-1.el8                              rhel-8-for-x86_64-appstream-rpms     67 k
 python-rpm-macros               noarch    3-45.el8                                     rhel-8-for-x86_64-appstream-rpms     16 k
 python-srpm-macros              noarch    3-45.el8                                     rhel-8-for-x86_64-appstream-rpms     16 k
 python3-pyparsing               noarch    2.1.10-7.el8                                 rhel-8-for-x86_64-baseos-rpms       142 k
 python3-rpm-macros              noarch    3-45.el8                                     rhel-8-for-x86_64-appstream-rpms     15 k
 qt5-srpm-macros                 noarch    5.15.3-1.el8                                 rhel-8-for-x86_64-appstream-rpms     11 k
 redhat-rpm-config               noarch    131-1.el8                                    rhel-8-for-x86_64-appstream-rpms     91 k
 rust-srpm-macros                noarch    5-2.el8                                      rhel-8-for-x86_64-appstream-rpms    9.3 k
 systemtap-sdt-devel             x86_64    4.8-2.el8                                    rhel-8-for-x86_64-appstream-rpms     88 k
 unzip                           x86_64    6.0-46.el8                                   rhel-8-for-x86_64-baseos-rpms       196 k
 zip                             x86_64    3.0-23.el8                                   rhel-8-for-x86_64-baseos-rpms       270 k
Installing weak dependencies:
 perl-Encode-Locale              noarch    1.05-10.module+el8.3.0+6498+9eecfe51         rhel-8-for-x86_64-appstream-rpms     22 k
 perl-IO-Socket-SSL              noarch    2.066-4.module+el8.3.0+6446+594cad75         rhel-8-for-x86_64-appstream-rpms    298 k
 perl-Mozilla-CA                 noarch    20160104-7.module+el8.3.0+6498+9eecfe51      rhel-8-for-x86_64-appstream-rpms     15 k
 perl-TermReadKey                x86_64    2.37-7.el8                                   rhel-8-for-x86_64-appstream-rpms     40 k
Enabling module streams:
 perl                                      5.26                                                                                  
 perl-IO-Socket-SSL                        2.066                                                                                 
 perl-libwww-perl                          6.34                                                                                  

Transaction Summary
==================================================================================================================================
Install  159 Packages

Total size: 23 M
Total download size: 23 M
Installed size: 64 M
Is this ok [y/N]:
Downloading Packages:
(1/158): perl-Scalar-List-Utils-1.49-2.el8.x86_64.rpm                                             173 kB/s |  68 kB     00:00    
(2/158): perl-Data-Dumper-2.167-399.el8.x86_64.rpm                                                147 kB/s |  58 kB     00:00    
(3/158): perl-PathTools-3.74-1.el8.x86_64.rpm                                                     155 kB/s |  90 kB     00:00    
(4/158): perl-threads-shared-1.58-2.el8.x86_64.rpm                                                127 kB/s |  48 kB     00:00    
(5/158): perl-Unicode-Normalize-1.25-396.el8.x86_64.rpm                                           431 kB/s |  82 kB     00:00    
(6/158): zip-3.0-23.el8.x86_64.rpm                                                                896 kB/s | 270 kB     00:00    
(7/158): perl-MIME-Base64-3.15-396.el8.x86_64.rpm                                                 101 kB/s |  31 kB     00:00    
(8/158): perl-Pod-Simple-3.35-395.el8.noarch.rpm                                                  879 kB/s | 213 kB     00:00    
<...snip...> 
(156/158): python-srpm-macros-3-45.el8.noarch.rpm                                                  65 kB/s |  16 kB     00:00    
(157/158): perl-open-1.11-422.el8.noarch.rpm                                                      818 kB/s |  78 kB     00:00    
(158/158): perl-utils-5.26.3-422.el8.noarch.rpm                                                   269 kB/s | 129 kB     00:00    
----------------------------------------------------------------------------------------------------------------------------------
Total                                                                                             1.6 MB/s |  23 MB     00:14     
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                                                          1/1 
  Installing       : perl-Digest-1.17-395.el8.noarch                                                                        1/159 
  Installing       : perl-Digest-MD5-2.55-396.el8.x86_64                                                                    2/159 
  Installing       : perl-Data-Dumper-2.167-399.el8.x86_64                                                                  3/159 
  Installing       : perl-libnet-3.11-3.el8.noarch                                                                          4/159 
  Installing       : perl-Net-SSLeay-1.88-2.module+el8.6.0+13392+f0897f98.x86_64                                            5/159 
<...snip...> 
  Installing       : perl-CPAN-2.18-397.el8.noarch                                                                        156/159 
  Installing       : perl-Encode-devel-4:2.97-3.el8.x86_64                                                                157/159 
  Installing       : perl-4:5.26.3-422.el8.x86_64                                                                         158/159 
  Installing       : alteeve-release-0.1-2.noarch                                                                         159/159 
  Running scriptlet: alteeve-release-0.1-2.noarch                                                                         159/159 
  Verifying        : perl-Scalar-List-Utils-3:1.49-2.el8.x86_64                                                             1/159 
  Verifying        : perl-PathTools-3.74-1.el8.x86_64                                                                       2/159 
  Verifying        : perl-Data-Dumper-2.167-399.el8.x86_64                                                                  3/159 
  Verifying        : perl-threads-shared-1.58-2.el8.x86_64                                                                  4/159 
  Verifying        : perl-Encode-4:2.97-3.el8.x86_64                                                                        5/159 
<...snip...> 
  Verifying        : python-srpm-macros-3-45.el8.noarch                                                                   155/159 
  Verifying        : perl-SelfLoader-1.23-422.el8.noarch                                                                  156/159 
  Verifying        : perl-open-1.11-422.el8.noarch                                                                        157/159 
  Verifying        : perl-utils-5.26.3-422.el8.noarch                                                                     158/159 
  Verifying        : alteeve-release-0.1-2.noarch                                                                         159/159 
Installed products updated.

Installed:
  alteeve-release-0.1-2.noarch                                    dwz-0.12-10.el8.x86_64                                         
  efi-srpm-macros-3-3.el8.noarch                                  ghc-srpm-macros-1.4.2-7.el8.noarch                             
<...snip...> 
  redhat-rpm-config-131-1.el8.noarch                              rust-srpm-macros-5-2.el8.noarch                                
  systemtap-sdt-devel-4.8-2.el8.x86_64                            unzip-6.0-46.el8.x86_64                                        
  zip-3.0-23.el8.x86_64                                          

Complete!

Selecting a Repository

There are two released version of the Anvil! cluster. There are pros and cons to both options;

Community Repo

The Community repository is the free repo that anyone can use. As new builds pass our CI/CD test infrastructure, the versions in this repository are automatically built.

This repository always has the latest and greatest from Alteeve. We use Jenkins and a suite of proprietary test suite to ensure that the quality of the releases is excellent. Of course, Alteeve is a company of humans, and there's always a small chance that a bug could get through. Our free community repository is community supported, and it's our wonderful users who help us improve and refine our Anvil! platform.

Enterprise Repo

The Enterprise repository is the paid-access repository. The releases in the enterprise repo are "cherry picked" by Alteeve, and subjected to more extensive testing and QA. This repo is designed for businesses who want the most stable releases.

Using this repo opens up the option of active monitoring of your Anvil! cluster by Alteeve, also!

If you choose to get the Enterprise repo, please contact us and we will provide you with a custome repository key.

Configuring the Alteeve Repo

To configure the repo, we will use the alteeve-repo-setup program that was just installed.

You can see a full list of options, including the use of the --key <uuid> to enable to Enterprise Repo. For this tutorial, we will configure the community repo.

alteeve-repo-setup
You have not specified an Enterprise repo key. This will enable the community
repository. We work quite hard to make it as stable as we possibly can, but it
does lead Enterprise.
Proceed? [y/N]:
Writing: [/etc/yum.repos.d/alteeve-anvil.repo]...
Repo: [rhel-8] created successfuly.

This is RHEL 8. Once subscribed, please enable this repo;
# subscription-manager repos --enable codeready-builder-for-rhel-8-x86_64-rpms

NOTE: On *nodes*, also add the High-Availability Addon repo as well;
# subscription-manager repos --enable rhel-8-for-x86_64-highavailability-rpms

If you are installing using CentOS Stream 8, you are done now and can move on.

If you are using RHEL 8 proper, with the system now subscribed, we now need to enable additional repositories


Enabling the Alteeve Repo

On all machines, post OS install, add the Anvil! repo;

dnf -y install https://www.alteeve.com/an-repo/m3/anvil-release-latest.noarch.rpm

Once installed, you need to enable the repository.

Repository Options
Community This is free, and it gets the latest updates generated by our CI/CD testing infrastructure.
Commercial This is paid and includes a support contract. Updates are curated for maximum reliability.

If you purchase a commercial support agreement, you will be provided a key to access the commercial repository.

To enable the Anvil! repository, run either:

Repository Options
Community
alteeve-repo-setup
Commercial
alteeve-repo-setup -k <key>

Now update the OS;

dnf update

If the kernel was updated, it is recommended (but not required) to reboot now.

In all cases, the Anvil! will rename network interfaces. For this to work, the biosdevname package needs to be removed.

dnf remove biosdevname

Now, we can install the anvil RPM. This next step is what defines a given machine as a Striker, Node or DR Host. As such, be careful to install the right one for the machine you're configuring.

Install the Anvil! RPM
Striker Dashboard
dnf install anvil-striker
Node
dnf install anvil-node
DR Host
dnf install anvil-dr



 

Any questions, feedback, advice, complaints or meanderings are welcome.
Us: Alteeve's Niche! Support: Mailing List IRC: #clusterlabs on Libera Chat
© Alteeve's Niche! Inc. 1997-2023   Anvil! "Intelligent Availability™" Platform
legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions.