Recovery in the Event of a Disaster

Recovery in the Event of a Disaster

Disaster recovery for
Resonate RFID Reader Management
will have differences and similarities depending on whether it is installed in a single-node or multi-node configuration. In the event of a disaster, such as a crashed drive or server node, the first step is to get a new node up and running; the second step is to ensure the readers are re-attached to the
system
at the new node.

Replacing a Failed Resonate Node in a Multi-Node Configuration

When preparing to install the
Resonate server
on multiple nodes, one common way of deciding how many nodes are needed is to double the number of nodes required at a minimum to run the system, and then add one as a spare for a higher level of redundancy and fault tolerance. This configuration, 2n+1, is considered a good level of redundancy in much of the commercial IT industry. It can withstand multiple component failures with near-zero downtime. In general,
Resonate RFID Reader Management
can manage up to 2,000 readers using the requirements described in the
Resonate RFID Reader Management
Software Installation Guide; this is the maximum that the standard Resonate license SKUs currently support. Therefore, a very reliable system can be run with 2
n
+1 nodes, where
n
is the number of nodes that can be down simultaneously. For example, to allow 1 node to be down at a given time, use 3 nodes. Increase
n
for additional redundancy with the current Resonate SKUs. In the future, n will also increase as additional nodes are added to support higher reader counts with future high-scalability Resonate license SKUs.
In the case of Resonate multi-node operations, if a disaster removes one node completely, the existing high-availability support will simply continue running with no loss of performance or data, but with no advanced redundancy or increased reliability. After a new node is in place and assigned to the Kubernetes High-Availability cluster, the node will automatically be recovered and become a part of the cluster, and data will be replicated to it. At this point, the redundancy and increased reliability are back in place.

Replacing a Failed Resonate Node in a Single-Node Configuration

When configured to run as a single-node Resonate system, there is no additional redundancy or reliability over that of the base software and the computer it’s running on. In this case, a catastrophic failure of the node can also be recovered by restoring the full backup onto a new node computer and restarting it. This naturally allows for a longer downtime than the HA multi-node configuration, but it is a very straightforward task. In this case, one key issue will be to ensure that when readers re-attach, they can access the correct Resonate control and management data queues and service endpoint addresses. To facilitate this, it is recommended to use the same hostname and fully qualified domain name (FQDN) on the new node as was used on the node that experienced the disaster. This can be handled via DNS Alias, via common IT tools, or via manual reassignment. When complete, there will be a new node, just like the old node, ready to go.

Ensuring Readers are Re-attached to the New Resonate Node

After ensuring your new node is up and running as described above, you must send a
Restart
command from
Resonate RFID Reader Management
to all the readers. The restart causes the on-reader
Resonate Agent
app to restart, which causes it to resend all the required information about the state and configuration of the reader to Resonate. This, in turn, repopulates any live queues lost during the previous disaster, the last of the Resonate data that needs to be recovered. The Restart command is a menu item in the Device Settings
Device Settings
menu at the far right of the readers' row on the
Infrastructure
Devices
page. It is also available as a multi-select action from the column header Device Settings
Device Settings
button, letting you restart
Resonate Agent
on up to 100 readers at a time.