Introduction

Many application needs high availability (HA) feature - been able to continue functioning even in case of hardware/software fault. The usual way of providing high availability is to duplicate system components, so if some becomes unavailable, other can be used instead of it.

HAS model

HAS provides very simple HA model with one active and one standby node located on different hosts. Scenario is the following - applications opens the storage and performs some request processing loop. Before return from the open method, active node establishes connection with standby node. Standby node doesn't return from open method until active node normal or abnormal termination. If active node is normally terminated, open method of standby node returns false. In case of active node fault, open will return true at standby node. As soon as application is organized as event serving loop, there is no need to save and transfer context of program execution. Standby node just continue loop iterations. So the same program can be used at active and standby nodes. Moreover they can switch their roles. If active node A is crashed, standby node B continue execution and becomes an active node. If node A is restarted it connects to node B and now becomes standby node for node B. If B is crashed, A will continue execution once again.

High availability can be provided with minimal influence on the application code. HAS provides special operators new and delete, which performs allocation of objects in distributed shared memory. This shared memory segment is mapped to the same virtual address at active and standby nodes, so normal C++ pointer can be used by application. HAS provides two modes:

  1. Explicit specification of storage in which object should be allocation. In this case application can create private and shared objects. Private object are not distributed by HAS and can be used locally by application.
  2. Redefinition of default operator new. In this case all objects are shared and allocated from HAS pool. Application can still use free/malloc functions or derive a class from haTransient class to create private objects.

If you uncomment definition of REDEFINE_GLOBAL_NEW file, then default operator new will be redefined. If you o not want to make all objects shared implicitly just comment this line and rebuild HAS.

To be able to preserve consistency of shared data, HAS requires programmer to periodically invoke commit method. This method actually transfer all modified pages in shared pool to standby node. These pages will be copied to the pool of standby node only after all of them will be received from active node. So transaction is either atomically applied or ignored.

HAS use virtual memory protection mechanism to detect modified pages. At the beginning it prohibits modification access to the shared pool pages. Once an application tries to modify data in shared pool, page fault is happen. HAS provides handler of this exception. This handler marks the page in the bitmap and enable write to this page. When transaction is committed, all modified pages are sent to the standby node.

HAS is able to map shared pool on the file, so saving data between sessions or work with swap file. In last case all data is lost after application termination.

HAS throws haException in case of critical storage error. Programmer can catch and handle this exception. It contains description of the context where the exception was thrown and system error code.

haManager API

All interaction of application with HAS is performed through haManager class. Instance of this class should be created by application and used to open/close a storage. Below is description of haManager methods:


bool open(char const* fileName, 
          char const* host, 
          int         localPort, 
          int         remotePort, 
          void*       baseAddr = NULL);
Opens the storage.
Parameters
fileName - path to the storage file, if NULL then shared pool is not mapped to any OS file and data will be lost after application termination.
host - name of the host where the partner (standby/active node) is running.
localPort - port at local host at which HAS will accept standby node connections
remotePort - port at the remote computer to which standby node will try to connect
baseAddr - base address in virtual memory to which shared pool should be mapped, if NULL then OS will choose this address automatically (it is necessary that this address will always be the same at both nodes)
Returns
true - this application is active node
false - this is standby node and active node closes the storage

void commit();
Transfer all changes made by application asynchronously to the standby node. These modified pages will be placed in the standby storage in atomic way - either all of them will be placed, either neither of them will be placed (if standby node is also crashed at the moment of transferring pages to the shared pool, than integrity of the file on the disk can be violated).

void close();
Closes the storage

void setRootObject(void* obj);
Set the storage root. This root can be later retrieved using getRootObject method. Application should first check if root already set before invoking this method.
Parameters
obj - pointer to root object previously allocated in the shared pool.

void* getRootObject() const;
Retrieves reference to previously set root object.
Returns
reference to previously assigned root object or NULL is root was not yet specified.

haManager(size_t maxSize);
High availability storage constructor.
Parameters
maxSize - maximal size of the storage. Programmer has to specify maximal size of the storage at open time. Exceeding this limit will cause application failure.

Example: game "Guess an animal"

"Guess an animal" is very simple program which uses database to store the game tree. Having very simple algorithm this program shows some elements of "artificial intelligence". The more information you provide to this game, the more smarter will be it's behavior. The structure of the program is dialogue loop - ask questions and then place new information in the database. So this structure exactly fits the HAS application model - request service loop.

To run this application, you need to copy guess executable files to two different nodes of the net. It is possible to run two applications at the same computer, but in this case you should run in in the different directories, because otherwise there will be conflict on accessing the same data file, which cause application failure. Now choose port at both nodes which will be used by the HAS. At the first node A run the following command:

        guess B portA portB
And at other node B run execute the command:
        guess A portB portA
Node A will become the active node and B - standby node. Now you can input some data and emulate crashes using Ctrl-C.

Distribution terms

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the Software), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHOR OF THIS SOFTWARE BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

I will provide e-mail support and help you with development of HAS applications.


Look for new version at my homepage | E-Mail me about bugs and problems