|
|
 | | From: | kbhat at kaxy.com | | Subject: | Software design for redundant systems | | Date: | 14 Jan 2005 12:53:33 -0800 |
|
|
 | Hi,
Is there any good literature out there pertaining to software design of redundant systems? I have some ideas, but I am not sure if they are adequate, or even correct. I am thinking along the lines of separating the redundancy logic from the business logic. In today's world no software application is an island unto itself. Software applications communicate with each other via some sort of IPC, or by accessing some shared data (e.g. shared memory and file). Also, software applications start timers, and a lot of processing is triggered by the triggering of the timers.
I am thinking of maintaining the notion of redundancy state in the supporting software, outside the business logic of the application. What I mean specifically is this: I will create wrapper functions around IPC system calls, IO calls and timer calls. Inside those wrapper functions, I will maintain the notion of redundancy state. For example, if the redundancy state is standby, the wrapper functions for IPC will not send out any messages, the wrapper functions for shared memory access will not access the shared memory, the wrapper functions for file access will not access shared files and the wrapper functions for timers will not set any timers. The advantage that I see with this approach is that the business logic of the application is completely oblivious to the redundancy state. When the redundancy state switches to active, lo and behold all these wrapper functions are turned ON, and they begin to work normally.
An alternate approach is to make a call to a function which returns immediately on the active side, but blocks on the standby side, up until the redundancy state changes to active. While easier to implement, the disadvantage of this approach is that upon switchover, control will resume only from this point onwards.
No discussion on redundancy is complete without a discussion on data synchronization and the need for checkpointing. Data synchronization of persistent data seems to be a lot easier than data synchronization of memory-resident data. In the former case we could potentially rely on external utilities and operating system capabilities (e.g. timestamps on files) maintaining this synchronization, using some criteria (e.g. time based or number of updates).
For synchronization of memory-resident data, I have the following in mind. I "register" a certain region of process memory with a "memory duplication service". This service runs on the active and standby side in its own thread. Any data that is written anywhere in this region of memory on the active side gets copied to the standby side. Of course the physical memory address values inside the two instances of the applications (primary and secondary) will be different, but within these address spaces relative offsets will be the same (after all it is the same software that runs in both active and standby mode). To duplicate some data from active to standby, you merely need to provide its offset from the beginning and its size. If more than one region of memory are "registered", the memory region identifier may also need to be provided.
I have tried to look far and wide to see if there are any standards for redundancy management. The only standard that I have found so far is X.751 from ITU-T. However, this standard only deals with the management aspect of redundancy management. Unfortunately this document reads like scripture ---- extremely cryptic that takes at least a few readings before you get it. For example it took me a long time to realize that PRIMARY and SECONDARY are roles in the fallback relationship, while BACKEDUP and BACKUP are roles in the backup relationship. I had initially assumed them to be synonymous.
To wrap up, I would appreciate if someone could provide some software strategies for building redundant systems.
Regards, Bhat
|
|
 | | From: | Patrik Servin | | Subject: | Re: Software design for redundant systems | | Date: | Sat, 15 Jan 2005 05:11:52 GMT |
|
|
 | > To wrap up, I would appreciate if someone could provide some software > strategies for building redundant systems. >
There are a number of design patterns that describes different strategies for redundancy. I would start by checking them out.
|
|
 | | From: | kbhat at kaxy.com | | Subject: | Re: Software design for redundant systems | | Date: | 17 Jan 2005 14:53:47 -0800 |
|
|
 | > > I have tried to look far and wide to see if there are any standards for > redundancy management. The only standard that I have found so far is > X.751 from ITU-T. However, this standard only deals with the > management aspect of redundancy management. Unfortunately this > document reads like scripture ---- extremely cryptic that takes at > least a few readings before you get it. For example it took me a long > time to realize that PRIMARY and SECONDARY are roles in the fallback > relationship, while BACKEDUP and BACKUP are roles in the backup > relationship. I had initially assumed them to be synonymous. > > To wrap up, I would appreciate if someone could provide some software > strategies for building redundant systems. > > Regards, > Bhat
According to ITU-T X.732 (Attributes for Representing Relationships), "A fallback relationship is an asymmetric relationship denoting that the second of a pair of managed objects (the secondary object) has been designated as a fallback or "next preferred choice" to the first managed object (the primary object). The existence of a fall back relationship implies that the secondary resource is capable of providing Back-up service to the primary resource if the latter is unable to fulfil its function. It does not necessarily imply that the secondary resource is currently active and performing its Back-up function in place of the primary resource. Primary and secondary are two roles in a fallback relationship.
"A back-up relationship is an asymmetric relationship denoting that the second of a pair of managed objects (the backup object) is currently active and performing a back-up function in place of the first managed object (the backed-up object). Back-up object and backed-up object are two roles in a back-up relationship.
"A back-up relationship is created as a result of a pre-existing fallback relationship between two managed objects. The back-up relationship comes into existence when the backed-up resource is not fulfilling its function, and the back-up resource is activated to provide the same service. The back-up relationship ceases to exist when the backed-up resource resumes fulfilling its function, and the back-up resource ceases to provide that service. Creation and deletion of the back-up relationship has no effect on the existence of the fallback relationship between the two managed objects".
According to ITU-T X.751 (Change over function), "The change over relationship is a composition of the fallback and back-up relationships described in CCITT Rec. X.732."
"the existence of a fallback relationship is the precondition for establishing a back-up relationship, the change over relationship is defined as the composition of the fallback and back-up relationships."
"The potential to provide back-up capability is represented by the fallback relationship. The primary object represents the resource that is to be backed up; the secondary object represents the resource that can provide back-up capability."
I am interperting this to mean that the distinction between primary/backed-up and secondary/backup, that X.732 maintains, is blurred in X.751. Is this a correct interpretation?
Regards, Bhat
|
|
|