knowledge-database (beta)

Current group: comp.sources.d

Software design for redundant systems

Software design for redundant systems  
kbhat at kaxy.com
 Re: Software design for redundant systems  
Patrik Servin
 Re: Software design for redundant systems  
kbhat at kaxy.com
From:kbhat at kaxy.com
Subject:Software design for redundant systems
Date:14 Jan 2005 12:53:33 -0800
Hi,

Is there any good literature out there pertaining to software design of
redundant systems? I have some ideas, but I am not sure if they are
adequate, or even correct. I am thinking along the lines of separating
the redundancy logic from the business logic. In today's world no
software application is an island unto itself. Software applications
communicate with each other via some sort of IPC, or by accessing some
shared data (e.g. shared memory and file). Also, software applications
start timers, and a lot of processing is triggered by the triggering of
the timers.

I am thinking of maintaining the notion of redundancy state in the
supporting software, outside the business logic of the application.
What I mean specifically is this: I will create wrapper functions
around IPC system calls, IO calls and timer calls. Inside those
wrapper functions, I will maintain the notion of redundancy state. For
example, if the redundancy state is standby, the wrapper functions for
IPC will not send out any messages, the wrapper functions for shared
memory access will not access the shared memory, the wrapper functions
for file access will not access shared files and the wrapper functions
for timers will not set any timers. The advantage that I see with this
approach is that the business logic of the application is completely
oblivious to the redundancy state. When the redundancy state switches
to active, lo and behold all these wrapper functions are turned ON, and
they begin to work normally.

An alternate approach is to make a call to a function which returns
immediately on the active side, but blocks on the standby side, up
until the redundancy state changes to active. While easier to
implement, the disadvantage of this approach is that upon switchover,
control will resume only from this point onwards.

No discussion on redundancy is complete without a discussion on data
synchronization and the need for checkpointing. Data synchronization
of persistent data seems to be a lot easier than data synchronization
of memory-resident data. In the former case we could potentially rely
on external utilities and operating system capabilities (e.g.
timestamps on files) maintaining this synchronization, using some
criteria (e.g. time based or number of updates).

For synchronization of memory-resident data, I have the following in
mind. I "register" a certain region of process memory with a
"memory duplication service". This service runs on the active and
standby side in its own thread. Any data that is written anywhere in
this region of memory on the active side gets copied to the standby
side. Of course the physical memory address values inside the two
instances of the applications (primary and secondary) will be
different, but within these address spaces relative offsets will be the
same (after all it is the same software that runs in both active and
standby mode). To duplicate some data from active to standby, you
merely need to provide its offset from the beginning and its size. If
more than one region of memory are "registered", the memory region
identifier may also need to be provided.

I have tried to look far and wide to see if there are any standards for
redundancy management. The only standard that I have found so far is
X.751 from ITU-T. However, this standard only deals with the
management aspect of redundancy management. Unfortunately this
document reads like scripture ---- extremely cryptic that takes at
least a few readings before you get it. For example it took me a long
time to realize that PRIMARY and SECONDARY are roles in the fallback
relationship, while BACKEDUP and BACKUP are roles in the backup
relationship. I had initially assumed them to be synonymous.

To wrap up, I would appreciate if someone could provide some software
strategies for building redundant systems.

Regards,
Bhat
From:Patrik Servin
Subject:Re: Software design for redundant systems
Date:Sat, 15 Jan 2005 05:11:52 GMT

> To wrap up, I would appreciate if someone could provide some software
> strategies for building redundant systems.
>

There are a number of design patterns that describes different strategies
for redundancy. I would start by checking them out.
From:kbhat at kaxy.com
Subject:Re: Software design for redundant systems
Date:17 Jan 2005 14:53:47 -0800
>
> I have tried to look far and wide to see if there are any standards
for
> redundancy management. The only standard that I have found so far is
> X.751 from ITU-T. However, this standard only deals with the
> management aspect of redundancy management. Unfortunately this
> document reads like scripture ---- extremely cryptic that takes at
> least a few readings before you get it. For example it took me a
long
> time to realize that PRIMARY and SECONDARY are roles in the fallback
> relationship, while BACKEDUP and BACKUP are roles in the backup
> relationship. I had initially assumed them to be synonymous.
>
> To wrap up, I would appreciate if someone could provide some software
> strategies for building redundant systems.
>
> Regards,
> Bhat


According to ITU-T X.732 (Attributes for Representing Relationships),
"A fallback relationship is an asymmetric relationship denoting that
the second of a pair of managed objects (the secondary object) has been
designated as a fallback or "next preferred choice" to the first
managed object (the primary object). The existence of a fall back
relationship implies that the secondary resource is capable of
providing Back-up service to the primary resource if the latter is
unable to fulfil its function. It does not necessarily imply that the
secondary resource is currently active and performing its Back-up
function in place of the primary resource. Primary and secondary are
two roles in a fallback relationship.

"A back-up relationship is an asymmetric relationship denoting that the
second of a pair of managed objects (the backup object) is currently
active and performing a back-up function in place of the first managed
object (the backed-up object). Back-up object and backed-up object are
two roles in a back-up relationship.

"A back-up relationship is created as a result of a pre-existing
fallback relationship between two managed objects. The back-up
relationship comes into existence when the backed-up resource is not
fulfilling its function, and the back-up resource is activated to
provide the same service. The back-up relationship ceases to exist when
the backed-up resource resumes fulfilling its function, and the back-up
resource ceases to provide that service. Creation and deletion of the
back-up relationship has no effect on the existence of the fallback
relationship between the two managed objects".

According to ITU-T X.751 (Change over function), "The change over
relationship is a composition of the fallback and back-up relationships
described in CCITT Rec. X.732."

"the existence of a fallback relationship is the precondition for
establishing a back-up relationship, the change over relationship is
defined as the composition of the fallback and back-up relationships."

"The potential to provide back-up capability is represented by the
fallback relationship. The primary object represents the resource that
is to be backed up; the secondary object represents the resource that
can provide back-up capability."

I am interperting this to mean that the distinction between
primary/backed-up and secondary/backup, that X.732 maintains, is
blurred in X.751. Is this a correct interpretation?

Regards,
Bhat
   

Copyright © 2006 knowledge-database   -   All rights reserved