[LUNI] Failover script?

Jason Rexilius jason at hostedlabs.com
Mon Jan 7 10:47:47 CST 2008


Not Invented Here (NIH)

The value of open source isn't just that its free but that things often 
get implemented and tested in a wide variety of environments under 
widely different loads (at least for popular projects like heartbeat).

Its called code maturity and is critical to running stable systems.

As an example if someone was going to throw you in a jungle to survive 
by yourself in harsh unknown conditions you would probably pick 
machinery and equipment that stood the test of time and was tested in 
those harsh conditions rather than something you built in your garage.

Sorry for not responding to your original problem but I think to 
summarize what you are looking at is:

1) you have an existing code base deployed across multiple machines that 
has IP based connection to a database (master I believe?).

2) you want to have fail-over to a secondary.

3) the cost of changing code vs. the cost of implementing a network 
level solution is what you are evaluating.


I would add a few thoughts that might help to sort out:

1) what failure scenarios matter in this instance.  For example, does 
the network topology allow for some machines to connect to DB but not 
others (i.e. lose one switch and half servers on that lose 
connectivity).  Is it just hard down? Are there cron jobs that run on 
the DB server that need to be stopped (do data updates that could cause 
inconsistency if other became master). etc. etc.

2) What is the fail-over demand? Real-time, automated, manual 
intervention, transaction-loss-less?  Its worth just describing what 
your minimum threshold is there.

3) What is the process for going back to other DB?  What happens to 
data?  Do you switch replication direction and keep secondary as master 
until next failure?

4) Might it be worth making an update to the code at this point to put 
in a DB-connection abstraction layer that provides future flexibility 
for things like two-phase commit model, app level log syncing or data 
partitioning?


Well, those are just some things worth thinking about.  After having a 
painful experience this weekend from not thinking things out before I 
implemented them I can passionately recommend that you maybe sketch it 
all out before making a decision ;-)


By the way the heartbeat project is worth looking at.  Its really stable 
and will be a good tool to have in the chest down the road regardless.

-jason



Richard Reina wrote:
>  In fact, this puts more burden on you as you do
> not have enough eyes to make all bugs shallow.
> 
> Beware NIH syndrome!
> 
> Keith, 
> 
> Thank you very much for you're reply.  May I ask what "make all bugs shallow" and "NIH syndrome" mean?
> 
> Thanks,
> 
> Richard


More information about the luni mailing list