Provisions for DB failures?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Provisions for DB failures?

Gabriel Ambuehl
Hi,
I just stumbled over PowerDNS and I must say, I'm deeply impressed by
what I see (sure as hell beats BIND hands down).

However, there's one thing that kinda scares me: suppose I use the
MySQL backend (which is what I'd like the most) and the DB goes down
so now I'm with a DNS daemon that's still running but not sending out
any data as it can't get to it. So is there some sort of filesystem
based cache module that can take over in such a situation?

Sure I have 4 DNS in two different places but I still don't like the
idea of having one box down to a DB failure and it gets even nastier
if its the primary in a one primary, 3 superslave [1] scenario...

I'd appreciate any comments on this issue.

Regards,
Gabriel

[1] I LOVE that feature ;-).

Reply | Threaded
Open this post in threaded view
|

Re: Provisions for DB failures?

bert hubert
On Sun, Mar 16, 2003 at 07:32:38PM +0100, Gabriel Ambuehl wrote:
> Hi,
> I just stumbled over PowerDNS and I must say, I'm deeply impressed by
> what I see (sure as hell beats BIND hands down).

Thanks!

> However, there's one thing that kinda scares me: suppose I use the
> MySQL backend (which is what I'd like the most) and the DB goes down
> so now I'm with a DNS daemon that's still running but not sending out
> any data as it can't get to it. So is there some sort of filesystem
> based cache module that can take over in such a situation?

We spent a lot of time thinking about this problem.

> Sure I have 4 DNS in two different places but I still don't like the
> idea of having one box down to a DB failure and it gets even nastier
> if its the primary in a one primary, 3 superslave [1] scenario...

Our ideas are as follows:

        1) in case of database failure, PowerDNS starts returning SERVFAIL
           packets, which quickly indicate 'ask another nameserver' and make
           sure nothing like 'no such domain' is answered.

        2) It is hard to determine effectively if a database is 'down'.
           Having a timeout is not enough to be sure about that. In case of
           authorization failure, it is best to treat this as a permanent
           failure and not try to paper it over.

        3) Simplicity is good

In this case, if any of the slaves has a database problem, it goes down. The
other slaves continue to work and the internet does not notice that there is
a problem.

The code to support 'alternative backend selection' would be complex and by
necessity based on a heuristic. It is doubtful if it would in fact increase
uptime. In the right circles, for example, it is well known that many
failover solutions in fact lower uptime because of increased complexity.

But our thoughts on this are not entirely set in stone. If:

        * somebody can convince us that there are real problems if a single
slave is down

and

        * somebody creates a *simple* algorithm for determining if a
database is not working right

and

        * has a reasonable way of reconfiguring so that we do give out the
right answers.

Then we'd be willing to implement it. Let me know.

Regards,

bert

>
> I'd appreciate any comments on this issue.
>
> Regards,
> Gabriel
>
> [1] I LOVE that feature ;-).
>
> _______________________________________________
> Pdns-users mailing list
> [hidden email]
> http://mailman.powerdns.com/mailman/listinfo/pdns-users

--
http://www.PowerDNS.com      Open source, database driven DNS Software
http://lartc.org           Linux Advanced Routing & Traffic Control HOWTO
http://netherlabs.nl                         Consulting
Reply | Threaded
Open this post in threaded view
|

Re[2]: Provisions for DB failures?

Gabriel Ambuehl
-----BEGIN PGP SIGNED MESSAGE-----

Hello bert,

Sunday, March 16, 2003, 8:44:54 PM, you wrote:

>         3) Simplicity is good

> In this case, if any of the slaves has a database problem, it goes
down. The
> other slaves continue to work and the internet does not notice that
there is
> a problem.

More or less. Except for the people that need to make two queries, of
course.

>         * somebody creates a *simple* algorithm for determining if a
> database is not working right

I suggest this: no algorithm at all. From what I've gathered,
everything needed is in place already as you can have multiple
backends. So what's needed is a mechanism that you can supply
additional DBs which are queried in case the first query would result
in a servfailed reply (I believe that's already happening if you have
different backends right?).

> Then we'd be willing to implement it. Let me know.


Now personally, what I'd love to see more than anything else (and more
than the above, of coure) is a dumb flat file based module that can be
used as second backend in case the DB fails. Say the flatfile gets
produced every X hours from the DB. This is relatively cheap
to generate (it could be done with a one line crontab entry,
actually) with any but very big installations where there are
usually other provisions taken that RDBMS don't go down.

Thoughts?



Best regards,
 Gabriel

-----BEGIN PGP SIGNATURE-----
Version: PGP 6.0.2i

iQEVAwUBPnTMTMZa2WpymlDxAQGctwf/YNxIGPviP7NOz6kszQJ5TNuB7Wyx2cJ1
Z+DR2T8Kf0yyg2WlVBxmvC1nBTdOcspR6IvkMFzJ2cGvC4ymj/F/dqMxldBdmxyW
ESh869AIOur0hudHer41WiF7Uj6A9Id35LrRDdLStGx5rGPr8jJV7R3Pn3YAWZk2
UXK06xKZiPQO21ACJttUz6wIfodxkgRLa39YfwjHrb7yqmmcrMI56NFsia/bOx/0
rW93gwenUJIo5nlkk9jZ4Rghu1grnRVNoUlAAWeum3oNW7Bhhu5NnQVQfqKGs3Df
SasUqIXOEZyeA5IgWjHE7dRggIFtY55/jlCQtZqJwF5pf0o24obMFA==
=5kij
-----END PGP SIGNATURE-----

Reply | Threaded
Open this post in threaded view
|

Re: Provisions for DB failures?

bert hubert
On Sun, Mar 16, 2003 at 09:11:01PM +0100, Gabriel Ambuehl wrote:

> > In this case, if any of the slaves has a database problem, it goes down.
> The
> > other slaves continue to work and the internet does not notice that
> there is
> > a problem.
>
> More or less. Except for the people that need to make two queries, of
> course.

What do you mean? Making a second query 500ms later shouldn't hurt that
much, and happens only once per ttl per recursor.

> I suggest this: no algorithm at all. From what I've gathered,
> everything needed is in place already as you can have multiple
> backends. So what's needed is a mechanism that you can supply
> additional DBs which are queried in case the first query would result
> in a servfailed reply (I believe that's already happening if you have
> different backends right?).

Wrong. If you have multiple database backends, they are considered to
complement eachother, like described in
http://doc.powerdns.com/pipebackend-dynamic-resolution.html#PIPE-AND-BIND

If any of them errors before a definite answer ('does exist', 'does not
exist') is in, the whole query gets a SERVFAIL. Partial data is an error!

> Now personally, what I'd love to see more than anything else (and more
> than the above, of coure) is a dumb flat file based module that can be
> used as second backend in case the DB fails. Say the flatfile gets
> produced every X hours from the DB. This is relatively cheap
> to generate (it could be done with a one line crontab entry,
> actually) with any but very big installations where there are
> usually other provisions taken that RDBMS don't go down.

And then you detect personally that your 'dump database to disk script' does
not make things worse by performing a partial dump because you get 'error
3434: index key duplicate' halfway during the dump? You've only moved the
problem!

Remember, failures are unexpected and mostly have a real cause. They are
rarely, if ever, neatly delineated from normal operation. PowerDNS tries
very hard to make sure it does not report a definite result in case of *any*
indication of failure so as not to report 'no such host'.

The best thing to do is have a tool like 'nagios' to determine if all your
slaves are doing the right thing and investigate if one of them isn't. I'd
hesitate to rely on a broken system to do the right thing in case of
failure.

Rely on your slaves, and make sure each has its own independent database.

Alternatively, if anybody comes up with a very simple idea, I'm willing to
implement it. However, I'm not going to throw "good code after a bad
problem", raising complexity while probably not helping anyhow.

Regards,

bert

--
http://www.PowerDNS.com      Open source, database driven DNS Software
http://lartc.org           Linux Advanced Routing & Traffic Control HOWTO
http://netherlabs.nl                         Consulting