DoS attack ? (qsize-q increased from 0 to thouthands)

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

DoS attack ? (qsize-q increased from 0 to thouthands)

Miguel Miranda-2
Hello to all, I kindly ask for your advice regarding an issue i am having with a pair of pdns server/resolvers. Both are running Centos 6.4 and pdns 3.3.3. and recursor 3.7 in native mode with mysql replication.
The issue is that suddently we receive a lot of customer complaints about slow dns responses, and indeed, in our tests we got 3 - 4 seconds delay in dns queries or not response at all. We run both servers and recursor in the same machine with recursor listening on localhost, i know this is not the best practice but i cant change that topology right now. Our network guys are telling me that overall traffic increases about 10 mb in each servers. We got the same delay in both servers.
Looking at the webserver page in each server  qps increase from 4,000 to 22,000 and the qsize-q increases from 0 to 2,500 or so value. Looking at the logs there are lots of entries like this:

Feb 21 18:35:54 dns1 pdns[27173]: Recursive query for remote 190.150.38.225:63567 with internal id 1433 was not answered by backend within timeout, reusing id
Feb 21 18:35:54 dns1 pdns[27173]: Recursive query for remote 190.150.218.7:33221 with internal id 1435 was not answered by backend within timeout, reusing id
Feb 21 18:35:54 dns1 rsyslogd-2177: imuxsock begins to drop messages from pid 27173 due to rate-limiting
Feb 21 18:35:56 dns1 pdns_recursor-balancer1[24608]: Sending SERVFAIL to 127.0.0.1 during resolve of 'fbapi.sd.duapps.com.' because: Too much time waiting for fbapi.dxsvr.com.|A, timeouts: 5, throttles: 0, queries: 7, 7914msec
Feb 21 18:35:47 dns1 pdns[27173]: Received a malformed qdomain from 186.32.122.240, 's1!0#037#006#003U#004': sending servfail
Feb 21 18:35:47 dns1 pdns[27173]: Received a malformed qdomain from 190.99.48.212, 'b._dns-sd._udp.xSò#001h¹ó#001': sending servfail
Feb 21 18:35:47 dns1 pdns[27173]: Received a malformed qdomain from 190.99.48.212, 'dr._dns-sd._udp.xSò#001h¹ó#001': sending servfail
Feb 21 18:35:47 dns1 pdns[27173]: Received a malformed qdomain from 190.99.48.212, 'r._dns-sd._udp.xSò#001h¹ó#001': sending servfail
Feb 21 18:35:47 dns1 pdns[27173]: Received a malformed qdomain from 190.99.48.212, 'db._dns-sd._udp.xSò#001h¹ó#001': sending servfail
Feb 21 18:35:47 dns1 pdns[27173]: Received a malformed qdomain from 190.99.48.212, 'lb._dns-sd._udp.xSò#001h¹ó#001': sending servfail
Feb 21 18:35:47 dns1 pdns_recursor-balancer2[24616]: Timeout from remote TCP client 127.0.0.1

the delay happens only when using the external ip, if i try to resolve a host using pdns recursor directly running on localhost, there is no delay.
No other signs of what could be causing the high qsize-q values.
The only way to resolve the slow response is to restart pdns service, pdns-recursor is not restarted so i think the problem is with pdns when it tries to forward recursive queries to dnsdist.
Im lost at what to check to track the problem cause.
regards.

_______________________________________________
Pdns-users mailing list
[hidden email]
http://mailman.powerdns.com/mailman/listinfo/pdns-users
Reply | Threaded
Open this post in threaded view
|

Re: DoS attack ? (qsize-q increased from 0 to thouthands)

Miguel Miranda-2
Following on this issue, the only other thing that i could check in the logs are lots of entries like this:

messages-20160228:Feb 24 21:01:51 dns1 pdns[1587]: Respawning
messages-20160228:Feb 24 21:01:54 dns1 pdns[13845]: 5017 questions waiting for database attention. Limit is 5000, respawning
messages-20160228:Feb 24 21:01:54 dns1 pdns[1587]: Respawning
messages-20160228:Feb 24 21:01:57 dns1 pdns[13926]: 5018 questions waiting for database attention. Limit is 5000, respawning
messages-20160228:Feb 24 21:01:57 dns1 pdns[1587]: Respawning
messages-20160228:Feb 24 21:02:00 dns1 pdns[14029]: 5010 questions waiting for database attention. Limit is 5000, respawning
messages-20160228:Feb 24 21:02:00 dns1 pdns[1587]: Respawning
messages-20160228:Feb 25 21:05:25 dns1 pdns[5498]: Respawning
messages-20160228:Feb 25 21:05:27 dns1 pdns[5498]: Respawning
messages-20160228:Feb 25 21:05:29 dns1 pdns[5498]: Respawning
messages-20160228:Feb 25 21:05:31 dns1 pdns[5498]: Respawning
messages-20160228:Feb 25 21:05:33 dns1 pdns[5498]: Respawning
messages-20160228:Feb 25 21:05:35 dns1 pdns[5498]: Respawning
messages-20160228:Feb 25 21:05:37 dns1 pdns[5498]: Respawning
messages-20160228:Feb 25 21:05:39 dns1 pdns[5498]: Respawning

Do you think is it something related to mysql backend?

On Tue, Feb 23, 2016 at 11:38 AM, Miguel Miranda <[hidden email]> wrote:
Hello to all, I kindly ask for your advice regarding an issue i am having with a pair of pdns server/resolvers. Both are running Centos 6.4 and pdns 3.3.3. and recursor 3.7 in native mode with mysql replication.
The issue is that suddently we receive a lot of customer complaints about slow dns responses, and indeed, in our tests we got 3 - 4 seconds delay in dns queries or not response at all. We run both servers and recursor in the same machine with recursor listening on localhost, i know this is not the best practice but i cant change that topology right now. Our network guys are telling me that overall traffic increases about 10 mb in each servers. We got the same delay in both servers.
Looking at the webserver page in each server  qps increase from 4,000 to 22,000 and the qsize-q increases from 0 to 2,500 or so value. Looking at the logs there are lots of entries like this:

Feb 21 18:35:54 dns1 pdns[27173]: Recursive query for remote 190.150.38.225:63567 with internal id 1433 was not answered by backend within timeout, reusing id
Feb 21 18:35:54 dns1 pdns[27173]: Recursive query for remote 190.150.218.7:33221 with internal id 1435 was not answered by backend within timeout, reusing id
Feb 21 18:35:54 dns1 rsyslogd-2177: imuxsock begins to drop messages from pid 27173 due to rate-limiting
Feb 21 18:35:56 dns1 pdns_recursor-balancer1[24608]: Sending SERVFAIL to 127.0.0.1 during resolve of 'fbapi.sd.duapps.com.' because: Too much time waiting for fbapi.dxsvr.com.|A, timeouts: 5, throttles: 0, queries: 7, 7914msec
Feb 21 18:35:47 dns1 pdns[27173]: Received a malformed qdomain from 186.32.122.240, 's1!0#037#006#003U#004': sending servfail
Feb 21 18:35:47 dns1 pdns[27173]: Received a malformed qdomain from 190.99.48.212, 'b._dns-sd._udp.xSò#001h¹ó#001': sending servfail
Feb 21 18:35:47 dns1 pdns[27173]: Received a malformed qdomain from 190.99.48.212, 'dr._dns-sd._udp.xSò#001h¹ó#001': sending servfail
Feb 21 18:35:47 dns1 pdns[27173]: Received a malformed qdomain from 190.99.48.212, 'r._dns-sd._udp.xSò#001h¹ó#001': sending servfail
Feb 21 18:35:47 dns1 pdns[27173]: Received a malformed qdomain from 190.99.48.212, 'db._dns-sd._udp.xSò#001h¹ó#001': sending servfail
Feb 21 18:35:47 dns1 pdns[27173]: Received a malformed qdomain from 190.99.48.212, 'lb._dns-sd._udp.xSò#001h¹ó#001': sending servfail
Feb 21 18:35:47 dns1 pdns_recursor-balancer2[24616]: Timeout from remote TCP client 127.0.0.1

the delay happens only when using the external ip, if i try to resolve a host using pdns recursor directly running on localhost, there is no delay.
No other signs of what could be causing the high qsize-q values.
The only way to resolve the slow response is to restart pdns service, pdns-recursor is not restarted so i think the problem is with pdns when it tries to forward recursive queries to dnsdist.
Im lost at what to check to track the problem cause.
regards.


_______________________________________________
Pdns-users mailing list
[hidden email]
http://mailman.powerdns.com/mailman/listinfo/pdns-users