Nagios load spike every 7 hoursHigh load on a nagios server — How many service checks for a nagios server...

What is the highest possible scrabble score for placing a single tile

I found an audio circuit and I built it just fine, but I find it a bit too quiet. How do I amplify the output so that it is a bit louder?

Taxes on Dividends in a Roth IRA

How to make money from a browser who sees 5 seconds into the future of any web page?

How do you make your own symbol when Detexify fails?

"It doesn't matter" or "it won't matter"?

15% tax on $7.5k earnings. Is that right?

How to explain what's wrong with this application of the chain rule?

How to get directions in deep space?

Is my low blitz game drawing rate at www.chess.com an indicator that I am weak in chess?

Make a Bowl of Alphabet Soup

How to convince somebody that he is fit for something else, but not this job?

US tourist/student visa

Is there a RAID 0 Equivalent for RAM?

Are Captain Marvel's powers affected by Thanos breaking the Tesseract and claiming the stone?

Shouldn’t conservatives embrace universal basic income?

What is Cash Advance APR?

Mimic lecturing on blackboard, facing audience

What's the name of the logical fallacy where a debater extends a statement far beyond the original statement to make it true?

Is this part of the description of the Archfey warlock's Misty Escape feature redundant?

Why is the Sun approximated as a black body at ~ 5800 K?

What to do when eye contact makes your coworker uncomfortable?

What fields between the rationals and the reals allow a good notion of 2D distance?

Why do Radio Buttons not fill the entire outer circle?

Nagios load spike every 7 hours

High load on a nagios server — How many service checks for a nagios server is too many?Nagios remote monitoring: NRPE Vs. SSHHow to setup a nagios event handler to run only in non working hours?esxi nagios speed issueNagios - Service checks for all but notify in work hours for someNagios Core to Nagios Core CommunicationNagios - measuring Average CPU LoadSet different warning thresholds for Nagios on weekendsConfiguring Nagiostimeout errors from nagios / SNMP

I have a NagiosXi server monitoring 631 services on 63 hosts. Every seven hours the load on the server spikes up to 20ish and then gradually falls back to near-0.

There are no cron jobs running every 7 hours.

The server has 8 cores and 2GB RAM. The RAM is not an issue, it still sits at 1GB free during the spikes, and upping it to 4GB makes no difference. The server was also migrated to a new host a week or so ago with no changes.

We also have scheduled downtime on 17 of the hosts being monitored so they are only monitored during 6am-6pm Mon-Fri, this seems to make no difference to the load spikes.

Most checks are done on Windows servers, using check_wmi_plus.

During load spikes, I tend to see 5-8 instances of check_wmi_plus.pl using 2-3% cpu, and a handful of httpd processes using the same, but nothing stands out as using a lot of cpu. Those processes also roll over quite fast so they are not hung or taking an unusual long period of time. The Service Check Execution Time in NagiosXi Performance Monitor tends to peak at ~5.5s with averages around 1s.

Can anyone suggest a possible cause, or how I can further troubleshoot this?

edited Mar 23 '15 at 1:37

masegaloeh

16.3k74085

asked Dec 3 '12 at 22:22

daryl_graham

161

Since you say it isn't a cron job then perhaps it is nagios itself. I'd look at the nagios log to see if it is restarting every 7 hours. If you are retaining state and have horribly slow disk I/O the load would spike. During the high load time run iotop -oP to see if there is a process doing excessive I/O.

– Mark Wagner
Dec 4 '12 at 1:14

You might want to try and see if you can spread the scheduling for those windows servers, ie running server1 at 1/7 hours and the second at 2/7 and so one, basically running each check on a different hour.

– Danie
Dec 4 '12 at 7:53

add a comment |

I have a NagiosXi server monitoring 631 services on 63 hosts. Every seven hours the load on the server spikes up to 20ish and then gradually falls back to near-0.

There are no cron jobs running every 7 hours.

We also have scheduled downtime on 17 of the hosts being monitored so they are only monitored during 6am-6pm Mon-Fri, this seems to make no difference to the load spikes.

Most checks are done on Windows servers, using check_wmi_plus.

Can anyone suggest a possible cause, or how I can further troubleshoot this?

edited Mar 23 '15 at 1:37

masegaloeh

16.3k74085

asked Dec 3 '12 at 22:22

daryl_graham

161

Since you say it isn't a cron job then perhaps it is nagios itself. I'd look at the nagios log to see if it is restarting every 7 hours. If you are retaining state and have horribly slow disk I/O the load would spike. During the high load time run iotop -oP to see if there is a process doing excessive I/O.

– Mark Wagner
Dec 4 '12 at 1:14

You might want to try and see if you can spread the scheduling for those windows servers, ie running server1 at 1/7 hours and the second at 2/7 and so one, basically running each check on a different hour.

– Danie
Dec 4 '12 at 7:53

add a comment |

I have a NagiosXi server monitoring 631 services on 63 hosts. Every seven hours the load on the server spikes up to 20ish and then gradually falls back to near-0.

There are no cron jobs running every 7 hours.

We also have scheduled downtime on 17 of the hosts being monitored so they are only monitored during 6am-6pm Mon-Fri, this seems to make no difference to the load spikes.

Most checks are done on Windows servers, using check_wmi_plus.

Can anyone suggest a possible cause, or how I can further troubleshoot this?

edited Mar 23 '15 at 1:37

masegaloeh

16.3k74085

asked Dec 3 '12 at 22:22

daryl_graham

161

I have a NagiosXi server monitoring 631 services on 63 hosts. Every seven hours the load on the server spikes up to 20ish and then gradually falls back to near-0.

There are no cron jobs running every 7 hours.

We also have scheduled downtime on 17 of the hosts being monitored so they are only monitored during 6am-6pm Mon-Fri, this seems to make no difference to the load spikes.

Most checks are done on Windows servers, using check_wmi_plus.

Can anyone suggest a possible cause, or how I can further troubleshoot this?

nagios

edited Mar 23 '15 at 1:37

masegaloeh

16.3k74085

asked Dec 3 '12 at 22:22

daryl_graham

161

edited Mar 23 '15 at 1:37

masegaloeh

16.3k74085

asked Dec 3 '12 at 22:22

daryl_graham

161

edited Mar 23 '15 at 1:37

masegaloeh

16.3k74085

edited Mar 23 '15 at 1:37

masegaloeh

16.3k74085

edited Mar 23 '15 at 1:37

masegaloeh

16.3k74085

asked Dec 3 '12 at 22:22

daryl_graham

161

asked Dec 3 '12 at 22:22

daryl_graham

161

asked Dec 3 '12 at 22:22

daryl_graham

161

Since you say it isn't a cron job then perhaps it is nagios itself. I'd look at the nagios log to see if it is restarting every 7 hours. If you are retaining state and have horribly slow disk I/O the load would spike. During the high load time run iotop -oP to see if there is a process doing excessive I/O.

– Mark Wagner
Dec 4 '12 at 1:14

You might want to try and see if you can spread the scheduling for those windows servers, ie running server1 at 1/7 hours and the second at 2/7 and so one, basically running each check on a different hour.

– Danie
Dec 4 '12 at 7:53

add a comment |

Since you say it isn't a cron job then perhaps it is nagios itself. I'd look at the nagios log to see if it is restarting every 7 hours. If you are retaining state and have horribly slow disk I/O the load would spike. During the high load time run iotop -oP to see if there is a process doing excessive I/O.

– Mark Wagner
Dec 4 '12 at 1:14

You might want to try and see if you can spread the scheduling for those windows servers, ie running server1 at 1/7 hours and the second at 2/7 and so one, basically running each check on a different hour.

– Danie
Dec 4 '12 at 7:53

Since you say it isn't a cron job then perhaps it is nagios itself. I'd look at the nagios log to see if it is restarting every 7 hours. If you are retaining state and have horribly slow disk I/O the load would spike. During the high load time run iotop -oP to see if there is a process doing excessive I/O.

– Mark Wagner
Dec 4 '12 at 1:14

You might want to try and see if you can spread the scheduling for those windows servers, ie running server1 at 1/7 hours and the second at 2/7 and so one, basically running each check on a different hour.

– Danie
Dec 4 '12 at 7:53

add a comment |

3 Answers
3

active

oldest

votes

A high load does NOT necessarily mean that you are using high levels of CPU only it only provides the number of process at a snapshot in time that are ready to run and receive CPU time but not how much of it.

Nagios does spin off a lot of processes rapidly depending on how you have set its monitoring schedules and at times will cause a spike as it starts a lot of processes running as fast as possible, but they might not require very much CPU or go immediately into a sleep/wait state.

BTW, if you disable NOTIFICATIONS in Nagios, this does not stop it from continuing to monitor a given host or service.

edited Dec 4 '12 at 17:33

answered Dec 3 '12 at 22:30

mdpc

10.2k84560

add a comment |

Lower the rhel/centos defaults prefork settings in the default /etc/httpd/conf/httpd.conf to something more realistic.

Use tools like apachebuddy.pl & apachetuner.sh to do the math on memory per process fork. allow more memory for other process on the system (mysql/postgresql/php) and reduce the MaxClient and MaxRequestChild.

I experienced this after the upgrade to 2014R1.1 from 2012R2.9. not sure if the latest version of XI2014 requires more resources for the web frontend.

This morning after lowering my settings, I noticed my load spikes are smaller, and navigating through the interface doesn't give me the grey unhappy face screen using forward and back buttons in browser. does this weirdness in the interface seem similar?

One last item, I'm looking at now, is what rhel modules in this default httpd.conf file are required. I see no sense in loading default modules if not needed. This server is a PROD enterprise server at my place of business with thousands of checks, so it needs to be solid.

UPDATE:

run

# service mysqld stop

# sh /usr/local/nagiosxi/scripts/repair_databases.sh 

# service mysqld start

or optimize tables while online via

# mysql -u root -p

mysql> use nagios;

list your tables

mysql> show tables;

then

mysql> optimize table $TABLENAME;

mysql> optimize table $TABLENAME;

mysql> optimize table $TABLENAME;

...

mysql> use nagiosql;



**list your tables**



mysql> show tables;

then

mysql> optimize table $TABLENAME;

mysql> optimize table $TABLENAME;

mysql> optimize table $TABLENAME;

...

do this for all tables.

If you can stop the service for the couple of minutes, then do it via nagiosxi script. if you can't until a later time... do it online, but expect the interface to be a bit slow until queries are re-ran. It maybe also beneficial to flush your query cache

mysql> FLUSH QUERY CACHE;

http://assets.nagios.com/downloads/nagiosxi/docs/Repairing_The_Nagios_XI_Database.pdf

edited Mar 23 '15 at 1:36

masegaloeh

16.3k74085

answered Jul 10 '14 at 13:51

user3258557

194

add a comment |

this is due to how kernel calculates load. see the source:
https://github.com/torvalds/linux/blob/master/include/linux/sched/loadavg.h
and you will get something like this: #define LOAD_FREQ (5*HZ+1)

LOAD_FREQ is the interval the kernel collects cpu load. Note that there is a minor shift with the value of 0.001s. So it take 5* 1000 *5.001 seconds to drift back to a multiple of 5 seconds. 25005/ 3600 is around 7 hours.

so I bet the system forks shourt tasks periodically and just gets "caught" by the kernel every 7 hours.

answered 2 mins ago

dennis.s

New contributor

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "2"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f454745%2fnagios-load-spike-every-7-hours%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

BTW, if you disable NOTIFICATIONS in Nagios, this does not stop it from continuing to monitor a given host or service.

edited Dec 4 '12 at 17:33

answered Dec 3 '12 at 22:30

mdpc

10.2k84560

add a comment |

BTW, if you disable NOTIFICATIONS in Nagios, this does not stop it from continuing to monitor a given host or service.

edited Dec 4 '12 at 17:33

answered Dec 3 '12 at 22:30

mdpc

10.2k84560

add a comment |

BTW, if you disable NOTIFICATIONS in Nagios, this does not stop it from continuing to monitor a given host or service.

edited Dec 4 '12 at 17:33

answered Dec 3 '12 at 22:30

mdpc

10.2k84560

BTW, if you disable NOTIFICATIONS in Nagios, this does not stop it from continuing to monitor a given host or service.

edited Dec 4 '12 at 17:33

answered Dec 3 '12 at 22:30

mdpc

10.2k84560

edited Dec 4 '12 at 17:33

answered Dec 3 '12 at 22:30

mdpc

10.2k84560

answered Dec 3 '12 at 22:30

mdpc

10.2k84560

answered Dec 3 '12 at 22:30

mdpc

10.2k84560

add a comment |

Lower the rhel/centos defaults prefork settings in the default /etc/httpd/conf/httpd.conf to something more realistic.

I experienced this after the upgrade to 2014R1.1 from 2012R2.9. not sure if the latest version of XI2014 requires more resources for the web frontend.

UPDATE:

run

# service mysqld stop

# sh /usr/local/nagiosxi/scripts/repair_databases.sh 

# service mysqld start

or optimize tables while online via

# mysql -u root -p

mysql> use nagios;

list your tables

mysql> show tables;

then

mysql> optimize table $TABLENAME;

mysql> optimize table $TABLENAME;

mysql> optimize table $TABLENAME;

...

mysql> use nagiosql;



**list your tables**



mysql> show tables;

then

mysql> optimize table $TABLENAME;

mysql> optimize table $TABLENAME;

mysql> optimize table $TABLENAME;

...

do this for all tables.

mysql> FLUSH QUERY CACHE;

http://assets.nagios.com/downloads/nagiosxi/docs/Repairing_The_Nagios_XI_Database.pdf

edited Mar 23 '15 at 1:36

masegaloeh

16.3k74085

answered Jul 10 '14 at 13:51

user3258557

194

add a comment |

Lower the rhel/centos defaults prefork settings in the default /etc/httpd/conf/httpd.conf to something more realistic.

I experienced this after the upgrade to 2014R1.1 from 2012R2.9. not sure if the latest version of XI2014 requires more resources for the web frontend.

UPDATE:

run

# service mysqld stop

# sh /usr/local/nagiosxi/scripts/repair_databases.sh 

# service mysqld start

or optimize tables while online via

# mysql -u root -p

mysql> use nagios;

list your tables

mysql> show tables;

then

mysql> optimize table $TABLENAME;

mysql> optimize table $TABLENAME;

mysql> optimize table $TABLENAME;

...

mysql> use nagiosql;



**list your tables**



mysql> show tables;

then

mysql> optimize table $TABLENAME;

mysql> optimize table $TABLENAME;

mysql> optimize table $TABLENAME;

...

do this for all tables.

mysql> FLUSH QUERY CACHE;

http://assets.nagios.com/downloads/nagiosxi/docs/Repairing_The_Nagios_XI_Database.pdf

edited Mar 23 '15 at 1:36

masegaloeh

16.3k74085

answered Jul 10 '14 at 13:51

user3258557

194

add a comment |

Lower the rhel/centos defaults prefork settings in the default /etc/httpd/conf/httpd.conf to something more realistic.

I experienced this after the upgrade to 2014R1.1 from 2012R2.9. not sure if the latest version of XI2014 requires more resources for the web frontend.

UPDATE:

run

# service mysqld stop

# sh /usr/local/nagiosxi/scripts/repair_databases.sh 

# service mysqld start

or optimize tables while online via

# mysql -u root -p

mysql> use nagios;

list your tables

mysql> show tables;

then

mysql> optimize table $TABLENAME;

mysql> optimize table $TABLENAME;

mysql> optimize table $TABLENAME;

...

mysql> use nagiosql;



**list your tables**



mysql> show tables;

then

mysql> optimize table $TABLENAME;

mysql> optimize table $TABLENAME;

mysql> optimize table $TABLENAME;

...

do this for all tables.

mysql> FLUSH QUERY CACHE;

http://assets.nagios.com/downloads/nagiosxi/docs/Repairing_The_Nagios_XI_Database.pdf

edited Mar 23 '15 at 1:36

masegaloeh

16.3k74085

answered Jul 10 '14 at 13:51

user3258557

194

Lower the rhel/centos defaults prefork settings in the default /etc/httpd/conf/httpd.conf to something more realistic.

I experienced this after the upgrade to 2014R1.1 from 2012R2.9. not sure if the latest version of XI2014 requires more resources for the web frontend.

UPDATE:

run

# service mysqld stop

# sh /usr/local/nagiosxi/scripts/repair_databases.sh 

# service mysqld start

or optimize tables while online via

# mysql -u root -p

mysql> use nagios;

list your tables

mysql> show tables;

then

mysql> optimize table $TABLENAME;

mysql> optimize table $TABLENAME;

mysql> optimize table $TABLENAME;

...

mysql> use nagiosql;



**list your tables**



mysql> show tables;

then

mysql> optimize table $TABLENAME;

mysql> optimize table $TABLENAME;

mysql> optimize table $TABLENAME;

...

do this for all tables.

mysql> FLUSH QUERY CACHE;

http://assets.nagios.com/downloads/nagiosxi/docs/Repairing_The_Nagios_XI_Database.pdf

edited Mar 23 '15 at 1:36

masegaloeh

16.3k74085

answered Jul 10 '14 at 13:51

user3258557

194

edited Mar 23 '15 at 1:36

masegaloeh

16.3k74085

edited Mar 23 '15 at 1:36

masegaloeh

16.3k74085

edited Mar 23 '15 at 1:36

masegaloeh

16.3k74085

answered Jul 10 '14 at 13:51

user3258557

194

answered Jul 10 '14 at 13:51

user3258557

194

answered Jul 10 '14 at 13:51

user3258557

194

add a comment |

this is due to how kernel calculates load. see the source:
https://github.com/torvalds/linux/blob/master/include/linux/sched/loadavg.h
and you will get something like this: #define LOAD_FREQ (5*HZ+1)

so I bet the system forks shourt tasks periodically and just gets "caught" by the kernel every 7 hours.

answered 2 mins ago

dennis.s

New contributor

add a comment |

this is due to how kernel calculates load. see the source:
https://github.com/torvalds/linux/blob/master/include/linux/sched/loadavg.h
and you will get something like this: #define LOAD_FREQ (5*HZ+1)

so I bet the system forks shourt tasks periodically and just gets "caught" by the kernel every 7 hours.

answered 2 mins ago

dennis.s

New contributor

add a comment |

this is due to how kernel calculates load. see the source:
https://github.com/torvalds/linux/blob/master/include/linux/sched/loadavg.h
and you will get something like this: #define LOAD_FREQ (5*HZ+1)

so I bet the system forks shourt tasks periodically and just gets "caught" by the kernel every 7 hours.

answered 2 mins ago

dennis.s

New contributor

this is due to how kernel calculates load. see the source:
https://github.com/torvalds/linux/blob/master/include/linux/sched/loadavg.h
and you will get something like this: #define LOAD_FREQ (5*HZ+1)

so I bet the system forks shourt tasks periodically and just gets "caught" by the kernel every 7 hours.

answered 2 mins ago

dennis.s

New contributor

answered 2 mins ago

dennis.s

New contributor

answered 2 mins ago

dennis.s

answered 2 mins ago

dennis.s

New contributor

dennis.s is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Server Fault!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ryfujk