As per my previous post here. I’d like to share some conclusions and how they came about.
A scheduled reboot was performed and the nagios agent and scripts got corrected. However this wasn’t the issue, which was sort of determined prior to the reboot, as I had them stopped for about 2 cycles of sar reports (20mins).
After the reboot, the problem of course happened again. sar -v output would show the file-sz increase over time. Somewhat like 3 steps forward and 2 steps back. So you get a slow increase, although as seen in previous post 500 or so handles consumed at a time.
I suspected some with oracle, so pointed out some processes that seemed to be respawning every 7 mins, but it turns out this was due to database scripts accessing the database, however one of the dba’s investigated further and determined they believed the emagent process seemed to be acting a bit odd. Upon further investigation a stop of the agent, and checking the current cat /proc/sys/fs/file-nr output showed the file-sz value drop by over 10000. Thus it looks like we found the issue, so of course they had to defer it to oracle support. At least we now have a way to bring the value down by a restart on emagent (which is Oracle Grid Control related I am told).
To aid others, if anyone is seeing something similar. I’d recommend making sure that the system wide file-max-nr value is greater then that of the limits.conf hard limit for the user you suspect that might be causing it. In the case of our system, the file-max-nr was set to 65536, however so was the oracle hard limit for nofiles. Thus when oracle user used up all the handles, then the system also had issues, aka unable to ssh in. So increase the system wide setting, can be performed without reboot. Then for any user’s you suspect that might be causing the issue, set them up with limits via the limits.conf file and then monitor. If a process running as one of those users is consuming all the file handlers, then only the software ran by this user will suffer the issue, and the system wide setting of a greater value will allow you to still ssh in and do various other things.
Hope this helps anyone else out in the wide world web, as it was certainly a good problem to investigate.
Anyone out in the wide world web seen this before;
16:50:01 dentunusd file-sz inode-sz super-sz %super-sz dquot-sz %dquot-sz rtsig-sz %rtsig-sz
17:00:01 22144 5100 18581 0 0.00 0 0.00 0 0.00
17:10:01 22274 5610 18591 0 0.00 0 0.00 0 0.00
17:20:01 22631 5610 18832 0 0.00 0 0.00 0 0.00
17:30:01 22744 5610 18822 0 0.00 0 0.00 0 0.00
17:40:01 23233 6120 19172 0 0.00 0 0.00 0 0.00
17:50:01 23563 6120 19381 0 0.00 0 0.00 0 0.00
18:00:01 23702 5610 19395 0 0.00 0 0.00 0 0.00
18:10:01 24023 6120 19583 0 0.00 0 0.00 0 0.00
18:20:01 24093 6630 19522 0 0.00 0 0.00 0 0.00
18:30:01 24441 6630 19738 0 0.00 0 0.00 0 0.00
What I am referring too, is the file-sz value increasing, in fact on the system in question it continues to increase until it hits the system limit and then processes start to fail. Which is not what I want.
Any tips of trying to pin point the application that might be causing it and any associated commands. Platform is RHEL 5.3 x64, system is used to run three Oracle Database instances.
I have a suspicion against a possible application, and intend of having it shutdown at some point and then monitoring the output from the sar -v for several samples to determine if I see the same pattern as per the output above.
EDIT: I believe I think I found the cause, but won’t know until I can get approval to make the change and reboot the host. I had a feeling it might be something to do with nagios and some scripts that check various things. Still believe this to be the case, as I have found an issue with nrpe itself on the host. Will get the approval to make the changes and reboot. Then will post back outcome.
After what has seemed to be about 7 years, our Canon Ixus camera suddenly died. It started to produce lines in the pictures, which is common with the CCD failure.
We replaced it with a Pentax E80, as found here. It’s a sweet little camera, and the 720p HD recording feature is nice.
Have another few video clips to share from the other day, unfortunately I couldn’t post them sooner as I didn’t have the internet bandwidth available to me to uploaded them. So it had to wait until I got home.
Hope people enjoy, I know I certainly enjoy using the car. Although managed to break a drive shaft in the last few days, so the Rustler VXL is out of action. Going to order a set of steel drive shafts, as the standard plastic ones just don’t handle the power produced from the 11.1V 3S LiPo.