Tag Archives: Linux

Proposed Technical Training

Unfortunately it appears for sometime it is very difficult to get any sort of technical training. I ask and ask like anyone else and the technical training seems to be put on the back burner.

I guess it’s very lucky that I took the initiative of my own to attempt to brush up on some Microsoft skills. The journey so far has been interesting, although I must admit I am not at all looking forward to 70-640 exam. I’ll read the material and then do the usual revision. Hopefully I can pass it on the first attempt, but we will cross that bridge when it comes.

Meanwhile, as I was saying, technical training is hard to come by. So I have sat down and proposed the following training that I am going to complete off my own back.

Sept – 70-640 (attempt to complete the MCITP Server 2008 Administrator certification).

Oct – ITILv3 Foundations certification

Nov – 70-622 (turn my 70-620 into an MCITP Vista certification)

Dec – 70-686 (turn my 70-680 into an MCITP Windows 7 certification)

Jan – Jun 2011 (work through a RHCE book and then attempt to do the exam day to obtain an RHCE, followed by some more Microsoft. Possibly 70-652 and then more Hyper-V stuff).

Wish me luck, of course so items make get pushed back if I need more prep time for exams, but this is a rough guide on what I am hoping to achieve.

Possible limitation in sudo

Had an interesting issue at work today, it was to do with sudo (following version: 1.6.9p17).

It might be a bug for all I know, so at the moment I will just say possible limitation. Will post in more detail soon as I get time to use information I have to reproduce the issue, then I will comment further.

Dreamhost $97 off for your first year

I’ve recently moved our hosting to Dreamhost. So far I am very happy with it, so much so we are going to stick around for 12 months.

If you’d like to try out Dreamhost for 2 weeks, and then sign up with a $97 off your first 12 months of hosting for new customers, then sign up using the link here.

Your first 12 months of hosting will end up being about $22 USD if you sign up using the referral above.

RHEL 5.3 excessive file handles

As per my previous post here. I’d like to share some conclusions and how they came about.

A scheduled reboot was performed and the nagios agent and scripts got corrected. However this wasn’t the issue, which was sort of determined prior to the reboot, as I had them stopped for about 2 cycles of sar reports (20mins).

After the reboot, the problem of course happened again. sar -v output would show the file-sz increase over time. Somewhat like 3 steps forward and 2 steps back. So you get a slow increase, although as seen in previous post 500 or so handles consumed at a time.

I suspected some with oracle, so pointed out some processes that seemed to be respawning every 7 mins, but it turns out this was due to database scripts accessing the database, however one of the dba’s investigated further and determined they believed the emagent process seemed to be acting a bit odd. Upon further investigation a stop of the agent, and checking the current cat /proc/sys/fs/file-nr output showed the file-sz value drop by over 10000. Thus it looks like we found the issue, so of course they had to defer it to oracle support. At least we now have a way to bring the value down by a restart on emagent (which is Oracle Grid Control related I am told).

To aid others, if anyone is seeing something similar. I’d recommend making sure that the system wide file-max-nr value is greater then that of the limits.conf hard limit for the user you suspect that might be causing it. In the case of our system, the file-max-nr was set to 65536, however so was the oracle hard limit for nofiles. Thus when oracle user used up all the handles, then the system also had issues, aka unable to ssh in. So increase the system wide setting, can be performed without reboot. Then for any user’s you suspect that might be causing the issue, set them up with limits via the limits.conf file and then monitor. If a process running as one of those users is consuming all the file handlers, then only the software ran by this user will suffer the issue, and the system wide setting of a greater value will allow you to still ssh in and do various other things.

Hope this helps anyone else out in the wide world web, as it was certainly a good problem to investigate.

RHEL 5.3 sar -v output continues to show an file-sz increase

Anyone out in the wide world web seen this before;

sar -v

16:50:01    dentunusd   file-sz  inode-sz  super-sz %super-sz  dquot-sz %dquot-sz  rtsig-sz %rtsig-sz
17:00:01        22144      5100     18581         0      0.00         0      0.00         0      0.00
17:10:01        22274      5610     18591         0      0.00         0      0.00         0      0.00
17:20:01        22631      5610     18832         0      0.00         0      0.00         0      0.00
17:30:01        22744      5610     18822         0      0.00         0      0.00         0      0.00
17:40:01        23233      6120     19172         0      0.00         0      0.00         0      0.00
17:50:01        23563      6120     19381         0      0.00         0      0.00         0      0.00
18:00:01        23702      5610     19395         0      0.00         0      0.00         0      0.00
18:10:01        24023      6120     19583         0      0.00         0      0.00         0      0.00
18:20:01        24093      6630     19522         0      0.00         0      0.00         0      0.00
18:30:01        24441      6630     19738         0      0.00         0      0.00         0      0.00

What I am referring too, is the file-sz value increasing, in fact on the system in question it continues to increase until it hits the system limit and then processes start to fail. Which is not what I want.

Any tips of trying to pin point the application that might be causing it and any associated commands. Platform is RHEL 5.3 x64, system is used to run three Oracle Database instances.

I have a suspicion against a possible application, and intend of having it shutdown at some point and then monitoring the output from the sar -v for several samples to determine if I see the same pattern as per the output above.

EDIT: I believe I think I found the cause, but won’t know until I can get approval to make the change and reboot the host. I had a feeling it might be something to do with nagios and some scripts that check various things. Still believe this to be the case, as I have found an issue with nrpe itself on the host. Will get the approval to make the changes and reboot. Then will post back outcome.

Anthony Rumble: Rest In Peace

I’ve just learnt that Anthony Rumble passed away after an accident at home some days ago. My condolences to his family. I’d never met Anthony Rumble personally, however I had dealt with him a number of times in the early days of Everything Linux. Further more I had exchanged emails with him in the past via SLUG.

All I can say is this is a great loss to his family and the community. Anthony Rumble was a great guy and someone whom I certainly had respect for. Anthony will be missed by everyone who had dealt with him.

Sun xVM

I’d heard a bit about Sun xVM but hadn’t actually had the chance to look at the product.

In the last day or two I have a chance to install the product. All I can say is I am very impressed. It works extremely well. Not bad for a product that is Free.

The GUI is easy enough to use, and the performance of the guests seem good too. It’s also good to see they have packaged tools packages for various supported operating systems, so the installed guest gets some extra handy functions.

Question is, what will happen to this product and others once Oracle take over. I surely hope they don’t disappear.

linode VPS upgrade from debian 4.0 to 5.0

Finally took the plunge. Upgraded out linode VPS from debian 4.0 to 5.0

Apart from an inital hiccup, it seems fine. Upgrade complete. Unfortunately I had to shutdown the VPS instance to increase a big of diskspace onto the root, so that the dist-upgrade had enough disk space.

In other news, had a bit of a funny issue today at work. It was related to domain name resolution, and oddly enough another solution to the issue popped into my head just a while ago. It’s funny how this happens to us IT folk.

Basically it looked like a host lookup was failing and we thought maybe the host record is missing from the internal dcs, while it exists in the management host we have. As everything works for the customer when the host is using dns from our management servers, but soon as we use the customer primary dns it seems to fail. First thought posible record missing, however then it dawned on me that it could be related to the resolv.conf on the host.

Going from memory I think the resolv.conf is the issue. I remember it containing the following options;

domain blah.blah.com
nameserver w.x.y.z
#nameserver a.b.c.d
search blah.blah.com
search blah2.blah.com

And if the above holds true, I think I know why. The host record exists both in blah.blah.com and in blah2.blah.com. However, nameserver w.x.y.z only serves blah.blah.com, and nameserver a.b.c.d only serves blah2.blah.com. From my understanding of the man pages for resolv.conf, it looks like you shouldn’t use search/domain together, and if you do then only the last instance of search is ever used, which in this case means everytime we try to lookup host.blah2.blah.com with the primary nameserver set as w.x.y.z it will always fail. If we hashed out the domain line, it would work, as now resolver would search for both domain suffix strings against the host being used in the lookup.

I’ll check the resolv.conf tomorrow when I am back in the office, but I am pretty certain this is how it was, however if its not then back to the drawing board (as it might then end up being a missing host record between the two domains/nameservers).

Just so we are clear, technically the host shouldn’t be using the nameserver a.b.c.d, nor domain suffix blah2.blah.com (as this is ONLY to be used from the management node) and not for the entire environment, will soon be making sure no one else depends on it and stopping it from answering requests in the future.

Events of the past week

It’s been an interesting week, thought I’d mention what happened. The week ended with me having to setup a CentOS 5.2 system and then install an application called Jira, which I continued to setup and tinker with today. Instructions not the greatest, but managed to get it configured in the end. Although it was a bit of a wrestle to get things running in a fashion that would be suitable for the end user.

When I left work, it was working. Although as of now it is not, so I suspect a team member probably stopped the apache and tomcat instances. Guess I will check when I get back to work on Monday.