Netbackup 2505 : Semaphores have run out

Issue:

Netbackup database has gone offline. All jobs are reporting a 2505 error. Online community recommends setting semaphores to official recommendations. backupserver already has the recommended settings in place.

Let’s dig into that.

View the current limits: sysctl -a | grep kernel.sem
kernel.sem = 300 307200 32 1024
View the current usage in a very messy manner: `ipcs
Count the currently used semaphores:
ipcs | grep Semaphore\ Arrays -A2000 | grep ^0 | wc -l (yes… there are cleaner ways of doing this- I am not making this pretty though)

This resulted in seeing 1024 currently used (out of 1024)

Stopping the DB resulted in 1015 consumed…

The thought at this point was that either netbackup was not releasing semaphores, or another process was consuming them. I hear they are tasty.

Let’s map semaphores to PIDs- this is a huge hassle to do manually as you have to do a lookup of each semaphore ID to a PID with the -s -i flags (specify semaphore with -s, specify -i for “print details on resource identified by id”)

Can anyone say “FOR LOOP”?

for pid in $( for semid in $( ipcs -s | awk ‘/0x/{ print $2 }’ ) ; do ipcs -s -i $pid | head -9 | tail -n1 | awk ‘{print $5}’ ; done

… this strips the key out of the results of ipcs -s, and then tosses the semaphore ID to ipcs via the specification of -i for a lookup on the specific semaphore. Easy.

We get this fallout: (with the highly sophisticated uniq -c tool…. Sarcasm is enhanced after 6 cups of coffee.)

3 2137
1 32550
5 2137
683 32550
1 36917
3 32550
281 36917
1 16356
6 36917
23 16356
1 6639
3 16356

We see a pattern here: pid 32550 has a bunch open!

So… what is that process?

Given that our max pid is 40960 on backupserver… we could have wrapped around again… (see: `cat /proc/sys/kernel/pid_max`) but if we are to believe what /var/log/messages has to say…

This was ‘nfsidmap[32550]’

The second-most semaphores loving PID was also nfsidmap: nfsidmap[36917]

A no-longer living process never released these. This was likely an issue with nfsidmap

So I rebooted backupserver- satisfied that the mine was still active in our environment… lurking for another day.

Conclusion:

Running backups again…

Only seeing 13 semaphores actively used:

[root@backupserver ~]# ipcs | grep Semaphore\ Arrays -A2000 | grep ^0 | wc -l

13

Advertisements