Playing with RHEV 3.1

I took the snowy weekend to play with the new release of RHEV 3.1. This new release come with an impressive set of new features, including the removal of the requisite of Internet Explorer! Also included (and to be reviewed in a future post) is the possibility to use a Gluster based storage with RHEV!

My plan for the beginning was to use my old laptop and install RHEV-M on it. That part went well, the installation is now so easy. I also wanted to test the new “All-In-One” plugin on it. This plugin allow one to install a complete and working RHEV environment on a single server. The AIO plugin configure a local data center, cluster, storage and a local host.

This plugin is not supported by Red Hat to use in production case but it’s a welcome addition to ease demoing the RHEV platform. Sadly, I didn’t have much success with it. I had multiple crash and timeout during the plugin configuration that left my RHEV-M not working anymore. (Keep in mind that this plugin is still a proof of concept, oVirt is still working hard to make it work fine.)

So, the plan changed : I used a VM on my new laptop to install RHEV-M and I used my old laptop to install the RHEV hypervisor. One hour later, everything was working fine, I have a few VM running in my “data center”, etc. I also installed the Reports engine to RHEV-M (a big 10 minutes task!). The integration of reports into RHEV-M is absolutely awesome! I can right click on anything and launch a report that give me precious informations about my resources, workload, usage, etc.

The next step is to add a Gluster based storage to my data center and test the new storage live migration with it. I will post my experience with it as soon as I can!

Beware of the semaphores!

I just run into a weird problem today when I was in the process of migrating a bunch of servers from an old HP SAN to a shiny new EMC VMAX.

The client have chosen to use Powerpath for the multipathing software on RHEL 5. The servers in question run multiple Oracle databases into a grid configuration. We had no problem with the old SAN and we had no problem with the new EMC VMAX using dm-multipath.

The problem started when we first reboot after installing Powerpath. All the devices was there, all the mount point worked fine but Oracle refused to start with the following message :

ORA-27154: post/wait create failed
ORA-27300: OS system dependent operation:semget failed withstatus: 28
ORA-27301: OS failure message: No space left on device
ORA-27302: failure occurred at: sskgpsemsper

Hmm, interesting. After searching on Oracle Metalink, we found that this message is normally related to an insufficient number of available semaphores. But, all our other servers work fine and we have followed the Oracle recommendation when we set the number of semaphores initially.

Our systems are currently configured with 128 semaphores arrays as per the Oracle recommendation. Using “ipcs -u”, we found out that we already use the whole 128 available arrays, even before trying to start Oracle. With the “ipcs -s” command, we saw that the root user had a huge number of semaphores arrays, 125 to be exact. Why these systems have 125 semaphores arrays for the root user when our other systems have around 25-30?

Here come the Powerpath semaphores eater! If you use Powerpath in combination with a EMC VMAX SAN, Powerpath use one semaphore array per LUN, per path to that LUN. So, if you have 4 paths to the EMC VMAX and 25 LUNs presented to the server, 100 semaphores arrays automatically goes away on boot, leaving not enough for your other normal task.

This problem is easily fixed by changing the value in /etc/sysctl.conf on the kernel.sem line. The semaphores arrays limit is the last digit. You can view your current limit with the “ipcs -l” command. I plan to write a following post shortly on using SystemTap to get a diagnostic during the boot on what consume semaphores arrays.

You can see this post for reference if you have a valid Red Hat subscription : https://access.redhat.com/knowledge/solutions/23696

Or this article on Oracle Metalink with a valid subscription : 949468.1

SystemTap client/server tutorial

I just wrote a small tutorial about the client/server mode of SystemTap. It helps using SystemTap in production environment by using a dedicated server to compile all your SystemTap scripts.

The tutorial is not 100% completed, I will add the items in the TODO section as soon as I have some time to do so. If you spot any error or have any feedback, please feel free to comment here!

https://jfsaucier.wordpress.com/tutorials-and-howto/systemtap-server-and-client/