Welcome to a Danish Virtualization blog! Thoughts, comments and tips and tricks on Virtualization topics are provided to you by Heino Skov and Nicolai Sandager.
The Virtual Troll
A virtualization blog!
On this blog we will post comments, thoughts, ideas, tips and tricks around virtualization topics. We may also discuss other topics and we hope you will enjoy it and feel free to leave a comment.
ESXi and HA….
For the last year I have been working with building a VMware enterprise environment for a customer planned to hold approx 700 guests.
By the customers wish we went with ESXi. A decision I now, to some extent, regret.
One of the things that have been the most annoying is the implementation of HA on ESXi. For stable performance of the HA agent, it is nessary to implement some swap space for the agent as described in http://kb.vmware.com/kb/1004177
If you look at the best practices for deploying VMware 3.5, it is not recommended to have your swapspace on SAN, due to the potentiel increased I/O it can generate. In my world this applies to ESXi too…..but would that not ruin the idea of having a server without disks booting from a embedded image?
We have chosen to implement local disks just for swap use. This is, in my opinion, not a very elegant solution. It would have been nice to see the servers delivered with ESXi Embedded with some more space on the chips holding the ESXi, and then swapping on that space.
That solved we ran into more problems with HA. We had some issues with hosts suddently being disconnected in the vCenter interface. All guests kept running, but the management part of the host was inaccessable. This error looked very much like errors i had seen before with logs filling up the / on regular ESX. And correct…it was.
The HA logs are actually located in /var/log/vmware/aam/agent which was not on it’s partition. Therefore this classic error.
The solution is to log into the hidden busybox (press ALT-F1 on the ESXi console and type unsupported), log in with the root password and delete the files using, for example,
find /var/log/vmware/aam/rule -name ‘*.log’ -exec rm {} \;
and restart the management agents using “services.sh restart”
This is a known issue with VMware and will be addressed in Update 4 which we, at the moment, do not have a release date on.
All in all this seriously gives me thought on reinstalling the whole environment to regular ESX 3.5
Feel free to leave a comment. Thanks in advance. Regards Heino.