I was doing some stuff on Woodcrest’s server last night, and I noticed that Walnut was offline, so I looked into it, and I found a couple of problems, potentially major. The connectivity to the SAN was broken, and the server, upon reboot, would report 2 issues:
1- That DIMM 5 has experienced a failure, and the server would be operating without it. When I finally was able to reboot the server successfully though, Hypervisor didn’t seem to see any missing memory, so we need to maybe place a call to Dell to see what this message means. Here’s a screenshot of what came up when I was booting up the server:
2- The other issue was with the virtual drives created from the RAIDs. Apparently, it saw the iSCSI connection even before it went to the Hypervisor, and apparently, the failure of the drive, even though it didn’t cause data loss, I think vCenter didn’t like it. After I figured out how to get around that error that I got within bootup, vCenter is reporting that Drive 5 is failed. Here’s a screenshot of the drive failure:
After I went back to vCenter, I saw that both of the errors weren’t being reported anymore, but can be found in the event history.
As I checked more of the servers, I found that 306-dc1 was not powering on. It is at this point that I realized that one of the drives is missing from the 306-vmhost1. that drive is the backup drive. Unfortunately, we don’t have this one in any sort of redundant or RAIDed configuration, so we lost the backup data on it. We need to get this replaced ASAP in order to resume backups.
I needed to figure out a solution to be able to boot up 306-dc1 without the missing virtual drive. Reconfiguring/removing the drive from within the virtual machine’s GUI settings always failed with an “Invalid Device 0” error. So I had to resort to manually creating a VMX file without the SCSI0:2 configuration in it. In order to do that, I logged in to 306-vmhost1 through ssh, then, I went to: /vmfs/volumes/306-VMStore1-Raid5/306-dc1 in there, I made a copy of the vmx file as a backup:
cp 306-dc1.vmx 306-dc1.vmx.bak
then I edited the vmx: vi 306-dc1.vmx
and remove the following lines from it:
scsi0:2.present = "TRUE"
scsi0:2.fileName = "/vmfs/volumes/4a12862d-8b7b3294-df15-002219a47b1e/306-dc1/30
scsi0:2.mode = "independent-persistent"
scsi0:2.deviceType = "scsi-hardDisk"
scsi0:2.redo = ""
Doing this removed the 2nd drive from the 306-dc1 vm, and finally allowed it to boot up successfully. We need to place a call ASAP to Dell to get a replacement for that drive so that we can bring back the backup drive. Once that is in place, we can recreate a new VMDK on that drive and use it as the backup drive. (restoring the backed up vmx will probably not work, as the associated VMDKs won’t exist on the new drive.
- This work was performed on 01/24/2010 @ 22:45
No comments:
Post a Comment
Please make your comment. (GMK)
Note: Only a member of this blog may post a comment.