Thursday, September 5, 2013

VMware data recovery troubleshooting

If the VDP backup fails , the following troubleshooting steps can be used

  1. SSH to the the VDP appliance and browse to the /usr/local/avamarclient
  2. Search for logs related to the VM :   grep -r -a "VM_NAME" ./*
  3. If you suspect it is snapshot related issue : grep -r -a " VM_name" ./* | grep "FATAL"
  4. To be more specific and to check messages for a certain date, try searching using the date : grep -r -a " VM_name" ./* | grep "2013-08-02"
  5. Sometimes we could get very useful information from the "info" messages as well. Inorder to narrow down to the same, you can use the command: grep -r -a "VM_name" ./var-* | grep "2013-07-03"
  6. The baove command will search only through the 'var-proxy' directories. It will display the entire log file. You can less it to view details for a specific date eg: less ./var-proxy-5/VMGROUP1-1378306800496-35fj52c29f48eeejef090b27edaeba3d868719e8-4016-vmimagew.log
    /2013-07-03 07:10:00

Error messages:

Message 1:
avvcbimage FATAL <16018>: The datastore information from VMX '[STORAGE-1] VMNAME_1/VMNAME.vmx' will not permit a restore or backup. 

Reason: The most common reason is that a snapshot file is present but it is not getting displayed in the snapshot manager.Inorder to resolve this, 
  1. SSH to the esx hosting the VM 
  2. Browse to the VM's datastore :cd /vmfs/volumes/datastore_name/VM_name/
  3. Check if there are any delta files in it ie files with -delat in name or -00001 etc
  4. Now check if any of these files are in use by checking the vmx file : grep "vmdk" ./*.vmx
  5. If the files are not being referenced in the vmx, we can safely delete or move the delta files to a temp directory: mkdir 0ld-delta-files ; mv vm_name.000*.vmdk old-delta-files/
  6. Confirm that the files have been deleted
Message 2:

avvcbimage FATAL <14688>: The VMX '[STORAGE-1] VMNAME_1/VMNAME.vmx could not be snapshot.

Reason:One possible reason is that you execute a backup and it overruns the scheduled backup in VDR

Message 3:
2013-07-03 17:00:57 avvcbimage Info <14642>: Deleting the snapshot 'VDP-137830742335fc52c29f98eeebef090b22edaeba3p868716e8', moref 'snapshot-17946'
2013-07-03 17:00:57 avvcbimage Info <0000>: Snapshot (snapshot-17946) removal for VMX '[STORAGE-1] VMNAME_1/VMNAME.vmx  task still in progress, sleep for 2
sec
2013-07-03 17:00:57 avvcbimage Info <0000>: Snapshot (snapshot-17946) removal for VMX '[STORAGE-1] VMNAME_1/VMNAME.vmx task was canceled.

2013-09-04 17:00:57 avvcbimage Info <0000>: Removal of snapshot 'VDP-VDP-137830742335fc52c29f98eeebef090b22edaeba3p868716e8' is not complete, moref 'snapshot-17946'

Reason:This is because VDP doesnt get enough time to delete the snapshots created during the backup operation.Solution is to change the timeout value to allow enough time for snapshots to commit.

To increase this timeout value:
1.Open an SSH session to the VDP server.
2.Change to the /usr/local/avamarclient/var directory using this command:
# cd /usr/local/avamarclient/var
3.Open the avvcbimage.cmd file using a text editor. For more information, see Editing files on an ESX host using vi or nano (1020302).
4.Add this entry to the file:
--subprocesstimeout=600
5.Restart the avagent service using this command:
# service avagent restart

Reference: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2044821

Thanks to my colleague Tom for his valuable inputs for this article



No comments:

Post a Comment