VMware Datastores Disappear After Rescan

VMware

We had a very odd situation occur on one of our VMware ESXi 7.0.3 clusters (VMware ESXi, 7.0.3, 20328353), which is using iSCSI with HPE Nimble Storage.

Some new storage was presented to the hosts via iSCSI. The first host was scanned, the datastore prepared as normal. Once complete, then started to perform a re-scan of the rest of the hosts within the cluster one by one.

All seemed fine until all of a sudden we started to get reports that some VMs had failed, we then started to investigate and found that 4 datastores (which already existed) had suddenly disappeared after the rescan. There was no reason for them to have disappeared, they were presented as normal to the hosts and had been in use for months.

The only thing that differentiated these four datastores, that were actually delivered from two seperate storage arrays is they are all “encrypted” datastores on the Nimble Storage side.

We reviewed the /var/log/vmkernel.log for each host at about the time and found that on one of the hosts the following, I’ve grepped to make it clearer:

[root@host7:/vmfs/volumes] cat  /var/log/vmkernel.log | grep "detected to be a snapshot"
2023-02-15T09:00:47.847Z cpu31:2097699)LVM: 11764: Device eui.7bb4db6152049ebc6c9ce9001bad5ab8:1 detected to be a snapshot:
2023-02-15T09:00:47.882Z cpu1:2097699)LVM: 11764: Device eui.08dfa58320101b306c9ce9001bad5ab8:1 detected to be a snapshot:
2023-02-15T09:00:50.035Z cpu57:2101141 opID=801b49c6)LVM: 11764: Device eui.cfb315ebff4717396c9ce9002f8a0e18:1 detected to be a snapshot:
2023-02-15T09:00:50.059Z cpu57:2101141 opID=801b49c6)LVM: 11764: Device eui.7bb4db6152049ebc6c9ce9001bad5ab8:1 detected to be a snapshot:
2023-02-15T09:00:50.064Z cpu57:2101141 opID=801b49c6)LVM: 11764: Device eui.08dfa58320101b306c9ce9001bad5ab8:1 detected to be a snapshot:
2023-02-15T09:00:50.084Z cpu57:2101141 opID=801b49c6)LVM: 11764: Device eui.fe7013f45e3cb0216c9ce9002f8a0e18:1 detected to be a snapshot:
2023-02-15T09:00:50.035Z cpu57:2101141 opID=801b49c6)LVM: 11764: Device eui.cfb315ebff4717396c9ce9002f8a0e18:1 detected to be a snapshot:
2023-02-15T12:51:33.969Z cpu27:2097699)LVM: 11764: Device eui.cfb315ebff4717396c9ce9002f8a0e18:1 detected to be a snapshot:
2023-02-15T12:51:33.981Z cpu27:2097699)LVM: 11764: Device eui.fe7013f45e3cb0216c9ce9002f8a0e18:1 detected to be a snapshot:
2023-02-15T12:51:34.081Z cpu27:2097699)LVM: 11764: Device eui.7bb4db6152049ebc6c9ce9001bad5ab8:1 detected to be a snapshot:
2023-02-15T12:51:34.092Z cpu27:2097699)LVM: 11764: Device eui.08dfa58320101b306c9ce9001bad5ab8:1 detected to be a snapshot:
2023-02-15T12:51:37.947Z cpu48:2098732)LVM: 11764: Device eui.fe7013f45e3cb0216c9ce9002f8a0e18:1 detected to be a snapshot:
2023-02-15T12:51:37.958Z cpu48:2098732)LVM: 11764: Device eui.08dfa58320101b306c9ce9001bad5ab8:1 detected to be a snapshot:
2023-02-15T12:51:37.964Z cpu48:2098732)LVM: 11764: Device eui.7bb4db6152049ebc6c9ce9001bad5ab8:1 detected to be a snapshot:
2023-02-15T12:51:37.982Z cpu48:2098732)LVM: 11764: Device eui.cfb315ebff4717396c9ce9002f8a0e18:1 detected to be a snapshot:
2023-02-15T12:51:41.070Z cpu46:2097805)LVM: 11764: Device eui.fe7013f45e3cb0216c9ce9002f8a0e18:1 detected to be a snapshot:
2023-02-15T12:51:41.090Z cpu46:2097805)LVM: 11764: Device eui.7bb4db6152049ebc6c9ce9001bad5ab8:1 detected to be a snapshot:
2023-02-15T12:51:41.093Z cpu46:2097805)LVM: 11764: Device eui.08dfa58320101b306c9ce9001bad5ab8:1 detected to be a snapshot:
2023-02-15T12:51:41.105Z cpu46:2097805)LVM: 11764: Device eui.cfb315ebff4717396c9ce9002f8a0e18:1 detected to be a snapshot:
2023-02-15T12:51:45.774Z cpu17:2098972)LVM: 11764: Device eui.cfb315ebff4717396c9ce9002f8a0e18:1 detected to be a snapshot:
2023-02-15T12:51:45.775Z cpu17:2098972)LVM: 11764: Device eui.fe7013f45e3cb0216c9ce9002f8a0e18:1 detected to be a snapshot:
2023-02-15T12:51:45.781Z cpu17:2098972)LVM: 11764: Device eui.08dfa58320101b306c9ce9001bad5ab8:1 detected to be a snapshot:
2023-02-15T12:51:45.788Z cpu17:2098972)LVM: 11764: Device eui.7bb4db6152049ebc6c9ce9001bad5ab8:1 detected to be a snapshot:
2023-02-15T12:51:46.154Z cpu71:2099094)LVM: 11764: Device eui.cfb315ebff4717396c9ce9002f8a0e18:1 detected to be a snapshot:
2023-02-15T12:51:46.168Z cpu71:2099094)LVM: 11764: Device eui.08dfa58320101b306c9ce9001bad5ab8:1 detected to be a snapshot:
2023-02-15T12:51:46.171Z cpu71:2099094)LVM: 11764: Device eui.7bb4db6152049ebc6c9ce9001bad5ab8:1 detected to be a snapshot:
2023-02-15T12:51:46.193Z cpu71:2099094)LVM: 11764: Device eui.fe7013f45e3cb0216c9ce9002f8a0e18:1 detected to be a snapshot:
2023-02-15T12:51:49.307Z cpu30:2100284)LVM: 11764: Device eui.08dfa58320101b306c9ce9001bad5ab8:1 detected to be a snapshot:
2023-02-15T12:51:49.328Z cpu30:2100284)LVM: 11764: Device eui.7bb4db6152049ebc6c9ce9001bad5ab8:1 detected to be a snapshot:
2023-02-15T12:51:49.396Z cpu5:2100284)LVM: 11764: Device eui.cfb315ebff4717396c9ce9002f8a0e18:1 detected to be a snapshot:
2023-02-15T12:51:49.405Z cpu5:2100284)LVM: 11764: Device eui.fe7013f45e3cb0216c9ce9002f8a0e18:1 detected to be a snapshot:

so:
2023-02-15T12:51:33.969Z cpu27:2097699)LVM: 11764: Device eui.cfb315ebff4717396c9ce9002f8a0e18:1 detected to be a snapshot:
2023-02-15T12:51:33.981Z cpu27:2097699)LVM: 11764: Device eui.fe7013f45e3cb0216c9ce9002f8a0e18:1 detected to be a snapshot:
2023-02-15T12:51:34.081Z cpu27:2097699)LVM: 11764: Device eui.7bb4db6152049ebc6c9ce9001bad5ab8:1 detected to be a snapshot:
2023-02-15T12:51:34.092Z cpu27:2097699)LVM: 11764: Device eui.08dfa58320101b306c9ce9001bad5ab8:1 detected to be a snapshot:

So it turns out that the EUI numbers are the 4 datastores that went offline. Although we are still working to try to determine what happened our best working hypothesis is that one of the hosts (host 7, i.e. the one above incorrectly decided that during a rescan these four devices were actually snapshots, so then proceeded to re-signature them. From the point of view of all the other hosts using this storage within the cluster those too would have then lost that storage because of the re-signaturing causing the VMs to go offline. Thus far we have no confirmation from VMware on this.

https://kb.vmware.com/s/article/1011387

Additional Information

https://kb.vmware.com/s/article/1011387

1 thought on “VMware Datastores Disappear After Rescan

  1. A rescan does not mount a datastore. “Force mounted” means a datastore which is a snapshot of another datastore was mounted without being re-signatured. So it has the same signature as the original datastore.

    You can run this to do a quick check on each host for force mounted volumes:

    esxcfg-info -a | grep “Force Mounted”

Leave a Reply

Your email address will not be published. Required fields are marked *