ZFS dataset丢失排查
服务器重启后无法显示zfs rpool
#zfs list
no datasets available显示zfs模块已经加载
#lsmod | grep zfs
zfs 1230460 3
zunicode 331251 1 zfs
zavl 15010 1 zfs
zcommon 51321 1 zfs
znvpair 93262 2 zfs,zcommon
spl 290129 5 zfs,zavl,zunicode,zcommon,znvpair系统日志中显示zfs的rpool不能导入:
Jul 18 11:15:53 testtfs-1-1 zpool: cannot import 'rpool': one or more devices are already in use
Jul 18 11:15:53 testtfs-1-1 systemd: zfs-import-cache.service: main process exited, code=exited, status=1/FAILURE
Jul 18 11:15:53 testtfs-1-1 systemd: Failed to start Import ZFS pools by cache file.
Jul 18 11:15:53 testtfs-1-1 systemd: Unit zfs-import-cache.service entered failed state.
Jul 18 11:15:53 testtfs-1-1 systemd: Starting Mount ZFS filesystems...
...
Jul 18 11:15:53 testtfs-1-1 systemd: Started Mount ZFS filesystems.
Jul 18 11:15:53 testtfs-1-1 systemd: Mounted NFSD configuration filesystem.
Jul 18 11:15:54 testtfs-1-1 multipathd: sda: add path (uevent)
Jul 18 11:15:54 testtfs-1-1 multipathd: sda: spurious uevent, path already in pathvec
Jul 18 11:15:54 testtfs-1-1 multipathd: sda: No SAS end device for 'end_device-0:0'
Jul 18 11:15:54 testtfs-1-1 kernel: device-mapper: table: 253:21: multipath: error getting device
Jul 18 11:15:54 testtfs-1-1 kernel: device-mapper: ioctl: error adding target to table
Jul 18 11:15:54 testtfs-1-1 multipathd: HGST_HUS724020ALA640_PN2134P6HKEADP: failed in domap for addition of new path sda
Jul 18 11:15:54 testtfs-1-1 multipathd: uevent trigger error对比了正常服务器
异常服务器再次重启,这次启动以后能够看到zfs的卷,但是发现docker目录下空
检查系统日志
在这个日志前有
检查ZFS文件系统
检查存储池
检查数据一致性:先发起一个存储池所有数据的explicit scrubbing,然后检查状态
对比正常的服务器节点
可以看到这个服务器没有正常挂载zfs,该服务器挂载显示如下
由于zfs卷 rpool/docerk 挂载失败,显示目录中有存在文件,所以尝试先移除/var/lib/docker目录然后挂载
这样完成挂载数据恢复成功。
详细磁盘故障问题排查,参考ZFS故障磁盘替换
参考
Last updated
Was this helpful?