ZFS dataset丢失排查

服务器重启后无法显示zfs rpool

#zfs list
no datasets available

显示zfs模块已经加载

#lsmod | grep zfs
zfs                  1230460  3
zunicode              331251  1 zfs
zavl                   15010  1 zfs
zcommon                51321  1 zfs
znvpair                93262  2 zfs,zcommon
spl                   290129  5 zfs,zavl,zunicode,zcommon,znvpair

系统日志中显示zfs的rpool不能导入:

Jul 18 11:15:53 testtfs-1-1 zpool: cannot import 'rpool': one or more devices are already in use
Jul 18 11:15:53 testtfs-1-1 systemd: zfs-import-cache.service: main process exited, code=exited, status=1/FAILURE
Jul 18 11:15:53 testtfs-1-1 systemd: Failed to start Import ZFS pools by cache file.
Jul 18 11:15:53 testtfs-1-1 systemd: Unit zfs-import-cache.service entered failed state.
Jul 18 11:15:53 testtfs-1-1 systemd: Starting Mount ZFS filesystems...
...
Jul 18 11:15:53 testtfs-1-1 systemd: Started Mount ZFS filesystems.
Jul 18 11:15:53 testtfs-1-1 systemd: Mounted NFSD configuration filesystem.
Jul 18 11:15:54 testtfs-1-1 multipathd: sda: add path (uevent)
Jul 18 11:15:54 testtfs-1-1 multipathd: sda: spurious uevent, path already in pathvec
Jul 18 11:15:54 testtfs-1-1 multipathd: sda: No SAS end device for 'end_device-0:0'
Jul 18 11:15:54 testtfs-1-1 kernel: device-mapper: table: 253:21: multipath: error getting device
Jul 18 11:15:54 testtfs-1-1 kernel: device-mapper: ioctl: error adding target to table
Jul 18 11:15:54 testtfs-1-1 multipathd: HGST_HUS724020ALA640_PN2134P6HKEADP: failed in domap for addition of new path sda
Jul 18 11:15:54 testtfs-1-1 multipathd: uevent trigger error

对比了正常服务器

异常服务器再次重启,这次启动以后能够看到zfs的卷,但是发现docker目录下空

检查系统日志

在这个日志前有

检查ZFS文件系统

  • 检查存储池

  • 检查数据一致性:先发起一个存储池所有数据的explicit scrubbing,然后检查状态

对比正常的服务器节点

可以看到这个服务器没有正常挂载zfs,该服务器挂载显示如下

由于zfs卷 rpool/docerk 挂载失败,显示目录中有存在文件,所以尝试先移除/var/lib/docker目录然后挂载

这样完成挂载数据恢复成功。

详细磁盘故障问题排查,参考ZFS故障磁盘替换

参考

Last updated

Was this helpful?