474 字
2 分钟
ProxmoxVE ( smartd ) 单独忽略某块硬盘的 SMART 报警
背景
PVE 的系统盘是三块 NVMe 组的 mirror,其中两块寿命已经到期,但还能正常使用,暂时不打算更换。smartd 每天会发邮件提醒:
Device: /dev/nvme3, Critical Warning (0x04): ReliabilityDevice info: SAMSUNG MZ9LQ128HBHQ-000H1, ...需求是单独屏蔽这块盘的报警,其他盘继续监控。
配置方法
默认在/etc/smartd.conf中,会检测所有设备
DEVICESCAN -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner感谢Arch手册smartd.conf 要排除其中一块,有两个关键点:
-d ignore指令可以让 smartd 跳过指定设备。- 写在 DEVICESCAN 前面的配置会优先生效。
所以在 DEVICESCAN 上面加一行即可:
/dev/nvme3 -d ignoreDEVICESCAN -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner使用smartd -q onecheck 2>&1 | grep -i monitor检查看到/dev/nvme3被排除了就没问题。
root@serverhub:~# smartd -q onecheck 2>&1 | grep -i monitorDevice: /dev/nvme0, is SMART capable. Adding to "monitor" list.Device: /dev/nvme1, is SMART capable. Adding to "monitor" list.补充
大家都知道/dev/nvme3这样的写法是阴间的,因为设备变动导致在开机时重算会生成不同的id,应该使用/dev/disk/by-xxx/xxx的方式。
但是,/dev/disk/by-id/下的设备没有/dev/nvme3然而用/dev/nvme3n1这样带namespace的就不行。
root@serverhub:~# ls -l /dev/disk/by-id/total 0lrwxrwxrwx 1 root root 13 Jan 8 18:26 nvme-SAMSUNG_MZ9LQ128HBHQ-000H1_S5MJNX0R448741 -> ../../nvme3n1lrwxrwxrwx 1 root root 13 Jan 8 18:26 nvme-SAMSUNG_MZ9LQ128HBHQ-000H1_S5MJNX0R448741_1 -> ../../nvme3n1lrwxrwxrwx 1 root root 15 Jan 8 18:26 nvme-SAMSUNG_MZ9LQ128HBHQ-000H1_S5MJNX0R448741_1-part1 -> ../../nvme3n1p1lrwxrwxrwx 1 root root 15 Jan 8 18:26 nvme-SAMSUNG_MZ9LQ128HBHQ-000H1_S5MJNX0R448741_1-part2 -> ../../nvme3n1p2lrwxrwxrwx 1 root root 15 Jan 8 18:26 nvme-SAMSUNG_MZ9LQ128HBHQ-000H1_S5MJNX0R448741_1-part3 -> ../../nvme3n1p3lrwxrwxrwx 1 root root 15 Jan 8 18:26 nvme-SAMSUNG_MZ9LQ128HBHQ-000H1_S5MJNX0R448741-part1 -> ../../nvme3n1p1lrwxrwxrwx 1 root root 15 Jan 8 18:26 nvme-SAMSUNG_MZ9LQ128HBHQ-000H1_S5MJNX0R448741-part2 -> ../../nvme3n1p2lrwxrwxrwx 1 root root 15 Jan 8 18:26 nvme-SAMSUNG_MZ9LQ128HBHQ-000H1_S5MJNX0R448741-part3 -> ../../nvme3n1p3lrwxrwxrwx 1 root root 13 Jan 8 18:26 nvme-SAMSUNG_MZ9LQ128HBHQ-000H1_S5MJNX0R448748 -> ../../nvme0n1lrwxrwxrwx 1 root root 13 Jan 8 18:26 nvme-SAMSUNG_MZ9LQ128HBHQ-000H1_S5MJNX0R448748_1 -> ../../nvme0n1lrwxrwxrwx 1 root root 15 Jan 8 18:26 nvme-SAMSUNG_MZ9LQ128HBHQ-000H1_S5MJNX0R448748_1-part1 -> ../../nvme0n1p1lrwxrwxrwx 1 root root 15 Jan 8 18:26 nvme-SAMSUNG_MZ9LQ128HBHQ-000H1_S5MJNX0R448748_1-part2 -> ../../nvme0n1p2lrwxrwxrwx 1 root root 15 Jan 8 18:26 nvme-SAMSUNG_MZ9LQ128HBHQ-000H1_S5MJNX0R448748_1-part3 -> ../../nvme0n1p3lrwxrwxrwx 1 root root 15 Jan 8 18:26 nvme-SAMSUNG_MZ9LQ128HBHQ-000H1_S5MJNX0R448748-part1 -> ../../nvme0n1p1lrwxrwxrwx 1 root root 15 Jan 8 18:26 nvme-SAMSUNG_MZ9LQ128HBHQ-000H1_S5MJNX0R448748-part2 -> ../../nvme0n1p2lrwxrwxrwx 1 root root 15 Jan 8 18:26 nvme-SAMSUNG_MZ9LQ128HBHQ-000H1_S5MJNX0R448748-part3 -> ../../nvme0n1p3 ProxmoxVE ( smartd ) 单独忽略某块硬盘的 SMART 报警
https://www.homelabproject.cc/posts/proxmox/proxmoxve--smartd--单独忽略某块硬盘的-smart-报警/