您现在的位置是: 网站首页 >服务部署 服务部署

【smart】Linux配置SMART检测,避免硬盘故障导致数据丢失

admin2020年4月7日 10:46 Linux 2073人已围观

# 前言 当我们使用单盘时,如果数据没有备份,是非常危险的,指不定硬盘什么时候就坏了,不像Windows,可以在日常使用中能感觉到硬盘是否有问题。 Linux一般作为服务器,一般情况不会随时登录上去看,所以就需要一个自动检测的工具,在硬盘有故障时报警。 # 安装 smartmontools S.M.A.R.T. 是个用来检测硬盘健康状况的指标。 套件 `smartmontools` 包含了 `smartctl`, `smartd`,是个可以监控 ATA, SCSI 硬盘 (storage) SMART (Self-Monitoring, Analysis and Reporting Technology System) 状态的工具。我们可以透过它来进阶设定各种硬盘退化、错误警告的回报机制。 ```bash root@PxeCtrlSys:~# apt-get install smartmontools # 如果不能安装,就下载离线包安装 root@PxeCtrlSys:~# wget https://downloads.sourceforge.net/smartmontools/smartmontools-7.0.tar.gz root@PxeCtrlSys:~# ls smartmontools-7.0.tar.gz root@PxeCtrlSys:~# tar xzf smartmontools-7.0.tar.gz root@PxeCtrlSys:~# ls smartmontools-7.0 smartmontools-7.0.tar.gz root@PxeCtrlSys:~# cd smartmontools-7.0 root@PxeCtrlSys:~/smartmontools-7.0# ./configure configure: ----------------------------------------------------------------------------- smartmontools-7.0 configuration: host operating system: x86_64-pc-linux-gnu C++ compiler: g++ C compiler: gcc preprocessor flags: C++ compiler flags: -g -O2 -Wall -W -Wformat=2 -fstack-protector-strong C compiler flags: -g -O2 linker flags: OS specific modules: os_linux.o cciss.o dev_areca.o binary install path: /usr/local/sbin man page install path: /usr/local/share/man doc file install path: /usr/local/share/doc/smartmontools examples install path: /usr/local/share/doc/smartmontools/examplescripts drive database file: /usr/local/share/smartmontools/drivedb.h database update script: /usr/local/sbin/update-smart-drivedb database update branch: branches/RELEASE_7_0_DRIVEDB download tools: curl wget lynx svn GnuPG for verification: gpg local drive database: /usr/local/etc/smart_drivedb.h smartd config file: /usr/local/etc/smartd.conf smartd warning script: /usr/local/etc/smartd_warning.sh smartd plugin path: /usr/local/etc/smartd_warning.d PATH within scripts: /usr/local/bin:/usr/bin:/bin smartd initd script: [disabled] smartd save files: [disabled] smartd attribute logs: [disabled] SELinux support: no libcap-ng support: no systemd notify support: no NVMe DEVICESCAN: yes ----------------------------------------------------------------------------- configure: WARNING: The default for the inclusion of NVME devices in smartd.conf 'DEVICESCAN' and 'smartctl --scan' has been changed to 'yes' on this platform. If '--without-nvme-devicescan' is still needed, please inform smartmontools-support@listi.jpberlin.de. Use option '--with-nvme-devicescan' to suppress this warning. configure: WARNING: systemd(1) is used on this system but smartd systemd notify support will not be available because libsystemd-dev[el] package is not installed. Use option '--without-libsystemd' to suppress this warning. root@PxeCtrlSys:~/smartmontools-7.0# make install # 查看硬盘分区 root@PxeCtrlSys:~# fdisk -l Disk /dev/sda: 465.8 GiB, 500107862016 bytes, 976773168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: dos Disk identifier: 0xeb8c9d78 Device Boot Start End Sectors Size Id Type /dev/sda1 * 2048 960192511 960190464 457.9G 83 Linux /dev/sda2 960194558 976771071 16576514 7.9G 5 Extended /dev/sda5 960194560 976771071 16576512 7.9G 82 Linux swap / Solaris ``` # 查看硬盘所有健康信息-a或--all 获取磁盘信息总览,使用 `-a` 或 `--all` 选项来显示关于磁盘所有的 SMART 信息,`-x` 或 `--xall` 来显示所有关于磁盘的 SMART 信息以及非 SMART 信息。 ```bash root@PxeCtrlSys:~# smartctl -a /dev/sda smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.16.0-4-amd64] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org # 这一段可以使用 smartctl --info /dev/sda 命令获取 # 主要是显示硬盘的基本信息 === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.14 (AF) Device Model: ST500DM002-1BD142 Serial Number: W2AB2D8M LU WWN Device Id: 5 000c50 0494f78ba Firmware Version: KC45 User Capacity: 500,107,862,016 bytes [500 GB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 4 SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Sat Sep 21 16:21:12 2019 CST SMART support is: Available - device has SMART capability. SMART support is: Enabled # 硬盘信息结束 # 这一段可以使用:smartctl -A /dev/sda 获取 # "READ SMART DATA"部分显示出硬盘的整体健康状况。这个测试的结果是PASSED或FAILED。后者表示即将出现硬件故障,所以需要开始备份这块磁盘上的重要数据! === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 600) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 83) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 106 099 006 Pre-fail Always - 12286208 3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 908 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 087 060 030 Pre-fail Always - 660478633 9 Power_On_Hours 0x0032 066 066 000 Old_age Always - 29879 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 906 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 0 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 065 051 045 Old_age Always - 35 (Min/Max 33/38) 194 Temperature_Celsius 0x0022 035 049 000 Old_age Always - 35 (0 7 0 0 0) 195 Hardware_ECC_Recovered 0x001a 016 015 000 Old_age Always - 12286208 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 29880h+46m+19.184s 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 161062691 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 3307571432 # SMART信息结束 # 基本上,SMART属性表列出了制造商在硬盘中定义好的属性值,以及这些属性相关的故障阈值。这个表由驱动固件自动生成和更新。 # ID:属性ID,通常是一个1到255之间的十进制或十六进制的数字。 # ATTRIBUTE_NAME:硬盘制造商定义的属性名。 # FLAG:属性操作标志(可以忽略)。 # VALUE:这是表格中最重要的信息之一,代表给定属性的标准化值,在1到253之间。253意味着最好情况,1意味着最坏情况。取决于属性和制造商,初始化VALUE可以被设置成100或200. # WORST:所记录的最小VALUE。 # THRESH:在报告硬盘FAILED状态前,WORST可以允许的最小值。 # TYPE:属性的类型(Pre-fail或Oldage)。Pre-fail类型的属性可被看成一个关键属性,表示参与磁盘的整体SMART健康评估(PASSED/FAILED)。如果任何Pre-fail类型的属性故障,那么可视为磁盘将要发生故障。另一方面,Oldage类型的属性可被看成一个非关键的属性(如正常的磁盘磨损),表示不会使磁盘本身发生故障。 # UPDATED:表示属性的更新频率。Offline代表磁盘上执行离线测试的时间。 # WHEN_FAILED:如果VALUE小于等于THRESH,会被设置成“FAILING_NOW”;如果WORST小于等于THRESH会被设置成“In_the_past”;如果都不是,会被设置成“-”。在“FAILING_NOW”情况下,需要尽快备份重要文件,特别是属性是Pre-fail类型时。“In_the_past”代表属性已经故障了,但在运行测试的时候没问题。“-”代表这个属性从没故障过。 # RAW_VALUE:制造商定义的原始值,从VALUE派生。 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. ``` # 开启SMART功能 如果上面`SMART support is: Enabled`的值为`Disabled`表示没有启动,可以通过下面命令开启。 ```bash root@PxeCtrlSys:~# smartctl -s on /dev/sda smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.16.0-4-amd64] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org === START OF ENABLE/DISABLE COMMANDS SECTION === SMART Enabled. ``` # 查看使用手册或帮助 ```bash man smartctl # 查看使用手册 smartctl -h # 查看帮助 ``` # 测试硬盘健康状态-H或--health 持 SMART 后我们可以使用 `-H` 或 `--health` 参数来手动检查硬盘、随身硬盘的建康状态。 ```bash root@PxeCtrlSys:~# smartctl -H /dev/sda smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.16.0-4-amd64] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED ``` 如果是`FAILED`,会在下方显示哪个值有问题。 # 手工检测是硬盘-t后台或-C前台 ```bash smartctl -t short <device> # 后台检测硬盘,消耗时间短 smartctl -t long <device> # 后台检测硬盘,消耗时间长 smartctl -C -t short <device> # 前台检测硬盘,消耗时间短 smartctl -C -t long <device> # 前台检测硬盘,消耗时间长 smartctl -X /dev/hdb <device> # 如果是后台检测,可以使用这个命令终止硬盘检查 ``` # 查看硬件检查日志-l 使用“`smartctl -l logtype <device>`”可以查看硬盘的日志,日志又分为多种类型,如`selftest`、`error`等等。例如查看硬盘检测的日志 ```bash smartctl -l selftest /dev/sda # 查看硬件检测日志 smartctl -l error /dev/sda # 查看硬盘错误日志 ``` # 服务运行smartd(未完成) 配置文件保存路径 ```bash root@PxeCtrlSys:/usr/local/etc# pwd /usr/local/etc root@PxeCtrlSys:/usr/local/etc# ls smartd.conf smartd_warning.d smartd_warning.sh root@PxeCtrlSys:/etc/default# whereis smartd smartd: /usr/local/sbin/smartd /usr/local/etc/smartd.conf root@PxeCtrlSys:/etc/default# whereis smartctl smartctl: /usr/local/sbin/smartctl ``` 查看smartd使用方法 ```bash root@PxeCtrlSys:~# /usr/local/sbin/smartd -h smartd 7.0 2018-12-30 r4883 [x86_64-linux-3.16.0-4-amd64] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org Usage: smartd [options] -A PREFIX, --attributelog=PREFIX Log ATA attribute information to {PREFIX}MODEL-SERIAL.ata.csv -B [+]FILE, --drivedb=[+]FILE Read and replace [add] drive database from FILE [default is +/usr/local/etc/smart_drivedb.h and then /usr/local/share/smartmontools/drivedb.h] -c NAME|-, --configfile=NAME|- Read configuration file NAME or stdin [default is /usr/local/etc/smartd.conf] -d, --debug Start smartd in debug mode -D, --showdirectives Print the configuration file Directives and exit -h, --help, --usage Display this help and exit -i N, --interval=N Set interval between disk checks to N seconds, where N >= 10 -l local[0-7], --logfacility=local[0-7] Use syslog facility local0 - local7 or daemon [default] -n, --no-fork Do not fork into background -p NAME, --pidfile=NAME Write PID file NAME -q WHEN, --quit=WHEN Quit on one of: nodev, errors, nodevstartup, never, onecheck, showtests -r, --report=TYPE Report transactions for one of: ioctl[,N], ataioctl[,N], scsiioctl[,N], nvmeioctl[,N] -s PREFIX, --savestates=PREFIX Save disk states to {PREFIX}MODEL-SERIAL.TYPE.state -w NAME, --warnexec=NAME Run executable NAME on warnings [default is /usr/local/etc/smartd_warning.sh] -V, --version, --license, --copyright Print License, Copyright, and version information ```

很赞哦! (3)

文章交流

  • emoji
0人参与,0条评论

当前用户

未登录,点击   登录

站点信息

  • 建站时间:网站已运行2076天
  • 系统信息:Linux
  • 后台程序:Python: 3.8.10
  • 网站框架:Django: 3.2.6
  • 文章统计:256 篇
  • 文章评论:60 条
  • 腾讯分析网站概况-腾讯分析
  • 百度统计网站概况-百度统计
  • 公众号:微信扫描二维码,关注我们
  • QQ群:QQ加群,下载网站的学习源码
返回
顶部
标题 换行 登录
网站