Search


print PDF Ireland


SmartMon Tool


Introduction

The Real Time Monitoring allows OVH to supervise your server in real time and use S.M.A.R.T to collect the most important information of your Hard Disk. Your server sends regularly information thanks to a cron task into our monitoring interface. According to the detected problem on the server, we can quickly fix it.

SmartMon Tools is a tool of hard disk analysis. It allows to check the most critical physical characteristics. It's in two parts:

  • The smartd daemon, which check, every 30 minutes, the settings by writing the result into /var/log/messages.
  • The smartctl command is a root command which allows to display all information


Process


SmartMon Tool activation / installation

Connect you as root on your server. This is important that you carry out the connection directly as root. If you connect as a simple user and next you launch the "sudo" command, the installation will fail, smartctl won't run correctly. Once connected, apply the last release by patching your server:


[root@delirium root]# wget ftp://ftp.ovh.net/made-in-ovh/release/patch-all.sh -O patch-all.sh; sh patch-all.sh
Connection to ftp.ovh.net:21...Connect!
Session starting under Anonymous...Established session!
==> SYST ... complete. ==> PWD ... complete.
==> TYPE I ... complete. ==> CWD /made-in-ovh/release/1.58-1.59 ... complete.
==> PASV ... complete. ==> LIST ... complete.
0K @ 84.96 KB/s
10:48:32 (84.96 KB/s) - `.listing' backup [87]
`.listing' deleted.
--10:48:32-- ftp://ftp.ovh.net/made-in-ovh/release/1.58-1.59/smartmontools-5.33-1.i386.rpm
=> `smartmontools-5.33-1.i386.rpm'
==> CWD isn't required.
==> PASV ... complete. ==> RETR smartmontools-5.33-1.i386.rpm ... complete.
Lenght: 342,512
0K .......... .......... .......... .......... .......... 14% @ 5.43 MB/s
50K .......... .......... .......... .......... .......... 29% @ 6.98 MB/s
100K .......... .......... .......... .......... .......... 44% @ 8.14 MB/s
150K .......... .......... .......... .......... .......... 59% @ 8.14 MB/s
200K .......... .......... .......... .......... .......... 74% @ 8.14 MB/s
250K .......... .......... .......... .......... .......... 89% @ 8.14 MB/s
300K .......... .......... .......... .... 100% @ 16.84 MB/s
10:48:32 (7.60 MB/s) - `smartmontools-5.33-1.i386.rpm' backup [342512]
End --10:48:32--
Download: 342,512 bits in 1 file
Preparing... ########################################### [100%]
1:smartmontools ########################################### [100%]
Shutting down smartd: [ OK ]
Starting smartd: [ OK ]
Restarted smartd services
smartd will continue to start up on system boot
Shutting down smartd: [ OK ]
Starting smartd: [ OK ]

If all the process run correctly, you will therefore see the following information: Use smartctl -h to get a usage summary. Smart is activated and added in the cron task.


The smartd daemon

Now, Smartd will check regularly the hard disk information, and it will transmit into our RTM server and finally write then in your logs:


[root@delirium /]# cat /var/log/messages | grep smartd
Mar 17 10:48:34 delirium smartd[990]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
Mar 17 10:48:34 delirium smartd[990]: Device: /dev/hda, opened
Mar 17 10:48:34 delirium smartd[990]: Device: /dev/hda, found in smartd database.
Mar 17 10:48:35 delirium smartd[990]: Device: /dev/hda, is SMART capable. Adding to "monitor" list.
Mar 17 10:48:35 delirium smartd[990]: Device: /dev/hdb, opened
Mar 17 10:48:35 delirium smartd[990]: Device: /dev/hdb, not ATA, no IDENTIFY DEVICE Structure
Mar 17 10:48:35 delirium smartd[990]: Monitoring 1 ATA and 0 SCSI devices
Mar 17 10:48:35 delirium smartd: Lancement smartd succeeded
Mar 17 10:48:35 delirium smartd[2421]]: smartd has fork()ed into background mode. New PID=2421.
Mar 17 13:48:35 delirium smartd[2421]: Device: /dev/hda, SMART Prefailure Attribute: 8 Seek_Time_Performance? changed from 246 to 247
Mar 17 15:48:35 delirium smartd[2421]: Device: /dev/hda, SMART Prefailure Attribute: 8 Seek_Time_Performance? changed from 247 to 246
Mar 17 17:18:35 delirium smartd[2421]: Device: /dev/hda, SMART Prefailure Attribute: 8 Seek_Time_Performance? changed from 246 to 247

How to unterstand this lines? The hard disk shows a constant value which evolved between 246 and 247. If the value changes brutally from 247 to 500, it's a unusual behaviour. In the next chapter, we will give you more details about these values.


Tips

You can receive a mail with the most relevant information in adding or editing a line in /etc/smartd.conf


[root@delirium /]# pico /etc/smartd.conf
# A very silent check. Only report SMART health status if it fails
# But send an email in this case
# /dev/hdc -H -m admin@example.com

We can add our address in order to receive mail: /dev/hda -H -m your@mailaddress.com


Smartctl and the interpretation


As indicated at the begining of this guide, the use of smartctl command must be done with root rights. Let's have a look at the different characteristics of this command. The following commands are basic commands of smartctl.


[root@delirium /]# smartctl -h
smartctl version 5.33 [i386-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
Usage: smartctl [options] device

* h, --help, --usage
Display this help and exit
* i, --info
Show identity information for device
* a, --all
Show all SMART information for device

The command has the following format : [root@delirium /]# smartctl -i /dev/hda and give you the following result:


===START OF INFORMATION SECTION ===
Device Model: Maxtor 6E040L0
Serial Number: E1KTPXFE
Firmware Version: NAR61590
User Capacity: 41,110,142,976 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0
Local Time is: Thu Mar 17 22:21:52 2005 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

The use of characteristic -a will display all the information that the tools will be able to collect:
[root@delirium /]# smartctl -a /dev/hda give yout the following result


===START OF READ SMART DATA SECTION===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (1021) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 17) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
3 Spin_Up_Time? 0x0027 252 252 063 Pre-fail Always - 2463
4 Start_Stop_Count? 0x0032 253 253 000 Old_age Always - 18
5 Reallocated_Sector_Ct? 0x0033 253 253 063 Pre-fail Always - 0
6 Read_Channel_Margin? 0x0001 253 253 100 Pre-fail Offline - 0
7 Seek_Error_Rate? 0x000a 253 252 000 Old_age Always - 0
8 Seek_Time_Performance? 0x0027 247 238 187 Pre-fail Always - 46214
9 Power_On_Minutes? 0x0032 241 241 000 Old_age Always - 950h+09m
10 Spin_Retry_Count? 0x002b 252 252 157 Pre-fail Always - 0
11 Calibration_Retry_Count? 0x002b 253 252 223 Pre-fail Always - 0
12 Power_Cycle_Count? 0x0032 253 253 000 Old_age Always - 22
192 Power-Off_Retract_Count? 0x0032 253 253 000 Old_age Always - 13
193 Load_Cycle_Count? 0x0032 253 253 000 Old_age Always - 72
194 Temperature_Celsius? 0x0032 253 253 000 Old_age Always - 31
195 Hardware_ECC_Recovered 0x000a 253 252 000 Old_age Always - 25095
196 Reallocated_Event_Count? 0x0008 253 253 000 Old_age Offline - 0
197 Current_Pending_Sector? 0x0008 253 253 000 Old_age Offline - 0
198 Offline_Uncorrectable? 0x0008 253 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0008 199 199 000 Old_age Offline - 0
200 Multi_Zone_Error_Rate? 0x000a 253 252 000 Old_age Always - 0
201 Soft_Read_Error_Rate? 0x000a 251 138 000 Old_age Always - 1746
202 TA_Increase_Count 0x000a 253 252 000 Old_age Always - 0
203 Run_Out_Cancel? 0x000b 253 252 180 Pre-fail Always - 137
204 Shock_Count_Write_Opern? 0x000a 253 252 000 Old_age Always - 0
205 Shock_Rate_Write_Opern? 0x000a 253 252 000 Old_age Always - 0
207 Spin_High_Current? 0x002a 252 252 000 Old_age Always - 0
208 Spin_Buzz? 0x002a 252 252 000 Old_age Always - 0
209 Offline_Seek_Performnce? 0x0024 187 183 000 Old_age Offline - 0
99 Unknown_Attribute? 0x0004 253 253 000 Old_age Offline - 0
100 Unknown_Attribute? 0x0004 253 253 000 Old_age Offline - 0
101 Unknown_Attribute? 0x0004 253 253 000 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[root@delirium /]#

Now we must see the interpretation of the information such as the uptime of the hard disk, the temperature and, for us the most important thing, the errors. For this, we essentially look at the last 2 columns: WHEN_FAILED and RAW_VALUE and the section shown below: SMART Error Log Version: 1 No Errors Logged. An example:


5 Reallocated_Sector_Ct? 0x0033 016 016 063 Pre-fail Always FAILING_NOW 598

Here, we can see that the sectors reallocation failed. We must check this part. If the indicated number eventually grows brutaly, keep necessary measures: just do the backup of your datas and contact the support.


Smartmontool on debian


The installation also requiere root rights. The package name evolves according to your debian version. The example shown below concerned an unstable Debian. For the OVH version, you must use: apt-get install smartsuite. However, the process and the commands are the same.


23:19 root@revolution / # apt-get install smartmontools
Lecture des listes de paquets... Fait
Construction de l'arbre des dependances... Fait
Les NOUVEAUX paquets suivants seront installes :
smartmontools
0 mis a jour, 1 nouvellement installes, 0 a enlever et 60 non mis a jour.
Il est necessaire de prendre 222ko dans les archives.
Apres depaquetage, 508ko d'espace disque supplementaires seront utilises.
Reception de : 1 http://ftp.fr.debian.org unstable/main smartmontools 5.32-3 [222kB]
222ko receptionnes en 0s (272ko/s)
Selection du paquet smartmontools precedemment deselectionne.
(Lecture de la base de donnees... 67466 fichiers et repertoires deja installes.)
Depaquetage de smartmontools (a partir de .../smartmontools_5.32-3_i386.deb) ...
Parametrage de smartmontools (5.32-3) ...
Not starting S.M.A.R.T. daemon smartd, disabled via /etc/default/smartmontools

As you can see, the daemon hasn't been launched immediately, you must again edit the file /etc/default/smartmontools

23:20 root@revolution /# pico /etc/default/smartmontools


Defaults for smartmontools initscript (/etc/init.d/smartmontools)
# This is a POSIX shell fragment

# list of devices you want to explicitly enable S.M.A.R.T. for
# not needed if the device is monitored by smartd
enable_smart="/dev/hda /dev/hdb"

# uncomment to start smartd on system startup
start_smartd=yes

# uncomment to pass additional options to smartd on startup
#smartd_opts="--interval=1800"

At enable-smart, edit disks to check and discomment startup system.Validate the changement and launch the daemon:

23:21 root@revolution /# /etc/init.d/smartmontools start
Enabling S.M.A.R.T. for: /dev/hda /dev/hdb.
Starting S.M.A.R.T. daemon: smartd.
23:21 root@revolution /# smartctl -a /dev/hda
smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen

It's ready!


Conclusion


Smartmontool is easy to use and very thorough. Note however that such a tool does not replace the most important regular backup of your data. OVH offers a backup weekly or incremental backup or l installation of a backup USB disk for only £10 / month. More details about smartmontool available on the page.">http://smartmontools.sourceforge.net/">page.