由于距离上一次磁盘排查已经有一段时间了,这段时间比较忙没有时间去立刻记录下当时的情况,于是借助当时找到的一篇博客来记录一下。
这里列举一些比较能用得上的命令。
1 2 3 4
| >lsblk >df -h >tail -f -n100 /var/log/messages >lsof | grep "/data"
|
现象
VM中有一块盘,进入mount点后,ls显示错误:
1 2
| [root@es10 data2]# ls ls: cannot open directory .: Input/output error
|
跟踪分析
1 2 3 4
| [root@es10 ~]# lsblk | grep vdc NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vdc 253:32 0 2T 0 disk └─vdc1 253:33 0 2T 0 part /data2
|
1 2
| [root@es10 ~]# mount | grep vdc /dev/vdc1 on /data2 type xfs (rw,relatime,attr2,inode64,noquota)
|
1 2 3
| [root@es10 data2]# df -h | grep vdc Filesystem Size Used Avail Use% Mounted on /dev/vdc1 2.0T 281G 1.7T 15% /data2
|
1 2 3 4
| [root@es10 ~]# free -h total used free shared buff/cache available Mem: 31G 322M 18G 16M 12G 30G Swap: 0B 0B 0B
|
1 2 3 4
| [root@es10 ~]# tail /var/log/messages Dec 4 10:02:37 es10 kernel: XFS (vdc1): xfs_log_force: error -5 returned. Dec 4 10:02:39 es10 kernel: XFS (vdc1): xfs_log_force: error -5 returned. Dec 4 10:02:39 es10 kernel: XFS (vdc1): xfs_log_force: error -5 returned.
|
1 2 3 4 5 6 7 8
| [root@es10 ~]# lsof | grep "/data2" lsof: WARNING: can't stat() xfs file system /data2 Output information may be incomplete.
[root@es10 ~]# lsof | grep "/data" lsof: WARNING: can't stat() xfs file system /data2 Output information may be incomplete. bash 32147 root cwd DIR 253,17 26 128 /data1
|
解决
在网上搜索了一下,有人也遇到过,重新mount可以解决:
1 2 3 4 5 6 7 8 9 10 11 12 13
| [root@es10 ~]# umount /data2 [root@es10 ~]# mount | grep vdc [root@es10 ~]# mount /dev/vdc1 /data2
[root@es10 ~]# tail /var/log/messages Dec 4 10:02:37 es10 kernel: XFS (vdc1): xfs_log_force: error -5 returned. Dec 4 10:02:39 es10 kernel: XFS (vdc1): Unmounting Filesystem Dec 4 10:02:57 es10 kernel: XFS (vdc1): Mounting V4 Filesystem Dec 4 10:02:57 es10 kernel: XFS (vdc1): Starting recovery (logdev: internal) Dec 4 10:02:58 es10 kernel: XFS (vdc1): Ending recovery (logdev: internal) Dec 4 10:05:38 es10 systemd: Started Session 592 of user root. Dec 4 10:05:38 es10 systemd-logind: New session 592 of user root. Dec 4 10:05:38 es10 systemd: Starting Session 592 of user root.
|
深度分析
解释
Linux默认会使用可用内存的40%来作为文件系统cache,当这个阈值达到时,文件系统会下刷这些cache到磁盘。而这个flush的过程,会有一个时间限制,操作系统默认给了120s的限制,如果底层磁盘不是很快,部分cache依然占用,没有flush到disk,系统内存会慢慢被消耗完。
再度分析系统Log:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
| Dec 2 05:27:32 es10 kernel: INFO: task java:10432 blocked for more than 120 seconds. Dec 2 05:27:32 es10 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 2 05:27:32 es10 kernel: java D ffff8808115a4500 0 10432 1 0x00000180 Dec 2 05:27:32 es10 kernel: ffff8803fcf9fe40 0000000000000082 ffff8808115a4500 ffff8803fcf9ffd8 Dec 2 05:27:32 es10 kernel: ffff8803fcf9ffd8 ffff8803fcf9ffd8 ffff8808115a4500 000000720007bf45 Dec 2 05:27:32 es10 kernel: ffff88042944c528 ffff8803fcf9fef4 ffff88080fc2e918 ffff8808115a4500 Dec 2 05:27:32 es10 kernel: Call Trace: Dec 2 05:27:32 es10 kernel: [<ffffffff8163bb59>] schedule+0x29/0x70 Dec 2 05:27:32 es10 kernel: [<ffffffffa01c5c14>] _xfs_log_force_lsn+0x184/0x2f0 [xfs] Dec 2 05:27:32 es10 kernel: [<ffffffff810b8910>] ? wake_up_state+0x20/0x20 Dec 2 05:27:32 es10 kernel: [<ffffffffa01a83e0>] xfs_file_fsync+0x1b0/0x200 [xfs] Dec 2 05:27:32 es10 kernel: [<ffffffff81210065>] do_fsync+0x65/0xa0 Dec 2 05:27:32 es10 kernel: [<ffffffff81210353>] SyS_fdatasync+0x13/0x20 Dec 2 05:27:32 es10 kernel: [<ffffffff81646d52>] tracesys+0xdd/0xe2 Dec 2 05:27:32 es10 kernel: INFO: task java:10433 blocked for more than 120 seconds. Dec 2 05:27:32 es10 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 2 05:27:32 es10 kernel: java D ffff88081176e780 0 10433 1 0x00000180 Dec 2 05:27:32 es10 kernel: ffff88049fb03e40 0000000000000082 ffff88081176e780 ffff88049fb03fd8 Dec 2 05:27:32 es10 kernel: ffff88049fb03fd8 ffff88049fb03fd8 ffff88081176e780 000000720007be05 Dec 2 05:27:32 es10 kernel: ffff88042944c528 ffff88049fb03ef4 ffff88080fc2e818 ffff88081176e780 Dec 2 05:27:32 es10 kernel: Call Trace: Dec 2 05:27:32 es10 kernel: [<ffffffff8163bb59>] schedule+0x29/0x70 Dec 2 05:27:32 es10 kernel: [<ffffffffa01c5c14>] _xfs_log_force_lsn+0x184/0x2f0 [xfs] Dec 2 05:27:32 es10 kernel: [<ffffffff810b8910>] ? wake_up_state+0x20/0x20 Dec 2 05:27:32 es10 kernel: [<ffffffffa01a83e0>] xfs_file_fsync+0x1b0/0x200 [xfs] Dec 2 05:27:32 es10 kernel: [<ffffffff81210065>] do_fsync+0x65/0xa0 Dec 2 05:27:32 es10 kernel: [<ffffffff81210353>] SyS_fdatasync+0x13/0x20 Dec 2 05:27:32 es10 kernel: [<ffffffff81646d52>] tracesys+0xdd/0xe2
Dec 2 05:33:17 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2097660325 Dec 2 05:33:17 es10 kernel: XFS (vdc1): metadata I/O error: block 0x7d07b9a5 ("xlog_iodone") error 5 numblks 64 Dec 2 05:33:17 es10 kernel: XFS (vdc1): xfs_do_force_shutdown(0x2) called from line 1180 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa01c3b9e Dec 2 05:33:17 es10 kernel: XFS (vdc1): Log I/O Error Detected. Shutting down filesystem Dec 2 05:33:17 es10 kernel: XFS (vdc1): xfs_log_force: error -5 returned. Dec 2 05:33:17 es10 kernel: XFS (vdc1): Please umount the filesystem and rectify the problem(s) Dec 2 05:33:47 es10 kernel: XFS (vdc1): xfs_log_force: error -5 returned. Dec 2 05:33:54 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2115248608 Dec 2 05:33:54 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2115249616 Dec 2 05:33:54 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2115250624 Dec 2 05:33:54 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2115251632 Dec 2 05:33:54 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2115252640 Dec 2 05:33:54 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2115253648 Dec 2 05:33:54 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2115254656 Dec 2 05:33:54 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2115255664 Dec 2 05:33:54 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2115256672 Dec 2 05:33:54 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2115256800 Dec 2 05:34:17 es10 kernel: XFS (vdc1): xfs_log_force: error -5 returned. Dec 2 05:38:17 es10 kernel: blk_update_request: 8 callbacks suppressed Dec 2 05:38:17 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2097660389 Dec 2 05:38:17 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2097660453 Dec 2 05:38:17 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2097660517 Dec 2 05:38:17 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2097660581 Dec 2 05:38:17 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2097660645 Dec 2 05:38:17 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2097660709 Dec 2 05:38:17 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2097660773 Dec 2 05:38:17 es10 kernel: XFS (vdc1): metadata I/O error: block 0x7d07b9e5 ("xlog_iodone") error 5 numblks 64 Dec 2 05:38:17 es10 kernel: XFS (vdc1): xfs_do_force_shutdown(0x2) called from line 1180 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa01c3b9e Dec 2 05:38:17 es10 kernel: XFS (vdc1): metadata I/O error: block 0x7d07ba25 ("xlog_iodone") error 5 numblks 64 Dec 2 05:38:17 es10 kernel: XFS (vdc1): xfs_do_force_shutdown(0x2) called from line 1180 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa01c3b9e Dec 2 05:38:17 es10 kernel: XFS (vdc1): metadata I/O error: block 0x7d07ba65 ("xlog_iodone") error 5 numblks 64 Dec 2 05:38:17 es10 kernel: XFS (vdc1): xfs_do_force_shutdown(0x2) called from line 1180 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa01c3b9e Dec 2 05:38:17 es10 kernel: XFS (vdc1): metadata I/O error: block 0x7d07baa5 ("xlog_iodone") error 5 numblks 64 Dec 2 05:38:17 es10 kernel: XFS (vdc1): xfs_do_force_shutdown(0x2) called from line 1180 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa01c3b9e Dec 2 05:38:17 es10 kernel: XFS (vdc1): metadata I/O error: block 0x7d07bae5 ("xlog_iodone") error 5 numblks 64 Dec 2 05:38:17 es10 kernel: XFS (vdc1): xfs_do_force_shutdown(0x2) called from line 1180 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa01c3b9e Dec 2 05:38:17 es10 kernel: XFS (vdc1): metadata I/O error: block 0x7d07bb25 ("xlog_iodone") error 5 numblks 64 Dec 2 05:38:17 es10 kernel: XFS (vdc1): xfs_do_force_shutdown(0x2) called from line 1180 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa01c3b9e Dec 2 05:38:17 es10 kernel: XFS (vdc1): metadata I/O error: block 0x7d07bb65 ("xlog_iodone") error 5 numblks 64 Dec 2 05:38:17 es10 kernel: XFS (vdc1): xfs_do_force_shutdown(0x2) called from line 1180 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa01c3b9e
|
从上面的日志可以看到:
主要问题还是 kernel: INFO: task java:10432 blocked for more than 120 seconds.这种问题,解决方法就是降低vm.dirty_ratio、vm.dirty_background_ratio这两个参数。
1 2 3 4
| echo ``"vm.dirty_ratio = 10"` `>> /etc/sysctl.conf echo ``"vm.dirty_background_ratio = 1"` `>> /etc/sysctl.conf' 不重启生效: sysctl -p
|
链接
http://blog.51cto.com/7938217/1679885
https://www.blackmoreops.com/2014/09/22/linux-kernel-panic-issue-fix-hung_task_timeout_secs-blocked-120-seconds-problem/[](https://bugzilla.redhat.com/show_bug.cgi?id=1211084)
https://bugzilla.redhat.com/show_bug.cgi?id=1211084