ceph ( requests are blocked ) 异常解决方法

ceph ( requests are blocked ) 异常解决方法

最近发现客户kolla平台的ceph环境遇到下面异常错误

1
2
3
4
5
6
7
8
9
10
11
12
13
(ceph-mon)[[email protected]ontrol01 /]# ceph -s
cluster b233a0b7-4e21-4375-bca8-e215c056cc25
health HEALTH_WARN
2 requests are blocked > 32 sec
monmap e1: 3 mons at {10.254.253.1=10.254.253.1:6789/0,10.254.253.2=10.254.253.2:6789/0,10.254.253.3=10.254.253.3:6789/0}
election epoch 26, quorum 0,1,2 10.254.253.1,10.254.253.2,10.254.253.3
osdmap e380: 90 osds: 90 up, 90 in
flags sortbitwise,require_jewel_osds
pgmap v1730087: 1008 pgs, 11 pools, 3502 GB data, 886 kobjects
10463 GB used, 235 TB / 245 TB avail
1007 active+clean
1 active+clean+scrubbing+deep
client io 1838 kB/s rd, 100 MB/s wr, 457 op/s rd, 929 op/s wr

注意: 1 requests are blocked > 32 sec 有可能是在数据迁移过程中, 用户正在对该数据块进行访问, 但访问还没有完成, 数据就迁移到别的 OSD 中, 那么就会导致有请求被 block, 对用户也是有影响的

解决思路

1.寻找block的请求

1
2
3
4
(ceph-mon)[[email protected] /]# ceph health detail
HEALTH_WARN 2 requests are blocked > 32 sec; 1 osds have slow requests
2 ops are blocked > 4194.3 sec on osd.5
1 osds have slow requests

可以考到osd.5具有一个操作block
2.查找osd对应的主机

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
(ceph-mon)[[email protected] /]# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 270.00000 root default
-2 27.00000 host 10.254.253.1
0 3.00000 osd.0 up 1.00000 1.00000
5 3.00000 osd.5 up 1.00000 1.00000
10 3.00000 osd.10 up 1.00000 1.00000
15 3.00000 osd.15 up 1.00000 1.00000
21 3.00000 osd.21 up 1.00000 1.00000
27 3.00000 osd.27 up 1.00000 1.00000
30 3.00000 osd.30 up 1.00000 1.00000
36 3.00000 osd.36 up 1.00000 1.00000
42 3.00000 osd.42 up 1.00000 1.00000
-3 27.00000 host 10.254.253.5
2 3.00000 osd.2 up 1.00000 1.00000
6 3.00000 osd.6 up 1.00000 1.00000
11 3.00000 osd.11 up 1.00000 1.00000
17 3.00000 osd.17 up 1.00000 1.00000
20 3.00000 osd.20 up 1.00000 1.00000
25 3.00000 osd.25 up 1.00000 1.00000
31 3.00000 osd.31 up 1.00000 1.00000
35 3.00000 osd.35 up 1.00000 1.00000
41 3.00000 osd.41 up 1.00000 1.00000
-4 27.00000 host 10.254.253.2
1 3.00000 osd.1 up 1.00000 1.00000
7 3.00000 osd.7 up 1.00000 1.00000
12 3.00000 osd.12 up 1.00000 1.00000
16 3.00000 osd.16 up 1.00000 1.00000
22 3.00000 osd.22 up 1.00000 1.00000
26 3.00000 osd.26 up 1.00000 1.00000
32 3.00000 osd.32 up 1.00000 1.00000
37 3.00000 osd.37 up 1.00000 1.00000
40 3.00000 osd.40 up 1.00000 1.00000
-6 27.00000 host 10.254.253.4
3 3.00000 osd.3 up 1.00000 1.00000
8 3.00000 osd.8 up 1.00000 1.00000
13 3.00000 osd.13 up 1.00000 1.00000
18 3.00000 osd.18 up 1.00000 1.00000
24 3.00000 osd.24 up 1.00000 1.00000
29 3.00000 osd.29 up 1.00000 1.00000
33 3.00000 osd.33 up 1.00000 1.00000
39 3.00000 osd.39 up 1.00000 1.00000
43 3.00000 osd.43 up 1.00000 1.00000
-5 27.00000 host 10.254.253.3
4 3.00000 osd.4 up 1.00000 1.00000
9 3.00000 osd.9 up 1.00000 1.00000
14 3.00000 osd.14 up 1.00000 1.00000
19 3.00000 osd.19 up 1.00000 1.00000
23 3.00000 osd.23 up 1.00000 1.00000
28 3.00000 osd.28 up 1.00000 1.00000
34 3.00000 osd.34 up 1.00000 1.00000
38 3.00000 osd.38 up 1.00000 1.00000
44 3.00000 osd.44 up 1.00000 1.00000
-7 27.00000 host 10.254.253.8
47 3.00000 osd.47 up 1.00000 1.00000
51 3.00000 osd.51 up 1.00000 1.00000
56 3.00000 osd.56 up 1.00000 1.00000
61 3.00000 osd.61 up 1.00000 1.00000
66 3.00000 osd.66 up 1.00000 1.00000
71 3.00000 osd.71 up 1.00000 1.00000
76 3.00000 osd.76 up 1.00000 1.00000
81 3.00000 osd.81 up 1.00000 1.00000
86 3.00000 osd.86 up 1.00000 1.00000
-8 27.00000 host 10.254.253.7
46 3.00000 osd.46 up 1.00000 1.00000
52 3.00000 osd.52 up 1.00000 1.00000
57 3.00000 osd.57 up 1.00000 1.00000
62 3.00000 osd.62 up 1.00000 1.00000
67 3.00000 osd.67 up 1.00000 1.00000
72 3.00000 osd.72 up 1.00000 1.00000
77 3.00000 osd.77 up 1.00000 1.00000
82 3.00000 osd.82 up 1.00000 1.00000
87 3.00000 osd.87 up 1.00000 1.00000
-9 27.00000 host 10.254.253.9
48 3.00000 osd.48 up 1.00000 1.00000
53 3.00000 osd.53 up 1.00000 1.00000
58 3.00000 osd.58 up 1.00000 1.00000
63 3.00000 osd.63 up 1.00000 1.00000
68 3.00000 osd.68 up 1.00000 1.00000
73 3.00000 osd.73 up 1.00000 1.00000
78 3.00000 osd.78 up 1.00000 1.00000
83 3.00000 osd.83 up 1.00000 1.00000
88 3.00000 osd.88 up 1.00000 1.00000
-10 27.00000 host 10.254.253.10
49 3.00000 osd.49 up 1.00000 1.00000
54 3.00000 osd.54 up 1.00000 1.00000
59 3.00000 osd.59 up 1.00000 1.00000
64 3.00000 osd.64 up 1.00000 1.00000
69 3.00000 osd.69 up 1.00000 1.00000
74 3.00000 osd.74 up 1.00000 1.00000
79 3.00000 osd.79 up 1.00000 1.00000
84 3.00000 osd.84 up 1.00000 1.00000
89 3.00000 osd.89 up 1.00000 1.00000
-11 27.00000 host 10.254.253.6
50 3.00000 osd.50 up 1.00000 1.00000
55 3.00000 osd.55 up 1.00000 1.00000
60 3.00000 osd.60 up 1.00000 1.00000
65 3.00000 osd.65 up 1.00000 1.00000
70 3.00000 osd.70 up 1.00000 1.00000
75 3.00000 osd.75 up 1.00000 1.00000
80 3.00000 osd.80 up 1.00000 1.00000
85 3.00000 osd.85 up 1.00000 1.00000
90 3.00000 osd.90 up 1.00000 1.00000

解决办法

1
2
3
4
5
[root@control01 ~]# docker restart ceph_osd_5
ceph_osd_5

[root@control01 ~]# docker ps|grep ceph_osd_5
9c4fe5eb0090 10.254.254.1:4000/99cloud/centos-source-ceph-osd:animbus-5.4.0 "kolla_start" 3 weeks ago Up 13 seconds

系统会对该 osd 执行 recovery 操作, recovery 过程中, 会断开 block request, 那么这个 request 将会重新请求 mon 节点, 并重新获得新的 pg map, 得到最新的数据访问位置, 从而解决上述问题。

查看集群状态

1
2
3
4
5
6
7
8
9
10
11
12
(ceph-mon)[[email protected] /]# ceph -s
cluster b233a0b7-4e21-4375-bca8-e215c056cc25
health HEALTH_OK
monmap e1: 3 mons at {10.254.253.1=10.254.253.1:6789/0,10.254.253.2=10.254.253.2:6789/0,10.254.253.3=10.254.253.3:6789/0}
election epoch 26, quorum 0,1,2 10.254.253.1,10.254.253.2,10.254.253.3
osdmap e387: 90 osds: 90 up, 90 in
flags sortbitwise,require_jewel_osds
pgmap v1730238: 1008 pgs, 11 pools, 3498 GB data, 886 kobjects
10453 GB used, 235 TB / 245 TB avail
1006 active+clean
2 active+clean+scrubbing+deep
client io 1090 kB/s rd, 92507 kB/s wr, 778 op/s rd, 904 op/s wr

本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!