'troubleshooting' 태그의 글 목록

troubleshooting

Ceph 문제/이슈 해결하기 - overall HEALTH_WARN 1 mgr modules have recently crashed 2023.03.06

Ceph 문제/이슈 해결하기 - overall HEALTH_WARN 1 mgr modules have recently crashed

2023. 3. 6. 15:48

Ceph storage를 사용하다가 Ceph cluster node 중 일부 Node를 강제로 재기동하다보면

아래와 같은 에러 로그를 Dashboard에서 보게 된다.

...

3/6/23 3:00:00 PM [WRN] overall HEALTH_WARN 1 mgr modules have recently crashed

3/6/23 2:50:00 PM [WRN] overall HEALTH_WARN 1 mgr modules have recently crashed

3/6/23 2:40:00 PM [WRN] overall HEALTH_WARN 1 mgr modules have recently crashed

...

위 로그가 출력되거나 Dashboard 화면에서 Unhealth warning 정보가 출력될 때,

아래처럼 `ceph crash archive-all` 명령을 수행하면 간단하게 해결된다.


$  kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash

##
## Ceph tool container 내부로 접속하여, ceph crash 목록을 확인
##

bash-4.4$ ceph crash ls

ID                                                                ENTITY  NEW
2023-02-01T07:02:56.432333Z_6ab1d847-9cbc-449b-9167-8b53e96774d8  mgr.a    *
2023-02-22T05:18:10.263896Z_7321ae9d-7dd8-49c9-a9e0-18ff892e3050  mgr.a    *

##
## ceph crash 상세 정보를 확인 (특이 사항이 있는지 확인하는 차원에서~)
##

bash-4.4$ ceph crash info 2023-02-22T05:18:10.263896Z_7321ae9d-7dd8-49c9-a9e0-18ff892e3050

{
    "backtrace": [
        "  File \"/usr/share/ceph/mgr/nfs/module.py\", line 154, in cluster_ls\n    return available_clusters(self)",
        ... 중간 생략 ...
        "orchestrator._interface.NoOrchestrator: No orchestrator configured (try `ceph orch set backend`)"
    ],
    "ceph_version": "17.2.5",
    "process_name": "ceph-mgr",
    ... 중간 생략 ...
    "utsname_version": "#66-Ubuntu SMP Fri Jan 20 14:29:49 UTC 2023"
}

##
## 아래 명령을 수행하여 crash 상태를 정리
## 

bash-4.4$ ceph crash archive-all

확인하는 차원에서 아래 명령으로 한번 더 ceph 상태를 확인한다.

bash-4.4$ ceph status
  cluster:
    id:     4e855f4b-085d-45d4-b713-19fc82d1a2a5
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum a,b,d (age 11d)
    mgr: b(active, since 11d), standbys: a
    osd: 3 osds: 3 up (since 11d), 3 in (since 4w)

  data:
    pools:   2 pools, 33 pgs
    objects: 3.69k objects, 13 GiB
    usage:   40 GiB used, 710 GiB / 750 GiB avail
    pgs:     33 active+clean

'kubernetes' 카테고리의 다른 글

Kubespray를 이용하여 Kubernetes Cluster 구축하기 (3)	2023.08.21
Kubernetes 관련 영상 모음 (Use Case, 데브시스터즈, 설치, 배포, CI/CD, etc) (0)	2023.03.17
Kubernetes, Ceph Storage 환경에서 PV Snapshot 방식 백업 (0)	2023.02.22
Ceph 성능 튜닝 (0)	2023.02.22
Oracle Cloud Infra(OCI) 그리고 OKE 기술 블로그(사용 설명서) (0)	2023.02.15

PREV 1 NEXT

sejong.jeonjo@gmail.com

troubleshooting

Ceph 문제/이슈 해결하기 - overall HEALTH_WARN 1 mgr modules have recently crashed

'kubernetes' 카테고리의 다른 글

+ Recent posts

티스토리툴바