chore: docs for ceph
All checks were successful
Build / precommit (pull_request) Successful in 5m22s

- add maintenance mode, how to bootstrap an osd, remove an osd
This commit is contained in:
Ben Vincent 2026-01-17 13:15:22 +11:00
parent 1077bdcbc1
commit d3c2d78625

View File

@ -58,3 +58,78 @@ this will overwrite the current capabilities of a given client.user
mon 'allow r' \
mds 'allow rw path=/' \
osd 'allow rw pool=media_data'
## adding a new osd on new node
create the ceph conf (automate this?)
cat <<EOF | sudo tee /etc/ceph/ceph.conf
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
fsid = de96a98f-3d23-465a-a899-86d3d67edab8
mon_allow_pool_delete = true
mon_initial_members = prodnxsr0009,prodnxsr0010,prodnxsr0011,prodnxsr0012,prodnxsr0013
mon_host = 198.18.23.9,198.18.23.10,198.18.23.11,198.18.23.12,198.18.23.13
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_crush_chooseleaf_type = 1
osd_pool_default_min_size = 2
osd_pool_default_size = 3
osd_pool_default_pg_num = 128
public_network = 198.18.23.1/32,198.18.23.2/32,198.18.23.3/32,198.18.23.4/32,198.18.23.5/32,198.18.23.6/32,198.18.23.7/32,198.18.23.8/32,198.18.23.9/32,198.18.23.10/32,198.18.23.11/32,198.18.23.12/32,198.18.23.13/32
EOF
ssh to one of the monitor hosts, then transfer the keys required
sudo cat /etc/ceph/ceph.client.admin.keyring | ssh prodnxsr0003 'sudo tee /etc/ceph/ceph.client.admin.keyring'
sudo cat /var/lib/ceph/bootstrap-osd/ceph.keyring | ssh prodnxsr0003 'sudo tee /var/lib/ceph/bootstrap-osd/ceph.keyring'
assuming we are adding /dev/sda to the cluster, first zap the disk to remove partitions/lvm/metadata
sudo ceph-volume lvm zap /dev/sda --destroy
then add it to the cluster
sudo ceph-volume lvm create --data /dev/sda
## removing an osd
check what OSD IDs were on this host (if you know it)
sudo ceph osd tree
or check for any DOWN osds
sudo ceph osd stat
sudo ceph health detail
once you identify the old OSD ID, remove it with these steps, replace X with the actual OSD ID:
sudo ceph osd out osd.X
sudo ceph osd down osd.X
sudo ceph osd crush remove osd.X
sudo ceph auth del osd.X
sudo ceph osd rm osd.X
## maintenance mode for the cluster
from one node in the cluster disable recovery
sudo ceph osd set noout
sudo ceph osd set nobackfill
sudo ceph osd set norecover
sudo ceph osd set norebalance
sudo ceph osd set nodown
sudo ceph osd set pause
to undo the change, use unset
sudo ceph osd unset noout
sudo ceph osd unset nobackfill
sudo ceph osd unset norecover
sudo ceph osd unset norebalance
sudo ceph osd unset nodown
sudo ceph osd unset pause