All checks were successful
Build / precommit (pull_request) Successful in 4m12s
- specifically related to managing csi volumes for kubernetes
230 lines
7.9 KiB
Markdown
230 lines
7.9 KiB
Markdown
# managing ceph
|
||
|
||
Always refer back to the official documentation at https://docs.ceph.com/en/latest
|
||
|
||
## adding new cephfs
|
||
- create a erasure code profile which will allow you to customise the raid level
|
||
- raid5 with 3 disks? k=2,m=1
|
||
- raid5 with 6 disks? k=5,m=1
|
||
- raid6 with 4 disks? k=2,m=2, etc
|
||
- create osd pool using custom profile for data
|
||
- create osd pool using default replicated profile for metadata
|
||
- enable ec_overwrites for the data pool
|
||
- create the ceph fs volume using data/metadata pools
|
||
- set ceph fs settings
|
||
- specify minimum number of metadata servers (mds)
|
||
- set fs to be for bulk data
|
||
- set mds fast failover with standby reply
|
||
|
||
|
||
```
|
||
sudo ceph osd erasure-code-profile set ec_4_1 k=4 m=1
|
||
sudo ceph osd pool create media_data 128 erasure ec_4_1
|
||
sudo ceph osd pool create media_metadata 32 replicated_rule
|
||
sudo ceph osd pool set media_data allow_ec_overwrites true
|
||
sudo ceph osd pool set media_data bulk true
|
||
sudo ceph fs new mediafs media_metadata media_data --force
|
||
sudo ceph fs set mediafs allow_standby_replay true
|
||
sudo ceph fs set mediafs max_mds 2
|
||
```
|
||
|
||
|
||
## managing cephfs with subvolumes
|
||
|
||
This will:
|
||
|
||
Create erasure code profiles. The K and M values are equivalent to the number of data disks (K) and parity disks (M) in RAID5, RAID6, etc.
|
||
|
||
sudo ceph osd erasure-code-profile set ec_6_2 k=6 m=2
|
||
sudo ceph osd erasure-code-profile set ec_4_1 k=4 m=1
|
||
|
||
Create data pools using the erasure-code-profile, set some required options
|
||
|
||
sudo ceph osd pool create cephfs_data_ssd_ec_6_2 erasure ec_6_2
|
||
sudo ceph osd pool set cephfs_data_ssd_ec_6_2 allow_ec_overwrites true
|
||
sudo ceph osd pool set cephfs_data_ssd_ec_6_2 bulk true
|
||
|
||
sudo ceph osd pool create cephfs_data_ssd_ec_4_1 erasure ec_4_1
|
||
sudo ceph osd pool set cephfs_data_ssd_ec_4_1 allow_ec_overwrites true
|
||
sudo ceph osd pool set cephfs_data_ssd_ec_4_1 bulk true
|
||
|
||
Add the pool to the fs `cephfs`
|
||
|
||
sudo ceph fs add_data_pool cephfs cephfs_data_ssd_ec_6_2
|
||
sudo ceph fs add_data_pool cephfs cephfs_data_ssd_ec_4_1
|
||
|
||
Create a subvolumegroup using the new data pool
|
||
|
||
sudo ceph fs subvolumegroup create cephfs csi_ssd_ec_6_2 --pool_layout cephfs_data_ssd_ec_6_2
|
||
sudo ceph fs subvolumegroup create cephfs csi_ssd_ec_4_1 --pool_layout cephfs_data_ssd_ec_4_1
|
||
|
||
All together:
|
||
|
||
sudo ceph osd erasure-code-profile set ec_6_2 k=6 m=2
|
||
sudo ceph osd pool create cephfs_data_ssd_ec_6_2 erasure ec_6_2
|
||
sudo ceph osd pool set cephfs_data_ssd_ec_6_2 allow_ec_overwrites true
|
||
sudo ceph osd pool set cephfs_data_ssd_ec_6_2 bulk true
|
||
sudo ceph fs add_data_pool cephfs cephfs_data_ssd_ec_6_2
|
||
sudo ceph fs subvolumegroup create cephfs csi_ssd_ec_6_2 --pool_layout cephfs_data_ssd_ec_6_2
|
||
|
||
sudo ceph osd erasure-code-profile set ec_4_1 k=4 m=1
|
||
sudo ceph osd pool create cephfs_data_ssd_ec_4_1 erasure ec_4_1
|
||
sudo ceph osd pool set cephfs_data_ssd_ec_4_1 allow_ec_overwrites true
|
||
sudo ceph osd pool set cephfs_data_ssd_ec_4_1 bulk true
|
||
sudo ceph fs add_data_pool cephfs cephfs_data_ssd_ec_4_1
|
||
sudo ceph fs subvolumegroup create cephfs csi_ssd_ec_4_1 --pool_layout cephfs_data_ssd_ec_4_1
|
||
|
||
Create a key with access to the new subvolume groups. Check if the user already exists first:
|
||
|
||
sudo ceph auth get client.kubernetes-cephfs
|
||
|
||
If it doesnt:
|
||
|
||
sudo ceph auth get-or-create client.kubernetes-cephfs \
|
||
mgr 'allow rw' \
|
||
osd 'allow rw tag cephfs metadata=cephfs, allow rw tag cephfs data=cephfs' \
|
||
mds 'allow r fsname=cephfs path=/volumes, allow rws fsname=cephfs path=/volumes/csi_ssd_ec_6_2, allow rws fsname=cephfs path=/volumes/csi_ssd_ec_4_1' \
|
||
mon 'allow r fsname=cephfs'
|
||
|
||
If it does, use `sudo ceph auth caps client.kubernetes-cephfs ...` instead to update existing capabilities.
|
||
|
||
## removing a cephfs subvolumegroup from cephfs
|
||
|
||
This will cleanup the subvolumegroup, and subvolumes if they exist, then remove the pool.
|
||
|
||
Check for subvolumegroups first, then for subvolumes in it
|
||
|
||
sudo ceph fs subvolumegroup ls cephfs
|
||
sudo ceph fs subvolume ls cephfs --group_name csi_raid6
|
||
|
||
|
||
If subvolumes exist, remove each one-by-one:
|
||
|
||
sudo ceph fs subvolume rm cephfs <subvol_name> --group_name csi_raid6
|
||
|
||
If you have snapshots, remove snapshots first:
|
||
|
||
sudo ceph fs subvolume snapshot ls cephfs <subvol_name> --group_name csi_raid6
|
||
sudo ceph fs subvolume snapshot rm cephfs <subvol_name> <snap_name> --group_name csi_raid6
|
||
|
||
Once the group is empty, remove it:
|
||
|
||
sudo ceph fs subvolumegroup rm cephfs csi_raid6
|
||
|
||
If it complains it’s not empty, go back as there’s still a subvolume or snapshot.
|
||
|
||
If you added it with `ceph fs add_data_pool`. Undo with `rm_data_pool`:
|
||
|
||
sudo ceph fs rm_data_pool cephfs cephfs_data_csi_raid6
|
||
|
||
After it’s detached from CephFS, you can delete it.
|
||
|
||
sudo ceph osd pool rm cephfs_data_csi_raid6 cephfs_data_csi_raid6 --yes-i-really-really-mean-it
|
||
|
||
|
||
## creating authentication tokens
|
||
|
||
- this will create a client keyring named media
|
||
- this client will have the following capabilities:
|
||
- mon: read
|
||
- mds:
|
||
- read /
|
||
- read/write /media
|
||
- read/write /common
|
||
- osd: read/write to cephfs_data pool
|
||
|
||
```
|
||
sudo ceph auth get-or-create client.media \
|
||
mon 'allow r' \
|
||
mds 'allow r path=/, allow rw path=/media, allow rw path=/common' \
|
||
osd 'allow rw pool=cephfs_data'
|
||
```
|
||
|
||
## list the authentication tokens and permissions
|
||
|
||
ceph auth ls
|
||
|
||
## change the capabilities of a token
|
||
|
||
this will overwrite the current capabilities of a given client.user
|
||
|
||
sudo ceph auth caps client.media \
|
||
mon 'allow r' \
|
||
mds 'allow rw path=/' \
|
||
osd 'allow rw pool=media_data'
|
||
|
||
## adding a new osd on new node
|
||
|
||
create the ceph conf (automate this?)
|
||
|
||
cat <<EOF | sudo tee /etc/ceph/ceph.conf
|
||
[global]
|
||
auth_client_required = cephx
|
||
auth_cluster_required = cephx
|
||
auth_service_required = cephx
|
||
fsid = de96a98f-3d23-465a-a899-86d3d67edab8
|
||
mon_allow_pool_delete = true
|
||
mon_initial_members = prodnxsr0009,prodnxsr0010,prodnxsr0011,prodnxsr0012,prodnxsr0013
|
||
mon_host = 198.18.23.9,198.18.23.10,198.18.23.11,198.18.23.12,198.18.23.13
|
||
ms_bind_ipv4 = true
|
||
ms_bind_ipv6 = false
|
||
osd_crush_chooseleaf_type = 1
|
||
osd_pool_default_min_size = 2
|
||
osd_pool_default_size = 3
|
||
osd_pool_default_pg_num = 128
|
||
public_network = 198.18.23.1/32,198.18.23.2/32,198.18.23.3/32,198.18.23.4/32,198.18.23.5/32,198.18.23.6/32,198.18.23.7/32,198.18.23.8/32,198.18.23.9/32,198.18.23.10/32,198.18.23.11/32,198.18.23.12/32,198.18.23.13/32
|
||
EOF
|
||
|
||
ssh to one of the monitor hosts, then transfer the keys required
|
||
|
||
sudo cat /etc/ceph/ceph.client.admin.keyring | ssh prodnxsr0003 'sudo tee /etc/ceph/ceph.client.admin.keyring'
|
||
sudo cat /var/lib/ceph/bootstrap-osd/ceph.keyring | ssh prodnxsr0003 'sudo tee /var/lib/ceph/bootstrap-osd/ceph.keyring'
|
||
|
||
assuming we are adding /dev/sda to the cluster, first zap the disk to remove partitions/lvm/metadata
|
||
|
||
sudo ceph-volume lvm zap /dev/sda --destroy
|
||
|
||
then add it to the cluster
|
||
|
||
sudo ceph-volume lvm create --data /dev/sda
|
||
|
||
## removing an osd
|
||
|
||
check what OSD IDs were on this host (if you know it)
|
||
|
||
sudo ceph osd tree
|
||
|
||
or check for any DOWN osds
|
||
|
||
sudo ceph osd stat
|
||
sudo ceph health detail
|
||
|
||
once you identify the old OSD ID, remove it with these steps, replace X with the actual OSD ID:
|
||
|
||
sudo ceph osd out osd.X
|
||
sudo ceph osd down osd.X
|
||
sudo ceph osd crush remove osd.X
|
||
sudo ceph auth del osd.X
|
||
sudo ceph osd rm osd.X
|
||
|
||
|
||
## maintenance mode for the cluster
|
||
|
||
from one node in the cluster disable recovery
|
||
|
||
sudo ceph osd set noout
|
||
sudo ceph osd set nobackfill
|
||
sudo ceph osd set norecover
|
||
sudo ceph osd set norebalance
|
||
sudo ceph osd set nodown
|
||
sudo ceph osd set pause
|
||
|
||
to undo the change, use unset
|
||
|
||
sudo ceph osd unset noout
|
||
sudo ceph osd unset nobackfill
|
||
sudo ceph osd unset norecover
|
||
sudo ceph osd unset norebalance
|
||
sudo ceph osd unset nodown
|
||
sudo ceph osd unset pause
|