git.apps.os.sepia.ceph.com Git

]> git.apps.os.sepia.ceph.com Git - ceph.git/log

Yuri Weinstein [Fri, 10 Mar 2023 00:21:29 +0000 (16:21 -0800)]

Merge pull request #50349 from ceph/wip-yuri-octopus-reef-octopus

qa/tests: added octopus client upgrade => reef

Reviewed-by: Laura Flores <lflores@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Thu, 2 Mar 2023 16:40:49 +0000 (08:40 -0800)]

qa/tests: added octopus client upgrade => reef

Signed-off-by: Yuri Weinstein <yweinste@redhat.com>

commit | commitdiff | tree

David Galloway [Thu, 11 Aug 2022 16:30:00 +0000 (12:30 -0400)]

Merge pull request #47520 from ceph/octopus-release

v15.2.17

commit | commitdiff | tree

Ceph Release Team [Tue, 9 Aug 2022 17:07:01 +0000 (17:07 +0000)]

15.2.17

Signed-off-by: Ceph Release Team <ceph-maintainers@ceph.io>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 9 Aug 2022 08:57:07 +0000 (10:57 +0200)]

Merge pull request #47294 from MrFreezeex/wip-56631-octopus

octopus: ceph-volume: fix fast device alloc size on mulitple device

commit | commitdiff | tree

Kefu Chai [Thu, 4 Aug 2022 01:10:37 +0000 (09:10 +0800)]

Merge pull request #47448 from tchaikov/octopus-cython3

octopus: debian/control: remove cython from Build-Depends

Reviewed-by: David Galloway <dgallowa@redhat.com>

commit | commitdiff | tree

Kefu Chai [Sun, 21 Mar 2021 14:55:37 +0000 (22:55 +0800)]

debian/control: remove cython from Build-Depends

as cython3 is enough. and we've dropped the python2 support.

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit bcd7f3ac0f299855516df871a56ef26c1ee21bc9)

commit | commitdiff | tree

David Galloway [Wed, 3 Aug 2022 23:36:12 +0000 (19:36 -0400)]

Merge pull request #47444 from ceph/octopus-rtd

octopus: .readthedocs.yml: Always build latest doc/releases pages

commit | commitdiff | tree

David Galloway [Thu, 30 Jun 2022 19:37:58 +0000 (15:37 -0400)]

.readthedocs.yml: Always build latest doc/releases pages

We don't backport PRs merged into doc/releases. Therefore, when one browses to an older Ceph release version on docs.ceph.com (e.g., https://docs.ceph.com/en/pacific/), the information is out of date at best.

The doc/releases page is only accurate if browsing https://docs.ceph.com/en/latest/, for example.

So this post_checkout command will make sure we've checked out doc/releases from main before building and publishing.

Signed-off-by: David Galloway <dgallowa@redhat.com>
(cherry picked from commit 055fe1f825b0629b7685d6d3d4d629ffc37a2d7c)

commit | commitdiff | tree

Yuri Weinstein [Tue, 2 Aug 2022 14:56:54 +0000 (07:56 -0700)]

Merge pull request #47236 from kotreshhr/wip-legacy-upgrade-config-issue-octopus-1

octopus: mgr/volumes: Fix subvolume discover during upgrade

Reviewed-by: Ramana Raja <rraja@redhat.com>

commit | commitdiff | tree

Kotresh HR [Mon, 1 Aug 2022 05:07:50 +0000 (10:37 +0530)]

tools/cephfs-shell: Fix flake8 errors

Signed-off-by: Kotresh HR <khiremat@redhat.com>

commit | commitdiff | tree

Kotresh HR [Wed, 27 Jul 2022 11:09:08 +0000 (16:39 +0530)]

mgr/volumes: Fix the subvolume creation in FIPS enabled system.

The md5 checksum is used in the construction of legacy
subvolume config filename. It's not used for security reason.
Hence marking the 'usedforsecurity' flag to false to
make it FIPs compliant.

The usage of md5 was always in there. The commit 373a04cf734
made it to get exercised in 'open_subvol' which is pre-requisite
for all the subvolume operations and hence subvolume
creation has failed.

Fixes: https://tracker.ceph.com/issues/56727
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit ced3fac48d3da2320827c6c86ece3b87953badc7)

commit | commitdiff | tree

Kotresh HR [Thu, 28 Jul 2022 10:26:07 +0000 (15:56 +0530)]

qa: Fix tests to run in octopus

Signed-off-by: Kotresh HR <khiremat@redhat.com>

commit | commitdiff | tree

Kotresh HR [Fri, 4 Feb 2022 09:58:39 +0000 (15:28 +0530)]

qa: validate subvolume discover on upgrade

Validate subvolume discover on upgrade from
legacy subvolume to v1. The handcrafted
`.meta' file on legacy subvolume root should
not be used for any subvolume apis like getpath,
authorize.

Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit fcc118500c545fe6018cd3f2742127b92c657def)

commit | commitdiff | tree

Kotresh HR [Thu, 9 Jun 2022 08:00:59 +0000 (13:30 +0530)]

mgr/volumes: V2 Fix for test_subvolume_retain_snapshot_invalid_recreate

Signed-off-by: Kotresh HR <khiremat@redhat.com>

commit | commitdiff | tree

Kotresh HR [Fri, 4 Feb 2022 09:25:03 +0000 (14:55 +0530)]

mgr/volumes: Fix subvolume discover during upgrade

Fixes the subvolume discover to use the correct
metadata file after an upgrade from legacy subvolume
to v1. The fix makes sure, it doesn't use the
handcrafted metadata file placed in the subvolume
root of legacy subvolume.

Co-authored-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
Co-authored-by: Dan van der Ster <daniel.vanderster@cern.ch>
Co-authored-by: Ramana Raja <rraja@redhat.com>
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 7eba9cab6cfb9a13a84062177d7a0fa228311e13)

commit | commitdiff | tree

Arthur Outhenin-Chalandre [Tue, 14 Jun 2022 09:10:33 +0000 (11:10 +0200)]

ceph-volume: fix shebang of install command

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
(cherry picked from commit 5255d4be74f15032331ba24b708ad04193ff87f3)

commit | commitdiff | tree

Arthur Outhenin-Chalandre [Tue, 14 Jun 2022 09:02:05 +0000 (11:02 +0200)]

ceph-volume: fix fast device alloc size on mulitple device

The size computed by get_physical_fast_allocs() was wrong when the
function had multiple devices to treat.

For instance if there is 4 OSDs and 2 fast devices of each 10G while
allocating 2 slots per fast devvices. The behavior before was that each
slot would be 2.5G meaning that both fast devices would half full. The
behavior now is that each slot will take 5G so that the fast devices
would be full.

Fixes: https://tracker.ceph.com/issues/56031
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
(cherry picked from commit d0f9e93914e2b7feac41a634311d74c146c8868b)

commit | commitdiff | tree

Yuri Weinstein [Tue, 19 Jul 2022 15:50:09 +0000 (08:50 -0700)]

Merge pull request #47160 from idryomov/wip-56549-octopus

octopus: librbd: bail from schedule_request_lock() if already lock owner

Reviewed-by: Christopher Hoffman <choffman@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Tue, 19 Jul 2022 14:00:19 +0000 (07:00 -0700)]

Merge pull request #47142 from idryomov/wip-56561-octopus

octopus: rbd: don't default empty pool name unless namespace is specified

Reviewed-by: Christopher Hoffman <choffman@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Tue, 19 Jul 2022 13:59:32 +0000 (06:59 -0700)]

Merge pull request #47117 from idryomov/wip-56516-octopus

octopus: rbd-mirror: remove bogus completed_non_primary_snapshots_exist check

Reviewed-by: Mykola Golub <mgolub@suse.com>
Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

commit | commitdiff | tree

Christopher Hoffman [Thu, 14 Jul 2022 18:20:29 +0000 (12:20 -0600)]

librbd: bail from schedule_request_lock() if already lock owner

Race condition may be hit if there are multiple pending locks for the
same image and pending callbacks. Abort exclusive lock process if
already exclusive lock owner.

Fixes: https://tracker.ceph.com/issues/56549
Signed-off-by: Christopher Hoffman <choffman@redhat.com>
(cherry picked from commit 3527d2c764626c09c5ede80ae844551fd8845756)

commit | commitdiff | tree

Yuri Weinstein [Mon, 18 Jul 2022 20:43:25 +0000 (13:43 -0700)]

Merge pull request #47108 from ceph/wip-matanb-octopus-snapmapper

octopus: osd/SnapMapper: fix pacific legacy key conversion and introduce test

Reviewed-by: Neha Ojha <nojha@redhat.com>

commit | commitdiff | tree

Ilya Dryomov [Thu, 14 Jul 2022 12:42:45 +0000 (14:42 +0200)]

rbd: drop unused default_empty_pool_name argument

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 365300f253fd8066ae6f8cbd36c94ff4b145ab8d)

commit | commitdiff | tree

Ilya Dryomov [Thu, 14 Jul 2022 12:19:06 +0000 (14:19 +0200)]

rbd: don't default empty pool name unless namespace is specified

Commit 96f05a7956b3 ("rbd: delay determination of default pool name")
broke "rbd perf image iostat" and "rbd perf image iotop" GLOBAL_POOL_KEY
support (the ability to blend all rbd pools together into a single
view).

Fixes: https://tracker.ceph.com/issues/56561
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit b2137e205862e6cfc316c11036266da65a78d26d)

commit | commitdiff | tree

Ilya Dryomov [Sat, 9 Jul 2022 11:35:04 +0000 (13:35 +0200)]

rbd-mirror: remove bogus completed_non_primary_snapshots_exist check

This check was added in commit ecd3778a6f9a ("rbd-mirror: ensure that
the last non-primary snapshot cannot be pruned") as an additional
safeguard against pruning an incomplete non-primary snapshot in case
there is no predecessor mirror snapshot. However it still fires if the
predecessor is there but happens to be a primary demotion snapshot.
A bogus "incomplete local non-primary snapshot" error is reported and
the replayer gets stuck.

Remove completed_non_primary_snapshots_exist tracking as the presence
of the predecessor in the incomplete non-primary snapshot pruning arm
is already ensured by "m_local_snap_id_start > 0" condition.

Fixes: https://tracker.ceph.com/issues/56516
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit a581509381ba84b49c906a1fe440ca3ddcab418c)

commit | commitdiff | tree

Ilya Dryomov [Fri, 8 Jul 2022 17:15:48 +0000 (19:15 +0200)]

test/rbd-mirror: add last_copied_object_number == 0 coverage

Incomplete non-primary snapshot handling is bifurcated depending
on whether any data objects have been copied. If no data objects
have been copied, an incomplete non-primary snapshot is assumed to
be malformed and gets pruned; the sync is restarted from scratch.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 13aa47ae2ee89babca7571165ac88eaa665d2fc0)

commit | commitdiff | tree

Manuel Lausch [Thu, 30 Jun 2022 12:29:53 +0000 (14:29 +0200)]

osd/SnapMapper: fix pacific legacy key conversion and introduce test

Octopus modified the SnapMapper key format from

  <LEGACY_MAPPING_PREFIX><snapid>_<shardid>_<hobject_t::to_str()>

to

  <MAPPING_PREFIX><pool>_<snapid>_<shardid>_<hobject_t::to_str()>

When this change was introduced, 94ebe0ea also introduced a conversion
with a crucial bug which essentially destroyed legacy keys by mapping them
to

  <MAPPING_PREFIX><poolid>_<snapid>_

without the object-unique suffix.  This commit fixes this conversion going
forward, but a fix for existing clusters still needs to be developed.

Fixes: https://tracker.ceph.com/issues/56147
Signed-off-by: Manuel Lausch <manuel.lausch@1und1.de>
Signed-off-by: Matan Breizman <mbreizma@redhat.com>

commit | commitdiff | tree

Laura Flores [Tue, 12 Jul 2022 17:20:27 +0000 (12:20 -0500)]

Merge pull request #47053 from ljflores/wip-fix-55636

octopus: Revert "rocksdb: do not use non-zero recycle_log_file_num setting"

commit | commitdiff | tree

Laura Flores [Mon, 11 Jul 2022 19:12:24 +0000 (14:12 -0500)]

Revert "rocksdb: do not use non-zero recycle_log_file_num setting"

This reverts commit 6e225bfbff2c86bf979e124a8808da1969ccab1f. It
causes a consistently-reproducible BlueFS bug. See the tracker
for details.

Fixes: https://tracker.ceph.com/issues/55636
Signed-off-by: Laura Flores <lflores@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Mon, 11 Jul 2022 18:19:05 +0000 (11:19 -0700)]

Merge pull request #46611 from rzarzynski/wip-55990-octopus

octopus: revert of #46253, add tools: ceph-objectstore-tool is able to trim solely pg log dups' entries

Reviewed-by: Neha Ojha <nojha@redhat.com>

commit | commitdiff | tree

Radosław Zarzyński [Sat, 11 Jun 2022 19:29:29 +0000 (21:29 +0200)]

tools: ceph-objectstore-tool is able to trim pg log dups' entries.

The main assumption is trimming just dups doesn't need any update
to the corresponding pg_info_t.

Testing:

1. cluster without the autoscaler
```
rzarz@ubulap:~/dev/ceph/build$ MON=1 MGR=1 OSD=3 MGR=1 MDS=0 ../src/vstart.sh -l -b -n -o "osd_pg_log_dups_tracked=3000000" -o "osd_pool_default_pg_autoscale_mode=off"
```

2. 8 PGs in the testing pool.
```
rzarz@ubulap:~/dev/ceph/build$ bin/ceph osd pool create test-pool 8 8
```

3. Provisioning dups with rados bench
```
bin/rados bench -p test-pool 300 write -b 4096  --no-cleanup
...
Total time run:         300.034
Total writes made:      103413
Write size:             4096
Object size:            4096
Bandwidth (MB/sec):     1.34637
Stddev Bandwidth:       0.589071
Max bandwidth (MB/sec): 2.4375
Min bandwidth (MB/sec): 0.902344
Average IOPS:           344
Stddev IOPS:            150.802
Max IOPS:               624
Min IOPS:               231
Average Latency(s):     0.0464151
Stddev Latency(s):      0.0183627
Max latency(s):         0.0928424
Min latency(s):         0.0131932
```

4. Killing osd.0
```
rzarz@ubulap:~/dev/ceph/build$ kill 2572129 # pid of osd.0
```

5. Listing PGs on osd.0 and calculating number of pg log's entries and
dups:

```
rzarz@ubulap:~/dev/ceph/build$ bin/ceph-objectstore-tool --data-path dev/osd0 --op list-pgs --pgid 2.c > osd0_pgs.txt
rzarz@ubulap:~/dev/ceph/build$ for pgid in `cat osd0_pgs.txt`; do echo $pgid; bin/ceph-objectstore-tool --data-path dev/osd0 --op log --pgid $pgid | jq '(.pg_log_t.log|length),(.pg_log_t.dups|length)'; done
2.7
10020
3100
2.6
10100
3000
2.3
10012
2800
2.1
10049
2900
2.2
10057
2700
2.0
10027
2900
2.5
10077
2700
2.4
10072
2900
1.0
97
0
```

6. Trimming dups
```
rzarz@ubulap:~/dev/ceph/build$ CEPH_ARGS="--osd_pg_log_dups_tracked 2500 --osd_pg_log_trim_max=100" bin/ceph-objectstore-tool --data-path dev/osd0 --op trim-pg-log-dups --pgid 2.7
max_dup_entries=2500 max_chunk_size=100
Removing keys dup_0000000020.00000000000000000001 - dup_0000000020.00000000000000000100
Removing keys dup_0000000020.00000000000000000101 - dup_0000000020.00000000000000000200
Removing keys dup_0000000020.00000000000000000201 - dup_0000000020.00000000000000000300
Removing keys dup_0000000020.00000000000000000301 - dup_0000000020.00000000000000000400
Removing keys dup_0000000020.00000000000000000401 - dup_0000000020.00000000000000000500
Removing keys dup_0000000020.00000000000000000501 - dup_0000000020.00000000000000000600
Finished trimming, now compacting...
Finished trimming pg log dups
```

7. Checking number of pg log's entries and dups
```
rzarz@ubulap:~/dev/ceph/build$ for pgid in `cat osd0_pgs.txt`; do echo $pgid; bin/ceph-objectstore-tool --data-path dev/osd0 --op log --pgid $pgid | jq '(.pg_log_t.log|length),(.pg_log_t.dups|length)'; done
2.7
10020
2500
2.6
10100
3000
2.3
10012
2800
2.1
10049
2900
2.2
10057
2700
2.0
10027
2900
2.5
10077
2700
2.4
10072
2900
1.0
97
0
```

Conflicts:
        src/tools/ceph_objectstore_tool.cc -- undetected conflict
        with d5445b8f113797718a0dbb05e884a6bffbfed76a. Fixed by
        adopting the patch no not require the `unique_ptr<T>::get()`.

Fixes: https://tracker.ceph.com/issues/53729
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
(cherry picked from commit a2190f901abf2fed20c65e59f53b38c10545cb5a)

commit | commitdiff | tree

Yuri Weinstein [Thu, 7 Jul 2022 13:30:41 +0000 (06:30 -0700)]

Merge pull request #46952 from idryomov/wip-56387-octopus

octopus: rbd-fuse: librados will filter out -r option from command-line

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Mon, 4 Jul 2022 14:05:28 +0000 (07:05 -0700)]

Merge pull request #46912 from idryomov/wip-rbd-deep-copy-progress-octopus

octopus: librbd: update progress for non-existent objects on deep-copy

Reviewed-by: Mykola Golub <mgolub@suse.com>

commit | commitdiff | tree

wanwencong [Fri, 24 Jun 2022 15:54:52 +0000 (23:54 +0800)]

rbd-fuse: librados will filter out -r option from command-line

The -r option will be filtered out by librados
when exec cmd "rbd-fuse /mountpoint -p pool_name -r rbd_name"
other rbds can be seen under the mount point

Fixes: https://tracker.ceph.com/issues/56387
Signed-off-by: wanwencong <wanwc@chinatelecom.cn>
(cherry picked from commit e99d64bc8a5c3bbb8a3632f211d4f56751cf499e)

commit | commitdiff | tree

Yuri Weinstein [Fri, 1 Jul 2022 17:43:20 +0000 (10:43 -0700)]

Merge pull request #46609 from rzarzynski/wip-55982-octopus

octopus: osd: log the number of 'dups' entries in a PG Log

Reviewed-by: Neha Ojha <nojha@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Fri, 1 Jul 2022 17:42:44 +0000 (10:42 -0700)]

Merge pull request #45159 from dvanders/wip-51323-octopus

octopus: qa: always format the pgid in hex

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Fri, 1 Jul 2022 17:39:33 +0000 (10:39 -0700)]

Merge pull request #44770 from guits/wip-54010-octopus

octopus: ceph-volume: zap osds in rollback_osd()

Reviewed-by: Teoman Onay <tonay@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Fri, 1 Jul 2022 17:37:52 +0000 (10:37 -0700)]

Merge pull request #43416 from trociny/wip-51908-octopus

octopus: crush: cancel upmaps with up set size != pool size

Reviewed-by: Neha Ojha <nojha@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Fri, 1 Jul 2022 17:36:29 +0000 (10:36 -0700)]

Merge pull request #45154 from pponnuvel/wip-54386-octopus

octopus: osd/OSD: Log aggregated slow ops detail to cluster logs

Reviewed-by: Prashant D <pdhange@redhat.com>
Reviewed-by: Dan Hill <daniel.hill@canonical.com>

commit | commitdiff | tree

Yuri Weinstein [Fri, 1 Jul 2022 13:37:06 +0000 (06:37 -0700)]

Merge pull request #45164 from dvanders/wip-52634-octopus

octopus: mds: ensure that we send the btime in cap messages

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Fri, 1 Jul 2022 13:36:02 +0000 (06:36 -0700)]

Merge pull request #45157 from dvanders/wip-50914-octopus

octopus: mds: add heartbeat_reset() in start_files_to_reover()

Reviewed-by: Venky Shankar <vshankar@redhat.com>

commit | commitdiff | tree

Ilya Dryomov [Tue, 28 Jun 2022 18:47:25 +0000 (20:47 +0200)]

librbd: make ImageCopyRequest::send_next_object_copy() return void

Make send_object_copies() consistent with handle_object_copy() wrt
calling send_next_object_copy().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 155fabd994f8a2052482d216df1ec0dfb40cb5b7)

commit | commitdiff | tree

Ilya Dryomov [Sun, 26 Jun 2022 11:05:09 +0000 (13:05 +0200)]

librbd: update progress for non-existent objects on deep-copy

As a side effect of commit e5a21e904142 ("librbd: deep-copy image copy
state machine skips clean objects"), handle_object_copy() stopped being
called for non-existent objects.  This broke progress_object_no logic,
which expects to "see" all object numbers so that update_progress()
callback invocations can be ordered.  Currently update_progress() based
progress reporting gets stuck after encountering a hole in the image.

To fix, arrange for handle_object_copy() to be called for all object
numbers, even if ObjectCopyRequest isn't created.  Defer the extra call
to the image work queue to avoid locking issues.

Fixes: https://tracker.ceph.com/issues/56181
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 6813a7879146aec40f204a174b40a5a54e00b780)

Conflicts:
src/librbd/deep_copy/ImageCopyRequest.cc [ commit aabfb76e51bf
  ("librbd: swapped ThreadPool/ContextWQ for AsioEngine") not in
  octopus ]
src/test/librbd/deep_copy/test_mock_ImageCopyRequest.cc [
  commit 235b27a8f08a ("librbd/deep_copy: skip snap list if
  object is known to be clean") not in octopus ]

commit | commitdiff | tree

Laura Flores [Wed, 29 Jun 2022 16:06:04 +0000 (11:06 -0500)]

Merge pull request #46149 from ljflores/wip-teuthology-octopus-backport

octopus: qa/tasks: teuthology octopus backport

commit | commitdiff | tree

Yuri Weinstein [Mon, 27 Jun 2022 13:35:57 +0000 (06:35 -0700)]

Merge pull request #46812 from idryomov/wip-rbd-mirror-remote-not-primary-octopus

octopus: rbd-mirror: generally skip replay/resync if remote image is not primary

Reviewed-by: Mykola Golub <mgolub@suse.com>
Reviewed-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>

commit | commitdiff | tree

Yuri Weinstein [Mon, 27 Jun 2022 13:34:51 +0000 (06:34 -0700)]

Merge pull request #46777 from idryomov/wip-rbd-schedule-backports-octopus

octopus: mirror snapshot schedule and trash purge schedule fixes

Reviewed-by: Mykola Golub <mgolub@suse.com>
Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Fri, 24 Jun 2022 15:10:32 +0000 (08:10 -0700)]

Merge pull request #46787 from adk3798/wip-56043-octopus

octopus: mgr/cephadm: try to get FQDN for active instance

Reviewed-by: Tatjana Dehler <tdehler@suse.com>

commit | commitdiff | tree

Yuri Weinstein [Thu, 23 Jun 2022 21:53:11 +0000 (14:53 -0700)]

Merge pull request #46645 from mgfritch/backport-39536-octopus

octopus: mgr/cephadm: fix and improve osd draining

Reviewed-by: Adam King adking@redhat.com

commit | commitdiff | tree

Yuri Weinstein [Thu, 23 Jun 2022 21:38:51 +0000 (14:38 -0700)]

Merge pull request #44541 from cfsnyder/wip-53493-octopus

octopus: mgr: limit changes to pg_num

Reviewed-by: Laura Flores <lflores@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Thu, 23 Jun 2022 21:33:05 +0000 (14:33 -0700)]

Merge pull request #45040 from ifed01/wip-ifed-fix-54288-oct

octopus: rocksdb: do not use non-zero recycle_log_file_num setting

Reviewed-by: Neha Ojha <nojha@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Thu, 23 Jun 2022 21:26:50 +0000 (14:26 -0700)]

Merge pull request #46687 from ifed01/wip-ifed-avl-update-cursor-oct

octopus: os/bluestore: Always update the cursor position in AVL near-fit search.

Reviewed-by: Mark Nelson <mnelson@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Wed, 22 Jun 2022 16:29:55 +0000 (09:29 -0700)]

Merge pull request #46745 from mkogan1/rgw-backport-octopus-t54363

octopus: rgwlc: fix segfault resharding during lc

Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Matt Benjamin <mbenjamin@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Wed, 22 Jun 2022 16:28:38 +0000 (09:28 -0700)]

Merge pull request #46701 from mkogan1/rgw-octopus-fix-fips-segf

octopus rgw: on FIPS enabled, fix segfault performing s3 multipart PUT

Reviewed-by: Casey Bodley <cbodley@redhat.com>

commit | commitdiff | tree

Ilya Dryomov [Sat, 18 Jun 2022 13:25:49 +0000 (15:25 +0200)]

rbd-mirror: spell out "remote image is not primary" status correctly

There is a difference: non-primary means NON_PRIMARY promotion state,
while "not primary" can refer to any of NON_PRIMARY, ORPHAN or UNKNOWN
promotion states.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 2bd2050909f22980b113870f17a50e2efbe02ac7)

commit | commitdiff | tree

Ilya Dryomov [Sat, 18 Jun 2022 11:15:02 +0000 (13:15 +0200)]

rbd-mirror: fix up "error preparing image for replay" messages

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 8ce97c8cb72a0049009caae69bc2e6c8b36ecbee)

commit | commitdiff | tree

Ilya Dryomov [Sat, 18 Jun 2022 11:00:34 +0000 (13:00 +0200)]

rbd-mirror: fix up PrepareReplayDisconnected test case

It was botched in commit 2bca9ee96c65 ("rbd-mirror: consolidate
prepare local/remote image steps to bootstrap") and went unnoticed
because currently no special handling is needed for disconnected
clients -- is_disconnected() check happens to be the last step
and it doesn't generate an error.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit b6d6a2737ebb8c131d929bca6d39a0b15c67755e)

commit | commitdiff | tree

Ilya Dryomov [Mon, 20 Jun 2022 15:21:08 +0000 (17:21 +0200)]

rbd-mirror: drop m_remote_promotion_state from PrepareReplayRequest

Now unused (and if it was used, the entire StateBuilder is passed in
anyway).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit adf7f1ee2ef70ca3e728f2c5619668dc38216edf)

commit | commitdiff | tree

Ilya Dryomov [Mon, 20 Jun 2022 12:19:41 +0000 (14:19 +0200)]

rbd-mirror: generally skip replay/resync if remote image is not primary

Replay and resync should generally be skipped if the remote image is
not primary.

If this is not done for replay, snapshot-based mirroring can run into
a livelock if the primary image is demoted while a mirror snapshot is
being synced.  On the demote site, rbd-mirror would pick up the just
demoted image, grab the exclusive lock on it and idle waiting for a new
mirror snapshot to be created.  On the (still) non-primary site,
rbd-mirror would eventually finish syncing that mirror snapshot and
attempt to unlink from it on the demote site.  These attempts would
fail with EROFS due to exclusive lock being held in the "refuse proxied
maintenance operations" mode, blocking forward progress (syncing of the
demotion snapshot so that the non-primary image can be orderly promoted
to primary, etc).

If this is not done for resync, data loss can ensue as the just demoted
image would be immediately trashed, underneath the non-primary site that
is still syncing.

Currently this is done in PrepareReplayRequest only for journal-based
mirroring.  Note that it is conditional: if the local image is linked
to the remote image, proceeding is desirable.

Generalize this check, consolidate it with a related check in
PrepareRemoteImageRequest and move the result to BootstrapRequest to
cover both "local image does not exist" and "local image is unlinked"
cases for both modes.

Fixes: https://tracker.ceph.com/issues/54448
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 79d28e63cb47e9bacbb2fd8a5ddc3a092377731b)

commit | commitdiff | tree

Ilya Dryomov [Sat, 18 Jun 2022 10:35:51 +0000 (12:35 +0200)]

rbd-mirror: strengthen is_local_primary() and is_linked()

Initialize local_promotion_state and remote_promotion_state to UNKNOWN
instead of counterintuitive PRIMARY and NON_PRIMARY -- half the time the
final values are flipped. Then is_local_primary() and is_linked() can
be strengthened as a non-existent image should stay in UNKNOWN.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit c60f1d5813c7fe248593731bbffb43d12cdd3b62)

commit | commitdiff | tree

Tatjana Dehler [Mon, 16 May 2022 13:05:37 +0000 (15:05 +0200)]

mgr/cephadm: try to get FQDN for active instance

Fixes: https://tracker.ceph.com/issues/55674
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
(cherry picked from commit d0385e030b391f588b4ec0dc707d5d46778a2aaa)

Conflicts:
src/pybind/mgr/cephadm/services/monitoring.py
src/pybind/mgr/cephadm/tests/test_services.py

commit | commitdiff | tree

Ilya Dryomov [Sun, 19 Jun 2022 10:12:01 +0000 (12:12 +0200)]

mgr/rbd_support: always rescan image mirror snapshots on refresh

Establishing a watch on rbd_mirroring object and skipping rescanning
image mirror snapshots on periodic refresh unless rbd_mirroring object
gets notified in the interim is flawed.  rbd_mirroring object is
notified when mirroring is enabled or disabled on some image (including
when the image is removed), but it is not notified when images are
promoted or demoted.  However, load_pool_images() discards images that
are not primary at the time of the scan.  If the image is promoted
later, no snapshots are created even if the schedule is in place.  This
happens regardless of whether the schedule is added before or after the
promotion.

This effectively reverts commit 69259c8d3722 ("mgr/rbd_support: make
mirror_snapshot_schedule rescan only updated pools").  An alternative
fix could be to stop discarding non-primary images (i.e. drop

    if not info['primary']:
        continue

check added in commit d39eb283c5ce ("mgr/rbd_support: mirror snapshot
schedule should skip non-primary images")), but that would clutter the
queue and therefore "rbd mirror snapshot schedule status" output with
bogus entries.  Performing a rescan roughly every 60 seconds should be
manageable: currently it amounts to a single mirror_image_status_list
request, followed by mirror_image_get, get_snapcontext and snapshot_get
requests for each snapshot-based mirroring enabled image and concluded
by a single dir_list request.  Among these, per-image get_snapcontext
and snapshot_get requests are necessary for determining primaryness.

Fixes: https://tracker.ceph.com/issues/53914
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 7fb4fdbed0b908a2105ac44deca48f2170e46fe5)

Conflicts:
src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py [ commit
  e4a16e261370 ("mgr/rbd_support: add type annotation") not in
  octopus ]

commit | commitdiff | tree

Ilya Dryomov [Fri, 17 Jun 2022 12:03:20 +0000 (14:03 +0200)]

mgr/rbd_support: avoid losing a schedule on load vs add race

If load_schedules() (i.e. periodic refresh) races with add_schedule()
invoked by the user for a fresh image, that image's schedule may get
lost until the next rebuild (not refresh!) of the queue:

1. periodic refresh invokes load_schedules()
2. load_schedules() creates a new Schedules instance and loads
   schedules from rbd_mirror_snapshot_schedule object
3. add_schedule() is invoked for a new image (an image that isn't
   present in self.images) by the user
4. before load_schedules() can grab self.lock, add_schedule() commits
   the new schedule to rbd_mirror_snapshot_schedule object and adds it
   to self.schedules
5. load_schedules() grabs self.lock and reassigns self.schedules with
   Schedules instance that is now stale
6. periodic refresh invokes load_pool_images() which discovers the new
   image; eventually it is added to self.images
7. periodic refresh invokes refresh_queue() which attempts to enqueue()
   the new image; this fails because a matching schedule isn't present

The next periodic refresh recovers the discarded schedule from
rbd_mirror_snapshot_schedule object but no attempt to enqueue() that
image is made since it is already "known" at that point.  Despite the
schedule being in place, no snapshots are created until the queue is
rebuilt from scratch or rbd_support module is reloaded.

To fix that, extend self.lock critical sections so that add_schedule()
and remove_schedule() can't get stepped on by load_schedules().

Fixes: https://tracker.ceph.com/issues/56090
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 95a0ec7b428c87294ca4a96ff6afcf613bc67144)

Conflicts:
src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py [ commit
  e4a16e261370 ("mgr/rbd_support: add type annotation") not in
  octopus ]
src/pybind/mgr/rbd_support/trash_purge_schedule.py [ ditto ]

commit | commitdiff | tree

Ilya Dryomov [Fri, 17 Jun 2022 08:28:55 +0000 (10:28 +0200)]

mgr/rbd_support: refresh schedule queue immediately after delay elapses

The existing logic often leads to refresh_pools() and refresh_images()
being invoked after a 120 second delay instead of after an intended 60
second delay.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit ef3edd399adc99f3ff2acf580727d3dd5439d862)

Conflicts:
src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py [ commit
e4a16e261370 ("mgr/rbd_support: add type annotation") not in
octopus ]
src/pybind/mgr/rbd_support/trash_purge_schedule.py [ ditto ]

commit | commitdiff | tree

Ilya Dryomov [Thu, 16 Jun 2022 17:23:58 +0000 (19:23 +0200)]

mgr/rbd_support: bail from refresh_pools() when there is no schedule

Make refresh_pools() behave the same as refresh_images().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 7d1e644b62909fb3844c0559f5ea94419eae6864)

Conflicts:
src/pybind/mgr/rbd_support/trash_purge_schedule.py [ commit
e4a16e261370 ("mgr/rbd_support: add type annotation") not in
octopus ]

commit | commitdiff | tree

Ilya Dryomov [Thu, 16 Jun 2022 17:10:32 +0000 (19:10 +0200)]

mgr/rbd_support: add logs for when there is no schedule and for descheduling

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 568345b47503d8e69e6f7d074a0083fc5de44a2e)

Conflicts:
src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py [ commit
e4a16e261370 ("mgr/rbd_support: add type annotation") not in
octopus ]
src/pybind/mgr/rbd_support/trash_purge_schedule.py [ ditto ]

commit | commitdiff | tree

Ilya Dryomov [Thu, 16 Jun 2022 16:15:26 +0000 (18:15 +0200)]

mgr/rbd_support: disambiguate mirror snapshot and trash purge schedule logs

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit bd4af8201cfbcce9ea473043aa9146e27f553b0e)

Conflicts:
src/pybind/mgr/rbd_support/mirror_snapshot_schedule.py [ commit
e4a16e261370 ("mgr/rbd_support: add type annotation") not in
octopus ]
src/pybind/mgr/rbd_support/trash_purge_schedule.py [ ditto ]

commit | commitdiff | tree

Mark Kogan [Tue, 15 Mar 2022 13:06:56 +0000 (15:06 +0200)]

octopus: rgwlc: fix segfault resharding during lc

Fixes: https://tracker.ceph.com/issues/54363
Signed-off-by: Mark Kogan <mkogan@redhat.com>
(cherry picked from commit 7d2e72a9d0451e36141282d6456a4c23d753b592)

commit | commitdiff | tree

Yuri Weinstein [Wed, 15 Jun 2022 20:40:09 +0000 (13:40 -0700)]

Merge pull request #46592 from idryomov/wip-rbd-unlink-newest-snap-at-capacity-octopus

octopus: librbd: unlink newest mirror snapshot when at capacity, bump capacity

Reviewed-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
Reviewed-by: Mykola Golub <mgolub@suse.com>

commit | commitdiff | tree

Yuri Weinstein [Wed, 15 Jun 2022 20:39:02 +0000 (13:39 -0700)]

Merge pull request #46589 from idryomov/wip-rbd-preserve-non-primary-snap-octopus

octopus: rbd-mirror: don't prune non-primary snapshot when restarting delta sync

Reviewed-by: Mykola Golub <mgolub@suse.com>
Reviewed-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>

commit | commitdiff | tree

Mark Kogan [Wed, 15 Jun 2022 17:02:43 +0000 (20:02 +0300)]

octopus rgw: on FIPS enabled, fix segfault performing s3 multipart PUT

Fixes: https://tracker.ceph.com/issues/56068
Signed-off-by: Mark Kogan <mkogan@redhat.com>

commit | commitdiff | tree

Mark Nelson [Wed, 13 Apr 2022 00:53:56 +0000 (00:53 +0000)]

os/bluestore: Always update the cursor position in AVL near-fit search.

Signed-off-by: Mark Nelson <mnelson@redhat.com>
(cherry picked from commit 3bed53debfa2f9ec9d31021ce7eaf8b78f78f9e0)

commit | commitdiff | tree

Sage Weil [Wed, 17 Feb 2021 21:21:02 +0000 (15:21 -0600)]

mgr/cephadm: fix up the strings reporting osd ids

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit a1ff3a9952778c1f20836b806de9fa5606432137)

commit | commitdiff | tree

Sage Weil [Wed, 17 Feb 2021 21:20:22 +0000 (15:20 -0600)]

mgr/cephadm: remove daemon before osd destroy/purge

Otherwise it doesn't work!

Drop the fullname property: it is always "osd.{self.osd_id}".

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit b5eab0ddfa0bb8ae7b1a6aec4ea2e4257a01a045)

commit | commitdiff | tree

Sage Weil [Wed, 17 Feb 2021 20:57:10 +0000 (14:57 -0600)]

mgr/cephadm: simplify OSD __str__ for drain

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit ca4050b057296d3c62deefca0ffcb4f640b30102)

commit | commitdiff | tree

Sage Weil [Wed, 17 Feb 2021 16:28:05 +0000 (10:28 -0600)]

mgr/cephadm: make drain adjust crush weight if not replacing

If we are replacing an OSD, we should mark it out and then back in
again when a new device shows up. However, if we are going to
destroy an OSD, we should just weight it to 0 in crush, so that data
doesn't move again once the OSD is purged.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 4fc1309f281356db0a074da22aa6f2daa034df8d)

commit | commitdiff | tree

Sage Weil [Wed, 17 Feb 2021 20:26:14 +0000 (14:26 -0600)]

mgr/cephadm: less log noise from osd drain code

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit e2f0e56ddf3197f220c5a43c79d6bc43e4b135ce)

commit | commitdiff | tree

Sage Weil [Wed, 17 Feb 2021 20:31:57 +0000 (14:31 -0600)]

mgr/cephadm: fix 'orch daemon add osd ...'

When adding an osd daemon explicitly, there is no created timestamp
for the spec, and we should never not apply it.

Fixes: b129c1312113f56a227caeb535f656f5a090a85f
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit e8643275e5d92af9539e60a7a80ef13d0f27af64)

commit | commitdiff | tree

Sage Weil [Wed, 3 Feb 2021 18:22:39 +0000 (12:22 -0600)]

mgr/cephadm: 'drive group' -> 'service'

...and add 'osd.' prefix

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 756bd773d4c6ff57e11c02502fb26f3500f928ad)

commit | commitdiff | tree

Sage Weil [Tue, 2 Feb 2021 23:09:15 +0000 (17:09 -0600)]

mgr/cephadm: only reapply osd spec if devices have changed

This avoids a lot of useless work when the devices have not changed.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit b129c1312113f56a227caeb535f656f5a090a85f)

commit | commitdiff | tree

Sage Weil [Tue, 2 Feb 2021 23:07:36 +0000 (17:07 -0600)]

mgr/cephadm: use datetime_now() for last_facts_update

Be consistent!

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 821f7e9d5b5a2d4725c5cea2987e601e07e83558)

commit | commitdiff | tree

Sage Weil [Tue, 2 Feb 2021 19:46:44 +0000 (13:46 -0600)]

mgr/cephadm: track last_device_change

Keep track of when the device inventory and/or state *changes*.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 2b7d8e16309b12273d2e65ce638c9588528ee1f0)

commit | commitdiff | tree

Sage Weil [Tue, 2 Feb 2021 18:01:32 +0000 (12:01 -0600)]

mgr/cephadm: track last_applied by host for osd specs

For each host, note when we last applied each osdspec. Log the start
time, not the end time.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 197a8ba22ff30ccf9498bbc14b7a3897e48e1220)

commit | commitdiff | tree

Radoslaw Zarzynski [Thu, 9 Jun 2022 21:38:09 +0000 (21:38 +0000)]

Revert "tools/ceph_objectstore_took: Add duplicate entry trimming"

This reverts commit ef04b8c06b2a0c5655233174f317994b1e70741c.

Although the chunking in off-line `dups` trimming (via COT) seems
fine, the `ceph-objectstore-tool` is a client of `trim()` of
`PGLog::IndexedLog` which means than a partial revert is not
possible without extensive changes.

The backport ticket is: https://tracker.ceph.com/issues/55990

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

commit | commitdiff | tree

Radoslaw Zarzynski [Thu, 9 Jun 2022 21:36:17 +0000 (21:36 +0000)]

Revert "osd/PGLog.cc: Trim duplicates by number of entries"

This reverts commit 7cc1b29f2b7b7feee127f1dbbef947799e56f38b.
which is the in-OSD part of the fix for accumulation of `dup`
entries in a PG Log. Brainstorming it has brought questions
on the OSD's behaviour during an upgrade if there are tons of
dups in the log. What must be double-checked before bringing
it back is ensuring we chunk the deletions properly to not
impose OOMs / stalls in, to exemplify, RocksDB.

The backport ticket is: https://tracker.ceph.com/issues/55990

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

commit | commitdiff | tree

Radoslaw Zarzynski [Thu, 9 Jun 2022 18:44:10 +0000 (18:44 +0000)]

osd: log the number of 'dups' entries in a PG Log

We really want to have the ability to know how many
entries `PGLog::IndexedLog::dups` has inside.
The current ways are either invasive (stopping an OSD)
or indirect (examination of `dump_mempools`).

The code comes from Nitzan Mordechai (part of
ede37edd79a9d5560dfb417ec176327edfc0e4a3).

Fixes: https://tracker.ceph.com/issues/55982
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 8f1c8a7309976098644bb978d2c1095089522846)

commit | commitdiff | tree

Ilya Dryomov [Sun, 29 May 2022 16:20:34 +0000 (18:20 +0200)]

librbd: unlink newest mirror snapshot when at capacity, bump capacity

CreatePrimaryRequest::unlink_peer() invoked via "rbd mirror image
snapshot" command or via rbd_support mgr module when creating a new
scheduled mirror snapshot at rbd_mirroring_max_mirroring_snapshots
capacity on the primary cluster can race with Replayer::unlink_peer()
invoked by rbd-mirror when finishing syncing an older snapshot on the
secondary cluster.  Consider the following:

   [ primary: primary-snap1, primary-snap2, primary-snap3
     secondary: non-primary-snap1 (complete), non-primary-snap2 (syncing) ]

0. rbd-mirror is syncing snap1..snap2 delta
1. rbd_support creates primary-snap4
2. due to rbd_mirroring_max_mirroring_snapshots == 3, rbd_support picks
   primary-snap3 for unlinking
3. rbd-mirror finishes syncing snap1..snap2 delta and marks
   non-primary-snap2 complete

   [ snap1 (the old base) is no longer needed on either cluster ]

4. rbd-mirror unlinks and removes primary-snap1
5. rbd-mirror removes non-primary-snap1
6. rbd-mirror picks snap2 as the new base
7. rbd-mirror creates non-primary-snap3 and starts syncing snap2..snap3
   delta

   [ primary: primary-snap2, primary-snap3, primary-snap4
     secondary: non-primary-snap2 (complete), non-primary-snap3 (syncing) ]

8. rbd_support unlinks and removes primary-snap3 which is in-use by
   rbd-mirror

If snap trimming on the primary cluster kicks in soon enough, the
secondary image becomes corrupted: rbd-mirror would eventually finish
"syncing" non-primary-snap3 and mark it complete in spite of bogus data
in the HEAD -- the primary cluster OSDs would start returning ENOENT
for snap trimmed objects.  Luckily, rbd-mirror's attempt to pick snap3
as the new base would wedge the replayer with "split-brain detected:
failed to find matching non-primary snapshot in remote image" error.

Before commit a888bff8d00e ("librbd/mirror: tweak which snapshot is
unlinked when at capacity") this could happen pretty much all the time
as it was the second oldest snapshot that was unlinked.  This commit
changed it to be the third oldest snapshot, turning this into a more
narrow but still very much possible to hit race.

Unfortunately this race condition appears to be inherent to the way
snapshot-based mirroring is currently implemented:

a. when mirror snapshots are created on the producer side of the
   snapshot queue, they are already linked
b. mirror snapshots can be concurrently unlinked/removed on both
   sides of the snapshot queue by non-cooperating clients (local
   rbd_mirror_image_create_snapshot() vs remote rbd-mirror)
c. with mirror peer links off the list due to (a), there is no
   existing way for rbd-mirror to persistently mark a snapshot as
   in-use

As a workaround, bump rbd_mirroring_max_mirroring_snapshots to 5 and
always unlink the newest snapshot (i.e. slot 4) instead of the third
oldest snapshot (i.e. slot 2).  Hopefully this gives enough leeway,
as rbd-mirror would need to sync two snapshots (i.e. transition from
syncing 0-1 to 1-2 and then to 2-3) before potentially colliding with
rbd_mirror_image_create_snapshot() on slot 4.

Fixes: https://tracker.ceph.com/issues/55803
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit ef83c0f347c38f3eee70d554d5433a1a17b209b6)

Conflicts:
src/common/options/rbd.yaml.in [ options are defined in
  src/common/options.cc in octopus ]

commit | commitdiff | tree

Ilya Dryomov [Sun, 29 May 2022 17:55:04 +0000 (19:55 +0200)]

test/librbd: fix set_val() call in SuccessUnlink* test cases

rbd_mirroring_max_mirroring_snapshots isn't actually set to 3 there
due to the stray conf_ prefix. It didn't matter until now because the
default was also 3.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 94703c1036d1e666b4c950656be9a0f291021279)

commit | commitdiff | tree

Ilya Dryomov [Sat, 28 May 2022 18:06:22 +0000 (20:06 +0200)]

rbd-mirror: don't prune non-primary snapshot when restarting delta sync

When restarting interrupted sync (signified by the "end" non-primary
snapshot with last_copied_object_number > 0), preserve the "start"
non-primary snapshot until the sync is completed, like it would have
been done had the sync not been interrupted. This ensures that the
same m_local_snap_id_start is passed to scan_remote_mirror_snapshots()
and ultimately ImageCopyRequest state machine on restart as on initial
start.

This ends up being yet another fixup for 281af0de86b1 ("rbd-mirror:
prune unnecessary non-primary mirror snapshots"), following earlier
7ba9214ea5b7 ("rbd-mirror: don't prune older mirror snapshots when
pruning incomplete snapshot") and ecd3778a6f9a ("rbd-mirror: ensure
that the last non-primary snapshot cannot be pruned").

Fixes: https://tracker.ceph.com/issues/55796
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 3ba82f2aa73871af9ef190c06e2c99eed6d21e7b)

commit | commitdiff | tree

Ilya Dryomov [Sat, 28 May 2022 08:04:11 +0000 (10:04 +0200)]

cls/rbd: fix operator<< for MirrorSnapshotNamespace

Commit 50702eece0b1 ("cls/rbd: added clean_since_snap_id to
MirrorSnapshotNamespace") updated dump() but missed operator<<
overload.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 8ddce107d02bbf3021a53a2861024f66a3ec0918)

commit | commitdiff | tree

Laura Flores [Wed, 8 Jun 2022 14:37:42 +0000 (09:37 -0500)]

Merge pull request #46392 from ljflores/wip-55743-octopus

octopus: qa/suites/rados/thrash-erasure-code-big/thrashers: add `osd max backfills` setting to mapgap and pggrow

commit | commitdiff | tree

zdover23 [Wed, 8 Jun 2022 13:54:07 +0000 (23:54 +1000)]

Merge pull request #46451 from zdover23/wip-doc-2022-05-31-backport-46430-octopus

octopus: doc/start: update "memory" in hardware-recs.rst

Reviewed-by: Mark Nelson <mnelson@redhat.com>

commit | commitdiff | tree

Zac Dover [Mon, 30 May 2022 13:32:06 +0000 (23:32 +1000)]

doc/start: update "memory" in hardware-recs.rst

This PR corrects some usage errors in the "Memory" section
of the hardware-recommendations.rst file. It also closes
some opened but never closed parentheses.

Signed-off-by: Zac Dover <zac.dover@gmail.com>
(cherry picked from commit 429bbdea65188df6708832efee188e0a40e1cde2)

doc: (squash) removing confvals

Signed-off-by: Zac Dover <zac.dover@gmail.com>

commit | commitdiff | tree

Yuri Weinstein [Tue, 7 Jun 2022 14:01:48 +0000 (07:01 -0700)]

Merge pull request #45891 from nkshirsagar/wip-55298-octopus

octopus: Catch exception if thrown by __generate_command_map()

Reviewed-by: Laura Flores <lflores@redhat.com>

commit | commitdiff | tree

Ernesto Puerta [Thu, 2 Jun 2022 17:17:10 +0000 (19:17 +0200)]

Merge pull request #45726 from rhcs-dashboard/wip-54997-octopus

octopus: mgr/dashboard: Table columns hiding fix

Reviewed-by: Sarthak Gupta <sarthak.dev.0702@gmail.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: kalaspuffar <NOT@FOUND>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Wed, 1 Jun 2022 23:14:56 +0000 (16:14 -0700)]

Merge pull request #46253 from rzarzynski/wip-45529-octopus

octopus: osd/PGLog.cc: Trim duplicates by number of entries

Reviewed-by: Nitzan Mordechai <nmordech@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Wed, 1 Jun 2022 23:12:40 +0000 (16:12 -0700)]

Merge pull request #45161 from dvanders/wip-51933-octopus

octopus: mds: check rejoin_ack_gather before enter rejoin_gather_finish

Reviewed-by: Venky Shankar <vshankar@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Wed, 1 Jun 2022 23:12:08 +0000 (16:12 -0700)]

Merge pull request #45158 from dvanders/wip-51202-octopus

octopus: mds: progress the recover queue immediately after the inode is enqueued

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Wed, 1 Jun 2022 23:11:14 +0000 (16:11 -0700)]

Merge pull request #45055 from Vicente-Cheng/wip-54219-octopus

octopus: mds: fix seg fault in expire_recursive

Reviewed-by: Venky Shankar <vshankar@redhat.com>

commit | commitdiff | tree

David Galloway [Wed, 1 Jun 2022 20:19:15 +0000 (16:19 -0400)]

Merge pull request #46489 from ceph/octopus-nobranch

octopus: qa: remove .teuthology_branch file

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom