Sage Weil [Tue, 5 Oct 2021 16:06:09 +0000 (11:06 -0500)]
qa/tasks/nvme_loop: set up nvme_loop on scratch_devs
Using an nvme loop device makes the LVs look like "real" disks,
which means we can exercise all of the normal code paths for
provisioning, deprovisioning, and zapping.
crimson/osd: cancel IO reservations on PG::stop().
`PG::request_{local,remote}_recovery_reservation()` dynamically allocates
up to 2 instances of `LambdaContext<T>` and transfers their ownership to
the `AsyncReserver<T, F>`. This is expressed in raw pointers (`new` and
`delete`) notion. Further analysis shows the only place where `delete`
for these objects is called is the `AsyncReserver::cancel_reservation()`.
In contrast to the classical OSD, crimson doesn't invoke the method when
stopping a PG during the shutdown sequence. This would explain the
following ASan issue observed at Sepia:
before this change the note on "apply" command is embedded in the note
on "_no_schedule". and they are not related. so let's move the former
out. also, highlight the yaml file sample in YAML.
crimson/osd: write the 'osd_key' meta on OSD::mkfs().
This commit fixes an issue identified during the Rook-crimson effort.
Missing the `write_meta()` on `osd_key` made the CephX inoperational
because of imposibility to load the keyring. Disabling CephX in turn
caused the auth method negotation to fail when reaching out to a monitor.
```
ERROR 2021-09-28 21:19:46,598 [shard 0] none - auth: unable to find a keyring on /var/lib/ceph/osd/ceph-0/keyring: (2) No such file or directory
ERROR 2021-09-28 21:19:46,598 [shard 0] none - AuthRegistry(0x7fa38c322b68) no keyring found at /var/lib/ceph/osd/ceph-0/keyring, disabling cephx
...
INFO 2021-09-28 21:19:46,601 [shard 0] monc - get_auth_request(con=[client.?(temp_mon_client) 172.17.0.1:0/2910147961@63138 >> mon.? v2:10.108.187.31:3300/0], auth_method=0)
INFO 2021-09-28 21:19:46,601 [shard 0] monc - get_auth_request no methods is supported
...
WARN 2021-09-28 21:20:06,612 [shard 0] monc - cannot establish the active_con with any mon
```
Samuel Just [Wed, 29 Sep 2021 01:46:25 +0000 (18:46 -0700)]
crimson/os/seastore/transaction_manager: limit callers to reserve_projected_usage
Adds an exclusive stage for obtaining projected usage as well as an
unordered one for submitting ool writes. This should allow for a
straightforward wait-list when io is blocked while still allowing
concurrent submission of ool writes otherwise.
Fixes: https://tracker.ceph.com/issues/52698 Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just [Wed, 29 Sep 2021 01:43:02 +0000 (18:43 -0700)]
crimson/os/seastore/segment_cleaner: track projected usage for in progress operations
We're going to want to permit multiple transactions to be writing
concurrently. Replace await_hard_limits() with a mechanism that
remembers bytes that will be used by in-progress operations.
Sage Weil [Tue, 28 Sep 2021 14:58:24 +0000 (10:58 -0400)]
Merge PR #43177 into master
* refs/pull/43177/head:
osd/PrimaryLogPG: drop ops when pool has EIO flag
osdc/Objecter: set SUPPORTSPOOLEIO flag on all ops
ceph_test_rados_api_aio: test pool EIO flag
osdc/Objecter: return EIO for new linger ops
osdc/Objecter: return EIO for existing ops and linger ops
osdc/Objecter: return EIO for new ops
osd,mon: add EIO pool flag
Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Neha Ojha <nojha@redhat.com> Reviewed-by: Samuel Just <sjust@redhat.com>
where the numbers of scrubbed object, clones, dirty and omap are always
less than the total number of corresponding numbers, if the PG contains
object(s) whose hash happens to be 0xffffffff.
in this change, if the calculated hash of the upper bound is greater
than the maximum possible number represented by uint32_t, in addition to
setting the hash of the upper bound hobj to 0xffffffff, we also set the
nspace of hobj of the upper bound to "\xff", so that the upper bound
is greater than an hobj whose hash happens to be 0xfffffff. please note,
the nspace of "\xff" is not an ascii string, so it's not likely to be
less than a real-world nspace of an hobj.
with this new *greater* upper bound, we are able to include the previous
missing hobj when listing the objects in a PG. so the scrub won't be
annoyed when the number of objects does not match.
crimson/os/seastore: introduce ool related metrics with misc improvements
* The number of ool records written;
* Write overhead from journal/ool records;
* Wasted writes from invalided ool records;
* Wasted writes from erased inline extents;
* Distinguish ool and inline extents from metrics;
Joseph Sawaya [Thu, 23 Sep 2021 15:07:23 +0000 (11:07 -0400)]
mgr/rook, qa/tasks/rook: change rgw daemon service name
This commit changes the rgw daemon service name format from
rgw.<realm name>.<zone name> to rgw.<resource_name> and changes the daemon
removal in the QA accordingly. This also gets rid of the Rook API when
describing services.
Joseph Sawaya [Wed, 22 Sep 2021 20:46:14 +0000 (16:46 -0400)]
mgr/rook: orch rm no longer uses rook api delete
This commit changes orch rm to no longer use the rook api to delete the daemon
but instead directly delete the corresponding CR using the kubernetes api.
Joseph Sawaya [Tue, 21 Sep 2021 13:41:28 +0000 (09:41 -0400)]
qa/tasks/rook: fix cluster deletion hanging due to CephObjectStore CR
This commit fixes the issue where the cluster deletion hangs in the QA
while a CephObjectStore CR is still up by removing all rgw/nfs/mds/rbd-mirror
daemons before tearing down the rest of the cluster.
Joseph Sawaya [Tue, 7 Sep 2021 13:06:08 +0000 (09:06 -0400)]
mgr/rook: use default replication size in orch apply rgw
This commit changes `orch apply rgw` to use the osd_pool_default_size
when setting the replication size for the data pool and metadata pool
of the rgw daemon. This commit also adds `orch apply rgw` to the Rook
QA.
the current test is wrong because it generates the tcmu-runner part two
times.
given the function `deploy_daemon_units()` in cephadm already writes a
first time the tcmu-runner command, calling a second time
`get_tcmu_runner_container()` from the test makes `deploy_daemon_units()`
write the same command again.
jerryluo [Mon, 25 Jan 2021 16:10:57 +0000 (00:10 +0800)]
mon/OSDMonitor: Make the pg_num check more accurate
In check_pg_num function, finding the corresponding osd according to the
current pool's crush rule, and calculating whether the average value of
pg_num on these osd will exceed the value of 'mon_max_pg_per_osd'. Make
the pg_num check more accurate by counting all the pgs on the osd used
by the new pool.
Fixes: https://tracker.ceph.com/issues/47062 Signed-off-by: Jerry Luo <luojierui@chinatelecom.cn>
tcmu-runner logs in `/var/log/tcmu-runner.log`, there's no option to
make it log to stdout/stderr so the log is only available from the
container.
Modifying the bindmount from `-v /var/log/ceph/<fsid>/:/var/log/rbd-target-api:z`
to `-v /var/log/ceph/<fsid>/:/var/log:z` makes it at least available
from the host.
ecb8d2cae2c063acf4e7e1bffed887d52117762f disabled the system_pmdk bcond for all
build targets based on the fact that pmdk 1.10 was not available on any of them.
Now that openSUSE Tumbleweed ships pmdk 1.11, we re-enable the system_pmdk bcond
to fix the master build for openSUSE Tumbleweed.
Since openSUSE Tumbleweed is the *only* SUSE build target master supports, there
is no need for greater granularity in the distro conditional here.
osd/scrub: collecting scrub-related files into a separate directory
Cleaning src/osd from scrub implementation files. Triggered by:
- the matching Crimson scrub structure;
- the proliferation of scrub related code files (inc. in coming PRs);
scrubber_common.h, which defines the scrubber's interface, remains
in src/osd.
The LBA tree implementation only requires that the start addr of
a logical extent be contained within the leaf range. It's entirely
possible for the end of a logical extent to extend past the end addr
of the containing leaf node.
Fixes: https://tracker.ceph.com/issues/52709 Signed-off-by: Samuel Just <sjust@redhat.com>