Samuel Just [Wed, 29 Sep 2021 01:43:02 +0000 (18:43 -0700)]
crimson/os/seastore/segment_cleaner: track projected usage for in progress operations
We're going to want to permit multiple transactions to be writing
concurrently. Replace await_hard_limits() with a mechanism that
remembers bytes that will be used by in-progress operations.
Sage Weil [Tue, 28 Sep 2021 14:58:24 +0000 (10:58 -0400)]
Merge PR #43177 into master
* refs/pull/43177/head:
osd/PrimaryLogPG: drop ops when pool has EIO flag
osdc/Objecter: set SUPPORTSPOOLEIO flag on all ops
ceph_test_rados_api_aio: test pool EIO flag
osdc/Objecter: return EIO for new linger ops
osdc/Objecter: return EIO for existing ops and linger ops
osdc/Objecter: return EIO for new ops
osd,mon: add EIO pool flag
Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Neha Ojha <nojha@redhat.com> Reviewed-by: Samuel Just <sjust@redhat.com>
ecb8d2cae2c063acf4e7e1bffed887d52117762f disabled the system_pmdk bcond for all
build targets based on the fact that pmdk 1.10 was not available on any of them.
Now that openSUSE Tumbleweed ships pmdk 1.11, we re-enable the system_pmdk bcond
to fix the master build for openSUSE Tumbleweed.
Since openSUSE Tumbleweed is the *only* SUSE build target master supports, there
is no need for greater granularity in the distro conditional here.
osd/scrub: collecting scrub-related files into a separate directory
Cleaning src/osd from scrub implementation files. Triggered by:
- the matching Crimson scrub structure;
- the proliferation of scrub related code files (inc. in coming PRs);
scrubber_common.h, which defines the scrubber's interface, remains
in src/osd.
The LBA tree implementation only requires that the start addr of
a logical extent be contained within the leaf range. It's entirely
possible for the end of a logical extent to extend past the end addr
of the containing leaf node.
Fixes: https://tracker.ceph.com/issues/52709 Signed-off-by: Samuel Just <sjust@redhat.com>
Sage Weil [Fri, 24 Sep 2021 14:36:41 +0000 (10:36 -0400)]
Merge PR #42384 into master
* refs/pull/42384/head:
mgr/cephadm: kick serve loop if new metadata makes all hosts metadata up to date
mgr/cephadm: fix upgrade to version with agent
mgr/cephadm: delay actions on agent daemons if root cert not created
mgr/cephadm: handle making certs when we have hostname but no address
mgr/cephadm: make agent certs compatible with OpenSSL hostname and ip checks
mgr/cephadm: offline host handling improvements for agent
mgr/cephadm: handle use_agent being turned on and off
mgr/cephadm: better handling of offline hosts with agent
mgr/cephadm: convert networks from set to list + don't reset con on down agent hosts
mgr/cephadm: add ceph volume to metadata agent reports
mgr/cephadm: implement 2-way ssl in mgr -> MgrListener comm line
cephadm: allow mgr listener to handle variable length json strings
mgr/cephadm: cephadm agent 2.0
Reviewed-by: Daniel Pivonka <dpivonka@redhat.com> Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
mgr/dashboard: make modified API endpoints backward compatible
Fixes: https://tracker.ceph.com/issues/52480 Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Introducing APIVersion class to handle versioning for API-endpints and making
them backward compatible.
Adam King [Wed, 16 Jun 2021 12:02:15 +0000 (08:02 -0400)]
mgr/cephadm: cephadm agent 2.0
Creates an http endpoint in mgr/cephadm to receive
http requests and an agent that can be deployed on
each host that will gather metadata on the host and
send it to the mgr/cephadm http endpoint. Should save the
cephadm mgr module a lot of time it would have to spend
repeatedly ssh-ing into each host to gather the metadata
and help performance on larger clusters.
Fixes: https://tracker.ceph.com/issues/51004 Signed-off-by: Adam King <adking@redhat.com>
Sebastian Wagner [Fri, 24 Sep 2021 10:46:54 +0000 (12:46 +0200)]
cephadm: TestCheckHost: also mock `check_time_sync`
Fixes: https://tracker.ceph.com/issues/52722
```
TestCheckHost.test_container_engine fails at cephadm:5834: Error cephadm.Error: No time synchronization is active
```
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
Due to the issue described in https://github.com/pypa/pip/issues/9818 I
am seeing lots of downloads and prolonged runtime when installing
dependencies.
rgw/sts: code to check IAM policy and return an
appropriate error incase Resource specified in the
IAM policy is incorrect and is discarded. The IAM
policy can be a resource policy or an identity policy.
This is for policies that have already been set.
rgw/sts: code for returning an error when an IAM policy
resource belongs to someone else's tenant.
While parsing the policy it discards the resource element,
but then when an operation is evaluated, since the resource element
is empty, it doesnt evaluate the resource at all and the policy
ends up erroneously allowing actions on resources in other tenants.
Kamoltat [Mon, 5 Oct 2020 09:38:35 +0000 (09:38 +0000)]
mgr/progress: optimize global recovery module
Instead of fetching `pg_stats` from the python
part of manager module, we filter out the pgs
that are in active + clean state in ActivePyModules.cc
then parse these pgs along with `reported_epoch` and
the `total_num_pgs` of the clusters to global recovery
module.
Laura Flores [Wed, 22 Sep 2021 17:55:40 +0000 (17:55 +0000)]
mgr/dashboard: add unit test for telemetry replacer method
This unit test checks that the "replacer" method in telemetry.component.ts works as it should. The replacer method takes the telemetry report and changes the ranges and values of the 'osd_perf_histograms' field from arrays to strings, thereby making the report more readable in the Dashboard Telemetry Preview.
This unit test needs improvement since it currently uses a test report rather than the real one.
crimson/os/alienstore: fix nullptr deref in OnCommit::finish().
`seastar::engine()` is available only for Seastar's threads;
it shouldn't be called outside of a reactor thread.
Unfortunately, this assumption is violated in `AlienStore`
where `OnCommit::finish()`, executed from a finisher thread
of `BlueStore`, calls `alien()` on `seastar::engine()`.
The net effect are crashes like the following one:
```
INFO 2021-09-22 14:26:33,214 [shard 0] osd - operator() writing superblock cluster_fsid 1d8f7908-2ebf-4a91-ae70-f445668c126b osd_fsid 4da9fe9a-1da5-4ea9-aa79-a1178165ede5 [381/1839]
Segmentation fault.
Backtrace:
0# print_backtrace(std::basic_string_view<char, std::char_traits<char> >) at /home/rzarzynski/ceph1/build/../src/crimson/common/fatal_signal.cc:80
1# FatalSignal::signaled(int, siginfo_t const&) at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/ostream:570
2# FatalSignal::install_oneshot_signal_handler<11>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) at /home/rzarzynski/ceph1/build/../src/crimson/common/fatal_signal.cc:
62
3# 0x00007F16BBA13B30 in /lib64/libpthread.so.0
4# (anonymous namespace)::OnCommit::finish(int) at /home/rzarzynski/ceph1/build/../src/crimson/os/alienstore/alien_store.cc:53
5# Context::complete(int) at /home/rzarzynski/ceph1/build/../src/include/Context.h:100
6# Finisher::finisher_thread_entry() at /home/rzarzynski/ceph1/build/../src/common/Finisher.cc:65
7# 0x00007F16BBA0915A in /lib64/libpthread.so.0
8# clone in /lib64/libc.so.6
Dump of siginfo:
...
si_addr: 0x10
```