Zac Dover [Wed, 18 May 2022 10:36:53 +0000 (20:36 +1000)]
doc/start: s/3/three/ in intro.rst
I'm changing "3" to "three" for two reasons:
1. It's correct.
2. This allows me to test backports into Octopus, Pacific, and Quincy.
I am particularly interested to see what happens when I attempt
the backport into Octopus, because backports into Octopus have
failed. This will provide me with another unit of data.
Kefu Chai [Mon, 21 Dec 2020 17:07:37 +0000 (01:07 +0800)]
include/denc: use pair<const K,V> in range-based for loop
map<K,V>::value_type is pair<const K, V>, so if we use range-based for
loop when iterating through a map, we should use pair<const K,V> instead
of pair<K,V>, the latter also compiles, but it might create a temporary
object of pair<K,V> from pair<const K,V>. GCC-11 complains at seeing
this:
../src/include/denc.h:1002:21: warning: loop variable ‘e’ of type ‘const T&’ {aka ‘const std::pair<OSDPerfMetricQuery, OSDPerfMetricReport>&’} binds to a tem\
porary constructed from type ‘const std::pair<const OSDPerfMetricQuery, OSDPerfMetricReport>’ [-Wrange-loop-constru
ct]
1002 | for (const T& e : s) {
| ^
this change
* use the value_type of container in `maplike_details<Container>`,
so we can avoid the overhead of creating temporay objects when
encoding a map
* define denc_traits for std::pair<const A, B> as well, so the elements
of a map can be encoded using denc facility
Kefu Chai [Fri, 8 Jan 2021 05:42:18 +0000 (13:42 +0800)]
test/test_rbd_replay: move operator<<(..rbd_loc& name) to rbd_replay
so gtest can print out rbd_loc when printing out diagnostic information
when test fails. after moving operator<<(ostream&, const rbd_loc&) to
the `rbd_replay` namespace, ADL is able to find it. for more details on
the lookup rules, see https://en.cppreference.com/w/cpp/language/adl
Kefu Chai [Thu, 7 Jan 2021 03:56:25 +0000 (11:56 +0800)]
common/ceph_time: add operator<< for signedspan
* templatize operator<<(ostream&, duration<>), so it works for more
duration<> classes with minimal efforts -- we just need to explicitly
instantiate these template operators
* explicitly instantiate operator<< for timespan, signedspan, seconds
and milliseconds. they are most likely to be used in Ceph. we can add
more of them when necessary.
Kefu Chai [Thu, 7 Jan 2021 07:17:45 +0000 (15:17 +0800)]
common/ceph_time: move operator<<(ostream&, timespan&) into std namespace
otherwise compiler is not able to find it as the "timespan" here is
actually a class defined in std namespace, even it has an alias defined
in ceph namespace like:
test/rbd_mirror: grab timer lock before calling add_event_after()
add_event_after() expects an externally provided mutex to be held
for the call. This was missed in commit 8965a0f2a6f7 ("rbd-mirror:
synchronize with in-flight stop in ImageReplayer::stop()").
Ilya Dryomov [Sun, 20 Feb 2022 16:33:08 +0000 (17:33 +0100)]
rbd-mirror: synchronize with in-flight stop in ImageReplayer::stop()
Complete on_finish right away only if the replayer is stopped (meaning
that it is legible to be restarted immediately, possibly from on_finish
itself). This is the behaviour pretty much anyone would assume and
also what ImageReplayer::restart() relies on.
Ilya Dryomov [Sun, 20 Feb 2022 12:11:02 +0000 (13:11 +0100)]
rbd-mirror: manual stop should take precedence over regular stop
Somewhat similar to commit 0a3794e56256 ("rbd-mirror: make stop
properly cancel restart"), make it so that a) if a manual stop is
joined to regular stop, the stop becomes manual and b) if a regular
stop is joined to a manual stop, the stop stays manual.
Ilya Dryomov [Sat, 19 Feb 2022 15:43:04 +0000 (16:43 +0100)]
rbd-mirror: straighten ImageReplayer::stop() a bit
- don't default on_finish parameter
- m_restart_requested is set in ImageReplayer::restart() which is the
only restart=true call site, so setting m_restart_requested here is
redundant
- is_stopped_() can't be true in is_running_() branch
- on_finish->complete(0) in the end is unreachable
Kefu Chai [Wed, 6 Jan 2021 08:18:17 +0000 (16:18 +0800)]
googletest submodule: pick up change to silence error=maybe-uninitialized warning
to include the fix of https://github.com/google/googletest/pull/3024
otherwise GCC-11 fails to compile the tests with following warning:
In file included from ../src/googletest/googletest/src/gtest-all.cc:42:
../src/googletest/googletest/src/gtest-death-test.cc: In function ‘bool testing::internal::StackGrowsDown()’:
../src/googletest/googletest/src/gtest-death-test.cc:1301:24: error: ‘dummy’ may be used uninitialized [-Werror=maybe-uninitialized]
1301 | StackLowerThanAddress(&dummy, &result);
| ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
../src/googletest/googletest/src/gtest-death-test.cc:1290:13: note: by argument 1 of type ‘const void*’ to ‘void testing::internal::StackLowerThanAddress(const void*, bool*)’ declared here
1290 | static void StackLowerThanAddress(const void* ptr, bool* result) {
| ^~~~~~~~~~~~~~~~~~~~~
../src/googletest/googletest/src/gtest-death-test.cc:1299:7: note: ‘dummy’ declared here
1299 | int dummy;
| ^~~~~
cc1plus: all warnings being treated as errors
Or Friedmann [Tue, 19 Apr 2022 12:00:28 +0000 (12:00 +0000)]
rgw: RGWCoroutine::set_sleeping() checks for null stack
users of the RGWOmapAppend coroutine don't manage the lifetime of its
underlying coroutine stack, so end up making calls on RGWOmapAppend
after its stack goes away. this null check is a band-aid, and there are
still several other calls in RGWCoroutine that don't check for null
stack
Fixes: https://tracker.ceph.com/issues/49302 Signed-off-by: Or Friedmann <ofriedma@redhat.com> Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 3f0f831d66c7d43c9872f5de2aceb68aef4004d8)
Kefu Chai [Mon, 22 Jun 2020 01:34:53 +0000 (09:34 +0800)]
doc/conf.py: s/add_javascript/add_js_file/
to address following warning:
jenkins-build/build/workspace/ceph-pr-docs/doc/conf.py:102: RemovedInSphinx40Warning: The app.add_javascript() is deprecated. Please use app.add_js_file() instead.
Kefu Chai [Sun, 6 Mar 2022 06:27:50 +0000 (14:27 +0800)]
doc/conf.py: silence warnings from breathe
breathe calls doxygen for extracting/generating docs from code.
while doxygen complains at seeing undocumented fields/func. these
warnings could fail the sphinx-build command, if it takes warnings
as errors.
Kefu Chai [Sun, 6 Mar 2022 06:23:42 +0000 (14:23 +0800)]
mgr/cephadm: add empty line after param list in docstring
this helps to silence the warning from sphinx, like
src/pybind/mgr/orchestrator/_interface.py:docstring of orchestrator._interface.Orchestrator.remove_osds:9: WARNING: Field list ends without a blank line; unexpected unindent.
Mark Kogan [Mon, 5 Apr 2021 12:49:42 +0000 (15:49 +0300)]
rgw: return OK on consecutive complete-multipart reqs
Fixes: https://tracker.ceph.com/issues/50141 Signed-off-by: Mark Kogan <mkogan@redhat.com>
fixup! rgw: return OK on consecutive complete-multipart reqs
Cherry-pick notes:
- Conflicts due in rgw_op.h due to execute method adjacent to change not having optional_yield arg
- Conflicts in rgw_op.cc due to lack of rgw::sal::Object encapsulation in Octopus
Kefu Chai [Sat, 5 Mar 2022 17:44:30 +0000 (01:44 +0800)]
admin/doc-requirements: bump sphinx to 4.4.0
bump sphinx to latest stable. to address following build failure
ERROR: sphinx-autodoc-typehints 1.17.0 has requirement Sphinx>=4, but you'll have sphinx 3.5.4 which is incompatible.
ERROR: sphinx-substitution-extensions 2022.2.16 has requirement sphinx>=4.0.0, but you'll have sphinx 3.5.4 which is incompatible.
also bump bump sphinx-rtd-theme, otherwise we'd have following
build failure:
ERROR: sphinx-rtd-theme 0.5.2 has requirement docutils<0.17, but you'll have docutils 0.17.1 which is incompatible.
Casey Bodley [Thu, 10 Mar 2022 20:32:48 +0000 (15:32 -0500)]
cls/rgw: rgw_dir_suggest_changes detects race with completion
if bucket listing races with a pending index transaction, its suggested
removal may be mistakenly applied if that index transaction completes
before the osd receives this suggestion
in `rgw_dir_suggest_changes()`, the sole condition for applying a
suggested change is that the `cur_disk.pending_map` is empty. this is
true after rgw_bucket_complete_op()
on index completion, `rgw_bucket_dir_entry::index_ver` is updated to match
the new value of `rgw_bucket_dir_header::ver`. because most of `struct
rgw_bucket_dir_entry` makes the round trip through bucket listing ->
dir_suggest, we have access to the index_ver of the suggested entry. by
comparing this against the stored entry, we can ignore any suggestions
that were sent before the most recent completion
Yaarit Hatuka [Tue, 9 Nov 2021 18:31:11 +0000 (18:31 +0000)]
mgr/telemetry: fix waiting for mgr to warm up
1. The implementation of config_notify() in telemetry module sets the
flag for event, which is supposed to wake up the 'serve' thread whenever
a config option is changed. The problem is that we call config_notify()
at the beginning of serve(), before we enter its 'run' loop. This call
sets the event which cancels the 10 seconds wait for the mgr to warm up.
To fix this, we extract the logic of updating the config options to a
separate function (config_update_module_option()), and call it on
__init__, instead of calling config_notify() in serve().
2. We should always wait for the mgr to warm up here (10 seconds). In
case of a sporadic event (e.g. a config option change via CLI) the event
will be set, and wait will return immediately. We enforce this wait by
using time.sleep(10) instead of event.wait(10).
Satoru Takeuchi [Thu, 18 Nov 2021 20:48:18 +0000 (20:48 +0000)]
osd: make osd_fast_shutdown_notify_mon option true by default
osd_fast_shutdown_notify_mon option is false by default. So users suffer
from error log flood, slow ops, and the long I/O timeouts on voluntary OS
shutdown before they are aware of the existence of this option. Let's
make this option true by default.
Nizamudeen A [Thu, 24 Mar 2022 08:01:18 +0000 (13:31 +0530)]
mgr/dashboard: fix "NullInjectorError: No provider for I18n
Although I am not sure what's the root cause of this but this seems to
fix the test failure. I don't know if this is caused by the differnce in
angular versions between master and octopus but I still don't understand
why it didn't catch in the recent PR to this file (https://github.com/ceph/ceph/pull/44763)
Fixes: https://tracker.ceph.com/issues/55011 Signed-off-by: Nizamudeen A <nia@redhat.com>
Currently, for CEPH_OSD_OP_OMAPRMKEYRANGE ops, clean_omap gets set to true,
which results in incomplete recovery of objects and results in
inconsistent PGs after a scrub.
Ilya Dryomov [Wed, 16 Mar 2022 19:05:56 +0000 (20:05 +0100)]
librados: check latest osdmap on ENOENT in pool_reverse_lookup()
Avoid spurious ENOENT errors from rados_pool_reverse_lookup() and
Rados::pool_reverse_lookup().
This makes lookup by id consistent with lookup by name: the latter
has been checking latest osdmap since commit 7e5669b11b14 ("rados: we
need to get the latest osdmap when pool does not exists").
iovec have unsigned length (size_t) and before this patch the
total length was computed by adding iovec's length to a signed
length variable (ssize_t). While the code checked if the resulting
length was negative on overflow, the case where length is positive
after overflow was not checked. This patch fixes the overflow check
by changing length to unsigned size_t.
Additionally, this patch fixes the case where some iovecs have been
added to the bufferlist and the aio completion has been blocked, but
adding an additional iovec fails because of overflow. This leads to
the UserBufferDeleter trying to unblock the completion on destruction
of the bufferlist but asserting because the completion was never
armed. We avoid this by first computing the total length and checking
for overflows and iovcnt before adding them to the bufferlist.
Ilya Dryomov [Sat, 19 Mar 2022 13:04:52 +0000 (14:04 +0100)]
qa/workunits/rbd/cli_generic.sh: relax trash purge schedule status assert
Commit 08df6e0fd006 ("qa/workunits/rbd: expand LevelSpec parsing
coverage") didn't account for images with a separate data pool. This
was missed because of small-cache-pool.yaml breakage.
Invoke "rbd mirror snapshot schedule ls -R" and "rbd mirror snapshot
schedule status" commands on all levels, consistently. In particular,
make sure that an image level schedule is listed for a recursive query
at the pool level both before and after the schedule kicks in:
$ rbd create --size 1G --mirror-image-mode snapshot -p foo bar
$ rbd mirror snapshot schedule add -p foo --image bar 1m
$ rbd mirror snapshot schedule ls -p foo -R
POOL NAMESPACE IMAGE SCHEDULE
foo bar every 1m
<wait for schedule to become visible in status>
$ rbd mirror snapshot schedule ls -p foo -R
POOL NAMESPACE IMAGE SCHEDULE
foo bar every 1m
Also, make sure that pool and image level status queries work:
$ rbd mirror snapshot schedule status -p foo
SCHEDULE TIME IMAGE
2022-03-04 07:14:00 foo/bar
$ rbd mirror snapshot schedule status -p foo --image bar
SCHEDULE TIME IMAGE
2022-03-04 07:14:00 foo/bar
Both of these issues are fixed by the previous commit.
Casey Bodley [Wed, 12 May 2021 18:13:13 +0000 (14:13 -0400)]
rgw: parse tenant name out of rgwx-bucket-instance
used by multisite bucket full sync to request the listing of a specific
bucket instance. if the bucket lives under a tenant, we need to get that
out of the rgwx-bucket-instance header, because the http request path
only names the bucket
Xuehan Xu [Sat, 2 Jan 2021 14:50:23 +0000 (22:50 +0800)]
librgw: make rgw file handle versioned
The reason that we need this is that there could be the following scenario:
1. rgw_setattr sets the file attr;
2. rgw_write writes some new data, and encodes its attr to store into rados;
3. before the actual persistence of the file's attr bl, rgw_lookup loads the file's
previous attr and modifies the current file handle's metadata;
4. rgw_write's result persisted to rados;
5. rgw_setattr set the current file handle's metadata which is actually an old one to rados
In this case, the attr in rados would be out of date which means loss of data
In RGWBucketCtl::chown we have one RGWObjectCtx for all objects of a bucket.
In RGWObjectCtx there is a cache mechanism (std::map) for states of objects that will grows
continuously. for buckets with millions of objects this mechanism leads to huge memory usage.
in chown process we really do not need this caching mechanism so we could create one RGWObjectCtx
for every 1000 objects to limit usage of memory.
Fixes: https://tracker.ceph.com/issues/53599 Signed-off-by: Mohammad Fatemipour <mohammad.fatemipour@sotoon.ir>
(cherry picked from commit cf2d83ef81458524715c23e802977dc0760c847f)
Conflicts:
src/rgw/rgw_bucket.cc
Cherry-pick notes:
- Conflicts due to Octopus implementation differences in RGWBucketCtl::chown
Matt Benjamin [Tue, 5 Jan 2021 20:30:23 +0000 (15:30 -0500)]
doc: rgw: document S3 bucket replication support
Support was added at Octopus.
Fixes: https://tracker.ceph.com/issues/48755 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 774a247b2b854538b679490581e6950372142797)