Adam C. Emerson [Wed, 14 Jul 2021 15:02:21 +0000 (11:02 -0400)]
rgw: Robust notify invalidates on cache timeout
This avoids a potential race condition in which updates are delayed.
Fixes: https://tracker.ceph.com/issues/51674 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 76247990ff38049ee32dd47d31482b9648353673)
Conflicts:
src/rgw/services/svc_notify.cc
- Skip the renaming, since this is a backport and that's mostly a
matter of futureproofing.
Backport: https://tracker.ceph.com/issues/51679 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Adam C. Emerson [Wed, 7 Jul 2021 22:47:00 +0000 (18:47 -0400)]
rgw: distribute() takes RGWCacheNotifyInfo
So we don't have to parse the bufferlist back out to find what object
to throw out of the cache.
Fixes: https://tracker.ceph.com/issues/51674 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 7f952ad80114096322f202ba58279aaa4a002313)
Backport: https://tracker.ceph.com/issues/51679 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Adam C. Emerson [Tue, 13 Jul 2021 20:05:47 +0000 (16:05 -0400)]
rgw: Don't segfault on datalog trim
Synchronous (or yielded, basically other-than AioCompletion trim)
would try to dereference the past-the-end iterator if we were trimming
to a point in the most recent generation.
https://tracker.ceph.com/issues/51661 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 97305f03c16db1cfaceef04a74ee510bc1fc1e80)
https://tracker.ceph.com/issues/51675 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
pacific: qa: FileNotFoundError: [Errno 2] No such file or directory: '/sys/kernel/debug/ceph/3fab6bea-f243-47a4-a956-8c03a62b61b5.client4721/mds_sessions'
Cherry-pick notes:
- handle differences due to renaming of rgw::sal::RGWObject to rgw::sal::Object
- handle differences due to move of test_ps_s3_metadata_on_master test from tests_ps.py to test_bn.py
Moving the attrs into s->bucket_attrs before setting them results in
setting empty attrs into the bucket. This means that reading them back
later gets empty attrs, which can cause a segfault.
mgr/dashboard: remove usage of 'rgw_frontend_ssl_key'
Fixes: https://tracker.ceph.com/issues/51643 Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Removing the usage of rgw_frontend_ssl_key from the rgw service form.
Conflicts:
qa/standalone/scrub/osd-scrub-repair.sh
- Removed non-existent test TEST_auto_repair_bluestore_tag() and its
associated helper functions initiate_and_fetch_state() and
wait_end_of_scrub()
This wouldn't cause issues if the tests are run individually. But when
running all the tests in the files mentioned above, it could introduce
unexpected test failures down the line. For e.g., multiple tests may
create pools with same name and if they are not cleaned up properly, this
could result in unexpected failures in a subsequent test.
max_misplaced with replaced by in target_max_misplaced_ratio edbd592ee44e02a5328e1510879555c2f9dcfc9e, but the document was not
sync'ed. let's update it accordingly.
/home/kchai/ceph/src/include/denc.h: In member function ‘void DencDumper<T>::dump() const’:
/home/kchai/ceph/src/include/denc.h:121:60: error: ‘O_BINARY’ was not declared in this scope
int fd = ::open(fn, O_WRONLY|O_TRUNC|O_CREAT|O_CLOEXEC|O_BINARY, 0644);
^~~~~~~~
/home/kchai/ceph/src/include/denc.h:121:60: note: the macro ‘O_BINARY’ had not yet been defined
In file included from /home/kchai/ceph/src/include/statlite.h:14,
from /home/kchai/ceph/src/include/types.h:41,
from /home/kchai/ceph/src/auth/Crypto.h:19,
from /home/kchai/ceph/src/auth/Crypto.cc:21:
../src/mon/Monitor.cc: In member function ‘void Monitor::handle_command(MonOpRequestRef)’:
../src/mon/Monitor.cc:3703:55: warning: ‘osd’ may be used uninitialized in this function [-Wmaybe-uninitialized]
3703 | uint64_t seq = mgrstatmon()->get_last_osd_stat_seq(osd);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~
otherwise it fails to build with gcc-toolset-10, like:
../src/common/Formatter.cc: In member function ‘virtual void ceph::XMLFormatter::close_section()’:
../src/common/Formatter.cc:449:8: error: ‘transform’ is not a member of ‘std’
449 | std::transform(section.begin(), section.end(), section.begin(),
| ^~~~~~~~~
Casey Bodley [Wed, 12 May 2021 18:13:13 +0000 (14:13 -0400)]
rgw: parse tenant name out of rgwx-bucket-instance
used by multisite bucket full sync to request the listing of a specific
bucket instance. if the bucket lives under a tenant, we need to get that
out of the rgwx-bucket-instance header, because the http request path
only names the bucket
Jeegn Chen [Wed, 25 Nov 2020 09:15:25 +0000 (17:15 +0800)]
rgw: avoid infinite loop when deleting a bucket
When deleting a bucket with an incomplete multipart upload that
has about 2000 parts uploaded, we noticed an infinite loop, which
stopped s3cmd from deleting the bucket forever.
Per check, when the bucket index was sharded (for example 128
shards), the original logic in
RGWRados::cls_bucket_list_unordered() did not calculate
the bucket shard ID correctly when the index key of a data
part was taken as the marker.
The issue is not necessarily reproduced each time. It will depend
on the key of the object. To reproduce it in 128-shard bucket,
we use 334 as the key for the incomplete multipart upload,
which will be located in Shard 127 (known by experiment). In this
setup, the original logic will usually come out a shard ID smaller
than 127 (since 127 is the largest one) from the marker and
thus a circle is constructed, which results in an infinite loop.
PS: Some times the bucket ID calculation may incorrectly going forward
instead of backward. Thus, the check logic may skip some shards,
which may have regular keys. In such scenarios, some non-empty buckets may
be deleted by accident.
Sage Weil [Fri, 4 Jun 2021 17:49:40 +0000 (12:49 -0500)]
mgr/telemetry: pass leaderboard flag even w/o ident
Allow non-identified clusters to appear in the leaderboard.
The leaderboard option still defaults to false, so the change here
is that if they opt in to leaderboard but not ident we'll see
that on the backend.
Note that a leaderboard still does not exist (yet), so this doesn't
have any immediate impact. But if/when we do create one, it will
allow us to show big clusters (that opt in) on the leaderboard
as 'unidentified' or similar.
Sébastien Han [Thu, 1 Jul 2021 15:23:57 +0000 (17:23 +0200)]
src/pybind/mgr/mirroring/fs/snapshot_mirror.py: do not assume a cephfs-mirror daemon is always running
We should not assume a daemon is runnning. If the daemon is not running
we get the default value of None. So let's skip the status if no daemon
is running yet.
Dan van der Ster [Tue, 29 Jun 2021 20:36:00 +0000 (22:36 +0200)]
mgr/DaemonServer: skip redundant update of pgp_num_actual
During PG merge the MGR was observed repeatedly sending identical
set pgp_num_actual values, leading to osdmap churn at 2000/hr.
Skip the redundant osd set pgp_num_actual command if the
pgp_num is already our computed next.
Fixes: https://tracker.ceph.com/issues/51433 Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
(cherry picked from commit 3f15749de0d550a124f8c6afbd457f17ef020963)
qa/workunits/mon/test_mon_config_key: use subprocess.run() instead of proc.communicate()
the loop of proc.communicate() on python3.6, where we always are able to
get something out of stdout and/or stderr PIPEs. and the `stdout` and
`stderr` keep growing until out of memory. and teuthology considers
the command crashed after a while.
This _mkdir_p should never have worked as the first directory it tries
to stat/mkdir is "", the empty string. This causes an assertion in the
client. I'm not sure how this code ever functioned without causing
faults. They look like:
2021-07-01 02:15:04.449 7f7612b5ab80 3 client.178735 statx enter (relpath want 2047)
胡玮文 [Mon, 21 Jun 2021 13:31:49 +0000 (21:31 +0800)]
mgr/dashboard: fix OSD out count
Think we have 3 OSDs out but up (prepare for re-formatting to change min_alloc_size), and another OSD down but in
(during reboot). The dashboard will display "1 down, 2 out", which is obviously incorrect. It should be "1 down, 3 out"
Kefu Chai [Sun, 13 Jun 2021 11:56:26 +0000 (19:56 +0800)]
test/pybind/test_ceph_argparse: add a test where args contains comma
to ensure the support for the new syntax of "prefix --name bon,jour"
does not break existing behavior of "prefix hello cruel,world" where value2
contains reads "hello,world", and the parsed result should be
prefix="prefix"
value=["hello", "cruel,world"]
instead of something like
prefix="prefix"
value=["hello", "cruel", "world"]
or
prefix="prefix"
value=["cruel", "world"]
the above test only applies to the test where "value" is a CephString.
if "value" is a CephChoices, the parsed argument should be
Kefu Chai [Wed, 9 Jun 2021 03:11:40 +0000 (11:11 +0800)]
test/pybind: do not test obsoleted command
"scrub" command was marked obsoleted in e9a5ce0897efc6126caeebea9900bf05ec3d2174, so the test_ceph_argparse
cannot retrieve its command description using "get_command_descriptions"
cli tool anymore, let's drop the related test accordingly.
Kefu Chai [Wed, 9 Jun 2021 02:01:35 +0000 (10:01 +0800)]
pybind/ceph_argparse: validate csv if desc.N
if desc.N is not None, we should take the argument as a comma separated
values, and validate the values individually.
restructure the validate() function and its helpers to pass the
validated args if desc.N explicitly, as desc.instance.val should only
hold a single value of desc.instance type, otherwise we need to reset
it after collecting all the argument in a CSV string value is parsed.