rpm: drop use of $FIRST_ARG in ceph-immutable-object-cache
The use of $FIRST_ARG was probably required because the SUSE-specific
%service_* rpm macros were playing tricks on the shell positional parameters.
This is bad practice and error-prone, so let's assume that no macros should do
that anymore and hence it's safe to assume that positional parameters remain
unchanged after any rpm macro call.
Kefu Chai [Thu, 10 Jun 2021 12:19:09 +0000 (20:19 +0800)]
tasks/ceph_manager: ignore EACCES when waiting for quorum
mon_tick_interval is 5 seconds by default. monitors update their
rotating keys every mon_tick_interval. before monitors forms a
quorum, the auth requests from clients are put into the wait list.
these requests are re-enqueued once the monitors form a quorum. but
there is a small window of mon_tick_interval, before they are able
to serve the auth requests even after their claim to be able to
server requests. if these re-enqueued requests happen to be served
in this window, and if authx is enabled, they will be greeted with
errors like
handle_auth_bad_method server allowed_methods [2] but i only support [2]
in the case of ceph cli, the error would look like:
[errno 13] RADOS permission denied (error connecting to the cluster)
so, to address this issue, the EACCES error is ignored when waiting
for a quorum.
ceph-monstore-tool: use a large enough paxos/{first,last}_committed
so the rebuild paxos transaction won't be overwritten by the ones
created before recovery completes.
when the quorum is recovering, the leader will collect the paxos
transactions from peons. if the quorum accept the proposal for setting
the fingerprint, the peon will update the monitor with the paxos
transaction with a newer "last_committed" than the one created using
update_paxos() in ceph_monstore_tool.cc. the latter "last_committed" is
always 0.
so, to avoid this extra paxos proposal obsoleting the "rebuilding" paxos
transaction, we use a large enough number for {first,last}_committed.
Adam C. Emerson [Wed, 14 Jul 2021 15:02:21 +0000 (11:02 -0400)]
rgw: Robust notify invalidates on cache timeout
This avoids a potential race condition in which updates are delayed.
Fixes: https://tracker.ceph.com/issues/51674 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 76247990ff38049ee32dd47d31482b9648353673)
Conflicts:
src/rgw/services/svc_notify.cc
- Skip the renaming, since this is a backport and that's mostly a
matter of futureproofing.
Backport: https://tracker.ceph.com/issues/51679 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Adam C. Emerson [Wed, 7 Jul 2021 22:47:00 +0000 (18:47 -0400)]
rgw: distribute() takes RGWCacheNotifyInfo
So we don't have to parse the bufferlist back out to find what object
to throw out of the cache.
Fixes: https://tracker.ceph.com/issues/51674 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 7f952ad80114096322f202ba58279aaa4a002313)
Backport: https://tracker.ceph.com/issues/51679 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Adam C. Emerson [Tue, 13 Jul 2021 20:05:47 +0000 (16:05 -0400)]
rgw: Don't segfault on datalog trim
Synchronous (or yielded, basically other-than AioCompletion trim)
would try to dereference the past-the-end iterator if we were trimming
to a point in the most recent generation.
https://tracker.ceph.com/issues/51661 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 97305f03c16db1cfaceef04a74ee510bc1fc1e80)
https://tracker.ceph.com/issues/51675 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
pacific: qa: FileNotFoundError: [Errno 2] No such file or directory: '/sys/kernel/debug/ceph/3fab6bea-f243-47a4-a956-8c03a62b61b5.client4721/mds_sessions'
Cherry-pick notes:
- handle differences due to renaming of rgw::sal::RGWObject to rgw::sal::Object
- handle differences due to move of test_ps_s3_metadata_on_master test from tests_ps.py to test_bn.py
Moving the attrs into s->bucket_attrs before setting them results in
setting empty attrs into the bucket. This means that reading them back
later gets empty attrs, which can cause a segfault.
mgr/dashboard: remove usage of 'rgw_frontend_ssl_key'
Fixes: https://tracker.ceph.com/issues/51643 Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Removing the usage of rgw_frontend_ssl_key from the rgw service form.
Kefu Chai [Thu, 20 May 2021 05:55:13 +0000 (13:55 +0800)]
os/bluestore/bluestore_tool: compare retval stat() with -1
before this change, stat() is always called to check if the
file specified by --dev-target exists even if this option is not
specified. also, we compare the retval of stat() with ENOENT, while
state() returns -1 on error.
after this change, stat() is called only if --dev-target is specified,
and we compare the retval of stat() with -1 and 0 only, so if
--dev-target option is not specified, the tool still hehaves.
Igor Fedotov [Fri, 19 Feb 2021 11:31:52 +0000 (14:31 +0300)]
ceph-volume: implement bluefs volume migration.
This is a wrapper over ceph-bluestore-tool's bluefs-bdev-migrate command.
Primarily intended to introduce LVM tags manipulation which
ceph-bluestore-tool is lacking.
Conflicts:
qa/standalone/scrub/osd-scrub-repair.sh
- Removed non-existent test TEST_auto_repair_bluestore_tag() and its
associated helper functions initiate_and_fetch_state() and
wait_end_of_scrub()
This wouldn't cause issues if the tests are run individually. But when
running all the tests in the files mentioned above, it could introduce
unexpected test failures down the line. For e.g., multiple tests may
create pools with same name and if they are not cleaned up properly, this
could result in unexpected failures in a subsequent test.
max_misplaced with replaced by in target_max_misplaced_ratio edbd592ee44e02a5328e1510879555c2f9dcfc9e, but the document was not
sync'ed. let's update it accordingly.
f7c5b01e18 tried to fix this, but adding peer_purged.erase() into
the peer_info loop made no effect because in purge_strays() when
inserting an osd to peer_purged we simultaneously remove it from
peer_info.
So it should be a separate loop through peer_purged list.
/home/kchai/ceph/src/include/denc.h: In member function ‘void DencDumper<T>::dump() const’:
/home/kchai/ceph/src/include/denc.h:121:60: error: ‘O_BINARY’ was not declared in this scope
int fd = ::open(fn, O_WRONLY|O_TRUNC|O_CREAT|O_CLOEXEC|O_BINARY, 0644);
^~~~~~~~
/home/kchai/ceph/src/include/denc.h:121:60: note: the macro ‘O_BINARY’ had not yet been defined
In file included from /home/kchai/ceph/src/include/statlite.h:14,
from /home/kchai/ceph/src/include/types.h:41,
from /home/kchai/ceph/src/auth/Crypto.h:19,
from /home/kchai/ceph/src/auth/Crypto.cc:21:
../src/mon/Monitor.cc: In member function ‘void Monitor::handle_command(MonOpRequestRef)’:
../src/mon/Monitor.cc:3703:55: warning: ‘osd’ may be used uninitialized in this function [-Wmaybe-uninitialized]
3703 | uint64_t seq = mgrstatmon()->get_last_osd_stat_seq(osd);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~
otherwise it fails to build with gcc-toolset-10, like:
../src/common/Formatter.cc: In member function ‘virtual void ceph::XMLFormatter::close_section()’:
../src/common/Formatter.cc:449:8: error: ‘transform’ is not a member of ‘std’
449 | std::transform(section.begin(), section.end(), section.begin(),
| ^~~~~~~~~
Casey Bodley [Wed, 12 May 2021 18:13:13 +0000 (14:13 -0400)]
rgw: parse tenant name out of rgwx-bucket-instance
used by multisite bucket full sync to request the listing of a specific
bucket instance. if the bucket lives under a tenant, we need to get that
out of the rgwx-bucket-instance header, because the http request path
only names the bucket
Jeegn Chen [Wed, 25 Nov 2020 09:15:25 +0000 (17:15 +0800)]
rgw: avoid infinite loop when deleting a bucket
When deleting a bucket with an incomplete multipart upload that
has about 2000 parts uploaded, we noticed an infinite loop, which
stopped s3cmd from deleting the bucket forever.
Per check, when the bucket index was sharded (for example 128
shards), the original logic in
RGWRados::cls_bucket_list_unordered() did not calculate
the bucket shard ID correctly when the index key of a data
part was taken as the marker.
The issue is not necessarily reproduced each time. It will depend
on the key of the object. To reproduce it in 128-shard bucket,
we use 334 as the key for the incomplete multipart upload,
which will be located in Shard 127 (known by experiment). In this
setup, the original logic will usually come out a shard ID smaller
than 127 (since 127 is the largest one) from the marker and
thus a circle is constructed, which results in an infinite loop.
PS: Some times the bucket ID calculation may incorrectly going forward
instead of backward. Thus, the check logic may skip some shards,
which may have regular keys. In such scenarios, some non-empty buckets may
be deleted by accident.
Sage Weil [Fri, 4 Jun 2021 17:49:40 +0000 (12:49 -0500)]
mgr/telemetry: pass leaderboard flag even w/o ident
Allow non-identified clusters to appear in the leaderboard.
The leaderboard option still defaults to false, so the change here
is that if they opt in to leaderboard but not ident we'll see
that on the backend.
Note that a leaderboard still does not exist (yet), so this doesn't
have any immediate impact. But if/when we do create one, it will
allow us to show big clusters (that opt in) on the leaderboard
as 'unidentified' or similar.
Sébastien Han [Thu, 1 Jul 2021 15:23:57 +0000 (17:23 +0200)]
src/pybind/mgr/mirroring/fs/snapshot_mirror.py: do not assume a cephfs-mirror daemon is always running
We should not assume a daemon is runnning. If the daemon is not running
we get the default value of None. So let's skip the status if no daemon
is running yet.
Dan van der Ster [Tue, 29 Jun 2021 20:36:00 +0000 (22:36 +0200)]
mgr/DaemonServer: skip redundant update of pgp_num_actual
During PG merge the MGR was observed repeatedly sending identical
set pgp_num_actual values, leading to osdmap churn at 2000/hr.
Skip the redundant osd set pgp_num_actual command if the
pgp_num is already our computed next.
Fixes: https://tracker.ceph.com/issues/51433 Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
(cherry picked from commit 3f15749de0d550a124f8c6afbd457f17ef020963)
qa/workunits/mon/test_mon_config_key: use subprocess.run() instead of proc.communicate()
the loop of proc.communicate() on python3.6, where we always are able to
get something out of stdout and/or stderr PIPEs. and the `stdout` and
`stderr` keep growing until out of memory. and teuthology considers
the command crashed after a while.
This _mkdir_p should never have worked as the first directory it tries
to stat/mkdir is "", the empty string. This causes an assertion in the
client. I'm not sure how this code ever functioned without causing
faults. They look like:
2021-07-01 02:15:04.449 7f7612b5ab80 3 client.178735 statx enter (relpath want 2047)
胡玮文 [Mon, 21 Jun 2021 13:31:49 +0000 (21:31 +0800)]
mgr/dashboard: fix OSD out count
Think we have 3 OSDs out but up (prepare for re-formatting to change min_alloc_size), and another OSD down but in
(during reboot). The dashboard will display "1 down, 2 out", which is obviously incorrect. It should be "1 down, 3 out"