Sage Weil [Fri, 4 Jun 2021 17:49:40 +0000 (12:49 -0500)]
mgr/telemetry: pass leaderboard flag even w/o ident
Allow non-identified clusters to appear in the leaderboard.
The leaderboard option still defaults to false, so the change here
is that if they opt in to leaderboard but not ident we'll see
that on the backend.
Note that a leaderboard still does not exist (yet), so this doesn't
have any immediate impact. But if/when we do create one, it will
allow us to show big clusters (that opt in) on the leaderboard
as 'unidentified' or similar.
After mkfs the store may not yet contain monmap:last_committed but
might be respawning after setting mon_sync:temp_newer_monmap.
Load that stashed map before falling back to the mkfs:monmap.
Fixes: https://tracker.ceph.com/issues/50230 Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
(cherry picked from commit cc0b4c77753962717da8a280a585990f7eec3c7b)
Igor Fedotov [Wed, 19 May 2021 23:17:21 +0000 (02:17 +0300)]
os/bluestore: introduce multithireading sync for bluestore's repairer
In quick-fix mode bluestore uses 2 threads by default to perform the
repair. Due to lacking synchronization they might corrupt repair
transaction batch.
Fixes: https://tracker.ceph.com/issues/50017 Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 38c5b04235402a7908bc4713f617d767ca9fdc56)
Conflicts:
src/os/bluestore/BlueStore.cc - future stuff attempted to sneak
in
src/os/bluestore/BlueStore.h - the same as above
Tatjana Dehler [Thu, 27 May 2021 09:46:50 +0000 (11:46 +0200)]
mgr/dashboard: show partially deleted RBDs
An RBD might be partially deleted if the deletion
process has been started but was interrupted. In
this case return the RBD as part of the RBD list
and mark it as partially deleted.
Fixes: https://tracker.ceph.com/issues/48603 Signed-off-by: Tatjana Dehler <tdehler@suse.com>
(cherry picked from commit d83c277ac1861df31d2a39d16e20c7bebbea676e)
Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/block/rbd-details/rbd-details.component.html
src/pybind/mgr/dashboard/frontend/src/app/ceph/block/rbd-list/rbd-list.component.html
src/pybind/mgr/dashboard/frontend/src/app/ceph/block/rbd-list/rbd-list.component.spec.ts
src/pybind/mgr/dashboard/frontend/src/app/ceph/block/rbd-list/rbd-list.component.ts
src/pybind/mgr/dashboard/services/rbd.py
src/pybind/mgr/dashboard/tests/test_rbd_service.py
Resolved various conflicts because nautilus and
master diverged a lot.
Neha Ojha [Thu, 3 Jun 2021 16:25:01 +0000 (16:25 +0000)]
osd/PG.cc: handle removal of pgmeta object
In 7f04700, we made the pg removal code
much more efficient. But it started marking the pgmeta object as an unexpected
onode, which in reality is expected to be removed after all the other objects.
This behavior is very easily reproducible in a vstart cluster:
ceph osd pool create test 1 1
rados -p test bench 10 write --no-cleanup
ceph osd pool delete test test --yes-i-really-really-mean-it
Before this patch:
"do_delete_work additional unexpected onode list (new onodes has appeared
since PG removal started[#2:00000000::::head#]" seen in the OSD logs.
After this patch:
"do_delete_work removing pgmeta object #2:00000000::::head#" is seen.
Related to:https://tracker.ceph.com/issues/50466 Signed-off-by: Neha Ojha <nojha@redhat.com>
Manually applied 0e917f1b1e18ca9e48b3f91110d3a46b086f7d83, because
nautilus does not have do_delete_work.
Due to bugs in cache managment in blkid, there are possible to have
nonexistence entries. This entries breaks ceph-volume operations by
passing two or more outputs instead of one (eg. /dev/sdk2).
Kefu Chai [Thu, 20 May 2021 05:55:13 +0000 (13:55 +0800)]
os/bluestore/bluestore_tool: compare retval stat() with -1
before this change, stat() is always called to check if the
file specified by --dev-target exists even if this option is not
specified. also, we compare the retval of stat() with ENOENT, while
state() returns -1 on error.
after this change, stat() is called only if --dev-target is specified,
and we compare the retval of stat() with -1 and 0 only, so if
--dev-target option is not specified, the tool still hehaves.
Igor Fedotov [Fri, 19 Feb 2021 11:31:52 +0000 (14:31 +0300)]
ceph-volume: implement bluefs volume migration.
This is a wrapper over ceph-bluestore-tool's bluefs-bdev-migrate command.
Primarily intended to introduce LVM tags manipulation which
ceph-bluestore-tool is lacking.
Conflicts:
doc/man/8/ceph-volume.rst - a bit different formatting is in use
src/ceph-volume/ceph_volume/api/lvm.py - get_single_lv is the
new name for get_first_lv
Igor Fedotov [Mon, 17 May 2021 19:23:26 +0000 (22:23 +0300)]
os/bluestore: fix unexpected ENOSPC in Avl/Hybrid allocators.
Avl allocator mode was returning unexpected ENOSPC in first-fit mode if all size-
matching available extents were unaligned but applying the alignment made all of
them shorter than required. Since no lookup retry with smaller size -
ENOSPC is returned.
Additionally we should proceed with a lookup in best-fit mode even when
original size has been truncated to match the avail size.
(force_range_size_alloc==true)
Fixes: https://tracker.ceph.com/issues/50656 Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 0eed13a4969d02eeb23681519f2a23130e51ac59)
Conflicts:
src/test/objectstore/Allocator_test.cc - legacy INSTANTIATE_TEST_CASE_P clause is still used in Nautilus
Ilya Dryomov [Wed, 26 May 2021 12:21:22 +0000 (14:21 +0200)]
librbd: don't stop at the first unremovable image when purging
As there is no inherent ordering, there may be multiple removable
images past the unremovable image. On top of that, removing a clone
may make its parent removable so perform an additional pass if any
image gets removed.
monitoring/grafana: Remove erroneous elements in hosts-overview Grafana dashboard
The hosts-overview Grafana dashboard json file contains a repeated element, making
it invalid JSON. Some JSON parsers handle this. However, this prevents Jsonnet
from parsing the dashboard, which prevents the deployment of this dashboard via
Jsonnet.
Deepika Upadhyay [Wed, 26 May 2021 19:25:13 +0000 (00:55 +0530)]
nautilus: qa/upgrade: disable update_features test_notify with older client as lockowner
* with the recent support for async rbd operations from pacific+ when an
older client(non async support) goes on upgrade, and simultaneously
interacts with a newer client which expects the requests to be async,
experiences hang; considering the return code for request completion to
be acknowledgement for async request, which then keeps waiting for
another acknowledgement of request completion.
this if happens should be a rare only when lockowner is an old client
and should be deferred if compatibility issues arises.
* amend upgrade test workunits to use respective stable branches
Xiubo Li [Fri, 7 Aug 2020 07:45:52 +0000 (15:45 +0800)]
msg: throw a system error when center.init fails
In the libcephfs test case, it will run handreds of threads in
parallel, it will possibly reach the open files limit, but there
won't useful logs about what has happened.
This will just throw a system error, just like:
C++ exception with description "(24) Too many open files" thrown in the test body.
Fixes: https://tracker.ceph.com/issues/43039 Signed-off-by: Xiubo Li <xiubli@redhat.com>
(cherry picked from commit 6338050)
Conflicts:
src/msg/async/Stack.cc
- nautilus uses plain "i" as the for loop counter variable, while
master has more fancy "worker_id"
Xiubo Li [Fri, 7 Aug 2020 08:21:07 +0000 (16:21 +0800)]
libcephfs: ignore restoring the open files limit
Let's just ignore restoring the open files limit, the kernel will
defer releasing the file descriptors and then the process will be
possibly reachthe open files limit.
Fixes: https://tracker.ceph.com/issues/43039 Signed-off-by: Xiubo Li <xiubli@redhat.com>
(cherry picked from commit c871d68)
Patrick Donnelly [Mon, 22 Mar 2021 16:17:43 +0000 (09:17 -0700)]
mgr/pybind/volumes: avoid acquiring lock for thread count updates
Perform thread count updates in a dedicated tick thread. This avoids the
mgr Finisher thread from getting potentially hung via a mutex deadlock
in the cloner thread management.
Fixes: https://tracker.ceph.com/issues/49605 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit b27ddfaed4a3c66bac2343c8315a1fe542edb63e)
Kefu Chai [Tue, 25 May 2021 06:31:02 +0000 (14:31 +0800)]
mon/OSDMonitor: drop stale failure_info even if can_mark_down()
in a124ee85b03e15f4ea371358008ecac65f9f4e50, we add a check to drop
stale failure_info reports. but if osdmap does not prohibit us from
marking the osd in question down, the branch checking the stale info
is not executed. in general, it is allowed to mark an osd down, so
the fix of a124ee85b03e15f4ea371358008ecac65f9f4e50 just fails to
work.
in this change, we check for stale failure report of osd in question
as long as the osd is not marked down in the same function. this should
address the slow ops of failure report issue.
Ilya Dryomov [Mon, 17 May 2021 19:16:16 +0000 (21:16 +0200)]
mon/MonClient: tolerate a rotating key that is slightly out of date
Commit 918c12c2ab5d ("monclient: avoid key renew storm on clock skew")
made wait_auth_rotating() wait for a key set with a valid "current" key
(instead of any key set, including with all keys expired if the clocks
are skewed). While a good idea in general, this is a bit too stringent
because the monitors will hand out key sets with "current" key that is
_just_ about to expire. There is nothing wrong with that as "next" key
is also there, valid for the entire auth_service_ticket_ttl. So even
if the daemon is talking to the leader, it is possible to get a key set
with an expired "current" key. If the daemon is talking to a peon, it
is pretty easy to run into in practice. This, coupled with the fact
that _check_auth_rotating() explicitly allows the keys to go slightly
out of date, can lead to wait_auth_rotating() stalling the boot for up
to 30 seconds:
15:41:11.824+0000 1 ... ==== auth_reply(proto 2 0 (0) Success)
15:41:41.824+0000 0 monclient: wait_auth_rotating timed out after 30
15:41:41.824+0000 -1 mds.b unable to obtain rotating service keys; retrying
Apply the same 30 second or less tolerance in wait_auth_rotating().
client: Fix executeable access check for the root user
Executeable permission check always returned sucessful
even when executeable bit is not set on any of the user,
group or others. This patch fixes it by overiding
executeable permission check for root only if one of
the executeable bit is set
Conflicts:
src/client/Client.cc: The commit 6aa78836548f (cephfs errno aliases) is not present in
nautilus and some other trivial conflict, may be because some patches are missing
in nautilus.
Alfonso Martínez [Fri, 22 May 2020 11:36:10 +0000 (13:36 +0200)]
mgr/dashboard: grafana panels for rgw multisite sync performance
* RGW sync perf. counters are now exposed through grafana panels.
* Sync Performance tab is only shown if rgw realm is detected.
* Prometheus module: added metrics suitable for prometheus consumption (from existing ones, not replacing for backward compatibility).
Fixes: https://tracker.ceph.com/issues/45310 Signed-off-by: Alfonso Martínez <almartin@redhat.com>
(cherry picked from commit cf4ff7d2f03bc285a3fae3f27577333f11dab58a)
Conflicts:
- Solved conflicts from cherry-pick:
src/pybind/mgr/dashboard/controllers/rgw.py
src/pybind/mgr/dashboard/frontend/src/app/ceph/rgw/rgw-bucket-form/rgw-bucket-form.component.spec.ts
src/pybind/mgr/dashboard/frontend/src/app/ceph/rgw/rgw-bucket-form/rgw-bucket-form.component.ts
src/pybind/mgr/dashboard/frontend/src/app/ceph/rgw/rgw-daemon-list/rgw-daemon-list.component.ts
src/pybind/mgr/dashboard/frontend/src/app/shared/api/rgw-site.service.spec.ts
src/pybind/mgr/dashboard/frontend/src/app/shared/api/rgw-site.service.ts
src/pybind/mgr/dashboard/services/rgw_client.py
src/pybind/mgr/dashboard/tests/test_rgw_client.py
- src/pybind/mgr/dashboard/tools.py: added method included in a feature not to be backported.
- src/pybind/mgr/dashboard/module.py: fixed linting issue.
we should not proceed against user's will if dual stack is specified but
only one network for a network family can be found. the right fix is
have better error message and documentation, not to tolerate the
failure.