Kefu Chai [Fri, 27 Aug 2021 14:37:23 +0000 (22:37 +0800)]
make-dist: bump node to 10.16.0
otherwise we have segfault when "npm ci", like
```
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f77f89099ed in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
[Current thread is 1 (Thread 0x7f77f8496740 (LWP 4046307))]
(gdb) bt
#0 0x00007f77f89099ed in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x00000000008c3127 in node::Environment::Environment(node::IsolateData*, v8::Local<v8::Context>, node::tracing::AgentWriterHandle*) ()
#2 0x00000000008e4d4b in node::Start(v8::Isolate*, node::IsolateData*, std::vector<std::string, std::allocator<std::string> > const&, std::vector<std::string, std::allocator<std::string> > const&) ()
#3 0x00000000008e34a2 in node::Start(int, char**) ()
#4 0x00007f77f84c00b3 in __libc_start_main (main=0x89dc10 <main>, argc=3, argv=0x7ffd1dc8e8a8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffd1dc8e898)
at ../csu/libc-start.c:308
#5 0x000000000089dd45 in _start ()
```
this change is not cherry-picked from master, because the change
introducing the 10.16.0 change of 7f7f8a443c820f3c77a6f267939c33891342a561 is way too large and touches
lots of places in dashboard. while we just need to get the dashboard
frontend npm packages ready with minimal change.
Mykola Golub [Thu, 17 Jun 2021 15:09:31 +0000 (18:09 +0300)]
rbd_mirror: properly handle image replay canceled when starting replay
It fixes the bug when the handle_start_replay detected the cancel
when it called on_replay_interrupted and returned without
completing m_on_start_finish context.
This is a direct commit to nautilus. The bug was accidentally
fixed in newer versions during refactoring.
Kefu Chai [Thu, 10 Jun 2021 12:19:09 +0000 (20:19 +0800)]
tasks/ceph_manager: ignore EACCES when waiting for quorum
mon_tick_interval is 5 seconds by default. monitors update their
rotating keys every mon_tick_interval. before monitors forms a
quorum, the auth requests from clients are put into the wait list.
these requests are re-enqueued once the monitors form a quorum. but
there is a small window of mon_tick_interval, before they are able
to serve the auth requests even after their claim to be able to
server requests. if these re-enqueued requests happen to be served
in this window, and if authx is enabled, they will be greeted with
errors like
handle_auth_bad_method server allowed_methods [2] but i only support [2]
in the case of ceph cli, the error would look like:
[errno 13] RADOS permission denied (error connecting to the cluster)
so, to address this issue, the EACCES error is ignored when waiting
for a quorum.
ceph-monstore-tool: use a large enough paxos/{first,last}_committed
so the rebuild paxos transaction won't be overwritten by the ones
created before recovery completes.
when the quorum is recovering, the leader will collect the paxos
transactions from peons. if the quorum accept the proposal for setting
the fingerprint, the peon will update the monitor with the paxos
transaction with a newer "last_committed" than the one created using
update_paxos() in ceph_monstore_tool.cc. the latter "last_committed" is
always 0.
so, to avoid this extra paxos proposal obsoleting the "rebuilding" paxos
transaction, we use a large enough number for {first,last}_committed.
Sage Weil [Fri, 4 Jun 2021 17:49:40 +0000 (12:49 -0500)]
mgr/telemetry: pass leaderboard flag even w/o ident
Allow non-identified clusters to appear in the leaderboard.
The leaderboard option still defaults to false, so the change here
is that if they opt in to leaderboard but not ident we'll see
that on the backend.
Note that a leaderboard still does not exist (yet), so this doesn't
have any immediate impact. But if/when we do create one, it will
allow us to show big clusters (that opt in) on the leaderboard
as 'unidentified' or similar.
Conflicts:
src/tools/rbd_mirror/ImageReplayer.cc (FunctionContext vs LambdaContext,
update stop's args in handle_remote_journal_metadata_updated)
src/tools/rbd_mirror/ImageReplayer.h (Mutex vs ceph::mutex)
Jason Dillaman [Fri, 17 Apr 2020 15:17:05 +0000 (11:17 -0400)]
rbd-mirror: track in-flight start/stop/restart in instance replayer
The shut down waits for in-flight ops to complete but the
start/stop/restart operations were previously not tracked. This
could cause a potential race and crash between an image replayer
operation and the instance replayer shutting down.
Fixes: https://tracker.ceph.com/issues/45072 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 31140a940ea1909c4b5d68ef4593cb582a527354)
Conflicts:
src/tools/rbd_mirror/InstanceReplayer.cc:
Mutex::Locker vs std::lock_guard,
m_local_rados->cct() vs m_local_io_ctx.cct(),
no stop(Context *on_finish) function.
Jason Dillaman [Tue, 8 Dec 2020 19:16:49 +0000 (14:16 -0500)]
librbd/deep_copy: added new migrating flag to object copy
The migration operation and the copyup state machine will set
this flag when attempting to perform a deep-copy due to a
live-migration.
This flag will prevent a possible race condition between the
start of the object deep-copy when migration was enabled and
the writing portion of the deep-copy when migration might
have completed via external means.
Fixes: https://tracker.ceph.com/issues/45694 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 1baba64e213cb808804796575d3f7969cf37a3c6)
Conflicts:
src/librbd/deep_copy/ImageCopyRequest.cc (FunctionContext vs LambdaContext, no handler param for ObjectCopyRequest)
src/librbd/deep_copy/ObjectCopyRequest.cc
src/librbd/deep_copy/ObjectCopyRequest.h
src/librbd/io/CopyupRequest.cc
src/librbd/operation/MigrateRequest.cc
src/test/librbd/deep_copy/test_mock_ImageCopyRequest.cc
src/test/librbd/deep_copy/test_mock_ObjectCopyRequest.cc
src/test/librbd/io/test_mock_CopyupRequest.cc
(no handler param for ObjectCopyRequest)
After mkfs the store may not yet contain monmap:last_committed but
might be respawning after setting mon_sync:temp_newer_monmap.
Load that stashed map before falling back to the mkfs:monmap.
Fixes: https://tracker.ceph.com/issues/50230 Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
(cherry picked from commit cc0b4c77753962717da8a280a585990f7eec3c7b)
Igor Fedotov [Wed, 19 May 2021 23:17:21 +0000 (02:17 +0300)]
os/bluestore: introduce multithireading sync for bluestore's repairer
In quick-fix mode bluestore uses 2 threads by default to perform the
repair. Due to lacking synchronization they might corrupt repair
transaction batch.
Fixes: https://tracker.ceph.com/issues/50017 Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 38c5b04235402a7908bc4713f617d767ca9fdc56)
Conflicts:
src/os/bluestore/BlueStore.cc - future stuff attempted to sneak
in
src/os/bluestore/BlueStore.h - the same as above
Tatjana Dehler [Thu, 27 May 2021 09:46:50 +0000 (11:46 +0200)]
mgr/dashboard: show partially deleted RBDs
An RBD might be partially deleted if the deletion
process has been started but was interrupted. In
this case return the RBD as part of the RBD list
and mark it as partially deleted.
Fixes: https://tracker.ceph.com/issues/48603 Signed-off-by: Tatjana Dehler <tdehler@suse.com>
(cherry picked from commit d83c277ac1861df31d2a39d16e20c7bebbea676e)
Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/block/rbd-details/rbd-details.component.html
src/pybind/mgr/dashboard/frontend/src/app/ceph/block/rbd-list/rbd-list.component.html
src/pybind/mgr/dashboard/frontend/src/app/ceph/block/rbd-list/rbd-list.component.spec.ts
src/pybind/mgr/dashboard/frontend/src/app/ceph/block/rbd-list/rbd-list.component.ts
src/pybind/mgr/dashboard/services/rbd.py
src/pybind/mgr/dashboard/tests/test_rbd_service.py
Resolved various conflicts because nautilus and
master diverged a lot.
Neha Ojha [Thu, 3 Jun 2021 16:25:01 +0000 (16:25 +0000)]
osd/PG.cc: handle removal of pgmeta object
In 7f04700, we made the pg removal code
much more efficient. But it started marking the pgmeta object as an unexpected
onode, which in reality is expected to be removed after all the other objects.
This behavior is very easily reproducible in a vstart cluster:
ceph osd pool create test 1 1
rados -p test bench 10 write --no-cleanup
ceph osd pool delete test test --yes-i-really-really-mean-it
Before this patch:
"do_delete_work additional unexpected onode list (new onodes has appeared
since PG removal started[#2:00000000::::head#]" seen in the OSD logs.
After this patch:
"do_delete_work removing pgmeta object #2:00000000::::head#" is seen.
Related to:https://tracker.ceph.com/issues/50466 Signed-off-by: Neha Ojha <nojha@redhat.com>
Manually applied 0e917f1b1e18ca9e48b3f91110d3a46b086f7d83, because
nautilus does not have do_delete_work.
Due to bugs in cache managment in blkid, there are possible to have
nonexistence entries. This entries breaks ceph-volume operations by
passing two or more outputs instead of one (eg. /dev/sdk2).
Kefu Chai [Thu, 20 May 2021 05:55:13 +0000 (13:55 +0800)]
os/bluestore/bluestore_tool: compare retval stat() with -1
before this change, stat() is always called to check if the
file specified by --dev-target exists even if this option is not
specified. also, we compare the retval of stat() with ENOENT, while
state() returns -1 on error.
after this change, stat() is called only if --dev-target is specified,
and we compare the retval of stat() with -1 and 0 only, so if
--dev-target option is not specified, the tool still hehaves.
Igor Fedotov [Fri, 19 Feb 2021 11:31:52 +0000 (14:31 +0300)]
ceph-volume: implement bluefs volume migration.
This is a wrapper over ceph-bluestore-tool's bluefs-bdev-migrate command.
Primarily intended to introduce LVM tags manipulation which
ceph-bluestore-tool is lacking.
Conflicts:
doc/man/8/ceph-volume.rst - a bit different formatting is in use
src/ceph-volume/ceph_volume/api/lvm.py - get_single_lv is the
new name for get_first_lv
Igor Fedotov [Mon, 17 May 2021 19:23:26 +0000 (22:23 +0300)]
os/bluestore: fix unexpected ENOSPC in Avl/Hybrid allocators.
Avl allocator mode was returning unexpected ENOSPC in first-fit mode if all size-
matching available extents were unaligned but applying the alignment made all of
them shorter than required. Since no lookup retry with smaller size -
ENOSPC is returned.
Additionally we should proceed with a lookup in best-fit mode even when
original size has been truncated to match the avail size.
(force_range_size_alloc==true)
Fixes: https://tracker.ceph.com/issues/50656 Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 0eed13a4969d02eeb23681519f2a23130e51ac59)
Conflicts:
src/test/objectstore/Allocator_test.cc - legacy INSTANTIATE_TEST_CASE_P clause is still used in Nautilus
Ilya Dryomov [Wed, 26 May 2021 12:21:22 +0000 (14:21 +0200)]
librbd: don't stop at the first unremovable image when purging
As there is no inherent ordering, there may be multiple removable
images past the unremovable image. On top of that, removing a clone
may make its parent removable so perform an additional pass if any
image gets removed.
monitoring/grafana: Remove erroneous elements in hosts-overview Grafana dashboard
The hosts-overview Grafana dashboard json file contains a repeated element, making
it invalid JSON. Some JSON parsers handle this. However, this prevents Jsonnet
from parsing the dashboard, which prevents the deployment of this dashboard via
Jsonnet.
Deepika Upadhyay [Wed, 26 May 2021 19:25:13 +0000 (00:55 +0530)]
nautilus: qa/upgrade: disable update_features test_notify with older client as lockowner
* with the recent support for async rbd operations from pacific+ when an
older client(non async support) goes on upgrade, and simultaneously
interacts with a newer client which expects the requests to be async,
experiences hang; considering the return code for request completion to
be acknowledgement for async request, which then keeps waiting for
another acknowledgement of request completion.
this if happens should be a rare only when lockowner is an old client
and should be deferred if compatibility issues arises.
* amend upgrade test workunits to use respective stable branches
Xiubo Li [Fri, 7 Aug 2020 07:45:52 +0000 (15:45 +0800)]
msg: throw a system error when center.init fails
In the libcephfs test case, it will run handreds of threads in
parallel, it will possibly reach the open files limit, but there
won't useful logs about what has happened.
This will just throw a system error, just like:
C++ exception with description "(24) Too many open files" thrown in the test body.
Fixes: https://tracker.ceph.com/issues/43039 Signed-off-by: Xiubo Li <xiubli@redhat.com>
(cherry picked from commit 6338050)
Conflicts:
src/msg/async/Stack.cc
- nautilus uses plain "i" as the for loop counter variable, while
master has more fancy "worker_id"
Xiubo Li [Fri, 7 Aug 2020 08:21:07 +0000 (16:21 +0800)]
libcephfs: ignore restoring the open files limit
Let's just ignore restoring the open files limit, the kernel will
defer releasing the file descriptors and then the process will be
possibly reachthe open files limit.
Fixes: https://tracker.ceph.com/issues/43039 Signed-off-by: Xiubo Li <xiubli@redhat.com>
(cherry picked from commit c871d68)
Patrick Donnelly [Mon, 22 Mar 2021 16:17:43 +0000 (09:17 -0700)]
mgr/pybind/volumes: avoid acquiring lock for thread count updates
Perform thread count updates in a dedicated tick thread. This avoids the
mgr Finisher thread from getting potentially hung via a mutex deadlock
in the cloner thread management.
Fixes: https://tracker.ceph.com/issues/49605 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit b27ddfaed4a3c66bac2343c8315a1fe542edb63e)
Kefu Chai [Tue, 25 May 2021 06:31:02 +0000 (14:31 +0800)]
mon/OSDMonitor: drop stale failure_info even if can_mark_down()
in a124ee85b03e15f4ea371358008ecac65f9f4e50, we add a check to drop
stale failure_info reports. but if osdmap does not prohibit us from
marking the osd in question down, the branch checking the stale info
is not executed. in general, it is allowed to mark an osd down, so
the fix of a124ee85b03e15f4ea371358008ecac65f9f4e50 just fails to
work.
in this change, we check for stale failure report of osd in question
as long as the osd is not marked down in the same function. this should
address the slow ops of failure report issue.
Ilya Dryomov [Mon, 17 May 2021 19:16:16 +0000 (21:16 +0200)]
mon/MonClient: tolerate a rotating key that is slightly out of date
Commit 918c12c2ab5d ("monclient: avoid key renew storm on clock skew")
made wait_auth_rotating() wait for a key set with a valid "current" key
(instead of any key set, including with all keys expired if the clocks
are skewed). While a good idea in general, this is a bit too stringent
because the monitors will hand out key sets with "current" key that is
_just_ about to expire. There is nothing wrong with that as "next" key
is also there, valid for the entire auth_service_ticket_ttl. So even
if the daemon is talking to the leader, it is possible to get a key set
with an expired "current" key. If the daemon is talking to a peon, it
is pretty easy to run into in practice. This, coupled with the fact
that _check_auth_rotating() explicitly allows the keys to go slightly
out of date, can lead to wait_auth_rotating() stalling the boot for up
to 30 seconds:
15:41:11.824+0000 1 ... ==== auth_reply(proto 2 0 (0) Success)
15:41:41.824+0000 0 monclient: wait_auth_rotating timed out after 30
15:41:41.824+0000 -1 mds.b unable to obtain rotating service keys; retrying
Apply the same 30 second or less tolerance in wait_auth_rotating().
client: Fix executeable access check for the root user
Executeable permission check always returned sucessful
even when executeable bit is not set on any of the user,
group or others. This patch fixes it by overiding
executeable permission check for root only if one of
the executeable bit is set
Conflicts:
src/client/Client.cc: The commit 6aa78836548f (cephfs errno aliases) is not present in
nautilus and some other trivial conflict, may be because some patches are missing
in nautilus.