Nathan Cutler [Mon, 23 Mar 2020 15:46:02 +0000 (16:46 +0100)]
script/ceph-backport.sh: set target_branch in API case
When we falling back to the GitHub API to determine the milestone
number, we were not initializing target_branch, so the script was
broken for octopus backports.
Sage Weil [Tue, 24 Mar 2020 01:00:58 +0000 (20:00 -0500)]
Merge PR #34115 into octopus
* refs/pull/34115/head:
doc/releases/octopus: drop stray line
doc/releases/octopus: note about repository locations
doc/releases: include octopus in index
doc/install/get-packages: update package install instructions
doc/releases/octopus: final notes
Neha [Sun, 22 Mar 2020 20:01:23 +0000 (20:01 +0000)]
qa/*/osd-backfill-recovery-log.sh: flush_pg_stats before checking log length
It is possible for the pg dump to not be the latest when we check for newprimary
in _common_test(). This is because mgr_stats_period is 5 seconds, and we may not
have fetched the latest stats just yet. This causes the test to look at the same
stats before and after wait_for_clean.
Sage Weil [Mon, 23 Mar 2020 13:24:06 +0000 (08:24 -0500)]
Merge PR #34105 into master
* refs/pull/34105/head:
Merge PR #34042 into octopus
Merge PR #33959 into octopus
Merge PR #34067 into octopus
mgr/DaemonServer: add explicit check that acting matches for merge
Merge pull request #34040 from dillaman/wip-44396-partial-fix
Merge PR #34098 into octopus
mgr/rook: list rgw services
mgr/rook: tolerate timestamps that are None
mgr/orch: add 'subcluster' property to RGWSpec
mgr/rook: do not create radosgw pools
mgr/rook: refactor apply/add for rgw
Merge PR #34082 into octopus
Merge PR #34068 into octopus
cephadm: relabel /etc/ganesha mount
Merge PR #34046 into octopus
Merge PR #34092 into octopus
Merge pull request #33719 from ukernel/wip-44416
rbd-mirror: leader watcher should not cancel get locker if locker is invalid
rbd-mirror: snapshot sync request needs to check for interruption
librbd: request exclusive lock when moving to trash
rbd-mirror: basic integration with sync throttling
rbd-mirror: don't prematurely finish snapshot replay loop
rbd-mirror: pass InstanceWatcher to snapshot Replayer
doc/releases/octopus.rst: add note about ec recovery below min_size
mgr/cephadm: configure rgw_frontends for rgw service
cephadm: switch grafana image to the ceph repo
Merge PR #34034 into octopus
qa/suites/rados/cephadm/upgrade: update starting version
Merge PR #33540 into octopus
Merge PR #34023 into octopus
Merge PR #34044 into octopus
Merge PR #34030 into octopus
doc/orchestrator: update rgw creation
mgr/cephadm: clean up client.crash.* container_image settings after upgrade
cephadm: make add-repo --release and --version independent
cephadm: env over last used
mgr/orch: accept port and ssl flags to 'apply rgw'
mgr/orch: 'ceph upgrade ...' -> 'ceph orch upgrade ...'
cephadm: fall back to default for infer_image
cephadm: remove outdated check
cephadm: consolidate default image logic
remove ceph_test_rados_watch_notify
python-common/ceph/deployment/service_spec: add ssl to RGWSpec
cephadm: only infer image for shell, run, inspect-image, pull, ceph-volume
mgr/test_orchestrator: fix service filtering when using dummy data
mgr/dashboard: fix adding/removing host errors
mgr/rook: fix 'orch ps' for osds
qa: fix all the fsx.sh-invoking yaml files to install dependencies
mds: pass proper MutationImpl::LockOp to Locker::wrlock_start()
Reviewed-by: Kiefer Chang <kiefer.chang@suse.com> Reviewed-by: Laura Paduano <lpaduano@suse.com>
The informaction about SocketConnection::side and
SocketConnection::ephemeral_port are not up-to-date in the log, because
they are not moved with Socket during connection replacement. They are
actually socket-level information.
Kefu Chai [Sat, 21 Mar 2020 12:18:50 +0000 (20:18 +0800)]
crimson/admin: do not reset connected_sock before closing
* no need to discard_result(). as `output_stream::close()` returns an
empty future<> already
* free the connected socket after the background task finishes, because:
we should not free the connected socket before the promise referencing it is fulfilled.
otherwise we have error messages from ASan, like
==287182==ERROR: AddressSanitizer: heap-use-after-free on address 0x611000019aa0 at pc 0x55e2ae2de882 bp 0x7fff7e2bf080 sp 0x7fff7e2bf078
READ of size 8 at 0x611000019aa0 thread T0
#0 0x55e2ae2de881 in seastar::reactor_backend_aio::await_events(int, __sigset_t const*) ../src/seastar/src/core/reactor_backend.cc:396
#1 0x55e2ae2dfb59 in seastar::reactor_backend_aio::reap_kernel_completions() ../src/seastar/src/core/reactor_backend.cc:428
#2 0x55e2adbea397 in seastar::reactor::reap_kernel_completions_pollfn::poll() (/var/ssd/ceph/build/bin/crimson-osd+0x155e9397)
#3 0x55e2adaec6d0 in seastar::reactor::poll_once() ../src/seastar/src/core/reactor.cc:2789
#4 0x55e2adae7cf7 in operator() ../src/seastar/src/core/reactor.cc:2687
#5 0x55e2adb7c595 in __invoke_impl<bool, seastar::reactor::run()::<lambda()>&> /usr/include/c++/10/bits/invoke.h:60
#6 0x55e2adb699b0 in __invoke_r<bool, seastar::reactor::run()::<lambda()>&> /usr/include/c++/10/bits/invoke.h:113
#7 0x55e2adb50222 in _M_invoke /usr/include/c++/10/bits/std_function.h:291
#8 0x55e2adc2ba00 in std::function<bool ()>::operator()() const /usr/include/c++/10/bits/std_function.h:622
#9 0x55e2adaea491 in seastar::reactor::run() ../src/seastar/src/core/reactor.cc:2713
#10 0x55e2ad98f1c7 in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) ../src/seastar/src/core/app-template.cc:199
#11 0x55e2a9e57538 in main ../src/crimson/osd/main.cc:148
#12 0x7fae7f20de0a in __libc_start_main ../csu/libc-start.c:308
#13 0x55e2a9d431e9 in _start (/var/ssd/ceph/build/bin/crimson-osd+0x117421e9)
0x611000019aa0 is located 96 bytes inside of 240-byte region [0x611000019a40,0x611000019b30)
freed by thread T0 here:
#0 0x7fae80a4e487 in operator delete(void*, unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.6+0xac487)
#1 0x55e2ae302a0a in seastar::aio_pollable_fd_state::~aio_pollable_fd_state() ../src/seastar/src/core/reactor_backend.cc:458
#2 0x55e2ae2e1059 in seastar::reactor_backend_aio::forget(seastar::pollable_fd_state&) ../src/seastar/src/core/reactor_backend.cc:524
#3 0x55e2adab9b9a in seastar::pollable_fd_state::forget() ../src/seastar/src/core/reactor.cc:1396
#4 0x55e2adab9d05 in seastar::intrusive_ptr_release(seastar::pollable_fd_state*) ../src/seastar/src/core/reactor.cc:1401
#5 0x55e2ace1b72b in boost::intrusive_ptr<seastar::pollable_fd_state>::~intrusive_ptr() /opt/ceph/include/boost/smart_ptr/intrusive_ptr.hpp:98
#6 0x55e2ace115a5 in seastar::pollable_fd::~pollable_fd() ../src/seastar/include/seastar/core/internal/pollable_fd.hh:109
#7 0x55e2ae0ed35c in seastar::net::posix_server_socket_impl::~posix_server_socket_impl() ../src/seastar/include/seastar/net/posix-stack.hh:161
#8 0x55e2ae0ed3cf in seastar::net::posix_server_socket_impl::~posix_server_socket_impl() ../src/seastar/include/seastar/net/posix-stack.hh:161
#9 0x55e2ae0ed943 in std::default_delete<seastar::net::api_v2::server_socket_impl>::operator()(seastar::net::api_v2::server_socket_impl*) const /usr/include/c++/10/bits/unique_ptr.h:81
#10 0x55e2ae0db357 in std::unique_ptr<seastar::net::api_v2::server_socket_impl, std::default_delete<seastar::net::api_v2::server_socket_impl> >::~unique_ptr()
/usr/include/c++/10/bits/unique_ptr.h:357 #11 0x55e2ae1438b7 in seastar::api_v2::server_socket::~server_socket() ../src/seastar/src/net/stack.cc:195
#12 0x55e2aa1c7656 in std::_Optional_payload_base<seastar::api_v2::server_socket>::_M_destroy() /usr/include/c++/10/optional:260
#13 0x55e2aa16c84b in std::_Optional_payload_base<seastar::api_v2::server_socket>::_M_reset() /usr/include/c++/10/optional:280
#14 0x55e2ac24b2b7 in std::_Optional_base_impl<seastar::api_v2::server_socket, std::_Optional_base<seastar::api_v2::server_socket, false, false> >::_M_reset() /usr/include/c++/10/optional:432
#15 0x55e2ac23f37b in std::optional<seastar::api_v2::server_socket>::reset() /usr/include/c++/10/optional:975
#16 0x55e2ac21a2e7 in crimson::admin::AdminSocket::stop() ../src/crimson/admin/admin_socket.cc:265
#17 0x55e2aa099825 in operator() ../src/crimson/osd/osd.cc:450
#18 0x55e2aa0d4e3e in apply ../src/seastar/include/seastar/core/apply.hh:36
Sage Weil [Sun, 22 Mar 2020 23:32:11 +0000 (18:32 -0500)]
Merge PR #34042 into octopus
* refs/pull/34042/head:
mgr/rook: list rgw services
mgr/rook: tolerate timestamps that are None
mgr/orch: add 'subcluster' property to RGWSpec
mgr/rook: do not create radosgw pools
mgr/rook: refactor apply/add for rgw
mgr/cephadm: configure rgw_frontends for rgw service
mgr/orch: accept port and ssl flags to 'apply rgw'
python-common/ceph/deployment/service_spec: add ssl to RGWSpec
mgr/rook: fix 'orch ps' for osds
Kefu Chai [Sat, 21 Mar 2020 06:07:40 +0000 (14:07 +0800)]
cephadm: init config and keyring with None
and we should not assume that both `config` and `keying` are specified
when calling this method. because, for instance, `create_daemon_dirs()`
does handle the case where `config` and/or `keyring` is not specified.
Sage Weil [Fri, 20 Mar 2020 18:56:47 +0000 (14:56 -0400)]
mgr/rook: do not create radosgw pools
First, we don't know how big they should be or what they should look like.
The caller should already know that, and/or radosgw can create the pools
itself.
This depends on https://github.com/rook/rook/pull/5058
Sage Weil [Wed, 18 Mar 2020 21:20:12 +0000 (17:20 -0400)]
mgr/rook: refactor apply/add for rgw
A few caveats here:
- enforce that realm == zone, since that is all rook does at the moment.
- we force a (bad!) pool configuration, since rook requires that these
be present (instead of allowing radosgw or the caller to create the pools)
Jason Dillaman [Fri, 20 Mar 2020 16:59:14 +0000 (12:59 -0400)]
rbd-mirror: leader watcher should not cancel get locker if locker is invalid
When a new leader acquires the lock, it will send out a lock acquired
notification along with periodic heartbeats. The get locker will attempt to
run immediately, but if a heartbeat arrives before it executes the heartbeat
will cancel the timer and reschedule it for the future. This process repeats
for each periodic heartbeat and the locker is never re-read from the OSD.
This is an issue only for namespace replayers due to the delayed fashion in
which the leader instance id is retrieved.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Fri, 20 Mar 2020 14:54:43 +0000 (10:54 -0400)]
rbd-mirror: snapshot sync request needs to check for interruption
If the sync request was locally canceled, we need to resume the paused
shut down logic instead of just notifying the image replayer state
machine of the change -- since it had already requested a shut down and
will not re-request it.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Thu, 19 Mar 2020 14:57:03 +0000 (10:57 -0400)]
librbd: request exclusive lock when moving to trash
Even if the image is in-use, moving it to the trash does not
remove any data. This also solves a race between snapshot-based
mirroring shutting down and being able to move a mirrored image
to the trash.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Wed, 18 Mar 2020 19:01:32 +0000 (15:01 -0400)]
rbd-mirror: basic integration with sync throttling
snapshot-based mirroring did not have any throttling to prevent
too many concurrent syncs from running. Since each sync might need
to iterate over every object of an image, that could potentially
put an extreme burden on the remote cluster.
A future PR will add a more intelligent throttle based on the actual
number of objects needed to be scanned.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Although it is not necessary to mark_down the connection in its
ms_handle_reset() event, but it can be more convenient to allow it.
And Heartbeat already encounters this assertion failure.
So move the assertion to close_clean() which will help identify problems
if we happen to make ms_handle_reset() wait for messenger shutdown.
Yingxin Cheng [Fri, 13 Mar 2020 06:22:40 +0000 (14:22 +0800)]
crimson/net: change close() to mark_down()
* be explicit that mark_down() won't trigger reset event;
* return void so no deadlock is possible and memory is still safe
guarded by Messenger::shutdown();
* related changes in crimson/osd;