Changcheng Liu [Thu, 23 Jul 2020 03:09:46 +0000 (11:09 +0800)]
doc: specify RBD_LOCK_MODE_EXCLUSIVE for exclusive-lock
The exclusive-lock could be transited transparently between clients
after finishing write operation. To disable "transparent" transition,
it needs to acquire the lock with RBD_LOCK_MODE_EXCLUSIVE.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
Jason Dillaman [Fri, 24 Jul 2020 16:13:10 +0000 (12:13 -0400)]
librbd: use task finisher thread for image open/close callbacks
There was a potential race condition with utilizing the AsioEngine
to deliver asynchronous image open and close callbacks. This left
the potential for the io_context thread to attempt to destroy itself.
This commit changes the behavior of the image open and close callbacks
to always delete the ImageCtx (now matches the synchronous API behavior)
and it always invokes the callback in Finisher thread whose lifetime is
tied to the CephContext.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
`ceph-volume lvm zap` command fails under certain conditions.
when passing `--osd-id` or `--osd-fsid` to `ceph-volume lvm zap` command
it tries to zap additionnal devices that have nothing to do with the osd
being zapped.
When calling `api.get_lvs()` in `ensure_associated_lvs()` we have to
pass the osd-id/osd-fsid information so only related devices are
returned by `get_lvs()` method
crimson/os/alienstore: always use fsid in bluestore
alienstore should not be stateful in this perspective, it should proxy
all acccess of fsid to bluestore.
there are couple issues in existing implementation:
* when mkfs, bluestore tries to generate a new osd_fsid if the specified
one is empty. but we explicitly pass the given uuid down to
AlienStore::mkfs() so the bluestore can use it. so we should pass it
down instad of storing it locally.
* when persisting superblock in OSD::mkfs(), superblock.osd_fsid() is
read from store->get_fsid(), if user specifies an empty uuid, we
should persist the generated uuid in the superblock.
in this change, all access to fsid is proxied to the underlying
bluestore.
osd sends a MOSDMarkMeDown message to monitor and waits for its ack
before timeout, so if we can stop osd before stopping mon, stop.sh can
return sooner without waiting until the timeout.
to avoid the attempts to connect an OSD which is bound to a v2
address to a v1 address of a mgr.
in general, osd is bound to both v1 and v2 addresses, but crimson
msgr does not support multiple bound address at the time of writing, so
to avoid the failures when trying to connect to incompatible addresses,
let's filter out them when connecting to monitor. this change
silence warnings like:
peer_addr_for_me v1:172.21.15.106:60008/0 type doesn't match myaddr
v2:0.0.0.0:6802/26710
to avoid the attempts to connect an OSD which is bound to a v2 address
to a v1 addrss of a monitor.
in general, osd is bound to both v1 and v2 addresses, but crimson msgr
does not support multiple bound address at the time of writing, so to
avoid the failures when trying to connect to incompatible addresses,
let's filter out them when connecting to monitor. this change silence
warnings like:
peer_addr_for_me v1:172.21.15.106:60008/0 type doesn't match myaddr
v2:0.0.0.0:6802/26710
This will crash the choose_acting() procedure as it will mistakenly
think that peer 3 should continue to perform asynchronous recovery
(e.g., due to num_objects_missing = 1) in contrast to fully
backfill-recovered.
While I did not dig into the real cause, there are a couple of
possible explanations of how num_objects can be off. I think that
if a roll forward or log replay could delete something twice, maybe
there would be an undercount. Or maybe something as simple as a
corruption.
Since _update_calc_stats() is going to fix num_objects_missing
for that peer anyway, let's make sure it always starts with a
clean state.
crimson/net: do not reset need_addr before learning it
because we don't bind both v1 and v2 addresses, when monitor returns a
v1 peer address, as the client side, crimson-osd just drops the
connection. but this failed attempt to learn the myaddr resets
`need_addr`. and this prevents crimson-osd from learning the v2 address
returned by monitor.
in this change, we reset need_addr only after it is learned from the
peer.
* add --flavor option, which is "default" by default, so one can, for
example, pass "--flavor crimson" to ceph-debug-docker
* extract $repo_url to avoid repeating the shared bits between centos
and debian derivatives envs.
Adam King [Fri, 10 Jul 2020 12:09:39 +0000 (08:09 -0400)]
mgr/cephadm: allow use of authenticated registry
Add option to use custom authenticated registry during
bootstrap as well as a registry-login command in order
to let user change authenticated registry login info
Fixes: https://tracker.ceph.com/issues/44886 Signed-off-by: Adam King <adking@redhat.com>
ceph.spec.in: cull _FORTIFY_SOURCE macro from CXXFLAGS for seastar
seastar uses setjmp() and longjmp() to implement coroutine, but
longjmp() is defined as ____longjmp_chk() by GCC if _FORTIFY_SOURC is
defined. ____longjmp_chk() simply bails out with an error message if
the dest stack pointer is higher than the src stack pointer, or the dest
stack pointer is not in the sigaltstack. in the case of seastar, the dst
%sp is not necessarily higher than src stack pointer, and it's not
handling a signal for switching the thread context. that's why we have
the "longjmp causes uninitialized stack frame" error when running
crimson-osd on RHEL/CentOS 8 using the prebuilt rpm packages.
the optflags rpm macro adds -D_FORTIFY_SOURCE=2 to CFLAGS and CXXFLAGS,
so even seastar tries to pass -U_FORTIFY_SOURCE to GCC, there is chance
that cmake append CXXFLAGS at the end of the option list passed to GCC.
and this renders seastar's attempt to undefine _FORTIFY_SOURCE useless.
another way to address this issue is to undefine this macro in
seastar:src/core/thread.cc. but since seastar tries neutralize the macro
in its cmake script instead of source file, i assume they have their
considerations. let's drop it in the rpm recipe instead.
8987f94416f453829eae6dda08837ef5a42531c6 introduced the osd_lock for the
bench command. Taking the osd_lock in bench can lead to deadlocks, causing the
command to hang as seen in https://tracker.ceph.com/issues/43888.
Fixes: https://tracker.ceph.com/issues/43888 Signed-off-by: Adam Kupczyk <akupczyk@redhat.com> Signed-off-by: Neha Ojha <nojha@redhat.com>
Sebastian Wagner [Wed, 22 Jul 2020 20:12:00 +0000 (22:12 +0200)]
Merge pull request #35667 from jschmid1/cephadm_deterministic_simplescheduler
mgr/cephadm: rework --dry-run/previews
Reviewed-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com> Reviewed-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com> Reviewed-by: Sebastian Wagner <sebastian.wagner@suse.com> Reviewed-by: Stephan Müller <smueller@suse.com>
Jason Dillaman [Wed, 22 Jul 2020 15:25:56 +0000 (11:25 -0400)]
librbd: flush all queued object IO from simple scheduler
Normally IO is tracked via the AioCompletion's async_op but the
scheduler will "complete" writes while the IO might be still
executing. Therefore, prior to shutting down this dispatch layer
we need to wait for all IO to complete.
Fixes: https://tracker.ceph.com/issues/46668 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
* default to centos:8, as we've moved to centos:8 now
* do not assume that the base image is centos:7, use centos:8 if it is
specified.
* install python3-* packages for centos:8 and install python36-*
packages for centos:7. as el8 is now a python3 distro, and
centos:7 now has python36.
* s/screen/tmux/. because screen is now offered by EPEL, while tmux
is in BaseOS.