Conflicts:
src/tools/rbd_mirror/Mirror.cc (std::lock_guard vs Mutex::Locker, ceph_abort_msgf does not exist)
src/tools/rbd_mirror/PoolReplayer.cc (std::lock_guard vs Mutex::Locker)
Ali Maredia [Mon, 25 Nov 2019 02:30:03 +0000 (21:30 -0500)]
mimic: update s3-test download code for s3-test tasks
Fixes: https://tracker.ceph.com/issues/43077
- Ensure the download code for all tasks running
s3-tests is consistent.
- Simplify download code to only use the config
variable 'force-branch' for the branch being
cloned.
- Make ceph-mimic the force-branch for all
suites using s3-tests.
- Add force-branch to suites running s3readwrite
& s3roundtrip tasks
Sage Weil [Fri, 15 Feb 2019 14:43:23 +0000 (08:43 -0600)]
osd: do not send peers really old maps
We may receive a message that sat in a queue for a while with a low
priority and is tagged with an older epoch. Don't send a bunch of old
maps that we have already sent the peer.
Jan Fajerski [Wed, 13 Nov 2019 09:13:01 +0000 (10:13 +0100)]
ceph-volume: assume msgrV1 for all branches containing mimic
With nautilus and newer OSDs listen on v1 ports and v2 ports. Assume
that if mimic (or luminous) occur in the branch name, the OSDs are
running msgrv1 only.
Fixes: https://tracker.ceph.com/issues/42791 Signed-off-by: Jan Fajerski <jfajerski@suse.com>
(cherry picked from commit b8754919df61b118200e210e0bfc8d6df0261dfd)
Jason Dillaman [Wed, 16 Oct 2019 00:42:29 +0000 (20:42 -0400)]
test/cls_rbd: removed mirror peer pool test cases
The mirror peer pool id has never been used and has been dropped
from the Octopus release. This will fix the breakage in the test
cases where the pool id was tested.
Fixes: https://tracker.ceph.com/issues/42333 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit f0492ebd677cb8cb4656fc8f1dc871db1a6e7753)
Conflicts:
src/cls/rbd/cls_rbd_client.cc
src/cls/rbd/cls_rbd_client.h
- I could not find the reason for the conflicts; it seems that git
got confused. Cherry-picked the changes in these two files manually.
Ilya Dryomov [Thu, 24 Oct 2019 15:35:23 +0000 (17:35 +0200)]
krbd: retry on an empty list from udev_enumerate_scan_devices()
systemd 219 doesn't have the issue that is worked around in the
previous commit, but has a different one: udev_enumerate_scan_devices()
always succeeds, but sometimes returns an empty list when the device is
actually there. This happens rarely and at random so I haven't been
able to get to the bottom of it yet, but it looks like another similar
race condition in libudev.
Since an empty list is expected if the device isn't there, retry just
twice with a small sleep in-between. This appears to be enough: I got
7 occurrences per 600000 "rbd unmap" invocations, all of which needed
a single retry:
Ilya Dryomov [Mon, 7 Oct 2019 13:32:39 +0000 (15:32 +0200)]
krbd: retry on transient errors from udev_enumerate_scan_devices()
udev_enumerate_scan_devices() doesn't handle disappearing devices well.
If called while some devices are being removed, it sometimes propagates
ENOENT and ENODEV errors encountered operating on directory entries in
/sys that no longer exist. Some of these errors are suppressed, but
this isn't reliable and varies across versions. In particular, systemd
239 suppresses ENODEV from sd_device_new_from_syspath() but doesn't
suppress ENODEV from sd_device_get_devnum(). In systemd 243 the call
to sd_device_get_devnum() has been moved, but it still leaks ENOENT
from sd_device_get_is_initialized() (referring to the body of
FOREACH_DIRENT_ALL loop in enumerator_scan_dir_and_add_devices()).
Assume that all ENOENT and ENODEV errors are transient and retry the
call to udev_enumerate_scan_devices(). Don't limit the number, but log
each retry.
Ilya Dryomov [Mon, 14 Oct 2019 10:40:43 +0000 (12:40 +0200)]
krbd: increase udev netlink socket receive buffer to 2M
Even though with the previous commit we no longer block between binding
the socket and starting handling events, we still want a larger receive
buffer to accommodate for scheduling delays. Since the filtering is
done in the listener, an estimate focused on just rbd is not accurate,
but anyway: a pair of "rbd" and "block" events for "rbd map" take 2048
bytes in the receive buffer. This allows for roughly a thousand of
them ("rbd map" and "rbd unmap" require root and libudev makes use of
SO_RCVBUFFORCE so rmem_max limit is ignored).
Because the event(s) we are interested in can be deliveled while we are
still in the kernel finishing map or unmap, we start listening for udev
events before going into the kernel. However, if (un)mapping takes its
time, udev netlink socket can be fairly easily overrun -- the filtering
is done on the listener side, so we get to process everything, not just
rbd events. If any of the events of interest get dropped (ENOBUFS), we
hang in poll().
Go into the kernel in a separate thread and leave the main thread to
run the event loop. The return value is communicated to the reactor
though a pipe.
Ilya Dryomov [Thu, 10 Oct 2019 08:49:17 +0000 (10:49 +0200)]
krbd: separate event reaping from event processing
Move event processing into UdevMapHandler and UdevUnmapHandler
functors and replace wait_for_udev_{add,remove}() with a single
wait_for_mapping() template.
This timeout was added as a (very poor) workaround for an issue
addressed in commit 42dd1eae630f ("krbd: fix rbd map hang due to udev
return subsystem unordered").
Kefu Chai [Thu, 7 Feb 2019 13:13:14 +0000 (21:13 +0800)]
msg/msg_types.h: do not cast `ceph_entity_name` to `entity_name_t` for printing
in GCC-9, `-Waddress-of-packed-member` is enabled, so we have warnings like:
src/msg/msg_types.h:142:41: warning: converting a packed 'const
ceph_entity_name' pointer (alignment 1) to a 'const entity_name_t'
pointer (alignment 8) may result in an unaligned pointer value
[-Waddress-of-packed-member]
142 | return out << *(const entity_name_t*)&addr;
| ^~~~
since the alignment of these two structures are different, we cannot
cast a structure with the alignment of 1 to a structure with the
alignment of 8. as the code generated by compiler accessing the members
of alignment 8 won't work with the members of alignment 1, we need to
create a temporary structure for printing it.
Kefu Chai [Fri, 3 Aug 2018 09:27:20 +0000 (17:27 +0800)]
qa/suites/fs: add python3-cephfs to packages
the default set of packages to install is in
$suite/qa/packages/packages.yaml . see get_package_list() in
teuthology/teuthology/task/install/__init__.py for how we prepare a
package list for install task.
for running python3 tests in
fs/basic_functional/tasks/volume-client, we need to install
python3-cephfs. please note that,
_package_override() in teuthology/teutholoy/task/install/rpm.py will
take care of the different naming on centos/rhel, where the python3
packages are named python34-*.
task.install.rpm installs packages listed in
$suites/qa/packages/packages.yaml, the packge list applies to the
upgrade tests also. but we don't have python3 bindings packages in jewel
-- they were introduced in kraken.
Boris Ranto [Thu, 24 Oct 2019 14:54:05 +0000 (16:54 +0200)]
restful: Query nodes_by_id for items
The node dict that is passed to the _gather_leaf_ids function from the
_gather_osds function does not have 'items' in it. We also can't use
buckets at this point since those only exist for leaf nodes, not all
nodes.
We need to query the nodes_by_id dict to get 'items' for a node inside
the _gather_leaf_ids function instead.
xie xingguo [Wed, 26 Jun 2019 06:24:08 +0000 (14:24 +0800)]
osd/OSD: auto mark heartbeat sessions as stale and tear them down
The primary benefit is that the OSD doesn't need to keep a flood of
blocked heartbeat messages around in memory.
This prevents OSDs from accumulating heartbeat messages due to a
broken switch and then exhausting the whole node's memory:
Jun 11 04:19:26 host-192-168-9-12 kernel: [409881.137077] Out of memory:
Kill process 1471476 (ceph-osd) score 47 or sacrifice child
Jun 11 04:19:26 host-192-168-9-12 kernel: [409881.146054] Killed process 1471476 (ceph-osd) total-vm:4822548kB, anon-rss:3097860kB,
file-rss:2556kB, shmem-rss:0kB