Matt Benjamin [Thu, 5 Sep 2019 15:38:56 +0000 (11:38 -0400)]
rgw: crypt: permit RGW-AUTO/default with SSE-S3 headers
Permit the existing logic for encrypton by a global master key
to take effect when a client has requested AES256 server-side encryption
with S3 managed keys, as well as SSE-KMS.
Fixes: https://tracker.ceph.com/issues/41670 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 80bffd9ae12f6b5846cf8efbffda71e9f921e18f)
Igor Fedotov [Mon, 7 Oct 2019 13:39:20 +0000 (16:39 +0300)]
os/bluestore: fix improper setting of STATE_KV_SUBMITTED.
Fixes: https://tracker.ceph.com/issues/42209
The issue is Nautilus and earlier releases specific as master already has
some changes making the case even worse and then fixing the whole bunch.
See https://tracker.ceph.com/issues/42189
Matt Benjamin [Wed, 5 Jun 2019 17:25:32 +0000 (13:25 -0400)]
rgw/OutputDataSocket: actually discard data on full buffer
A dout message in OutputDataSocket::append_output() states that
data will be dropped when appending would cause data_max_backlog
to be exceeded--but the method appends it anyway.
Log output discards at level 0, as messages will be lost. Suppress
repeated warnings mod 100. Switch to vector.
Fixes: http://tracker.ceph.com/issues/40178 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit c806b825dae649829de8847d36cb21ffd2bbee8e)
Tianshan Qu [Sun, 11 Nov 2018 11:56:51 +0000 (19:56 +0800)]
rgw: set null version object acl issues
1.set null version object acl will create empty index
RGWRados::set_attrs did not clear instance, so index prepare, complete got instance=null,
which lead to empty index 1000_<obj>_i_null.
there is no harm to create empty index, but listomapkeys to find that key.
2.if object is exist with versioned key, we can set none exists null version object
order:
1) enable bucket version
2) put obj
3) disable bucket version
4) set versoned_id=null acl will succeed which should not
Casey Bodley [Mon, 6 May 2019 19:01:07 +0000 (15:01 -0400)]
rgw: delete_obj_index() takes mtime for bilog
writing an empty timestamp to the bilog prevents other zones from
applying the delete. this means that the --bypass-gc flag for
'radosgw-admin bucket rm' doesn't work in multisite
metadata sync of a new bucket entrypoint may call rgw_link_bucket()
(which in turn calls into cls user) without deleting/unlinking the
previous bucket entrypoint. this prevented the new bucket entrypoint
from overwriting the creation_time of the old one
Conflicts:
src/rgw/rgw_admin.cc
- cherry-pick was clean, but there was a build failure "error: 'class RGWRados'
has no member named 'svc'", which was fixed by making the following change:
Neha Ojha [Mon, 11 Nov 2019 21:32:15 +0000 (13:32 -0800)]
osd/OSDMap.cc: don't output over/underfull messages to lderr
There can be cases where overfull and underfull(see example in
https://tracker.ceph.com/issues/42756) will be empty, which is not
necessarily an error. These error messages can end up spamming
the ceph-mgr log.
Ilya Dryomov [Thu, 24 Oct 2019 15:35:23 +0000 (17:35 +0200)]
krbd: retry on an empty list from udev_enumerate_scan_devices()
systemd 219 doesn't have the issue that is worked around in the
previous commit, but has a different one: udev_enumerate_scan_devices()
always succeeds, but sometimes returns an empty list when the device is
actually there. This happens rarely and at random so I haven't been
able to get to the bottom of it yet, but it looks like another similar
race condition in libudev.
Since an empty list is expected if the device isn't there, retry just
twice with a small sleep in-between. This appears to be enough: I got
7 occurrences per 600000 "rbd unmap" invocations, all of which needed
a single retry:
Ilya Dryomov [Mon, 7 Oct 2019 13:32:39 +0000 (15:32 +0200)]
krbd: retry on transient errors from udev_enumerate_scan_devices()
udev_enumerate_scan_devices() doesn't handle disappearing devices well.
If called while some devices are being removed, it sometimes propagates
ENOENT and ENODEV errors encountered operating on directory entries in
/sys that no longer exist. Some of these errors are suppressed, but
this isn't reliable and varies across versions. In particular, systemd
239 suppresses ENODEV from sd_device_new_from_syspath() but doesn't
suppress ENODEV from sd_device_get_devnum(). In systemd 243 the call
to sd_device_get_devnum() has been moved, but it still leaks ENOENT
from sd_device_get_is_initialized() (referring to the body of
FOREACH_DIRENT_ALL loop in enumerator_scan_dir_and_add_devices()).
Assume that all ENOENT and ENODEV errors are transient and retry the
call to udev_enumerate_scan_devices(). Don't limit the number, but log
each retry.
Ilya Dryomov [Mon, 14 Oct 2019 10:40:43 +0000 (12:40 +0200)]
krbd: increase udev netlink socket receive buffer to 2M
Even though with the previous commit we no longer block between binding
the socket and starting handling events, we still want a larger receive
buffer to accommodate for scheduling delays. Since the filtering is
done in the listener, an estimate focused on just rbd is not accurate,
but anyway: a pair of "rbd" and "block" events for "rbd map" take 2048
bytes in the receive buffer. This allows for roughly a thousand of
them ("rbd map" and "rbd unmap" require root and libudev makes use of
SO_RCVBUFFORCE so rmem_max limit is ignored).
Because the event(s) we are interested in can be deliveled while we are
still in the kernel finishing map or unmap, we start listening for udev
events before going into the kernel. However, if (un)mapping takes its
time, udev netlink socket can be fairly easily overrun -- the filtering
is done on the listener side, so we get to process everything, not just
rbd events. If any of the events of interest get dropped (ENOBUFS), we
hang in poll().
Go into the kernel in a separate thread and leave the main thread to
run the event loop. The return value is communicated to the reactor
though a pipe.
Ilya Dryomov [Thu, 10 Oct 2019 08:49:17 +0000 (10:49 +0200)]
krbd: separate event reaping from event processing
Move event processing into UdevMapHandler and UdevUnmapHandler
functors and replace wait_for_udev_{add,remove}() with a single
wait_for_mapping() template.
This timeout was added as a (very poor) workaround for an issue
addressed in commit 42dd1eae630f ("krbd: fix rbd map hang due to udev
return subsystem unordered").
krbd: fix rbd map hang due to udev return subsystem unordered
The order of subsystem returned by udev_device_get_subsystem
might not be same order as adding subsystem by
udev_monitor_filter_add_match_subsystem_devtype. So if block
event is returned first and rbd event is returned next, then
further poll will get nothing back until timed-out.
Nathan Cutler [Thu, 31 Oct 2019 16:58:46 +0000 (17:58 +0100)]
tests/ceph-disk: drop ceph-detect-init test
This commit fixes an issue with a commit that was cherry-picked into luminous
from mimic.
17bc3dc73a14701f5f6541245955bdd343ffbee2 cherry-picked ceph-detect-init.yaml
from mimic. In mimic, this test works fine because all the supported distros use
systemd. But in luminous we support Ubuntu 14.04 which still uses Upstart
instead of systemd.
Yan, Zheng [Fri, 21 Jun 2019 08:24:51 +0000 (16:24 +0800)]
mds: cleanup truncating inodes when standby replay mds trim log segments
Standby replay mds first trims expired log segments, then replays new
log segments. It's possible a 'truncate_start' log event is in expired,
but its 'truncate_finish' counterpart is the new log segments. When mds
replays the 'truncate_finish' log event, log segment that contains the
'truncate_start' is already trimmed, so mds does nothing. This causes
leak of Inode::PIN_TRUNCATING and trigger assertion when removing
corresponding inode.
Kefu Chai [Fri, 3 Aug 2018 09:27:20 +0000 (17:27 +0800)]
qa/suites/fs: add python3-cephfs to packages
the default set of packages to install is in
$suite/qa/packages/packages.yaml . see get_package_list() in
teuthology/teuthology/task/install/__init__.py for how we prepare a
package list for install task.
for running python3 tests in
fs/basic_functional/tasks/volume-client, we need to install
python3-cephfs. please note that,
_package_override() in teuthology/teutholoy/task/install/rpm.py will
take care of the different naming on centos/rhel, where the python3
packages are named python34-*.
task.install.rpm installs packages listed in
$suites/qa/packages/packages.yaml, the packge list applies to the
upgrade tests also. but we don't have python3 bindings packages in jewel
-- they were introduced in kraken.
Boris Ranto [Thu, 24 Oct 2019 14:54:05 +0000 (16:54 +0200)]
restful: Query nodes_by_id for items
The node dict that is passed to the _gather_leaf_ids function from the
_gather_osds function does not have 'items' in it. We also can't use
buckets at this point since those only exist for leaf nodes, not all
nodes.
We need to query the nodes_by_id dict to get 'items' for a node inside
the _gather_leaf_ids function instead.