Sage Weil [Mon, 2 Dec 2019 13:43:54 +0000 (07:43 -0600)]
mon: cap keys in mon_sync messages
The previous cap was set at 1 MB. However, a user was experiencing mon
timeouts while syncing the purged_snap_epoch * keys, which are ~20 bytes
each. Reducing the max payload to 64K resolved the problem, which maps
to (very!) roughly 1500 keys per message. Set our limit a bit higher than
that since we just made this quite a bit more efficient. Most of the time
the keys are larger than 20 bytes and we wouldn't hit the key limit, but
having one ensures that we won't burn too much CPU in one go when we do
have lots of these little keys.
Sage Weil [Tue, 12 Nov 2019 20:51:41 +0000 (14:51 -0600)]
mon/MonitorDBStore: improve get_chunk_tx limits
The old version was horribly inefficient in that it would reencode the
transaction on every iteration.
Instead, estimate the size if we add an item and stop it if looks like it
will go over. This isn't super precise, but it's close enough, since the
limits are approximate.
Drop the single-use helper since it only makes the code harder to
follow.
Brad Hubbard [Tue, 3 Mar 2020 05:58:35 +0000 (15:58 +1000)]
mgr/run-tox-tests: Fix issue with PYTHONPATH
Something changed recently on Bionic which caused tox to fail when
PYTHONPATH is a relative path. For some reason the path is mangled by
the time it gets to pytest so we need to ensure we are using an absolute
path. This seems to be nautilus specific, at least ATM.
Alex Zhang [Sun, 29 Sep 2019 09:33:58 +0000 (02:33 -0700)]
common: Fix multiple logical errors in get_device_id.
0. If blkdev.serial exists, the serial should be used. The original impl seems wrong (if serial does not exist, then use the value from the uninitialized buffer, or even worse, use the value from the last call (model))
1. When using fallback methods, device id should only be returned when both model and serial are present. The original impl looks like a logical error.
Kefu Chai [Wed, 29 May 2019 09:45:35 +0000 (17:45 +0800)]
common/blkdev.c: check retval of snprintf()
as snprintf()'ed string could be truncated, to properly use this
function, we need to check its return value.
to silence warning like
../src/common/blkdev.cc: In member function ‘int64_t
BlkDev::get_string_property(blkdev_prop_t, char*, size_t) const’:
../src/common/blkdev.cc:165:15: warning: ‘%s’ directive output may be
truncated writing up to 4095 bytes into a region of size between 4085
and 4089 [-Wformat-truncation=]
165 | "%s/block/%s/%s", sysfsdir(), dev, propstr);
| ^~
In file included from /usr/include/stdio.h:873,
from /usr/include/c++/9/cstdio:42,
from /usr/include/c++/9/ext/string_conversions.h:43,
from /usr/include/c++/9/bits/basic_string.h:6493,
from /usr/include/c++/9/string:55,
from /usr/include/c++/9/bits/locale_classes.h:40,
from /usr/include/c++/9/bits/ios_base.h:41,
from /usr/include/c++/9/ios:42,
from /usr/include/c++/9/ostream:38,
from /usr/include/c++/9/iterator:64,
from
/opt/ceph/include/boost/iterator/iterator_traits.hpp:10,
from
/opt/ceph/include/boost/range/iterator_range_core.hpp:26,
from
/opt/ceph/include/boost/algorithm/string/replace.hpp:16,
from ../src/common/blkdev.cc:31:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:67:35: note:
‘__builtin___snprintf_chk’ output 9 or more bytes (assuming 4108) into a
destination of size 4096
67 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL
- 1,
|
~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
68 | __bos (__s), __fmt, __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Venky Shankar [Wed, 26 Feb 2020 04:52:37 +0000 (23:52 -0500)]
mgr/volumes: unregister job upon async threads exception
If the async threads hit a temporary exception the job is
never unregistered and therefore gets skipped by the async
threads on subsequent scans.
Patrick hit this in nautilus when one of the purge threads
hit an exception when trying to log a message. The trash
entry was never picked up again by the purge threads.
Conflicts:
src/tools/rbd_mirror/InstanceReplayer.h (ceph::mutex vs Mutex)
src/tools/rbd_mirror/NamespaceReplayer.cc (does not exist)
src/tools/rbd_mirror/PoolReplayer.cc (code from NamespaceReplayer is here)
src/test/rbd_mirror/test_mock_PoolReplayer.cc (accordingly to PoolReplayer.cc changes)
Patrick Donnelly [Tue, 25 Feb 2020 04:26:30 +0000 (20:26 -0800)]
Merge PR #33526 into nautilus
* refs/pull/33526/head:
test: verify purge queue w/ large number of subvolumes
test: pass timeout argument to mount::wait_for_dir_empty()
mgr/volumes: access volume in lockless mode when fetching async job
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Venky Shankar [Wed, 19 Feb 2020 12:31:40 +0000 (07:31 -0500)]
mgr/volumes: access volume in lockless mode when fetching async job
Saw a deadlock when deleting lot of subvolumes -- purge threads were
stuck in accessing global lock for volume access. This can happen
when there is a concurrent remove (which renames and signals the
purge threads) and a purge thread is just about to scan the trash
directory for entries.
For the fix, purge threads fetches entries by accessing the volume
in lockless mode. This is safe from functionality point-of-view as
the rename and directory scan is correctly handled by the filesystem.
Worst case the purge thread would pick up the trash entry on next
scan, never leaving a stale trash entry.
Tatjana Dehler [Fri, 24 Jan 2020 16:02:22 +0000 (17:02 +0100)]
mgr/dashboard: show checkboxes for booleans
The frontend showed textboxes for the dashboard settings because
the actual type information was missing here. The REST API then
returned the default type 'str'.
Edit the e2e test case in order to update a different setting as
the 'editMgrModule' method can't handle checkboxes.
Fixes: https://tracker.ceph.com/issues/43769 Signed-off-by: Tatjana Dehler <tdehler@suse.com>
(cherry picked from commit 7e7cac98116e76bc9e4c52ac47796e3fcd880667)
Yaarit Hatuka [Mon, 27 Jan 2020 13:57:55 +0000 (08:57 -0500)]
mgr/devicehealth: fix telemetry stops sending device reports after 48 hours
Telemetry module fetches device metrics which were scraped in the last
"telemetry interval"*2 (=48 hours by default) by calling
_get_device_metrics() with min_sample. _get_device_metrics() fetches the
metrics from omap and breaks on the first one that is older than
min_sample. But because it fetched in ascending order (from oldest to
newest) it was breaking on the first one it received, if it was older
than the interval above. We need to pass min_sample to get_omap_vals()
so it will start fetching from that value.
Sage Weil [Fri, 4 Oct 2019 20:03:02 +0000 (15:03 -0500)]
mgr/devicehealth: factor _get_device_metrics out of show_device_metrics
Add the min_sample lower-bound argument too
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 7be5c1323b3814e2634d5cd66d45cab5a77df680)
Conflicts: had to be backported to enable backporting of
https://github.com/ceph/ceph/pull/32903
Backport tracker: https://tracker.ceph.com/issues/43873
Tiago Pasqualini [Fri, 31 Jan 2020 18:22:19 +0000 (15:22 -0300)]
rgw: make max_connections configurable in beast
Beast frontend currently accepts a hardcoded number of connections
that is defined by boost::asio::socket_base::max_connections. This
commit makes it configurable via a 'max_connections' config option
on rgw frontend.
Kefu Chai [Mon, 10 Feb 2020 08:27:22 +0000 (16:27 +0800)]
ceph-monstore-tool: rename mon-ids in initial monmap
when ceph-mon starts, it checks to see if it's listed in the monmap, if
not it complains
```
no public_addr or public_network specified, and mon.a not present in
monmap or ceph.conf.
```
then bails out. normally, the monitor will try to rename its name in
monmap when performing "mkfs", but in our case, we are merely using the
"mkfs" monmap for passing the monmap built by ceph-monstore-tools, and
we don't actually go through the "mkfs" process. so, ceph-mon won't
rename when booting up.
in this change, user is allowed to specify the mon-ids in command line
when rebuilding mondb, the default mon-ids would be a,b,c,... if not
specified.
David Zafman [Fri, 6 Dec 2019 17:01:41 +0000 (09:01 -0800)]
test: run-standalone.sh: Only run execs in the subdirectories of qa/standalone
This will ignore scripts placed at the qa/standalone level, though
I'm not sure if we should be putting any tests there. It does
allow support scripts present like ceph-helper.sh without modifying
run-standalone.sh to ignore it.
the head object for a multipart part should contain the entire stripe,
unlike a normal object where the head only contains the first chunk of
data (because it has to be written atomically)
Casey Bodley [Tue, 7 Jan 2020 18:30:51 +0000 (13:30 -0500)]
rgw: remove spawned_keys filter from incremental data sync
the spawned_keys filtering is valid "as long as we don't yield",
according to code comments. however, proper enforcement of the
spawn window necessitates yielding when we exceed that window
the key-based filtering provided by spawned_keys is actually already
satisfied by the call to marker_tracker->index_key_to_marker(), which
also takes completions (either from try_update_high_marker() or
finish()) into account
Casey Bodley [Tue, 7 Jan 2020 18:28:19 +0000 (13:28 -0500)]
rgw: incremental data sync respects spawn window
RGWReadRemoteDataLogShardCR will fetch up to 1000 entries. in order for
the spawn window to apply correctly, it has to be enforced inside the
loop over those entries
ofriedma [Tue, 3 Dec 2019 14:11:35 +0000 (16:11 +0200)]
rgw: Fix dynamic resharding not working for empty zonegroup in period
Sometimes when cluster has been upgraded from jewel, the period's zonegroup could be empty, so no dynamic resharding.
This fix should fix it and return true for less than 1 (0) zonegroup in period
Fixes: https://tracker.ceph.com/issues/43188 Signed-off-by: Or Friedmann <ofriedma@redhat.com>
(cherry picked from commit a76e4393728c3e74a943b635d2ac0652e0cc092a)