Alex Zhang [Sun, 29 Sep 2019 09:33:58 +0000 (02:33 -0700)]
common: Fix multiple logical errors in get_device_id.
0. If blkdev.serial exists, the serial should be used. The original impl seems wrong (if serial does not exist, then use the value from the uninitialized buffer, or even worse, use the value from the last call (model))
1. When using fallback methods, device id should only be returned when both model and serial are present. The original impl looks like a logical error.
Kefu Chai [Wed, 29 May 2019 09:45:35 +0000 (17:45 +0800)]
common/blkdev.c: check retval of snprintf()
as snprintf()'ed string could be truncated, to properly use this
function, we need to check its return value.
to silence warning like
../src/common/blkdev.cc: In member function ‘int64_t
BlkDev::get_string_property(blkdev_prop_t, char*, size_t) const’:
../src/common/blkdev.cc:165:15: warning: ‘%s’ directive output may be
truncated writing up to 4095 bytes into a region of size between 4085
and 4089 [-Wformat-truncation=]
165 | "%s/block/%s/%s", sysfsdir(), dev, propstr);
| ^~
In file included from /usr/include/stdio.h:873,
from /usr/include/c++/9/cstdio:42,
from /usr/include/c++/9/ext/string_conversions.h:43,
from /usr/include/c++/9/bits/basic_string.h:6493,
from /usr/include/c++/9/string:55,
from /usr/include/c++/9/bits/locale_classes.h:40,
from /usr/include/c++/9/bits/ios_base.h:41,
from /usr/include/c++/9/ios:42,
from /usr/include/c++/9/ostream:38,
from /usr/include/c++/9/iterator:64,
from
/opt/ceph/include/boost/iterator/iterator_traits.hpp:10,
from
/opt/ceph/include/boost/range/iterator_range_core.hpp:26,
from
/opt/ceph/include/boost/algorithm/string/replace.hpp:16,
from ../src/common/blkdev.cc:31:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:67:35: note:
‘__builtin___snprintf_chk’ output 9 or more bytes (assuming 4108) into a
destination of size 4096
67 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL
- 1,
|
~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
68 | __bos (__s), __fmt, __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Venky Shankar [Wed, 26 Feb 2020 04:52:37 +0000 (23:52 -0500)]
mgr/volumes: unregister job upon async threads exception
If the async threads hit a temporary exception the job is
never unregistered and therefore gets skipped by the async
threads on subsequent scans.
Patrick hit this in nautilus when one of the purge threads
hit an exception when trying to log a message. The trash
entry was never picked up again by the purge threads.
Patrick Donnelly [Tue, 25 Feb 2020 04:26:30 +0000 (20:26 -0800)]
Merge PR #33526 into nautilus
* refs/pull/33526/head:
test: verify purge queue w/ large number of subvolumes
test: pass timeout argument to mount::wait_for_dir_empty()
mgr/volumes: access volume in lockless mode when fetching async job
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Venky Shankar [Wed, 19 Feb 2020 12:31:40 +0000 (07:31 -0500)]
mgr/volumes: access volume in lockless mode when fetching async job
Saw a deadlock when deleting lot of subvolumes -- purge threads were
stuck in accessing global lock for volume access. This can happen
when there is a concurrent remove (which renames and signals the
purge threads) and a purge thread is just about to scan the trash
directory for entries.
For the fix, purge threads fetches entries by accessing the volume
in lockless mode. This is safe from functionality point-of-view as
the rename and directory scan is correctly handled by the filesystem.
Worst case the purge thread would pick up the trash entry on next
scan, never leaving a stale trash entry.
Yaarit Hatuka [Mon, 27 Jan 2020 13:57:55 +0000 (08:57 -0500)]
mgr/devicehealth: fix telemetry stops sending device reports after 48 hours
Telemetry module fetches device metrics which were scraped in the last
"telemetry interval"*2 (=48 hours by default) by calling
_get_device_metrics() with min_sample. _get_device_metrics() fetches the
metrics from omap and breaks on the first one that is older than
min_sample. But because it fetched in ascending order (from oldest to
newest) it was breaking on the first one it received, if it was older
than the interval above. We need to pass min_sample to get_omap_vals()
so it will start fetching from that value.
Sage Weil [Fri, 4 Oct 2019 20:03:02 +0000 (15:03 -0500)]
mgr/devicehealth: factor _get_device_metrics out of show_device_metrics
Add the min_sample lower-bound argument too
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 7be5c1323b3814e2634d5cd66d45cab5a77df680)
Conflicts: had to be backported to enable backporting of
https://github.com/ceph/ceph/pull/32903
Backport tracker: https://tracker.ceph.com/issues/43873
Tiago Pasqualini [Fri, 31 Jan 2020 18:22:19 +0000 (15:22 -0300)]
rgw: make max_connections configurable in beast
Beast frontend currently accepts a hardcoded number of connections
that is defined by boost::asio::socket_base::max_connections. This
commit makes it configurable via a 'max_connections' config option
on rgw frontend.
Kefu Chai [Mon, 10 Feb 2020 08:27:22 +0000 (16:27 +0800)]
ceph-monstore-tool: rename mon-ids in initial monmap
when ceph-mon starts, it checks to see if it's listed in the monmap, if
not it complains
```
no public_addr or public_network specified, and mon.a not present in
monmap or ceph.conf.
```
then bails out. normally, the monitor will try to rename its name in
monmap when performing "mkfs", but in our case, we are merely using the
"mkfs" monmap for passing the monmap built by ceph-monstore-tools, and
we don't actually go through the "mkfs" process. so, ceph-mon won't
rename when booting up.
in this change, user is allowed to specify the mon-ids in command line
when rebuilding mondb, the default mon-ids would be a,b,c,... if not
specified.
David Zafman [Fri, 6 Dec 2019 17:01:41 +0000 (09:01 -0800)]
test: run-standalone.sh: Only run execs in the subdirectories of qa/standalone
This will ignore scripts placed at the qa/standalone level, though
I'm not sure if we should be putting any tests there. It does
allow support scripts present like ceph-helper.sh without modifying
run-standalone.sh to ignore it.
the head object for a multipart part should contain the entire stripe,
unlike a normal object where the head only contains the first chunk of
data (because it has to be written atomically)
Casey Bodley [Tue, 7 Jan 2020 18:30:51 +0000 (13:30 -0500)]
rgw: remove spawned_keys filter from incremental data sync
the spawned_keys filtering is valid "as long as we don't yield",
according to code comments. however, proper enforcement of the
spawn window necessitates yielding when we exceed that window
the key-based filtering provided by spawned_keys is actually already
satisfied by the call to marker_tracker->index_key_to_marker(), which
also takes completions (either from try_update_high_marker() or
finish()) into account
Casey Bodley [Tue, 7 Jan 2020 18:28:19 +0000 (13:28 -0500)]
rgw: incremental data sync respects spawn window
RGWReadRemoteDataLogShardCR will fetch up to 1000 entries. in order for
the spawn window to apply correctly, it has to be enforced inside the
loop over those entries
ofriedma [Tue, 3 Dec 2019 14:11:35 +0000 (16:11 +0200)]
rgw: Fix dynamic resharding not working for empty zonegroup in period
Sometimes when cluster has been upgraded from jewel, the period's zonegroup could be empty, so no dynamic resharding.
This fix should fix it and return true for less than 1 (0) zonegroup in period
Fixes: https://tracker.ceph.com/issues/43188 Signed-off-by: Or Friedmann <ofriedma@redhat.com>
(cherry picked from commit a76e4393728c3e74a943b635d2ac0652e0cc092a)
The lvm batch command fails to prepare the OSDs on the created LV.
When using lvm batch, the LV/VG are created prior the OSD prepare.
During that creation, multiple tags are set with null value.