xie xingguo [Thu, 12 Mar 2020 23:59:07 +0000 (07:59 +0800)]
qa/short_pg_log: pass osd_pg_log_trim_min = 0 to exercise short pg logs
we have osd_min_pg_log_entries to 2 (good) but not osd_pg_log_trim_min
which defaults to 100. Thus, even on those tests we're only rarely vulnerable.
Reset osd_min_pg_log_entries to 0 to make sure we really
keep a minimal pg log in hand.
xie xingguo [Thu, 12 Mar 2020 10:01:45 +0000 (18:01 +0800)]
osd/PeeringState: do not trim pg log past last_update_ondisk
Trimming past last_update_ondisk would be really bad, e.g.,
a new interval change would cancel&redo a previous op, and if
we trim past last_update_ondisk, there could be potential
object inconsistencies as log merging won't necessarily be able
to find all divergent entries later (we lost track of the unfinished
op that should really be reverted).
Kefu Chai [Wed, 6 May 2020 07:48:12 +0000 (15:48 +0800)]
qa/suites/upgrade: disable min pg per osd warning
disable the TOO_FEW_PGS warning, as 1ac34a5ea3d1aca299b02e574b295dd4bf6167f4 is not backported to mimic, we
will have TOO_FEW_PGS warnings when a healthy cluster is expected when
upgrading from mimic.
this change disables this warning by setting "mon_pg_warn_min_per_osd" to
"0".
this change is not cherry-picked from master. as 1ac34a5ea3d1aca299b02e574b295dd4bf6167f4 is already included by master,
and we don't perform upgrade from mimic on master branch.
qa/tasks: do not cancel pending pg num changes on mimic
mimic does not support auto split/merge, but we do test mimic-x on
nautilus, which ends up with failures like:
ceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_git_teuthology_py2/teuthology/contextutil.py", line 34, in nested
yield vars
File "/home/teuthworker/src/git.ceph.com_ceph_nautilus/qa/tasks/ceph.py", line 1928, in task
ctx.managers[config['cluster']].stop_pg_num_changes()
File "/home/teuthworker/src/git.ceph.com_ceph_nautilus/qa/tasks/ceph_manager.py", line 1806, in stop_pg_num_changes
if pool['pg_num'] != pool['pg_num_target']:
KeyError: 'pg_num_target'
so we need to skip this if 'pg_num_target' is not in pg_pool_t::dump().
this change is not cherry-picked from master, as we don't test
mimic-x on master.
Also, mon_pg_warn_min_per_osd is disabled by default now (or set to a
low value in vstart/testing) so there's no need to base the pg count on
this value.
Ideally someday we can remove this so that the default cluster value is
used but we need to keep this for deployments of older versions of Ceph.
Fixes: https://tracker.ceph.com/issues/42228 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit fc88e6c6c55402120a432ea47f05f321ba4c9bb1)
Conflicts:
qa/tasks/cephfs/filesystem.py: this commit was orignally
backported by #34055, but it failed to cherry-pick all necessary
bits. in this change, the missing bit is picked up.
Ramana Raja [Tue, 14 Apr 2020 11:13:33 +0000 (16:43 +0530)]
mon/FSCommands: Fix 'fs new' command
After creating a filesystem using the 'fs new' command, the value
of the 'data' and 'metadata' key of the datapool and metadatapool's
application tag 'cephfs' should be the filesystem's name. This
didn't happen when the data or metadata pool's application metadata
'cephfs' was enabled before the pool was used in the 'fs new' command.
Fix this during the handling of the 'fs new' command by setting the
value of the key of the pool's application metadata 'cephfs' to the
filesystem's name even when the application metadata 'cephfs' is
already enabled or set.
Ramana Raja [Sat, 11 Apr 2020 07:15:39 +0000 (12:45 +0530)]
mon/FSCommands: Fix 'add_data_pool' command
After making a RADOS pool a filesystem's data pool using the
'add_data_pool' command, the value of the 'data' key of the pool's
application metadata 'cephfs' should be the filesystem's name. This
didn't happen when the pool's application metadata 'cephfs' was
enabled before the pool was made the data pool. Fix this during the
handling of the 'add_data_pool' command by setting the value of
the 'data' key of the pool's application metadata 'cephfs' to the
filesystem's name even when the application metadata 'cephfs' is
already enabled or set.
Yan, Zheng [Tue, 31 Mar 2020 03:29:47 +0000 (11:29 +0800)]
ceph-fuse: don't get mount options from /etc/fstab when doing remount
If there happen to be an kcephfs entry in /etc/fstab for ceph-fuse's
mount point. 'mount -o remount' may get options from that entry. fuse
may not understand some options (E.g name option).
Jason Dillaman [Wed, 11 Mar 2020 19:11:10 +0000 (15:11 -0400)]
qa/workunits/rbd: wait for nbd map to close after unmap
The unmap action only sends a signal to the kernel to notify the
rbd-nbd daemon to disconnect. Therefore, it's possible that an
unmap followed by an immediate re-map to the same device might
fail since the unmap is still in-progress.
Fixes: https://tracker.ceph.com/issues/44567 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 80a3f18cafb4add1624cc690bba436a1284dc634)
rgw: reshard: skip stale bucket id entries from reshard queue
If we encounter a reshard queue entry that has an older ID compared to the
bucket's current ID, it'd mean that some other process or a manual reshard has
already processed this entry, skip processing the entry this time. An
alternative is to verify the num_shards that we have in queue >= the current
shards, but this would mean that we may reshard a recently manual resharded
bucket again which might not be intended
Venky Shankar [Fri, 27 Mar 2020 04:00:08 +0000 (00:00 -0400)]
mgr: force purge normal ceph entities from service map
Normal ceph services can send task status updates to manager.
Task status is tracked in service map implying that normal
ceph services have entries in service map and daemon tracking
index (daemon state). But the manager prunes entries from daemon
state when it receives an updated map (fs, mon, etc...). This
causes periodic pruning of service map entries to fail for normal
ceph services (those which send task status updates) since it
expects a corresponding entry in daemon state.
Jan Fajerski [Wed, 4 Mar 2020 10:39:40 +0000 (11:39 +0100)]
ceph-volume: available_lvm: vg space takes precedence
This changes available_lvm to check for generic reasons only if no VGs
were found. A VG can contain a (mounted) lv, which triggers the
ro/locked test, despite the VG having space available.
Volker Theile [Wed, 6 Nov 2019 14:02:49 +0000 (15:02 +0100)]
mgr/dashboard: Refactor Python unittests
* Make use of the KVStoreMockMixin class to get rid off duplicate code.
* Fake the index.html file to be able to run tests/test_home.py locally without building the frontend in production mode.
* Encapsulate helper functions in controllers/home.py, otherwise tests/test_feature_toggles.py need to fake the filesystem because load_controllers() will load the home.py controller and fail due missing files in the filesystem.
in tasks/module_selftest.yaml, `TestModuleSelftest.test_telegraf()` is
called. but we fail to prepare a unix domain socket to which the telegraf
module can send stats. and telegraf module does not catch
FileNotFoundError exception, so the exception is populated to ceph-mgr
and is found by the test, hence the test is marked a failure whenever
telegraf is tested.
in this change,
* catch this exception, so it won't be caught by ceph-mgr
* whitelist the error message, so the test can pass
Tim Serong [Thu, 2 Apr 2020 05:46:57 +0000 (16:46 +1100)]
mgr/PyModule: fix missing tracebacks in handle_pyerror()
In certain cases, errors raised in mgr modules don't actually result in a
proper traceback in the mgr log; all you see is a message like "'Hello'
object has no a ttribute 'dneasdfasdf'", but you have no idea where that
came from, which is a complete PITA to debug.
Here's what's going on: handle_pyerror() calls PyErr_Fetch() to get
information about the error that occurred, then passes that information
back to python's traceback.format_exception() function to get the traceback.
If we write code in an mgr module that explicitly raises an exception
(e.g.: 'raise RuntimeError("that didn't work")'), the error value returned
by PyErr_Fetch() is of type RuntimeError, and traceback.format_exception()
does the right thing. If however we accidentally write code that's just
broken (e.g.: 'self.dneasdfasdf += 1'), the error value returned is not
an actual exception, it's just a string. So traceback.format_exception()
freaks out with something like "'str' object has no attribute '__cause__'"
(which we don't actually ever see in the logs), which in turn dumps us in a
"catch (error_already_set const &)" block, which just prints out the
single line error string.
https://docs.python.org/3/c-api/exceptions.html#c.PyErr_NormalizeException
tells us that "Under certain circumstances, the values returned by
PyErr_Fetch() below can be “unnormalized”, meaning that *exc is a class
object but *val is not an instance of the same class.". And that's exactly
the problem we're having here. We're getting a 'str', not an Exception.
Adding a call to PyErr_NormalizeException() turns the value back into a
proper Exception type and traceback.format_exception() now always does the
right thing.
I've also added calls to peek_pyerror() in the catch blocks, so if anything
else ever somehow causes traceback.format_exception to fail, we'll at least
have an idea of what it is in the log.
Sage Weil [Wed, 11 Sep 2019 22:26:52 +0000 (17:26 -0500)]
mon: disable min pg per osd warning
Now that the pg_autoscaler is on by default, it is "normal" (and okay) to
have a small number of PGs in the cluster if the overall cluster usage is
also low. This setting just results in a health warning out of the box
when you create a pool and haven't written any data yet.
J. Eric Ivancich [Fri, 31 Jan 2020 20:01:40 +0000 (15:01 -0500)]
rgw: fix bug with (un)ordered bucket listing and marker w/ namespace
When listing without specifying a namespace, the returned entries
could be in one or more namespaces. The marker used to continue the
listing may therefore contain a namespace, and that needs to be
preserved. This fixes a bug in both ordered and unordered listings
where it was not preserved.
Adam Kupczyk [Tue, 17 Mar 2020 13:56:58 +0000 (14:56 +0100)]
os/bluestore: open db in read-only when expanding standalone DB/WAL
Replace mount() with cold_open() to force opening db in read-only mode.
Signed-off-by: Adam Kupczyk <akupczyk@redhat.com> Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 976079216cec202d47b7193d19b16f5a8578f269)
Conflicts:
src/os/bluestore/BlueStore.cc
Cause by the lack of https://github.com/ceph/ceph/pull/30109
Igor Fedotov [Mon, 16 Mar 2020 13:40:54 +0000 (16:40 +0300)]
os/bluestore: show space available from bluestore when dumping bluefs devices.
This effectively allows ceph-bluestore-tool to report space available from bluestore
for bluefs-bdev-sizes command, i.e. estimate how utilized main device is.
Shilpa Jagannath [Tue, 26 Nov 2019 08:03:52 +0000 (13:33 +0530)]
rgw: when a period lookup for oldest_realm_epoch returns an ENOENT,
find the oldest one and update RGWMetadataLogHistory. This is to avoid an
empty cursor being passed in to ceph_assert() in PurgePeriodLogsCR::operate()
in case of incomplete period history.