Kefu Chai [Tue, 25 May 2021 06:31:02 +0000 (14:31 +0800)]
mon/OSDMonitor: drop stale failure_info even if can_mark_down()
in a124ee85b03e15f4ea371358008ecac65f9f4e50, we add a check to drop
stale failure_info reports. but if osdmap does not prohibit us from
marking the osd in question down, the branch checking the stale info
is not executed. in general, it is allowed to mark an osd down, so
the fix of a124ee85b03e15f4ea371358008ecac65f9f4e50 just fails to
work.
in this change, we check for stale failure report of osd in question
as long as the osd is not marked down in the same function. this should
address the slow ops of failure report issue.
client: Fix executeable access check for the root user
Executeable permission check always returned sucessful
even when executeable bit is not set on any of the user,
group or others. This patch fixes it by overiding
executeable permission check for root only if one of
the executeable bit is set
Conflicts:
src/client/Client.cc: The commit 6aa78836548f (cephfs errno aliases) is not present in
nautilus and some other trivial conflict, may be because some patches are missing
in nautilus.
we should not proceed against user's will if dual stack is specified but
only one network for a network family can be found. the right fix is
have better error message and documentation, not to tolerate the
failure.
Matthew Oliver [Mon, 10 Aug 2020 04:46:21 +0000 (04:46 +0000)]
pick_address: Warn and continue when you find at least 1 IPv4 or IPv6 address
Currently if specify a single public or cluster network, yet have both
`ms bind ipv4` and `ms bind ipv6` set daemons crash when they can't find
both IPs from the same network:
unable to find any IPv4 address in networks '2001:db8:11d::/120' interfaces ''
And rightly so, of course it can't find an IPv4 network in an IPv6
network.
This patch, adds a new helper method, networks_address_family_coverage,
that takes the list of networks and returns a bitmap of address families
supported.
We then check to see if we have enough networks defined and if you don't
it'll warn and then continue.
Also update the network-config-ref to mention having to define both
address family addresses for cluster and or public networks.
As well as a warning about `ms bind ipv4` being enabled by default which
is easy to miss, there by enabling dual stack when you may only be
expect single stack IPv6.
Thee is also a drive by to fix a `note` that wan't being displayed due
to missing RST syntax.
Signed-off-by: Matthew Oliver <moliver@suse.com> Fixes: https://tracker.ceph.com/issues/46845 Fixes: https://tracker.ceph.com/issues/39711
(cherry picked from commit 9f75dfbf364f5140b3f291e0a2c6769bc3d8cbac)
Dan van der Ster [Thu, 29 Apr 2021 23:06:17 +0000 (01:06 +0200)]
mgr/progress: ensure progress stays between [0,1]
If _original_pg_count is 0 then progress can be negative.
Fixes: https://tracker.ceph.com/issues/50591 Related-to: https://tracker.ceph.com/issues/50587 Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
(cherry picked from commit 20990a94598d0249745e2ec25c9197d842119d92)
os/FileStore: fix to handle readdir error correctly
Currently filestore code does not handle readdir error.
As man readdir(3) says, we need to check errno after readdir
returns NULL to determine if error happens or not.
This patch fixes the all readdir() calls to check errono and
handle it appropriately:
- FileStore.cc ... abort if EIO error happens
- BtrfsFileStoreBAckend.cc/LFNindex.cc
... return error to upper layer
Without this fixes, primary PG could fail to correctly perform
backfill operation and could lead data loss propagation described
in #50558.
Kefu Chai [Thu, 11 Mar 2021 13:13:13 +0000 (21:13 +0800)]
mon/OSDMonitor: drop stale failure_info
failure_info keeps strong references of the MOSDFailure messages
sent by osd or peon monitors, whenever monitor starts to handle
an MOSDFailure message, it registers it in its OpTracker. and
the failure report messageis unregistered when monitor acks them
by either canceling them or replying the reporters with a new
osdmap marking the target osd down. but if this does not happen,
the failure reports just pile up in OpTracker. and monitor considers
them as slow ops. and they are reported as SLOW_OPS health warning.
in theory, it does not take long to mark an unresponsive osd down if
we have enough reporters. but there is chance, that a reporter fails
to cancel its report before it reboots, and the monitor also fails
to collect enough reports and mark the target osd down. so the
target osd never gets an osdmap marking it down, so it won't send
an alive message to monitor to fix this.
in this change, we check for the stale failure info in tick(), and
simply drop the stale reports. so the messages can released and
marked "done".
will add a trim failures call in the loop, which mutates failure_info,
while we are still iterating this map. so have to restructure the loop
a little bit.
Mark Houghton [Tue, 3 Nov 2020 15:14:06 +0000 (15:14 +0000)]
rgw: rename variable for clarity
Signed-off-by: Mark Houghton <mhoughton@microfocus.com>
(cherry picked from commit 5d22b7d29a25db4f648daf0c51be74702d4149a2) Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Conflicts:
src/rgw/rgw_op.cc
Mark Houghton [Tue, 3 Nov 2020 11:10:04 +0000 (11:10 +0000)]
rgw: fix RGWDeleteMultiObj::verify_permission
Signed-off-by: Mark Houghton <mhoughton@microfocus.com>
(cherry picked from commit ba23750bea89a0e9818887abe62db0efef02fe3a) Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Conflicts:
src/rgw/rgw_op.cc
Mark Houghton [Tue, 20 Oct 2020 16:54:32 +0000 (17:54 +0100)]
rgw: Honour governance retention override in multi-object delete.
Allow governance retention to be overridden by a suitably privileged user.
Fixes: http://tracker.ceph.com/issues/47586 Signed-off-by: Mark Houghton <mhoughton@microfocus.com>
(cherry picked from commit 6989da1bcbe59e4d561c9d16f0ff891f6c6ef567) Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Conflicts:
src/rgw/rgw_op.cc
Mark Houghton [Thu, 15 Oct 2020 11:13:50 +0000 (12:13 +0100)]
rgw: Check S3 object lock date in multi-object delete
Multi-object delete (via the S3 API) will now check each object's retention date in the same way as single object delet does.
Fixes: http://tracker.ceph.com/issues/47586 Signed-off-by: Mark Houghton <mhoughton@microfocus.com>
(cherry picked from commit 1a3f08550813e719b34a8133b83eefa97dd43d3a) Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Conflicts:
src/rgw/rgw_common.h
src/rgw/rgw_common.cc
src/rgw/rgw_op.cc
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Kefu Chai [Thu, 21 Feb 2019 14:54:30 +0000 (22:54 +0800)]
pybind: set language_level for cythonize explicitly
Compiling rbd.pyx because it changed.
[1/1] Cythonizing rbd.pyx
/usr/lib/python2.7/dist-packages/Cython/Compiler/Main.py:367:
FutureWarning: Cython directive 'language_level' not set, using 2 for
now (Py2). This will change in a later re
lease! File: /var/ssd/ceph/src/pybind/rbd/rbd.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
J. Eric Ivancich [Wed, 14 Apr 2021 17:55:22 +0000 (13:55 -0400)]
rgw: during reshard lock contention, adjust logging
When RGW fails to get a lock on a reshard log, we log it in such a way
that it looks like an error. Instead we'll make sure that the log
message is informational.
Xiubo Li [Wed, 21 Apr 2021 13:00:19 +0000 (21:00 +0800)]
mds: do not trim the inodes from the lru list in standby_replay
In standby_replay, if some dentries just added/linked but not get a
chance to replay the EOpen journals followed, if the upkeep_main() is
excuted, which will may trim them out immediately. Then when playing
the EOpen journals later the replay will fail.
In standby_replay, let's skip trimming them if dentry's linkage inode
is not nullptr.
mon: Modifying trim logic to change paxos_service_trim_max dynamically
Currently, the Paxos Service trim logic is bounded by a max value (paxos_service_trim_max). This change dynamically modifies the max value when the number of logs to be trimmed is higher than paxos_service_trim_max.
The paxos_service_trim_max_multiplier has been added in case we want to increase paxos_service_trim_max by a certain factor. If this option is enabled we get a new upper bound when trim sizes are high.
mon: Adding variables for Paxos trim
1. Define variables for paxos_service_trim_min and paxos_service_trim_max.
2. Use them in place of g_conf()→paxos_service_trim_min and g_conf()→paxos_service_trim_max
Jose Castro Leon [Mon, 15 Jun 2020 13:46:55 +0000 (15:46 +0200)]
pybind/ceph_volume_client: Handles purge of unicode directory names in python2.7
This change is not cherry-picked from master, because master has dropped
python2 support. And this change is to print the path of "unicode" type in
python2.
Fixes: https://tracker.ceph.com/issues/45997 Signed-off-by: Jose Castro Leon <jose.castro.leon@cern.ch>
Kefu Chai [Tue, 30 Mar 2021 18:32:38 +0000 (02:32 +0800)]
mgr/PyModule: put mgr_module_path before Py_GetPath()
pip comes with _vendor/progress. so there is chance to import the vendored
version of "progress" module instead of the "progress" mgr module, and
fail to import the latter.
in this change, the order of paths are rearranged so the configured
`mgr_module_path` is put before the return value of `Py_GetPath()`.
Conflicts:
src/mgr/PyModule.cc
- nautilus has a preprocessor directive "#if PY_MAJOR_VERSION >= 3"
which is not there in master
- since we still need to support python2, apply the same change to
the #else branch at line 351