Ronen Friedman [Sat, 28 Oct 2023 16:42:34 +0000 (11:42 -0500)]
osd/scrub: do not clear PG_STATE_REPAIR unconditionally
As we now call clear_pgscrub_state() at the end of each
'Session' state, we must not clear PG_STATE_REPAIR
unconditionally.
Previously - scrubs that reached normal completion, i.e.
reached PgScrubber::scrub_finish(), would have only cleared
that PG flag under specific conditions. That was changed in
previous commits of this PR, and is now fixed.
Ronen Friedman [Fri, 13 Oct 2023 17:14:31 +0000 (12:14 -0500)]
osd/scrub: move ReplicaReservations into the Scrubber FSM
Handle grant/deny messages within the FSM.
One exception at this point: the handling of "granted by everyone"
(due to the technical inconvenience of having to handle the
"0 replicas" case in the FSM state constructor).
Note: after this commit, ScrubMachineListener - an API which is
a subset of the Scrubber API to be used by the Scrubber FSM - does
no longer make sense. The FSM should now have full access to the
scrubber, and that interface will be removed in a subsequent PR.
Ronen Friedman [Fri, 13 Oct 2023 12:48:44 +0000 (07:48 -0500)]
osd/scrub: route grant/deny messages through the scrubber FSM
The scrubber FSM will now be responsible for handling the grant/deny
ops received from the replica OSDs.
For this temporary step - the scrubber FSM will simply forward a
call to the ReplicaReservations object in the Scrubber.
osd/scrub: reserve replicas one by one, and in consistent order
Issuing the reservation requests one by one - waiting for
approval from the secondary before the next request is sent.
The requests are sent in ascending target pg-shard-id order, reducing the
chance of having two PGs repeatedly competing for the same set of
OSDs - and doing so in an interleaved sequence.
Modifying the Session state in the scrubber FSM to react to interval
changes by discarding replica reservations.
Ronen Friedman [Mon, 2 Oct 2023 16:29:51 +0000 (11:29 -0500)]
osd/scrub: group all scrub session states into a Session state
The Session state now includes the ReservingReplicas & Active
sub-states.
This new state will hold (in future commits) most of the scrub
state information that relates to a specific scrub session (and
should be cleaned up when that session terminates).
qa/suites/rbd: disable POOL_APP_NOT_ENABLED health check
Commit 990806e635a1 ("mon, qa: issue pool application warning even
if pool is empty") made it impossible to create a pool without raising
a (bogus) health alert. See [1] for details.
Patrick Donnelly [Thu, 21 Sep 2023 15:51:31 +0000 (11:51 -0400)]
Merge PR #50503 into main
* refs/pull/50503/head:
mon: do not change pending if strategy is unchanged
mon/MonmapMonitor: do not propose on error in prepare_update
mon/MonmapMonitor: wait for commit before reply
mon: use wait_for_commit to reply
mon: add context list for commit wait
mon: remove unused method
test/mon: add commit benchmark script
mon/MonClient: provide config to target specific rank
Reviewed-by: Laura Flores <lflores@redhat.com> Reviewed-by: Ramana Raja <rraja@redhat.com>
osd/scrub: modify schedule_result_t to report error class
(which directly translates to the required followup action)
instead of reporting the exact failure. The specific of the failure
were never used by the scrub scheduler.
osd/scrub: correct placement for some scheduler-related methods
Moving some member functions to their corresponding files.
Including ScrubQueue::dump_scrubs()
as it was moved in a previous commit,
and some ScrubJob code.
At this phase of the refactoring:
this is the main interface from the scrub scheduler in OsdScrub
to the ScrubQueue. The ScrubQueue provides the ordered list of
all targets (for now - PGs) that are ready for scrubbing.
Scrub initiation code is modified to use the new interface.
For now: OsdScrub is mostly a forwarder to the ScrubQueue object
(which it now owns).
The resource counters moved into a separate object within OsdScrub.
osd/scrub: renaming & fmt support for restrictions structure
Renaming ScrubPreconds, the collection of "environmental"
restrictions on possible scrubs, to OSDRestrictions.
Also - providing fmtlib support for that structure.
John Mulligan [Wed, 6 Sep 2023 20:18:37 +0000 (16:18 -0400)]
doc/cephadm: clarify what cephadm component writes to the cluster log channel
Clarify that the cephadm orchestrator module, a part of the ceph mgr,
logs to the cluster log channel. This prepares for adding a specific
section to cover logging for the cephadm "binary".
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Wed, 6 Sep 2023 18:15:41 +0000 (14:15 -0400)]
cephadm: remember log destination used during bootstrap
Store the log destination(s) specified on the CLI for cephadm bootstrap
as the manager configuration, unless the configuration key is explicitly
set by the input config.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Wed, 6 Sep 2023 17:39:06 +0000 (13:39 -0400)]
mgr/cephadm: add a module option for controlling cephadm log dest
Now that cephadm has multiple possible persistent logging destinations
we need a way to choose which one to use when the command is started by
the mgr. Add the option 'cephadm_log_destination' which can take one
of 'file', 'syslog', or 'file,syslog'. If left unset (empty string)
then the behavior is equivalent to 'file' and that is the same as
previous cephadm versions.
Fixes: https://tracker.ceph.com/issues/62233 Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Tue, 22 Aug 2023 19:11:35 +0000 (15:11 -0400)]
cephadm: add cli option to enable logging to syslog
Add the --log-dest option to cephadm. The --log-dest option can be
specified 0, 1 or more times. If unspecified, cephadm will log to
the default location, the log file. If specified one ore more times,
each instance will enable the named logging destination.
Example:
John Mulligan [Tue, 22 Aug 2023 19:11:16 +0000 (15:11 -0400)]
cephadm: add support for logging to syslog/journal
Add support to logging.py for persistent logging to syslog and thus to
journald. This is accomplished by switching logging handlers depending
on the log_dest attribute of the context. Setting this value is left
for a future patch.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Tue, 22 Aug 2023 16:42:14 +0000 (12:42 -0400)]
cephadm: move colored output support into logging.py
Rewrite cephadm's colored output support such that it abstracts away
the colorization into extra logging metadata. The new code will not
unconditionally put control characters into the log files. It will
only print the control chars if the stderr is a tty.
In theory this is probably more future proof as well, but it's only
got two callers so it is hard to say how useful it'll be.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
The CI appears to be really slow, and even a second of wait for inotify
sometimes fails. Add an exponential backoff wait of up to ~25 seconds
to hopefully make the test pass reliably.
Signed-off-by: Daniel Gryniewicz <dang@redhat.com>
rectify src/auth/cephx/CephxProtocol.cc 1 warning
with the variable 'ch' Used before initialized
auth/cephx/CephxProtocol.cc:595:57: warning: '*((void*)& ch +8)' may be used uninitialized in this function [-Wmaybe-uninitialized]
msg.server_challenge_plus_one = ch.server_challenge + 1;
~~~~~~~~~~~~~~~~~~~~^~~
Patrick Donnelly [Fri, 15 Sep 2023 16:11:37 +0000 (12:11 -0400)]
Merge PR #52199 into main
* refs/pull/52199/head:
mds: continue linking if targeti is temporarily located in stray dir
Revert "mds: wait unlink to finish to avoid conflict when creating same dentries"
Revert "mds: clear the STATE_UNLINKING state when the unlink fails"
Revert "mds: wait reintegrate to finish when unlinking"
Revert "mds: notify the waiters in replica MDSs"
Revert "mds: wait the linkmerge/migrate to finish after unlink"
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
This is the MVP for a driver for RGW that operates on top of a POSIX
filesystem. It supports get, put, list, copy, multipart, external
access via the filesystem itself, and ordered bucket listings via an
LRU-based cache.
Note that this is currently a Filter, indended to run on top of dbstore.
This is because it currently doesn't have any User implementation, so it
depends on dbstore's User. Everything else is implemented in
POSIXDriver. Once there is a User implementation, this will become a
Store, instead of a Filter.
Commit messages from bucket listing cache:
rgw/posixdriver: recycle lmdb database handles as required
While LMDB workflows often do not close/return database handles,
ours continually reuses them. This requires us to close each
handle (atomically) when a cache entry is recycled.
rgw/posixdriver: don't instantiate bucket cache entries from notify events
rgw/posixdriver: incorporate lmdb-safe for now
The current inclusion is based on https://github.com/Martchus/lmdb-safe,
which is actively maintained but currently has some packaging issues the
author has agreed to accept fixes for.
For now, skip the submodule to save time and remove an external dependency.
rgw/posixdriver: fix listing of cached, empty bucket
* check lmdb enumeration result in all cases and w/better style
* add unit test for enumeration of an empty cached directory
rgw/posixdriver: nest lmdbs in a directory under the dbroot path to avoid cleanup issues
rgw/posixdriver: refactor for posix integration
* Derive BucketCache types as templates on a SAL driver and SAL
bucket pair.
* Integrate cache fills as callbacks into SAL layer (or mock, for
tests)
* Renaming and cleanups
rgw/posixdriver: add bucket cache implementation and tests
Adds free-standing cache of buckets and object names, with
bucket names (and listing attributes, upcoming) managed in
a hashed set of lmdb databases, which provides ordering and
a high-performance listing cache.
An framework for notification on new object creation (e.g.,
outside S3 workflow) is provided, and a Linux implementation
using inotify.
FindLMDB.cmake taken with attribution and license.
Signed-off-by: Daniel Gryniewicz <dang@redhat.com> Signed-off-by: Ali Maredia <amaredia@redhat.com> Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>