Zac Dover [Tue, 19 Nov 2024 00:37:56 +0000 (10:37 +1000)]
doc/start: update os-recommendations.rst
Remove information about the operating systems that support Ceph's
official container images from the "Platforms" table in
doc/start/os-recommendations.rst and add that information to the (new)
table that shows the operating systems that support Ceph's official
container images.
Credit for this change should go to Enrico Bocchi, who noticed a
discrepancy that motivated it.
Oshrey Avraham [Mon, 18 Nov 2024 10:06:22 +0000 (12:06 +0200)]
rgw/notification: fix segmentation fault and topic listing logic
- Fixed a segmentation fault caused by a null bucket pointer in RGWPSListTopicsOp::execute()
- Corrected logic to use get_topics_v2 when supported, with fallback otherwise
mds: client is evicted when an export subtree task is interrupted
The importer will force open some sessions provided by the exporter but the client does not know about
the new sessions until the exporter notifies it, and the notifications cannot be sent if the exporter
is interrupted. The client does not renew the sessions regularly that it does not know about, so the client
will be evicted by the importer after `session_autoclose` seconds (300 seconds by default).
The sessions that are forced opened in the importer need to be closed when the import process is reversed.
Zhansong Gao [Fri, 26 May 2023 04:20:17 +0000 (12:20 +0800)]
mds: session in the importing state cannot be cleared if an export subtree task is interrupted while the state of importer is acking
The related sessions in the importer are in the importing state(`Session::is_importing` return true) when the state of importer is `acking`,
`Migrator::import_reverse` called by `MDCache::handle_resolve` should reverse the process to clear the importing state if the exporter restarts
at this time, but it doesn't do that actually because of its bug. And it will cause these sessions to not be cleared when the client is
unmounted(evicted or timeout) until the mds is restarted.
The bug in `import_reverse` is that it contains the code to handle state `IMPORT_ACKING` but it will never be executed because
the state is modified to `IMPORT_ABORTING` at the beginning. Move `stat.state = IMPORT_ABORTING` to the end of import_reverse
so that it can handle the state `IMPORT_ACKING`.
Patrick Donnelly [Wed, 13 Nov 2024 03:17:59 +0000 (22:17 -0500)]
Merge PR #60464 into main
* refs/pull/60464/head:
mds: add or update MDS thread names
log: cache recent threads up to a day
common: cache pthread names
log: concatenate thread names and print once per thread
Patrick Donnelly [Wed, 13 Nov 2024 03:14:20 +0000 (22:14 -0500)]
Merge PR #60381 into main
* refs/pull/60381/head:
doc: remove refrences to `mds_log_major_segment_event_ratio`
mds: start a new major segment after reaching minor segment threshold
mds: make parts of mdlog reusable to be used by beacon
Reviewed-by: Anthony D Atri <anthony.datri@gmail.com> Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Patrick Donnelly [Wed, 13 Nov 2024 03:12:27 +0000 (22:12 -0500)]
Merge PR #60283 into main
* refs/pull/60283/head:
mds: add issue_seq to all cap messages
include/ceph_fs: correct ceph_mds_cap_peer field name
include/ceph_fs: correct ceph_mds_cap_item field name
messages/MClientCaps: use correct ceph_seq_t for cap sequence types
messages/MClientCaps: dump issue_seq for debugging
mds: remove dead code
Ronen Friedman [Tue, 12 Nov 2024 14:21:25 +0000 (08:21 -0600)]
osd/scrub: list additional information when dumping the queue
Extend the information provided for operator dump commands, to
include the basic identity and scheduling information of the
entries in the scrub queue.
This change mostly benefits automatic QA and our internal
testing.
Zac Dover [Mon, 11 Nov 2024 23:31:28 +0000 (09:31 +1000)]
doc/rados: correct "full ratio" note
Correct a note that directed users not to add an OSD after the cluster
has reached its "full ratio". The note now says "Do not let your cluster
reach its full ratio before adding an OSD."
Hat tip: Oskar Berggren
Fixes: https://tracker.ceph.com/issues/68900 Co-authored-by: Oskar Berggren <oskar.berggren@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
Zac Dover [Tue, 29 Oct 2024 07:27:43 +0000 (17:27 +1000)]
doc/start: separate package chart from container chart
Separate the packages-and-containers chart into two charts:
(1) a chart that shows which OSes Ceph builds packages for
(2) a chart that shows which OSes support Ceph's containers
Vallari Agrawal [Tue, 8 Oct 2024 21:07:48 +0000 (02:37 +0530)]
monitoring: add 2 nvmeof alerts to prometheus_alerts.yaml
- `NVMeoFMissingListener`: trigger if all listeners
are not created for each gateway in a subsystem
- `NVMeoFZeroListenerSubsystem`: trigger if a subsystem has no listeners
Xuehan Xu [Thu, 7 Nov 2024 01:41:18 +0000 (09:41 +0800)]
crimson/os/seastore: move the root meta out of the root block
During massive data backfilling, new osdmaps keep being created due to
frequent pg status changes, which can lead to frequent osd meta updates.
Those updates will be translated into "SeaStore::write_meta"s, which
modifies the root block's meta field and invalidates all inflight
transactions. Since the osd meta updates can be very frequent, long
transactions may be kept invalidated and the corresponding IO requests
hang.
This commit moves the root meta out of the root block, so that updates
to it won't invalidate irrelevant transactions
Ivo Almeida [Tue, 5 Nov 2024 16:19:09 +0000 (16:19 +0000)]
mgr/dashboard: update carbon-components-angular
* update carbon-components-angular pkg to v5.48.0
* fixed change detection errors on unit tests
* fixed pagination page length when limit is 0 and data is empty
Fixes: https://tracker.ceph.com/issues/68837 Signed-off-by: Ivo Almeida <ialmeida@redhat.com>
Samuel Just [Thu, 10 Oct 2024 00:59:20 +0000 (00:59 +0000)]
crimson/.../client_request: complete_request() only in with_pg_process
This avoids needing to annotate every exit point in
with_pg_process_interruptible with complete_request. Regardless of the
result, completing with_pg_process_interruptible without an interruption
means that the request is over.
Samuel Just [Thu, 3 Oct 2024 00:41:34 +0000 (00:41 +0000)]
crimson: introduce RAII style obc lock mechanic
Currently, we rely on ObjectContextLoader::with_* wrappers to load,
lock, and guarrantee release of obcs. That mechanism works well enough,
but the execution pathway is pretty tough to read as it spans
[Internal]ClientRequest, PG, ObjectContextLoader, ObjectContext, and
tri_mutex. This mechanism cuts out PG and ObjectContext (mostly) and
uses coroutine support for auto variables to make the interface easier
to understand.
This mechanism will also allow a future PR to access the ObjectContext
state prior to loading it. This will be important to using the
ObjectContext memory to host per-object pipeline states.
Samuel Just [Thu, 3 Oct 2024 01:26:04 +0000 (18:26 -0700)]
crimson: track obcs unconditionally
Previously, we only interrupted head obcs. I don't think that
distinction actually makes sense -- both head and clone obcs
can have ops blocked on the lock. Let's just track them all.
Ronen Friedman [Wed, 6 Nov 2024 14:43:57 +0000 (08:43 -0600)]
osd/scrub: fix 'schedule-deepscrub' test asok command
The existing implementation of the 'schedule-deepscrub' Asok
command uses the set_last_deep_scrub_stamp() method to "fake"
the last-deep-scrub stamp. Unfortunately, this method also
updates the last-scrub stamp (as required for non-test usage).
Commit 9f3e18b fixed the comparator used when sorting the
scrub targets. An unintended side effect is that
following 'schedule-deepscrub' - the shallow target
is the one to be scrubbed next, instead of the deep target.
Zac Dover [Wed, 6 Nov 2024 12:22:14 +0000 (22:22 +1000)]
doc/cephadm: link to "host pattern" matching sect
Link to the "Placement by Pattern Matching" section in
doc/cephadm/services/index.rst from the "Advanced OSD Service
Specifications" section in doc/cephadm/services/osd.rst.