Prior to this set of commits, the MDS would write the ESubtreeMap to the
journal, trim everything up to that segment, then finally force the
trimming of that last segment (`MDLog::trim(0)`). This is awkward in the
new code which preserves a major segment boundary at the beginning of
the journal during trimming. Instead of writing a special case for this
situation, it seems more natural to just use a new "lid" or "cap" event
to mark the beginning of the journal when no subtree map can yet be
written but we need sequence numbers to tie in other MDS tables.
Like ESegment, ELid doesn't actually contain any state. It's just a
marker for the beginning the log after rank deactivation or rank
creation. It can appear in the middle of the log if the shutdown
sequence is interrupted while writing the event but the MDS will skip it
during replay in that case.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Mon, 30 Jan 2023 19:33:32 +0000 (14:33 -0500)]
qa: add numerous subtree test
When the ESubtreeMap is very large (~5k+ subtrees), the MDS will
end up logging only a few events (as bad as 1) per segment as the
subtree map dominates the segment size.
This test simply creates an artificially large subtree and confirms that
other file system activity completes in a timely manner. This is now
taking advantage of the minor segments which allows for a normal set of
events per log segment (and fewer subtree maps). The test fails on the
current main HEAD.
Historical note: when I first observed this abberant behavior, the
vstart cluster was actually using mds_debug_subtrees = True (the default
for every vstart cluster). This caused the MDS to write out the subtree
map (for debugging reasons) with every event. When testing the MDS with
large subtrees (distributed ephemeral pinning), this caused the MDS to
slow to a trickle of operations per second. Despite this unintentional
misconfiguration, the problem still exists but the number of auth
subtrees must be large for a particlar rank to replicate the behavior.
On main HEAD, the creation of 10k files (workload stage) takes ~110
seconds. On this branch, it takes ~30 seconds.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This commit adds a new ESegment event type which can delineate
LogSegments. This event can be used as an alternative to the heavy
weight ESubtreeMap which can be very expensive to generate when the MDS
has a large subtree map.
Fixes: https://tracker.ceph.com/issues/58154 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
The major problem here is that the MDLog::_start_entry method puts the
current event sequence number in the EMetaBlob of the event (if
present). Because of this, no other event can be submitted as this would
invalidate the event sequence. Instead, fixup the event sequence during
submission and simplify related logic that uses it during EMetaBlob
construction.
Secondarily, for the purposes of this commit series, _start_entry
introduced recursive locks when generating the ESubtreeMap within
MDLog::_segment_upkeep. So, this commit is a necessary cleanup.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
We're currently setting FMT_USE_TZSET=0 when building libfmt
in order to avoid the _tzset function, which is unavailable
under Mingw:
https://github.com/ceph/ceph/commit/aa5769ecf1d80fc9824280d2e90fd4c61a0e7769
The issue is that it still gets used by fmt/chrono.h, which is
why we'll move this definition to the top level cmake file.
Note that the Windows build is currently failing as a result of
a recent change: https://github.com/ceph/ceph/pull/52590/files
In file included from ceph/src/common/ceph_time.h:22,
from ceph/src/include/encoding.h:31,
from ceph/src/include/uuid.h:9,
from ceph/src/include/types.h:21,
from ceph/src/crush/CrushWrapper.h:14,
from ceph/src/crush/CrushCompiler.h:7,
from ceph/src/crush/CrushCompiler.cc:4:
ceph/src/fmt/include/fmt/chrono.h: In lambda function:
ceph/src/fmt/include/fmt/chrono.h:953:5: error: ‘_tzset’ was
not declared in this scope; did you mean ‘tzset’?
953 | _tzset();
| ^~~~~~
| tzset
avanthakkar [Mon, 26 Jun 2023 07:11:24 +0000 (12:41 +0530)]
mgr/dashboard: empty grafana panels for performance of daemons
Fixes: https://tracker.ceph.com/issues/61792 Signed-off-by: avanthakkar <avanjohn@gmail.com>
Removing the `ceph-` prefix from ceph_daemon label to adopt it with the label
format used by queries in grafana dashboards. Also changing the
`instance_id` label for rgw to match the values coming from
exporter and prometheus module
Ville Ojamo [Thu, 27 Jul 2023 07:56:58 +0000 (14:56 +0700)]
doc/radosgw: Add missing space to date option spec in admin.rst
The start time and end time CLI option specification is missing a space between the date and the optional time value. Also expand the text to talk about "optional time" after the date.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Adam King [Thu, 27 Jul 2023 13:58:51 +0000 (09:58 -0400)]
Merge pull request #52084 from rhcs-dashboard/fix-exporter-addrs
exporter: ceph-exporter scrapes failing on multi-homed server
Reviewed-by: Adam King <adking@redhat.com> Reviewed-by: Juan Miguel Olmo Martínez <jolmomar@ibm.com> Reviewed-by: Redouane Kachach <rkachach@redhat.com>
Add instructions directing the reader to install the "python3-routes"
package. This package is required in order to launch the dashboard after
the installation procedure has completed, but is not yet included in the
install-deps.sh script.
mengxiangrui [Sat, 21 Aug 2021 07:20:00 +0000 (15:20 +0800)]
rgw: fix the Content-Length in response header is inconsistent with response body size when rgw returns default html error page in static website
The default html error page as response body should be built completely include three ending html symbols(/ul, /body and /html) before rgw computes Content-Length in response header. The Content-Length in response header will be consistent with response body size. Client can get complete page.
Jos Collin [Mon, 24 Jul 2023 08:46:52 +0000 (14:16 +0530)]
qa: fix cephfs-mirror unwinding and 'fs volume create/rm' order
* Fixes the 'fs volume create' happens before the cephfs-mirror daemon start.
* Fixes the 'fs volume rm' happen only after the cephfs-mirror daemon unwinding.
- This prevents the issue of mirror-daemon not returning from a libcephfs call, as
the volumes were deleted during the cephfs_mirror_thrash ing.
Fixes: https://tracker.ceph.com/issues/61182 Signed-off-by: Jos Collin <jcollin@redhat.com>
* refs/pull/48038/head:
client test: Add fsync to ll_preadv_pwritev test
libcephfs: Option to write + fsync via ceph_ll_nonblocking_readv_writev
Client: Hook nonblocking fsync into the write path of ll_preadv_pwritev
Client: Add non-blocking fsync
Client/Inode: wait_for_caps fixups
Client: change several waitfor_* to use Context list
test: Add nonblocking I/O client test
libcephfs: Add nonblocking readv/writev I/O interface
Client: Add ll_preadv_pwritev to expose non-blocking I/O to libcephfs
Client: Add non-blocking helper classes
Client: Break some code into new methods in prep for non-blocking I/O
Buffers: Add function to buffer.h to copy bufferlist to an iovec
ObjectCacher: Prepare file_write path for non-blocking I/O
Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Adam C. Emerson <aemerson@redhat.com>
Frank S. Filz [Tue, 6 Sep 2022 18:44:43 +0000 (11:44 -0700)]
Client/Inode: wait_for_caps fixups
The non-blocking flush requires us to be able to re-add to
wait_for_caps but if we simply add to the list, we get stuck in an
infinite loop. Add a wait_for_caps_pending list to add to, and then
when done signalling, we move the wait_for_caps_pending items onto the
wait_for_caps list.
Also in handle_cap_flush_ack(), we need to complete the caps flushing
before signalling since with non-blocking flush, we will be actually
examining the caps from the completion rather than signalling a
condition variable in the completion.
Signed-off-by: Frank S. Filz <ffilzlnx@mindspring.com>
Frank S. Filz [Wed, 29 Jun 2022 22:39:12 +0000 (15:39 -0700)]
Client: change several waitfor_* to use Context list
Change waitfor_caps, waitfor_safe and waitfor_commit to Context list.
To make a non-blocking version of fsync (to be used for non-blocking write
and commit), we need to be able to signal an arbitrary Context on completion
of either of these lists.
add_nonblocking_onfinish_to_context_list Adds such a Context to the list.
Signed-off-by: Frank S. Filz <ffilzlnx@mindspring.com>
Frank S. Filz [Wed, 4 May 2022 20:35:44 +0000 (13:35 -0700)]
ObjectCacher: Prepare file_write path for non-blocking I/O
For non-blocking I/O, we will want to be able to override
block_writes_upfront so rename the member cfg_block_writes_upfront and add
an option to pass block_writes_upfront as a parameter along with a member
access method so caller can pass cfg_block_writes_upfront.
Signed-off-by: Frank S. Filz <ffilzlnx@mindspring.com>