Once we destruct SharedLRU, SharedLRU::weak_refs map is destroyed.
As a weak refernce might outlive the SharedLRU itself, when destroying
the object via the custom Deleter, we try to access the already
destroyed SharedLRU instance's weak ref map.
Instead, invalidate the custom Deleter (Deleter::cache), when
destructing the SharedLRU.
Zac Dover [Fri, 13 Dec 2024 06:12:49 +0000 (16:12 +1000)]
doc/cephfs: edit 3rd 3rd of mount-using-kernel-driver
Edit the third third of doc/cephfs/mount-using-kernel-driver.rst in
preparation for correcting mount commands that may not work in Reef as
described in this documentation.
This commit edits only English-language strings in
doc/cephfs/mount-using-kernel-driver.rst. No technical content (that is,
no commands and no settings) have been altered in this commit.
Technical alterations to this file will be made only after the English
is unambiguous.
This PR follows the following two PRs:
https://github.com/ceph/ceph/pull/61048 - 1st 3rd
https://github.com/ceph/ceph/pull/61049 - 2nd 3rd
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
Ilya Dryomov [Thu, 12 Dec 2024 20:32:39 +0000 (21:32 +0100)]
librbd/migration/HttpClient: socket isn't shut down on some state transitions
If shut_down() gets delayed until a) the state transition from
STATE_RESET_CONNECTING completes and the reconnect is unsuccessful or
b) the state transition from STATE_RESET_DISCONNECTING completes (i.e.
next_state is STATE_UNINITIALIZED or STATE_RESET_CONNECTING), the
socket needs to be shut down before m_on_shutdown is invoked. The line
of thought here is the same as for the corresponding state transitions
that don't involve STATE_SHUTTING_DOWN.
Ilya Dryomov [Wed, 11 Dec 2024 15:25:13 +0000 (16:25 +0100)]
librbd/migration/HttpClient: avoid hitting an assert in advance_state()
If the shutdown gets delayed until the state transition from
STATE_RESET_CONNECTING completes and the reconnect is successful
(i.e. next_state is STATE_READY), we eventually hit "unexpected
state transition" assert in advance_state(). The reason is that
advance_state() would update m_state and call disconnect() under
STATE_READY instead of STATE_SHUTTING_DOWN. After the disconnect
maybe_finalize_shutdown() would enter advance_state() again with
STATE_SHUTDOWN as next_state, but the transition to that from
STATE_READY is invalid.
Plug this by not transitioning to next_state if current_state is
STATE_SHUTTING_DOWN.
Ilya Dryomov [Mon, 9 Dec 2024 10:19:57 +0000 (11:19 +0100)]
librbd/migration/HttpClient: ignore stream_truncated when shutting down SSL
Propagate ec to handle_disconnect() and use it to suppress
stream_truncated errors. Here is a quote from Beast documentation [1]:
// Gracefully shutdown the SSL/TLS connection
error_code ec;
stream.shutdown(ec);
// Non-compliant servers don't participate in the SSL/TLS shutdown process and
// close the underlying transport layer. This causes the shutdown operation to
// complete with a `stream_truncated` error. One might decide not to log such
// errors as there are many non-compliant servers in the wild.
if(ec != net::ssl::error::stream_truncated)
log(ec);
... and a commit that made ignoring stream_truncated safe [2]:
// ssl::error::stream_truncated, also known as an SSL "short read",
// indicates the peer closed the connection without performing the
// required closing handshake
// [...]
// When a short read would cut off the end of an HTTP message,
// Beast returns the error beast::http::error::partial_message.
// Therefore, if we see a short read here, it has occurred
// after the message has been completed, so it is safe to ignore it.
Ilya Dryomov [Sat, 7 Dec 2024 12:52:41 +0000 (13:52 +0100)]
librbd/migration/HttpClient: drop SslHttpSession::m_ssl_enabled
The remaining callers of disconnect() call it only when m_ssl_enabled
is set to true (i.e. after the handshake is completed):
- shut_down(), in STATE_READY
- maybe_finalize_reset(), very shortly after transitioning out of
STATE_READY as part of performing a reset
- advance_state(), on a transition to STATE_READY that is intercepted
by a previously delayed shut down
m_ssl_enabled isn't used outside of disconnect() and on top of that
is never cleared.
Ilya Dryomov [Sat, 7 Dec 2024 11:22:52 +0000 (12:22 +0100)]
librbd/migration/HttpClient: don't call disconnect() in handle_handshake()
With m_ssl_enabled set to false, disconnect() is a no-op. Since
m_ssl_enabled is flipped to true only when the handshake succeeds,
calling disconnect() on "failed to complete handshake" error is bogus
(as would be attempting to shut down SSL there).
Ilya Dryomov [Fri, 6 Dec 2024 15:51:51 +0000 (16:51 +0100)]
librbd/migration/HttpClient: avoid reusing ssl_stream after shut down
ssl_stream objects can't be reused after shut down: despite
a successful reconnect and handshake, any attempt to read or write
fails with "end of stream" (beast.http:1) or "protocol is shutdown"
(asio.ssl:337690831) error respectively. This doesn't appear to be
documented, but Beast and ASIO authors both mention that the stream
must be destroyed and recreated [1][2].
This was missed because the only integration test with a big enough
image used http instead of https.
Ilya Dryomov [Fri, 6 Dec 2024 13:42:55 +0000 (14:42 +0100)]
librbd/migration/HttpClient: don't shut down socket in resolve_host()
resolve_host() is called from init() and issue() when transitioning out
of STATE_UNINITIALIZED and from advance_state() right after the call to
shutdown_socket(). In all three cases the socket should get closed, so
drop the redundant call and place asserts in connect() implementations
instead.
Adam Kupczyk [Wed, 11 Dec 2024 17:33:53 +0000 (17:33 +0000)]
qa/suites/rados: Add ceph_test_bluefs
unittest_bluefs was difficult for jenkins make check.
On jenkins disable the most resource hungry tests.
Make test on teuthology that tests everything.
This change has 2 rationales:
1) The test outgrew initial unittest framework and now executes
component testing
2) We still need to run most of unittest_blues as part of jenkins make check
3) We want to run tests on teuthology. Build process excludes unit
tests, so ceph_test_bluefs was created.
Venky Shankar [Fri, 13 Dec 2024 07:54:05 +0000 (13:24 +0530)]
Merge PR #58376 into main
* refs/pull/58376/head:
Temporarily change the libcephfs dependencies
proxy: Add the design document
proxy: Add the proxy to the deb builds
proxy: Add the proxy to the rpm builds
Initial version of the libcephfs proxy
Hannes Baum [Wed, 6 Nov 2024 08:46:09 +0000 (09:46 +0100)]
mgr: fix subuser creation via dashboard
Subusers couldn't be created through the dashboard, because the get call was overwritten with Python magic due to it being the function under the HTTP call.
The get function was therefore split into an "external" and "internal" function, whereas one
can be used by functions without triggering the magic. Since the user object was then returned correctly, json.loads could be removed.
Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>
Zac Dover [Wed, 4 Dec 2024 20:43:12 +0000 (21:43 +0100)]
doc/dev: instruct devs to backport
Add a note to doc/dec/development-workflow.rst that instructs developers
to do their own backports. This change was requested by Laura Flores on
04 Dec 2024.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
Zac Dover [Wed, 11 Dec 2024 21:17:40 +0000 (07:17 +1000)]
doc/cephfs: edit 2nd 3rd of mount-using-kernel-driver
Edit the second third of doc/cephfs/mount-using-kernel-driver.rst in
preparation for correcting mount commands that may not work in Reef as
described in this documentation.
This commit edits only English-language strings in
doc/cephfs/mount-using-kernel-driver.rst. No technical content (that is,
no commands and no settings) have been altered in this commit.
Technical alterations to this file will be made only after the English
is unambiguous.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
Zac Dover [Wed, 11 Dec 2024 14:15:14 +0000 (15:15 +0100)]
doc/cephfs: edit first 3rd of mount-using-kernel-driver
Edit the first third of doc/cephfs/mount-using-kernel-driver.rst in
preparation for correcting mount commands that may not work in Reef as
described in this documentation.
This commit is a cherry-pick from a branch that targeted the Reef
release branch. After some thought I realized that there was no reason
that the Engliish grammar shouldn't be clean in this branch too.
qa/rgw: force s3 java tests to run gradle on Java 8
Previously gradle would run using the default Java version. This looks
for Java 8 using `alternatives` and sets JAVA_HOME to the
corresponding directory prior to launching gradle.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Aashish Sharma [Tue, 10 Dec 2024 07:03:47 +0000 (12:33 +0530)]
mgr/dashboard: Show correct token expiration date in Manage Clusters
page
Currently wrong date is being displayed if we hover of the Token Expires
filed in the Clusters list table in Manage Clusters page. This PR
intends to fix this issue.
Max Kellermann [Sat, 2 Nov 2024 21:32:23 +0000 (22:32 +0100)]
tools/ceph-dencoder/sstring.h: use `char8_t` instead of `unsigned char`
This fixes a build failure with libc++ (clang/LLVM). This build
failure is correct: there exists no specialization for
`std::char_traits<unsigned char>`. The standards-compliant way to use
unsigned chars in strings is to use `char8_t`.
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Yuval Lifshitz [Wed, 6 Dec 2023 18:51:59 +0000 (18:51 +0000)]
rgw/logging: add support for GetBucketLogging and PutBucketLogging
this is based on AWS server access logs:
- https://docs.aws.amazon.com/AmazonS3/latest/userguide/enable-server-access-logging.html
- https://docs.aws.amazon.com/AmazonS3/latest/userguide/ServerLogs.html
- https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketLogging.html
- https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketLogging.html
however, a new mode was added called "journal" where:
- logs of PUT, COPY and MPU are guaranteed
- we have logs of DELETE and multi-DELETE operations (not guaranteed)
- log records hold only minimal amount of information
lu.shasha [Fri, 6 Dec 2024 04:40:27 +0000 (12:40 +0800)]
rgw: shouldn't call index_op.cancel() when rados op return ETIMEDOUT
when rados op return ETIMEOUT, rgw can't determine whether or not the rados op succeeded,
we shouldn't be calling index_op->cancel() in this case
Instead, we should leave that pending entry in the index so than bucket listing can recover with check_disk_state() and cls_rgw_suggest_changes()
Aashish Sharma [Thu, 5 Dec 2024 05:37:13 +0000 (11:07 +0530)]
mgr/dashboard: Update and correct zonegroup delete notification
while deleting zone group from dashboard, notification message says "zone <zg_name> deleted successfully" instead of "zone group <zg_name> deleted successfully"
Aashish Sharma [Thu, 28 Nov 2024 05:58:59 +0000 (11:28 +0530)]
mgr/dashboard: Add ceph_daemon filter to rgw overview grafana panel
queries
Currently rgw_servers filtering is not working in RGW Overview garfana graphs.
It is showing data of all the RGW services, even though filter set to single service.
This PR intends to solve this issue
J. Eric Ivancich [Fri, 22 Nov 2024 17:40:24 +0000 (12:40 -0500)]
rgw: optimize bucket listing to skip past regions of namespaced entries
When listing a bucket and the parameters are such that we're not
listing namespaced entries, this commit adds an optimization to
advance the marker such that we skip past a whole region of namespaced
entries rather than evaluating each entry one-by-one.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>