In ProtocolV2::send_server_ident(), the global_seq was being fetched
from messenger->get_global_seq() and used in the ServerIdentFrame,
however, it is stored in a local var and not the private class var
ProtocolV2::global_seq. This causes problems like
where the receiving peer sees a peer_global_seq that
appears older than expected, triggering a false-positive reconnect logic:
```
2025-07-15T11:40:50.927+0000 mon.c handle_existing_connection client has clearly restarted
(peer_global_seq < ex_peer_global_seq && cookie changed), dropping existing connection=0x563ffe9a9000 in favor of new one
```
In this case, mon.c received a peer_global_seq=75, which was already logged by mon.d as gs=79 in
its send_server_ident()—but ProtocolV2::global_seq was never updated, resulting in inconsistent state and premature connection teardown.
This commit fixes the issue by assigning the freshly incremented messenger->get_global_seq() value to the local global_seq field in ProtocolV2 as well,
ensuring consistency in the protocol.
src/mon/HealthMonitor: Add mon_netsplit_grace_period to suppress transient MON_NETSPLIT warnings
When a monitor is elected leader and begins evaluating connectivity,
it may detect temporary disconnections between monitors that have not
yet fully reconnected to each other—particularly after events like
monitor restarts, SIGSTOP/SIGCONT (as used in mon_thrash), or brief network blips.
This can result in false-positive MON_NETSPLIT health warnings that
quickly disappear within seconds as the cluster topology stabilizes.
This commit introduces a configurable option:
- mon_netsplit_grace_period (default: 9 seconds)
When the leader observes a netsplit between two monitors or locations,
it will wait for the grace period before raising a health warning.
If the split resolves within this window, no warning is emitted.
This reduces test flakiness and alert fatigue while preserving the
accuracy of persistent MON_NETSPLIT detection.
src/msg/async: Improve logging and prefixes for global_seq
global_seq needs more visibility on how it gets updated,
decided to add more loggings in AsyncMessenger::get_global_seq
and also added the prefixes for global_seq in both
ProtocolV1 and ProtocolV2.
qa/suites/rados: increase debug && msgr-failures/none && white list
bump mon debug level to 30 in RADOS
and bump debug_ms from mon in
rados/monthrash && rados/multimon.
Add msgr-failures/none scenario to multimon and monthrash suite
this is a control scenario, where MON_NETSPLIT can only be organically
generated due to actually monitor network partition.
Whitelist the MON_NETSPLIT health warning in msgr-failures cases (excluding none)
for both multimon and monthrash suites. This is because all other
msgr-failures that is not `none` will have ms_inject_socket_failures
which is not an organic case of MON_NETSPLIT.
msg/async/ProtocolV2: Server drops existing connection when client restarts
When a client is restarted, it loses its state including global_seq and gets a new client_cookie. This creates an issue during reconnection because the server has an existing connection with a higher global_seq value, causing it to reject the new connection as "stale" with the error:
"this is a stale connection, peer_global_seq="
This commit adds detection logic in ProtocolV2::handle_existing_connection() that identifies client restarts by checking for:
1. peer_global_seq < exproto->peer_global_seq
The reason is because global sequence should only increase during a
session. A decrease strongly indicates a restart.
2. client_cookie has changed (client generated a new cookie)
When these conditions are met, the server now drops the existing connection and accepts the new one (via, sending server ident to client, client happily accepts and both are ready to exchange messages), making events such as Monitor restarts & rejoin the quorum faster, preventing MON_NETSPLIT waring from poping up.
This allows clients to successfully reconnect after a restart without having to wait for server-side call-back handler to trigger (server will also try to connect, and will be successful since server will use the reconnect path instead since it contains the client's cookie) or for global_seq _id of the client to catch up to that of the server's .
Ville Ojamo [Wed, 16 Jul 2025 14:59:08 +0000 (21:59 +0700)]
doc/radosgw: Internal link and single-keystroke improvements
Use ref for hyperlink instead of abusing "external links" feature for
intra-docs link in cloud-transition.rst, add a label in
cloud-restore.rst for it.
Use auto-generated link text instead of manually adding that same
section title as link text in cloud-transition.rst.
One list item was missing a colon in s3_objects_dedup.rst.
Add space between number and units in s3_objects_dedup.rst.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Ville Ojamo [Wed, 16 Jul 2025 08:21:20 +0000 (15:21 +0700)]
doc/radosgw: Small improvements in s3_objects_dedup.rst
Fix sentence that had "different same" to just "different" (verified the
right one from the original author).
Remove colon at the end of section titles.
Remove rendered horizontal lines between sections.
Use double backticks for command name.
Use regular apostrophe in one sentence to be consistent with the rest.
Add missing full stop at the end of several sentences.
Very small language improvements in a few sentences.
Use consistent indent in one line.
Remove hyphens from many word pairs and don't capitalize few terms.
For consistency with rest of the docs.
Fix typos "spliting" to "splitting", "underlined" to "underlying".
Spell out "thousands" instead of using an apostrophe after the number.
Reformat table to use row separators like rest of the docs instead of
empty columns.
Separate number and unit with a space. Remove rendered underscores that
seemed to be an attempt to imprecisely align cell contents to right.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
include/function2.hpp: avoid using std::aligned_storage_t
std::aligned_storage_t was deprecated in C++23, now that we've switched
to C++23. let's address the warning like:
```
In file included from /mnt/igor/github/salieri11/ceph/src/osdc/Objecter.cc:15:
In file included from /mnt/igor/github/salieri11/ceph/src/osdc/Objecter.h:44:
/mnt/igor/github/salieri11/ceph/src/include/function2.hpp:962:10: error: 'aligned_storage_t' is deprecated [-Werror,-Wdeprecated-declarations]
962 | std::aligned_storage_t<Capacity> capacity_;
```
in this change, we
- update function2.hpp with upstream
- apply the fix to trade std::aligned_storage_t with an alignas-based
equivalent implementation
rgw: pass list_parts_each_t function by lvalue reference
list_parts_each_t is an alias of
`fu2::unique_function<int(const Part&) const>`, which is a non copyable
function. so in theory, we cannot copy it. and in the recent version of
function2, unique_function is not coyable anymore. if we bump up the
vendored function2.hpp, the build breaks.
so, in this change, we change the virtual function of
`Object::list_parts()` from passing the plain value of
`list_parts_each_t` to rvalue reference `list_parts_each_t` so that we
don't need to copy this non-copyable function. this allows us to
keep in sync with upstream function2, and to be symantically correct
regarding to the unique-ness of the functor.
Dnyaneshwari [Thu, 22 May 2025 07:08:25 +0000 (12:38 +0530)]
mgr/dashboard: Glacier Storage Class - create and list Fixes: https://tracker.ceph.com/issues/71897 Signed-off-by: Dnyaneshwari Talwekar <dtalwekar@redhat.com>
Kefu Chai [Wed, 21 May 2025 03:38:47 +0000 (11:38 +0800)]
osd: migrate from boost::variant to std::variant
Replace boost::variant with std::variant throughout the OSD-related
codebase. This change reduces third-party dependencies by leveraging
the C++ standard library alternative.
Changes:
- common/inline_variant.h: Replace the existing match() helper with a
wrapper around std::visit. The previous implementation constructed a
visitor class from given functions; the new implementation provides
equivalent functionality using standard library primitives.
- osd/osd_types.h: Add templated operator<< overload for std::variant.
Since boost::variant provided a built-in operator<< that we relied on,
and std::variant does not include this functionality, we implement our
own formatter. To avoid ambiguous overload resolution (where types
implicitly convertible to variant alternatives could match both the
variant formatter and their native formatters), the template requires
at least one alternative type parameter.
This migration maintains existing functionality while eliminating the
boost::variant dependency.
Remove unused header includes from librados_c.cc to reduce unnecessary
dependencies. This cleanup was initially motivated by removing unused
linkage of cls_lock_client, but expanded to address all unused includes
in the file.