Matan Breizman [Mon, 19 Feb 2024 12:24:52 +0000 (12:24 +0000)]
common/buffer_seastar: fix alien threads memory
The underlying raw_seastar_foreign_ptr::ptr is allocated from seastar.
This ptr is wrapped with seastar::foreign_ptr:
```
/// \c foreign_ptr<> wraps smart pointers -- \ref seastar::shared_ptr<>,
/// or similar, and remembers on what core this happened.
/// When the \c foreign_ptr<> object is destroyed, it sends a message to
/// the original core so that the wrapped object can be safely destroyed.
```
The issue is that once the pointer is de-allocated from an alien thread
it is unable to send a message to the original core.
Fix this issue by making use of seastar::alien integration with non-seastar applications.
In case ~raw_seastar_foreign_ptr() will be called from an alien thread, we will submit *and wait*
for the memory to be released from the origin core.
Zac Dover [Thu, 2 May 2024 08:36:34 +0000 (18:36 +1000)]
doc/cephadm: Squid default images procedure
Address Adam King's request for version-specific
cephadm-container-image-retrieval procedures, which he requested here: https://github.com/ceph/ceph/pull/57208#discussion_r1586614140
Co-authored-by: Adam King <adking@redhat.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
Add a list of default monitor images to the documentation. This commit
is made in response to a request from Eugen Block, and is made using the
information developed by Mr Block here:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/QGC66QIFBKRTPZAQMQEYFXOGZJ7RLWBN/.
Explain that an error message received in response to
"redirect_resolve_ip_addr True" might be caused by having an
insufficiently recent release of Ceph running in your cluster.
Casey Bodley [Fri, 22 Mar 2024 14:23:31 +0000 (10:23 -0400)]
rgw: increase default metadata cache size for accounts
account users will put some extra pressure on the metadata cache,
because each request has to load metadata for the account and zero
or more groups, in addition to the user's access key and user metadata
Adam King [Tue, 23 Apr 2024 16:04:39 +0000 (12:04 -0400)]
doc/cephadm: remove downgrade reference from upgrade docs
This has been in here for years, but cephadm will block
attempted upgrades to lower versions and we generally
don't want people to think this is supported or safe.
Adam King [Tue, 9 Apr 2024 16:19:06 +0000 (12:19 -0400)]
mgr/cephadm: make enable_monitor_client configurable for nvmeof
Currently, the mon client work is not merged on main, but our
default nvmeof container will attempt to make use of it by default,
causing it to crash. This makes it configurable and defaults the
behavior to false. That can be changed once the work is actually
present in main.
Remove references to dual-stack mode in
doc/rados/configuration/network-config-ref.rst and
doc/rados/configuration/msgr2.rst. This feature seems to have been
planned but never to have been completely implemented.
See the tracker issue listed below for an email exchange detailing the
confusion caused by the presence in the documentation of this
now-removed information.
rgw: apply default quota config on account creation
add new default quota config options for accounts analogous to
rgw_user_default_quota_max_objects/size. apply the default bucket quota
config options as-is
Nizamudeen A [Tue, 26 Sep 2023 16:08:51 +0000 (21:38 +0530)]
mgr/dashboard: start using alertmanager v2
I was looking into sorting the alerts and saw there is an api v2 for
alertmanager which also has an endpoint like `alerts/groups` which might
be something that is useful for us.
Pierre Riteau [Mon, 22 Apr 2024 09:28:53 +0000 (11:28 +0200)]
doc/rados: fix outdated value for ms_bind_port_max
The highest port number used by OSD or MDS daemons was increased from
7300 to 7568 in [1] but the documentation still refers to 7300 in
multiple locations.
[1] https://github.com/ceph/ceph/pull/42210
Fixes: https://tracker.ceph.com/issues/65609 Signed-off-by: Pierre Riteau <pierre@stackhpc.com>
(cherry picked from commit 23d2740241af2118652fef6e7d6a286f338a18f2)
Incorporate the material in /doc/rados/operations/pg-repair into
/doc/rados/troubleshooting/troubleshooting-pg. Remove
/doc/rados/operations/pg-repair from the documentation. Redirect all
links to the old location to the new location.
Replace the ".. graphviz" directive with an ".. image" directive that
correctly displays an image where previously an unusably zoomed-in image
appeared.
Rishabh Dave [Thu, 18 Apr 2024 08:59:15 +0000 (14:29 +0530)]
qa/vstart_runner: increase timeout for vstart.sh command
Since the timeout bug was fixed (https://tracker.ceph.com/issues/65533)
"Ceph API tests" sometimes fails because vstart.sh command had to be
aborted due to timeout.
Currently, "timeout" is set to 300 seconds which sometimes is not enough
for vstart.sh to run successfully for "Ceph API tests" CI job. 180
seconds usually suffices for vstart.sh to run successfully when used for
CephFS.
Increase value of "timeout" to avoid such failures on "Ceph API tests" CI.
luo rixin [Tue, 16 Apr 2024 07:18:06 +0000 (15:18 +0800)]
install-deps: save and restore user's XDG_CACHE_HOME
Since ccache 4.0, ccache use $XDG_CACHE_HOME/ccache to keep compile cache
if XDG_CACHE_HOME is set. In this case $XDG_CACHE_HOME is overwrite,
ccache will use $XDG_CACHE_HOME/ccache(ccache will create the dir if not exsit) to
store compile cache, but $XDG_CACHE_HOME will be removed next round running,
leading to ccache contests are always removed. So save and restore user's XDG_CACHE_HOME.
Fixes: https://tracker.ceph.com/issues/65175 Signed-off-by: luo rixin <luorixin@huawei.com>
(cherry picked from commit a17342147d4411211ecf646730987d2633dabb6e)
Instruct readers to use "mkdir /mnt/cephfs1" to create a mountpoint
before using "ceph-fuse" to mount a filesystem, if "/mnt/cephfs1"
doesn't already exist. cf.
https://github.com/ceph/ceph/pull/56831#discussion_r1561102227
Matt Benjamin [Wed, 27 Mar 2024 22:33:56 +0000 (18:33 -0400)]
rgwlc: check for no-bucket at bucket_lc_process() preamble
Avoids trivial segfault deferencing the bucket pointer.
Fixes: https://tracker.ceph.com/issues/65188 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit d5f6fe772f83d9e6b1ebaafdb1e8274041b0d684)
Tobias Urdin [Thu, 18 Jan 2024 09:29:05 +0000 (09:29 +0000)]
rgw: invalidate and retry keystone admin token
We validate client tokens against the Keystone API by
sending our own "admin token" that is allowed to lookup
client tokens.
This "admin token" is cached and upon checking the cache
we verify the expiration on the token before using it but
we have no logic to invalidate the cache if the response
from the Keystone API says that the "admin token" is invalid.
Since we don't invalidate it and it still has not expired
it will stay in our cache and continue to cause Swift API
requests for clients to be dropped because of the invalid
admin token, until service is restarted, admin token is
expired (which it can already be) or until
the whole cache is dropped or TokenCache::invalidate()
called on the admin token.
There is probably multiple places in Keystone where it
invalidates tokens, but one example where the "admin token"
would be invalidated and return HTTP 401 status code is if
the user that is configured in rgw_keystone_admin_user has
it's password changed (even if it's the same password as the
current one) then Keystone will invalidate it's cache and
invalidated existing tokens even if they have not expired yet.
test_multi.py:test_object_sync is updated to reproduce the issue.
Without the fix, objects "." and ".." are not replicated and the test
fails (times out).
The function is typically invoked on client errors like NoSuchBucket. Logging these errors with level 1 may initially suggest a significant issue, when in fact it's just a client error. Consider raising the logging level to 20 for better clarity.
Adam King [Wed, 3 Apr 2024 17:11:08 +0000 (13:11 -0400)]
mgr/cephadm: pass daemon's current image when reconfiguring
Important to note here is that a reconfig will rewrite
the config files of the daemon and restart it, but not
rewrite the unit files. This lead to a bug where the
grafana image we used between the quincy and squid release
used different UIDs internally, which caused us to rewrite
the config files as owned by a UID that worked with the
new image but did not work with the image still specified
in the unit.run file. This meant the grafana daemon was down
from a bit after the mgr upgrade until the end
of the upgrade when we redeploy all the monitoring images.