client: prohibit unprivileged users from setting sgid/suid bits
Prior to fb1b72d, unprivileged users could add mode bits as long as
S_ISUID and S_ISGID were not included in the change.
After fb1b72d, unprivileged users were allowed to modify S_ISUID and
S_ISGID bits only when no other mode bits were changed in the same
operation. This inadvertently permitted unprivileged users to set
S_ISUID and/or S_ISGID bits when they were the sole bits being modified.
This behavior should not be allowed. Unprivileged users should be
prohibited from setting S_ISUID and/or S_ISGID bits under any
circumstances.
This change tightens the permission check to prevent unprivileged
users from setting these privileged bits in all cases.
Dan Mick [Sun, 19 Oct 2025 00:45:31 +0000 (17:45 -0700)]
install-deps.sh: install proper compiler version on Debian/Ubuntu
This code used to run in a pbuilder hook (because it needed to run
inside the build environment chroot). When building in a container,
you also want the right compiler installed.
This is necessary at least to build reef on ubuntu focal.
John Mulligan [Mon, 20 Oct 2025 19:04:49 +0000 (15:04 -0400)]
script/build-with-container: optionally source WITH_CRIMSON from env file
Add support for optionally sourcing WITH_CRIMSON from the env file that
can be passed to BWC on the command line. When auto-detecting the
crimson variant we previously only looked at the BWC processes
environment. After speaking with Zack we determined that the Jenkinsfile
only writes the WITH_CRIMSON param into the env file, so we add support
to "peek" in the env file for the WITH_CRIMSON variable.
John Mulligan [Sat, 18 Oct 2025 00:05:09 +0000 (20:05 -0400)]
script/build-with-container: add more detailed variants
Create two new variants 'packages.minimal' or 'packages.crimson'.
The first disables test deps (make check) and crimson deps.
The second only disables test deps and explicitly enables crimson deps.
The existing 'packages' variant now tries to determine if it should
switch to 'packages.minimal' or 'packages.crimson' by checking for
the same env vars install-deps.sh was (WITH_CRIMSON).
John Mulligan [Thu, 2 Oct 2025 17:56:28 +0000 (13:56 -0400)]
Dockerfile.build: improve docker compatibility
Try to fix:
```
Step 6/18 : COPY ceph.spec.in do_cmake.sh install-deps.sh run-make-check.sh src/script/buildcontainer-setup.sh ${CEPH_CTR_SRC}
When using COPY with more than one source file, the destination must be a directory and end with a /
```
Allow the user to control the content of the build image with a
high-level `--image-variant=` switch. Currently the supported values are
`default` (the same maximal image we have been generating) and
`packages` a slimmer image that avoids installing certain test-only
dependencies.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Mon, 13 Oct 2025 20:23:10 +0000 (16:23 -0400)]
install-deps.sh: let FOR_MAKE_CHECK variable take precedence
Previously, the FOR_MAKE_CHECK variable could only enable installing
extra (test) dependencies when install-deps.sh was used and it was
ignored if `tty -s` exited true. This change allows FOR_MAKE_CHECK to
take precedence over the tty check and to specify one of true, 1, yes to
enable extra "for make check" deps or false, 0, no to explicitly disable
the extra deps.
Based-on-work-by: Dan Mick <dan.mick@redhat.com> Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Wed, 8 Oct 2025 20:41:36 +0000 (16:41 -0400)]
script/build-with-container: improve error handling for invalid distros
Instead of throwing a long obnoxious traceback at the user if the value
supplied to -d/--distro is invalid do something nicer. For example:
```
$ ./src/script/build-with-container.py -d trixy -e build
usage: build-with-container.py [-h] [--help-build-steps]
build-with-container.py: error: argument --distro/-d: unknown distro: 'trixy' not in centos10, centos10stream, centos8, centos9, centos9stream, rocky9, rockylinux9, rocky10, rockylinux10, fedora41, fc41, fedora42, fc42, fedora43, fc43, ubuntu20.04, ubuntu-focal, focal, ubuntu22.04, ubuntu-jammy, jammy, ubuntu24.04, ubuntu-noble, noble, debian12, debian-bookworm, bookworm, debian13, debian-trixie, trixie
John Mulligan [Wed, 8 Oct 2025 14:23:25 +0000 (10:23 -0400)]
script/build-with-container: be consistent with naming in distro kinds
Update the DistroKind enum and related items so that the naming is
applied consistently. That is: the canonical (no pun indented) form
of the name is "<name><version>" and codenames, such as "jammy" or
"bookworm" are aliases. This matches the previously existing code.
John Mulligan [Thu, 28 Aug 2025 23:39:06 +0000 (19:39 -0400)]
build-with-container: ensure npm dir is set up before configure
When the npm cache path option is passed the npm cache dir is passed
to all container `run` commands, ensure the dir has been created
before the first container command (configure) is used.
John Mulligan [Sat, 15 Mar 2025 16:44:00 +0000 (12:44 -0400)]
install-deps: extract SUDO variable logic into a reusable function
While the function is pretty simple and could be copy-pasted I
prefer to extract things into functions to indicate that the
logic is used/repeated elsewhere to ward off making changes to
one copy vs the other.
Nitzan Mordechai [Tue, 10 Dec 2024 09:04:34 +0000 (09:04 +0000)]
msg/async: race condition between reset_recv_state and shutdown_connections
when shutting down monitors and valgrind is involved, we can,
sometimes, to hit race condition and locks that causing the shutdown
process to hang for a long time.
reset_recv_state - issuing a message without proper locks that
causing the shutdown to hang during shutdown connection (drain network)
Rishabh Dave [Fri, 11 Oct 2024 19:03:29 +0000 (00:33 +0530)]
qa/cephfs: extend wait for trash empty
Trash directory for a volume is not created by default. If
_wait_for_trash_empty() in test_volumes.py encounters absence of trash
directory, return true.
Rishabh Dave [Sat, 6 Jan 2024 14:42:31 +0000 (20:12 +0530)]
qa/cephfs: add tests for config option pause_purging
Setting MGR config option mgr/volumes/pause_purging to true halts
all ongoing purges and allows no new purging to begin until this option
is changed to false. Add tests for this.
Conflicts:
qa/tasks/cephfs/test_volumes.py
- First conflict occurred due to missing import of safe_while which
in Reef branch compared to main branch. Along with resolving this
conflict this has been imported as it used by the tests.
- Second conflict occured due to absence of some test methods right
before where TestPausePurging was to be added.
- Third conflict occured because entire contextutil was imported instead
of just safe_while and only CommandFailedError was imported from
teuthology.exceptions while this commit imports MaxWhileTries too.
Rishabh Dave [Fri, 12 Jan 2024 10:28:41 +0000 (15:58 +0530)]
qa/cephfs: don't strip any whitespace for get_shell_stdout
Whitespace is not removed from the end of the stdout returned by the
method get_ceph_cmd_stdout(). Follow the same policy here since it is
better to not do so (this whitespace can be useful, when copying Ceph
auth keyrings from stdout to a file) and also for sake of uniformity of
interfaces.
Conflicts:
qa/tasks/cephfs/mount.py
- Conflict occured for 2 reasons -
- One, method get_shell_stdout() is absent on Reef branch but not in
main so this patch which makes modification to it will obviously run
in to conflict
- Two, run_shell_payload() lies right next to get_shell_stdout() in
main branch and its definition is quite different, leading to
conflict again.
Rishabh Dave [Tue, 3 Sep 2024 10:01:07 +0000 (15:31 +0530)]
mgr/vol: add pause/resume mechanism for async jobs
Add mechansim that allows pausing/resuming of the entire async job
machinery that queues, launches and picks next async job; both async
jobs, clones as well as purges.
And then add mgr/vol config option pause_purging and pause_cloning so
that both of these async jobs can be paused and resumed individually.
Fixes: https://tracker.ceph.com/issues/61903 Fixes: https://tracker.ceph.com/issues/68630 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 01d37d5e1ba0e250e9d3a5f28ec7f3fa3597c63f)
Conflicts:
src/pybind/mgr/volumes/module.py
- Code where patch was to be applied was slighty different
Adam King [Mon, 22 Sep 2025 21:05:07 +0000 (17:05 -0400)]
pybind/mgr: pin cheroot version in requirements-required.txt
With python 3.10 (didn't seem to happen with python 3.12) the
pybind/mgr/cephadm/tests/test_node_proxy.py test times out.
This appears to be related to a new release of the cheroot
package and a github issues describing the same problem
we're seeing has been opened by another user
https://github.com/cherrypy/cheroot/issues/769
It is worth noting that the workaround described in that
issue does also work for us. If you add
John Mulligan [Fri, 12 Sep 2025 17:52:25 +0000 (13:52 -0400)]
build-with-container: add argument groups to organize options
Use the argparse add_argument_group feature to organize the mass of
arguments into more sensible categories. Hopefully, someone reading
over the `--help` output can now more easily see options that
are useful rather than being overwhelmed by a wall of text.
mgr/dashboard: fix zone update API forcing STANDARD storage class
The zone update REST API (`edit_zone`) always attempted to configure a
placement target for the `STANDARD` storage class, even when the request
was intended for a different storage class name.
This caused failures in deployments where `STANDARD` is not defined.
Changes:
Club add placement target and add storage class methods into one single
add_placement_targets_storage_class_zone method which takes the storage
class as a param as well alongside the rest of the placement params.
Laura Flores [Tue, 3 Dec 2024 22:15:19 +0000 (16:15 -0600)]
qa/workunits/mon: ensure election strategy is "connectivity" for stretch mode
The election strategy is randomly chosen for this type of test. Sometimes,
the test passes if the "connectivity" election strategy happens to be picked.
But if a different strategy, i.e. "classic", is picked, then the test will fail.
We can ensure that the election strategy is "connectivity" by setting it in the
workunit with the ceph CLI command. Although connectivity was specified in
stretch-mode-5-mons-8-osds.yaml, that config ultimately gets overridden by
the "qa/mon_config" yaml.
Problem:
Current dump for "removed_ranks" and "disallowed_leaders"
doesn't have the correct format so the python test
script can parse through these values.
Solution:
Modified the values such that it is in the correct format
Conflicts:
src/mon/MonmapMonitor.cc - replace `goto reply` with
`goto reply_no_propose`
src/mon/OSDMonitorcc - replace `rule_valid_for_pool_type`
with `get_rule_type` since
`rule_valid_for_pool_type` is not
backported.
mgr/DaemonState: Minimise time we hold the DaemonStateIndex lock
Calling back into python functions whilst holding the lock can result in
this thread being queued for the GIL and resulting in extended delays
for threads waiting to acquire the lock.
Igor Fedotov [Thu, 21 Aug 2025 10:42:54 +0000 (13:42 +0300)]
test/libcephfs: use more entries to reproduce snapdiff fragmentation
issue.
Snapdiff listing fragments have different boundaries in Reef and Squid+
releases hence original reproducer (made for Reef) doesn't work properly
in S+ releases. This patch fixes that at cost of longer execution.
This might be redundant/senseless when backporting to Reef.
Related-to: https://tracker.ceph.com/issues/72518 Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 23397d32607fc307359d63cd651df3c83ada3a7f)
Igor Fedotov [Tue, 12 Aug 2025 13:17:49 +0000 (16:17 +0300)]
mds: rollback the snapdiff fragment entries with the same name if needed.
This is required when more entries with the same name don't fit into the
fragment. With the existing means for fragment offset specification such a splitting to be
prohibited.
Fixes: https://tracker.ceph.com/issues/72518 Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 24955e66f4826f8623d2bec1dbfc580f0e4c39ae)
Problem:
The readdir wouldn't list all the entries in the directory
when the osd is full with rstats enabled.
Cause:
The issue happens only in multi-mds cephfs cluster. If rstats
is enabled, the readdir would request 'Fa' cap on every dentry,
basically to fetch the size of the directories. Note that 'Fa' is
CEPH_CAP_GWREXTEND which maps to CEPH_CAP_FILE_WREXTEND and is
used by CEPH_STAT_RSTAT.
The request for the cap is a getattr call and it need not go to
the auth mds. If rstats is enabled, the getattr would go with
the mask CEPH_STAT_RSTAT which mandates the requirement for
auth-mds in 'handle_client_getattr', so that the request gets
forwarded to auth mds if it's not the auth. But if the osd is full,
the indode is fetched in the 'dispatch_client_request' before
calling the handler function of respective op, to check the
FULL cap access for certain metadata write operations. If the inode
doesn't exist, ESTALE is returned. This is wrong for the operations
like getattr, where the inode might not be in memory on the non-auth
mds and returning ESTALE is confusing and client wouldn't retry. This
is introduced by the commit 6db81d8479b539d which fixes subvolume
deletion when osd is full.
Fix:
Fetch the inode required for the FULL cap access check for the
relevant operations in osd full scenario. This makes sense because
all the operations would mostly be preceded with lookup and load
the inode in memory or they would handle ESTALE gracefully.
Update the "Disconnected+Remounted FS" section in
doc/cephfs/troubleshooting.rst, as suggested by Venky Shankar in https://github.com/ceph/ceph/pull/65129/files#r2312903062
auth: msgr2 can return incorrect allowed_modes through AuthBadMethodFrame
Updating AuthServer interface to return correct modes by dividing function
get_supported_auth_methods() into two functions to get methods and modes separately.