git-server-git.apps.pok.os.sepia.ceph.com Git

mgr/dashboard: start node virtual-env after starting ceph cluster

in frontend e2e.sh file, we don't need to start the node venv early on
before the ceph cluster is started. we only need it for the `npm` or
`npx` commands. Starting node virtual env and then starting ceph will
cause the ceph cluster to assume the node-env python as the python
environment which breaks the cryptotools call.

So moving the node-env venv start after the ceph is created

Fixes: https://tracker.ceph.com/issues/73804
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit a56ae5b0e6d1ed035fbb93591fea7e27858004e5)
(cherry picked from commit 411fcaa78fcf75392dd235533ba9b8d351971b08)

mgr/dashboard: add an option to control the dashboard crypto caller

Add a mgr config option `crypto_caller` that lets a ceph user override
the default behavior of using the remote crypto caller. Supported
values are `internal` and `remote`.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 27c2050e37ed2556e1b2d0e5f6631d51b506ec6f)

Conflicts:
src/pybind/mgr/dashboard/module.py
- removed the sso oauth2 option

mgr/cephadm: always use the internal cryptocaller

The cephadm modules needs to use python cryptography module for ssh (via
asyncssh) and thus there's no need to use the remote crypto caller in
cephadm. Configure cephadm to always use the internal cryptocaller.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 2128ffa619c9a4a800fb6394503b8ecc5b16fa96)

Conflicts:
src/pybind/mgr/cephadm/module.py
- REQUIRE_POST_OPTIONS import was not present in that file on squid, so
removing it

python-common/cryptotools: catch all failures to read cert

Previously, the internal crypto caller would catch (and convert) some
errors when reading the cert but not all cases. Move the logic to catch
the errors to a common location and do it once consistently.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit f6ab08783c0f121d33709a2aaecb6087c69ae3f2)

python-common/cryptotools: create module for selecting crypto caller

Add a module to select a desired crypto caller. Update the callers
to use the crypto caller interface.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 0eb2f4b1327e9a0da11db246fcbd0c4ed4d832f0)

python-common/cryptotools: move internal crypto caller to new file

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 0c774d5c767ef9875250de5a95e421a6b837b85e)

python-common/cryptotools: add caller module for base class

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit c3dc34a0d55e65694a1b7f2c0d423c4f2f0ed252)

python-common/cryptotools: unify and organize all endpoint functions

Lightly reorganize and make the "endpoint" functions in cryptotools.py more
consistent and uniform. Use small functions for input and output
handling so that the handling is done the same way throughout. Pass a
pre-constructed crypto caller via the args to then endpoint functions.
Make generating the private key it's own named function rather than
one single (and only) function with overloaded behavior controlled by
a cli switch.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 552d7b4373afa1a93fe47ce234560b9c8485321d)

python-common/cryptotools: use a main function

Use a main function to encapsulate the cli parsing rather than a block
of code in module scope.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit c98e53f1136ebef2ffeb3d191ab2fc49d9728a3d)

python-common/cryptotools: move actual crypto opts into a class

The functions now handle the i/o but allow the crypto function class
to centralize the functions that actually use the crypto libs.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 4e4cfa58c4b124c0b0406619cc14ced0b2422550)

pybind/mgr: fix test case in test_tls.py

Why violate the typing in a test? mypy never noticed this because tests
are not type checked but there seems to be no need to turn a str into
bytes to pass to a function that is typed only as taking str!

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 955143ddfb3ea6f5f7b63902a734f17d393da4d8)

mgr/dashboard: replace direct use of bcrypt in dashboard

Replace a direct usage of bycrypt with our cryptocaller wrapper.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit d2fd81eae98d8dee4f3363616ecd3241b05cf560)

python-common/cryptotools: give the parsers more sensible names

Name the parser objects after their functions and not `foo` and `bar`.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 5d4eeff0d5d6aa59fef2a6e2055615df3f94210e)

pybind/mgr: Appropriately rename function.

Signed-off-by: Paulo E. Castro <pecastro@wormholenet.com>
(cherry picked from commit 21d6e1d493dc5652b2242ef2e0dc7e1c12714d20)

Conflicts:
src/pybind/mgr/cephadm/cert_mgr.py
- removed this file since it doesn't exist in squid

python-common/cryptotools: Remove ascii and utf-8 references from encode/decode.

Signed-off-by: Paulo E. Castro <pecastro@wormholenet.com>
(cherry picked from commit e364df3914094b8e1c931a09ff8d6863b6d2845f)

python-common/cryptotools: fix error path in verify tls function

The remote verify_tls function was not raising errors when it should.
Fix the function so that it always returns an object when it succeeds or
fails gracefully. Always parse that function in the crypto caller class.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 96a7a72cf414a3dc5c8587d34e80838cc64b71a4)

pybind/mgr: Correct code to ensure cephadm/tests/test_certmgr.py passes.

Signed-off-by: Paulo E. Castro <pecastro@wormholenet.com>
(cherry picked from commit 218d84fc15d818d2df56c92cd71aeb2aa85f1590)

python-common/cryptotools: Always encode, Err via stderr and signal the exit.

Signed-off-by: Paulo E. Castro <pecastro@wormholenet.com>
(cherry picked from commit 56d508f9dc1b5503a465cb2b25838a1e81182a49)

python-common: Correct typo in private_key naming field.

Signed-off-by: Paulo E. Castro <pecastro@wormholenet.com>
(cherry picked from commit 4bcab139830eead485412219509fbe390b046aec)

pybind/mgr: update mgr_util to use cryptotools CryptoCaller class

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 2b9cf2453f13eb48e43e4eb06c78365c397c50cd)

Conflicts:
src/pybind/mgr/mgr_util.py
- accepted incoming changes

python-common: remove unused dir

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 9dcfde75e460476ccb2054662e6316236326ca09)

python-common/cryptotools: use one single dir for cryptotools

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit a5861c153e3dbb5482afe87525056cf194a436ff)

python-common/cryptotools: create CrytpoCaller interface class

Create a class to act as a common shim between the cryptotools external
functions and the mgr. It provides common conversion mechanisms and
could possibly act as an abstraction in case we decide to make
the external function calls in different ways in the future.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 84710f9ed414a8d81e7ebc2d21488fd5f91e51ec)

python-common/cryptotools: use json for structured output

Where possible try to use structured output in JSON for easier parsing
and interaction with the parent process.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 6f2d92cc6d6cccf6c84af5e3a3cea26f51a73399)

pybind/mgr: Hack around the 'ImportError: PyO3 modules may only be initialized once per interpreter process' issue.

Fixes: https://tracker.ceph.com/issues/64213
Signed-off-by: Paulo E. Castro <pecastro@wormholenet.com>
(cherry picked from commit 717d0a6f3530ad3e07f4423002810327b2addcf1)

Conflicts:
src/pybind/mgr/mgr_util.py
- accepted the incoming changes

mgr: add site package paths in PyModuleRegistry

before this change, we add the paths of site packages to sys.path
when starting subinterpretors for each of the mgr modules. this
works just fine. but in Python 3.11, it deprecates `PySys_SetPath()`
in favor of PyConfig machinary, which sets the module search paths
in PyConfig, before calling `Py_InitializeFromConfig()`. so, to
set the module search paths with the new machinary, we need to do
this in `PyModuleRegistry`, where we initialize the global Python
interpretor using the new PyConfig machinary. and since we've
switched to the new PyConfig machinary when compiling with Python 3.8
and up.

in this change, we set the module search paths in PyModuleRegistry.
because PyConfig imports the site packages by default, and we are
allowed to append a new path to the existing search paths, we just
append the configured `mgr_module_path`.

this change should silence the compiling warning like:

```
/var/ssd/ceph/src/mgr/PyModule.cc:368:20: warning: ‘void PySys_SetPath(const wchar_t*)’ is deprecated [-Wdeprecated-declarations]
  368 |       PySys_SetPath(const_cast<wchar_t*>(sys_path.c_str()));
      |       ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/python3.12/sysmodule.h:15:38: note: declared here
   15 | Py_DEPRECATED(3.11) PyAPI_FUNC(void) PySys_SetPath(const wchar_t *);
      |                                      ^~~~~~~~~~~~~
```

Fixes https://tracker.ceph.com/issues/66399
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit 51a5774aa605f3b976ced47902e15ce450f50339)

mgr: set argv for python in PyModuleRegistry

before this change, we setup the progname for Python interpreter,
but setup the argv for it in PyModule. and we are using deprecated
API to initialize Python interpreter.

in this change, let's do this in a single place for better
maintainability. also, take this opportunity, to use the non-deprecated
API to initialize interpreter on Python >= 3.8.

this silence the warning when compiling ceph-mgr with CPython 3.12:
```
/var/ssd/ceph/src/mgr/PyModule.cc: In member function ‘int PyModule::load(PyThreadState*)’:
/var/ssd/ceph/src/mgr/PyModule.cc:363:20: warning: ‘void PySys_SetArgv(int, wchar_t**)’ is deprecated [-Wdeprecated-declarations]
  363 |       PySys_SetArgv(1, (wchar_t**)argv);
      |       ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/python3.12/Python.h:96,
                 from /var/ssd/ceph/src/mgr/BaseMgrModule.h:4,
                 from /var/ssd/ceph/src/mgr/PyModule.cc:14:
/usr/include/python3.12/sysmodule.h:13:38: note: declared here
   13 | Py_DEPRECATED(3.11) PyAPI_FUNC(void) PySys_SetArgv(int, wchar_t **);
      |                                      ^~~~~~~~~~~~~
```

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit 07773617f339a2779aa7cf910c0361c937ffe4c2)

mgr: stop using deprecated API to initialize Python

Py_SetProgramName() is deprecated since CPython 3.11, see
https://docs.python.org/3/c-api/init_config.html .
`Py_InitializeFromConfig()` and friends were introduced by CPython 3.8,
but we still need to support CPython 3.6 which is shipped by CentOS8.
so we have to be backward compatible with the older Python versions.

so let's use new machinary to initialize the Python interpretor, since
the minimal supported Python version is now CPython 3.9 which comes with
CentOS 9.

this change addresses following compiling warning:

```
[428/753] Building CXX object src/mgr/CMakeFiles/ceph-mgr.dir/PyModuleRegistry.cc.o
/var/ssd/ceph/src/mgr/PyModuleRegistry.cc: In member function ‘void PyModuleRegistry::init()’:
/var/ssd/ceph/src/mgr/PyModuleRegistry.cc:49:20: warning: ‘void Py_SetProgramName(const wchar_t*)’ is deprecated [-Wdeprecated-declarations]
   49 |   Py_SetProgramName(const_cast<wchar_t*>(WCHAR(MGR_PYTHON_EXECUTABLE)));
      |   ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/python3.12/Python.h:94,
                 from /var/ssd/ceph/src/mgr/PyModule.h:22,
                 from /var/ssd/ceph/src/mgr/PyModuleRegistry.h:18,
                 from /var/ssd/ceph/src/mgr/PyModuleRegistry.cc:14:
/usr/include/python3.12/pylifecycle.h:37:38: note: declared here
   37 | Py_DEPRECATED(3.11) PyAPI_FUNC(void) Py_SetProgramName(const wchar_t *);
      |                                      ^~~~~~~~~~~~~~~~~`
```

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit 4cf9b36c66fd64f2dc50f4e1acca3fe93e29b3f2)

cmake: bump up required Python3 to 3.9

since we've dropped the support of CentOS 8 in favor of CentOS 9, and
the minmum Python3 version used by the suppored distros are 3.9. let's
bump up the Python3 version to 3.9. as we are going to remove the code
for older versions like Python 3.6 and 3.8 backward compatibility.

Refs https://tracker.ceph.com/issues/66399
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit 51f71fc17add4a5ed7bc10d35e01fdb90ef11aa0)

do_cmake: use Python 3.12 on ubuntu >= 24

the "official" Python shipped along with Ubuntu 24.04 (Noble Numbat) is
Python 3.12. And some of our building have been upgraded to Ubuntu
24.04. But we are still using Python 3.10 on Ubuntu >= 22, this breaks
the build. And CMake fails like:

```
CMake Error at /usr/share/cmake-3.28/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find Python3 (missing: Python3_EXECUTABLE Python3_INCLUDE_DIRS
  Python3_LIBRARIES Interpreter Development Development.Module
  Development.Embed) (Required is exact version "3.10")

      Reason given by package:
          Interpreter: Wrong version for the interpreter "/bin/python3"

Call Stack (most recent call first):
  /usr/share/cmake-3.28/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
  cmake/modules/FindPython/Support.cmake:3863 (find_package_handle_standard_args)
  cmake/modules/FindPython3.cmake:545 (include)
  CMakeLists.txt:597 (find_package)
```

This build failure should also happen on developers who build Ceph on
Ubuntu >= 24.

In this change, we use Python 3.12 on Ubuntu >= 24

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit 4df368381d3a7c09cdc4859eb52b5d29d206aa5a)

spdk: update spdk submodule to include fix for rocky10 linker error

Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit 5a155c916dd121aea4131503f2447c15a385a1a8)

spdk: update spdk submodule to fix build with newer glibc

Pick up a change that introduced CONFIG_HAVE_ARC4RANDOM to allow
building with glibc 2.36 and newer.

Fixes: https://tracker.ceph.com/issues/67843
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 77dd0027baf5feebfc05049a584e0e014a98e112)

Merge pull request #67185 from kshtsk/wip-74593-squid

squid: qa/workunits/rgw: drop netstat usage

qa/workunits/rgw: drop netstat usage

The `netstat` is deprecated now in modern Linux and usually
requires an extra package dependency to be installed.
Usually it is `net-tools`, however, for example, opensuse,
`netstat` does not present in it. Thus, let us use `ss` as
an alternative.

When using `netstat -nltp` we get lines like:
'tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN 25156/valgrind.bin \ntcp6 0 0 :::443 :::* LISTEN 25156/valgrind.bin \n'
When using `ss -nltp` we get lines like:
'LISTEN 0 4096 0.0.0.0:443 0.0.0.0:* users:(("memcheck-amd64-",pid=66045,fd=72))'
so we need to filter processes by `memcheck`. However further
parsing code works equivalently as for netstat.

Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@clyso.com>
(cherry picked from commit 82063f99024a8937dfa105e0828beda1bc730247)

Merge pull request #60567 from k0ste/wip-68781-squid

squid: osd: add clear_shards_repaired command

Reviewed-by: Ronen Friedman <rfriedma@redhat.com>

Merge pull request #67074 from idryomov/wip-74513-squid

squid: qa: krbd_blkroset.t: eliminate a race in the open_count test

Reviewed-by: Ramana Raja <rraja@redhat.com>

Merge pull request #67076 from idryomov/wip-74529-squid

squid: qa: don't assume that /dev/sda or /dev/vda is present in unmap.t

Reviewed-by: Ramana Raja <rraja@redhat.com>

Merge pull request #66518 from aclamk/aclamk-ifed-fix-70390-squid

squid: os/bluestore: compact patch to fix extent map resharding

qa: don't assume that /dev/sda or /dev/vda is present in unmap.t

Instead of hard-coding the block device name, use the block device that
is backing the filesystem that the test is running on. We can be quite
sure it won't be an RBD device ;)

Fixes: https://tracker.ceph.com/issues/74529
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 2b5f0f4e7396114f9944a4987c38e18d4ecfbb1f)

qa: krbd_blkroset.t: eliminate a race in the open_count test

Even at QD=1, dd may take less than 10 seconds to work its way to the
end of a 10M image, producing "No space left on device" error instead
of the expected "Operation not permitted" error which is supposed to
arise from the device getting marked read-only while opened.

Fixes: https://tracker.ceph.com/issues/74513
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 006e47e9ca691deb377fb76f7a23b6feec874865)

Merge pull request #66389 from Naveenaidu/wip-70082-squid

squid: mgr/telemetry: add stretch cluster data

Reviewed-by: Yaarit Hatuka <yhatuka@ibm.com>

Merge pull request #66970 from bluikko/wip-doc-2026-01-19-fix-63073-to-squid

squid: doc/cephadm: remove sections that do not not apply to Squid in rgw.rst

doc/cephadm: remove sections not apply to Squid in rgw.rst

4949311 backported changes that do not apply to Squid.
PR #63073 body and the commit referenced therein as cherry-pick do not
correspond to the diff. Remove the additions that do not apply to Squid:

- Wildcard SAN feature in 3c24753 only since Tentacle.
- Shutdown delay feature in b84bb72 only since Tentacle.

The third feature doc addition is valid, d620ba6 was backported to Squid
in PR #61350 for disable multisite sync traffic, commit 59b3f28. This
backport cherry-picked only the feature addition and missed the docs
commit 8878619. Leave this section in.

Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>

Merge pull request #65363 from ifed01/wip-ifed-fix-snapdiff-fragment-squi

squid: mds: fix snapdiff result fragmentation

Merge pull request #65602 from kotreshhr/wip-73130-squid

squid: cephfs-journal-tool: Journal trimming issue

Merge pull request #65267 from batrick/wip-72277-squid

squid: mds: include auth credential in session dump

Merge pull request #64886 from vshankar/wip-72390-squid

squid: mds/MDSDaemon: unlock `mds_lock` while shutting down Beacon and others

Merge pull request #60398 from rishabh-d-dave/wip-68621-squid

squid: mon,cephfs: require confirmation when changing max_mds on unhealthy cluster

Merge pull request #65785 from NitzanMordhai/wip-71315-squid

squid: memory lock issues causing hangs during connection shutdown

Merge pull request #66797 from bluikko/wip-doc-revert-64033-from-squid

squid: Revert "doc: mgr/dashboard: add OAuth2 SSO documentation"

doc: Revert "doc: mgr/dashboard: add OAuth2 SSO documentation"

This reverts commit 2af5800f5a20ecc1fd592e024a8d03806ab67f89.

The dashboard OAuth2.0 feature was released in Tentacle.

Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>

Merge pull request #66739 from tchaikov/squid-backport-pr-66732

squid: debian/control: add iproute2 to build dependencies

Merge pull request #66707 from tchaikov/squid-backport-pr-66700

squid: mgr/dashboard: update teuth_ref hash in api test

debian/control: add iproute2 to build dependencies

Test scripts like qa/tasks/cephfs/mount.py expect the ip command to be
available in the container environment. Without it, tests fail with:

```
  /bin/bash: line 1: ip: command not found

  File "/ceph/qa/tasks/cephfs/mount.py", line 96, in cleanup_stale_netnses_and_bridge
    p = remote.run(args=['ip', 'netns', 'list'],
  ...
  teuthology.exceptions.CommandFailedError: Command failed with status 127: 'ip netns list'
```

Add iproute2 to the debian package build dependencies when the
<pkg.ceph.check> build profile is enabled. This ensures the package is
available during container-based builds, since buildcontainer-setup.sh
→ script/run-make.sh → install-deps.sh → debian/control → generated
dependency package chain respects build profiles configured via
`FOR_MAKE_CHECK` and `WITH_CRIMSON` environment variables set in
Dockerfile.build.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>
(cherry picked from commit 599922aa582bbaa6fa8c8e274b780fabafb10a9b)

mgr/dashboard: update teuth_ref hash in api test

update the hash to the latest commit where Kefu addressed the distutils
error.

Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 36fb920c5e88f7da24d0c7289d7e6bafd8b367d2)

Merge pull request #66668 from ceph/apt-mirror-squid

squid: install-deps: Replace apt-mirror

install-deps: Replace apt-mirror

apt-mirror.front.sepia.ceph.com has happened to always work because we set up CNAMEs to gitbuilder.ceph.com.

That host is making its way to a new home upstate (literally and figuratively) so we'll get rid of the front subdomain since it's publicly accessible anyway and add TLS while we're at it.

Signed-off-by: David Galloway <david.galloway@ibm.com>
(cherry picked from commit 0b0c73ad860b20912c862b5376057153a5adab40)

Merge pull request #66289 from henrichter/wip-73704-squid

squid: rgw: beast add ssl hot-reload

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #62055 from k0ste/wip-68956-squid

squid: mds: session in the importing state cannot be cleared if an export subtree task is interrupted while the state of importer is acking

Merge pull request #64954 from batrick/wip-72515-squid

squid: mds: skip charmap handler check for MDS requests

Merge pull request #65449 from rishabh-d-dave/wip-70174-squid

squid: qa/cephfs: ignore warning that pg is stuck peering for upgrade jobs

Merge pull request #66471 from joscollin/wip-73879-squid

squid: cephfs: fix monclient not subscribed monmap/config

Merge pull request #66472 from joscollin/wip-73872-squid

squid: cephfs: MDCache request cleanup

Merge pull request #66473 from joscollin/wip-73870-squid

squid: client: account for mixed quotas in statfs

Merge pull request #64747 from kshtsk/wip-72330-squid

squid: qa/tasks/ceph_manager: population must be a sequence

Merge pull request #66165 from VinayBhaskar-V/wip-73738-squid

squid: rbd-mirror: allow incomplete demote snapshot to sync after rbd-mirror daemon restart

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>

os/bluestore: enforce extent split on shard boundary

Partially fixes: https://tracker.ceph.com/issues/70390

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 5beee2ad46cfeb8ffc70d106c1180f531e455e3e)
(cherry picked from commit 0611ed6d8980b8b8839bb0d6c7af07b598fcc089)

Conflicts:
src/os/bluestore/BlueStore.cc

The conflict was not a logical one, more like stemming from
refactor that changed "e"->"extent".

os/bluestore: Fix dirty_range in BlueStore::_do_remove

dirty_range used to have length = 1 byte.
This is good if whole extent is inside shard.
But this has proven not to be the case.
dirty_range(offset, length) is slower only when it crosses shard.

Partially fixes: https://tracker.ceph.com/issues/70390

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
(cherry picked from commit 4f566eaf6c4646e513ea6747c7df17383d8716e2)
(cherry picked from commit d6c61326a125f8bd278ec1c656d673e53edf47cd)
(cherry picked from commit 37248077f4550c85258b98f184193101a02dae0e)

os/bluestore: Fix reshard on spanning blobs

Make sure that spanning blobs are not allowed to have extents crossing
shard boundary.

Partially fixes: https://tracker.ceph.com/issues/70390

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
(cherry picked from commit ce05ade7980397cad48e8fc78bebc839c76ba327)
(cherry picked from commit 0f5e240e49a3a16b611fc80cf6ca06cfd8b1b303)
(cherry picked from commit c081b9eae7a7082dbc86b8d50a7044bc085729c5)

qa/cephfs: ignore warning that pg is stuck peering for upgrade jobs

Health warning "pg .* is stuck peering" is seen while Ceph cluster is
under the upgrade process during fs/upgrade QA job. Being an expected
warning, it should be added to the ignorelist.

And besides this one, we already ignore more severe warnings ("pg is
stuck inactive" and "pg is degrarded") for fs/upgrade jobs.

Fixes: https://tracker.ceph.com/issues/70023
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 9748de76e02254c6dc284dcc20ec5d5761760dcb)

Conflicts:
qa/cephfs/overrides/pg_health.yaml
- Line before the point where the patch was to be applied is different
comapred to main branch.

test: Add statfs test case for mixed quotas

Signed-off-by: Christopher Hoffman <choffman@redhat.com>
(cherry picked from commit 2b057ec7bb40855e3be3cb0de12b63f8c10b450e)

client: account for mixed quotas in statfs

In statfs, when the quota root for a dir is discovered,
it uses that dir to base values for max_files and max_bytes.

This can be an issue when a dir is found with only one of two potential quota
fields. Take for instance, a dir with only max_files set and parent dir
has only max_bytes set. During a statfs call, it will then use the max_files
value for provided dir, but does not have a value for max_bytes. In this case,
this behavior will cause the size of the filesystem to be displayed.

Instead, find the quota root for max_files and max_bytes separately. This will
allow for mixed quotas to inherit missing values from its parent. In the above
example, max_files from current dir and max_bytes from parent dir will be
displayed.

Fixes: https://tracker.ceph.com/issues/73487
Signed-off-by: Christopher Hoffman <choffman@redhat.com>
(cherry picked from commit dd02ea9b18502b87ce815eba4286ae3516e334b3)

mds: MDCache: check validity of mdr requests before dispatching

Ignore null requests

Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@cern.ch>
(cherry picked from commit 75cd8c074f37de2a492177c54b3ef1879ab87637)

mds: MDCache request cleanup handles potential null mdr

In cases where there is a single element in a batch_op_map,new_batch_head
is a nullptr, when this is retried at Finisher we'd hit one of the asserts when
dereferencing

Fixes: https://tracker.ceph.com/issues/70769
Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@cern.ch>
(cherry picked from commit e63f8cc54d03dbdd147cdd2c301adef119a640da)

cephfs: make sure mon authenticate before objecter start

Signed-off-by: Shaohui Wang <wangshaohui.0512@bytedance.com>
(cherry picked from commit 1de46f335ea21c7369c67a021da79f3c7e929e66)

tests: add a test case for cephfs SingletonClient

In SingletonClient::init(), objecter->start() called before
monc->authenticate(), it makes conns of monc authencated before
monc->authenticate() called if mons reply faster, in this case,
monc will not subsribe monmap/config.

Signed-off-by: Shaohui Wang <wangshaohui.0512@bytedance.com>
(cherry picked from commit 8cce3277edcb819e5e61a67948f35e5c5358379d)

Conflicts:
src/test/client/CMakeLists.txt
- syncio.cc and fscrypt_conf.cc not backported to squid

qa/workunit: update telemetry quincy/reef workunits with "basic_stretch_cluster" collection

Note, this is not a clean cherry pick. The 4dac20e updated the
`test_telemetry_reef_x.sh` and `test_telemetry_squid_x.sh` upgrade
workunits. These upgrade workunits test the upgrade of a cluster from
reef and squid (X-2) releases to the X version of cluster.

Since we are cherry picking the commit to squid (X release), we would
instead have to update the workunit files of quicy and reef i,e the
(X-2) releases.

Signed-off-by: Naveen Naidu <naveen.naidu@ibm.com>
(cherry picked from commit 4dac20e8987e271e4d92a649a6812b655097c6e1)

mgr/telemetry: add stretch_mode information

Stretch Mode information helps us learn how deployments are done
for stretch clusters.

We add a basic_stretch_cluster collection fo the "basic" channel
for this purpose.

Fixes: https://tracker.ceph.com/issues/67812
Signed-off-by: Naveen Naidu <naveen.naidu@ibm.com>
(cherry picked from commit 6472b6b9f94affb96be341c9d595e543d734f30b)

Merge pull request #66357 from k0ste/wip-70542-squid

squid: os/bluestore: Disable invoking unittest_deferred

Reviewed-by: Igor Fedotov <igor.fedotov@croit.io>

osd: add clear_shards_repaired command

This command will allow us to clear the OSD_TOO_MANY_REPAIRS alert
by setting the shard repair count to 0. This will help in cases where
the alert was a false positive, or a condition that has since cleared
at the disk level. Often, zeroing out the repair count is
better than muting the alert or restarting the OSD.

Fixes: https://tracker.ceph.com/issues/54182
Co-authored-by: David Zafman <dzafman@redhat.com>
Signed-off-by: Daniel Radjenovic <dradjenovic@digitalocean.com>
(cherry picked from commit 78d6bfe54c3b9b60fab36a640b1ce77c8f022fa9)

osd: remove unnecessary return statements

Signed-off-by: Daniel Radjenovic <dradjenovic@digitalocean.com>
(cherry picked from commit b01453b1c1b8e9034a17d05ce6ed41102c8d9c65)

mds: skip charmap handler check for MDS requests

The MDS uses a rename request to move the primary link to a remote link. For
these requests, there will be no session.

Fixes: https://tracker.ceph.com/issues/72349
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit ca35c0de42861352bee4cfaeda6e83a7aa0bd094)

qa: test for charmap handling on reintegration

The MDS uses a rename request to move the primary link to a remote link. For
these requests, there will be no session.

Fixes: https://tracker.ceph.com/issues/72349
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
(cherry picked from commit 5d983a46b010144b71f8c5d092d30a4942d8e6e7)

os/bluestore: Disable invoking unittest_deferred

There is no value in invoking unittest_deferred expect via
run_test_deferred.sh script.

Fixes: https://tracker.ceph.com/issues/68718
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
(cherry picked from commit 163d8297bd566d25a6897f3b9be0d97f95c3c12b)

mds: client is evicted when an export subtree task is interrupted

The importer will force open some sessions provided by the exporter but the client does not know about
the new sessions until the exporter notifies it, and the notifications cannot be sent if the exporter
is interrupted. The client does not renew the sessions regularly that it does not know about, so the client
will be evicted by the importer after `session_autoclose` seconds (300 seconds by default).

The sessions that are forced opened in the importer need to be closed when the import process is reversed.

Signed-off-by: Zhansong Gao <zhsgao@hotmail.com>
(cherry picked from commit 00b0711188f34ef4ea5c31f39bc70cf1fafbd907)

qa: add test for importer's unexpected client eviction after an export subtree task is interrupted

Signed-off-by: Zhansong Gao <zhsgao@hotmail.com>
(cherry picked from commit f23bd5d0995e4e52e0ac43c7e8a112cd2faf9f27)

mds: session in the importing state cannot be cleared if an export subtree task is interrupted while the state of importer is acking

The related sessions in the importer are in the importing state(`Session::is_importing` return true) when the state of importer is `acking`,
`Migrator::import_reverse` called by `MDCache::handle_resolve` should reverse the process to clear the importing state if the exporter restarts
at this time, but it doesn't do that actually because of its bug. And it will cause these sessions to not be cleared when the client is
unmounted(evicted or timeout) until the mds is restarted.

The bug in `import_reverse` is that it contains the code to handle state `IMPORT_ACKING` but it will never be executed because
the state is modified to `IMPORT_ABORTING` at the beginning. Move `stat.state = IMPORT_ABORTING` to the end of import_reverse
so that it can handle the state `IMPORT_ACKING`.

Fixes: https://tracker.ceph.com/issues/61459
Signed-off-by: Zhansong Gao <zhsgao@hotmail.com>
(cherry picked from commit 057c5b1610c11ad8cc6d0cde43bee1306228275b)

qa: add test for importer's session cleanup after an export subtree task is interrupted

Signed-off-by: Zhansong Gao <zhsgao@hotmail.com>
(cherry picked from commit eccaf85294ae80bb76b75f30d74957c6bf03745b)

mds: the assert should be before the journal entry submit otherwise it's racy

Signed-off-by: Zhansong Gao <zhsgao@hotmail.com>
(cherry picked from commit 11a4303d66fa0355c890a478b33ccc90ee68f6d3)

mds: add `importing_count` to session dump

Signed-off-by: Zhansong Gao <zhsgao@hotmail.com>
(cherry picked from commit 79a33025d506c9a90520633492285bc047ef31f5)

Merge pull request #66003 from cbodley/wip-73598-squid

squid: rgw: fix 'bucket rm --bypass-gc' for copied objects

Reviewed-by: Krunal Chheda <kchheda3@bloomberg.net>

Merge pull request #66152 from ivancich/wip-73747-squid

squid: rgw: fix `radosgw-admin object unlink ...`

Reviewed-by: Krunal Chheda <kchheda3@bloomberg.net>
Reviewed-by: Jane Zhu <jzhu116@bloomberg.net>

rgw: fix 'bucket rm --bypass-gc' for copied objects

the `--bypass-gc` argument to `radosgw-admin bucket rm` causes us to
call `RadosBucket::remove_bypass_gc()`, which loops over the tail
objects and removes each with `RGWRados::delete_raw_obj_aio()`

however, this was removing the objects with `cls_rgw_remove_obj()`,
which is for head objects, not tails. tail objects must be removed with
`cls_refcount_put()`, which preserves them until the last copy is
removed

rename `delete_raw_obj_aio()` to `delete_tail_obj_aio()` to clarify its
purpose

Fixes: https://tracker.ceph.com/issues/73348
Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 1fba459071da9f7ec13defe2c666f0df8174c8da)

Merge pull request #66104 from shraddhaag/wip-73694-squid

squid: tasks/cbt_performance: Tolerate exceptions during performance data up…

tasks/cbt_performance: Tolerate exceptions during performance data updates

If an exception occurs during the POST request to update CBT performance,
log the error instead of failing the entire job. This ensures that
intermittent update failures do not block the main workflow.

Fixes: https://tracker.ceph.com/issues/68843
Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>
(cherry picked from commit b47880f82de436776acab7ff13fb5e6496e49170)

Merge pull request #66033 from thuvh/bug-kafka-expected-sizes

squid: rgw-test: fix bug kafka unexpected keyword argument 'expected_sizes'

rgw: fix `radosgw-admin object unlink ...`

The unlink subcommand did not handle unsharded bucket indices
appropriately. These are when the number of shards listed in the
bucket instance object is 0. In that case there will actually be 1
shard.

When number of shards as 0 is passed into the function that maps
object names to shards, it returns -1. And that was not handled
properly. That is now fixed.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
(cherry picked from commit 9eeb71e28526df8ce2b920d4f50763a734264416)

squid: rgw-testing: fix unexpected keyword argument 'expected_sizes' for kafka test

Signed-off-by: Hoai-Thu Vuong <thuvh87@gmail.com>

Merge pull request #66242 from kshtsk/wip-73816-squid

squid: rgw: update keystone repo stable branch to 2024.2

Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>

Merge pull request #66251 from kshtsk/wip-73582-squid

squid: qa/tasks/workunit: fix no module named 'pipes'

Merge pull request #65922 from guits/wip-73514-squid

squid: ceph-volume: use udev data instead of LVM subprocess in get_devices()