Casey Bodley [Wed, 26 Jun 2024 14:52:37 +0000 (10:52 -0400)]
rgw: fix multipart get part when count==1
the RGWObjManifest for multipart uploads is subtly different when
there's only a single part. in that case, get_cur_part_id() for the
final rule returns 1 where it otherwise returns (parts_count + 1)
this caused two problems:
* we returned a parts_count of 0 instead 1, and
* the do-while loop got stuck in an infinite loop expecting the last
rule's part id to be higher than the requested part id
Casey Bodley [Mon, 20 Feb 2023 22:27:28 +0000 (17:27 -0500)]
rgw/rados: RadosReadOp::prepare only updates object instance
when called on a versioned object, prepare() may follow olh and look up
a different object instance
but when called on a multipart part, we should not overwrite the
original object name with the part's object name (of the form
mymultipart.2~_XLFNqOW0NuiALg7q4-Hi_7hdtAkZUH.1)
Casey Bodley [Mon, 20 Feb 2023 13:08:01 +0000 (08:08 -0500)]
rgw/rados: add get_obj_state() overload for RGWObjStateManifest
add an overload to expose the manifest storage to callers of
get_obj_state(). the existing RGWObjState+RGWObjManifest overload
just calls the RGWObjStateManifest one
Zac Dover [Mon, 24 Mar 2025 12:26:11 +0000 (22:26 +1000)]
src/common: add guidance for deep-scrubbing ratio warning
Add an explanation of how to set the value of
"mon_warn_pg_not_deep_scrubbed_ratio" to the confval definition of that
variable. Although this variable contains the string "mon", it is set on
the Manager. I have added a note to direct users to set this value on
the Manager.
This issue was pointed out by Petr Tlapa on Slack in late March of 2025.
Nitzan Mordechai [Thu, 20 Feb 2025 07:37:45 +0000 (07:37 +0000)]
LogMonitor: set no_reply for forward MLog commands
On streach mod clusters we can see slow ops when
removing and adding osds with --zap --force when osds
connected to peon monitor and forwarding the MLog to leader.
the no_reply is set only when we are connected to the leader,
this fix will add also the other option - so no_reply set anyway.
when extending the log, the sequence was left on a bad state because it would first create a transaction to update with the current seq number but leave the "real" transaction with the same sequence number which should be `extend_log_transaction.seq + 1`.
This commit fixes documentation about many-to-many topic relationship for notifications. The current sentence states the same fact twice instead of clarifying.
John Mulligan [Tue, 18 Mar 2025 19:56:25 +0000 (15:56 -0400)]
reef: mgr/diskprediction_local: avoid more mypy errors
Similar to c4111033172db28c4737e8438f27901811919ce4 this patch
suppresses mypy errors in the diskprediction_local mgr module.
I probably put the magic comment on more lines than needed but
mypy does not have a block-comment method to suppress checking
for just a region of code today.
This patch is not a backport as the issue is only impacting
reef CI jobs and so it is applied directly to the reef branch.
Signed-off-by: John Mulligan <phlogistonjohn@asynchrono.us>
Samuel Just [Thu, 13 Feb 2025 04:16:47 +0000 (04:16 +0000)]
dmclock/.../dmclock_server: do not clean clients with requests
PriorityQueueBase::do_clean() shouldn't remove ClientRec instances which
still have queued requests. Otherwise, very low priority clients might
end up having requests actually lost, which shouldn't be possible.
In the OSD, this resulted in PGRecovery items being lost if queued with
background_best_effort while expanding a cluster. Such items can
legitimately sit in the queue for a long period of time as they
represent background data migration which is allowed to be starved by an
aggressive client workload. Dropping the items broke an assumption in
the OSD that all items enqueued would eventually be dequeued resulting
in resources being leaked.
Samuel Just [Thu, 13 Feb 2025 03:54:28 +0000 (03:54 +0000)]
test/osd/TestMClockScheduler: create_item should pass prio < cutoff
Cutoff is set to 12, so let's pass something < 12 rather than 12.
Comments in some tests suggest that the intent is for create_item
to create things in the mclock queue rather than the high_queue.
Samuel Just [Thu, 13 Feb 2025 02:55:27 +0000 (02:55 +0000)]
test/osd/TestMClockScheduler: add test for very slow dequeue
Related: https://tracker.ceph.com/issues/61594 Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit b35589f7eb39e6bfabe7df1c55281f41925eca61)
John Mulligan [Thu, 13 Mar 2025 11:59:42 +0000 (07:59 -0400)]
script: ensure curl is always available in build containers
Ensure that curl is installed in all build containers regardless of
ceph's dependencies or other factors. This allows us to use curl in
any subsequent build steps/scripts.
Fixes: https://tracker.ceph.com/issues/70451 Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit b4e11f75bfa76036b9109485aa1cb4f9d633c8a2)
Conflicts:
src/pybind/mgr/dashboard/frontend/package-lock.json (conflicts
with typescript package version, kept the existing one)
src/pybind/mgr/dashboard/frontend/package.json (conflicts with
typescript package version, kept the existing one)
src/pybind/mgr/dashboard/frontend/src/app/ceph/rgw/rgw-multisite-migrate/rgw-multisite-migrate.component.ts (conflicts with automated system user creation in main)
src/pybind/mgr/dashboard/frontend/src/app/shared/forms/cd-validators.ts (conflicts with oauthAddressTest validator)
Laura Flores [Fri, 7 Mar 2025 06:22:00 +0000 (06:22 +0000)]
mon, osd: add command to remove invalid pg-upmap-primary entries
The current rm-pg-upmap-primary command checks that the pgid exists
in the pgmap before continuing to remove it. Due to https://tracker.ceph.com/issues/66867,
some invalid pg-upmap-primary entires may exist for pools that have been removed.
Currently, these mappings are impossible to remove since the pgids no longer
exist in the pgmap.
This new command, rm-pg-upmap-primary-all, allows users the ability to remove
any and all pg-upmap-primary mappings in the osdmap at once, which includes
valid and invalid entries.
This command may also be helpful when upgrading from versions where users
are plagued by https://tracker.ceph.com/issues/61948. Users may use an upgraded
mon to remove all pg-upmap-primray entries (valid and invalid) so they continue
to upgrade to a safe version.
See manual testing for this patch here: https://tracker.ceph.com/issues/67179#note-12
Fixes: https://tracker.ceph.com/issues/67179 Fixes: https://tracker.ceph.com/issues/69760 Signed-off-by: Laura Flores <lflores@ibm.com>
(cherry picked from commit 6e9e2033bf0f4779bdfac9a3a4f29115459c8c0e)
Conflicts:
src/osd/OSDMap.cc
src/osd/OSDMap.h
The `rm_all_upmap_prims` per pool function is part of
https://github.com/ceph/ceph/commit/2953db8b58535605882dff2e1d4ff36e6075e122, which
is related to the "size optimized" read balancer feature that
is only included >= Squid.
John Mulligan [Sat, 8 Feb 2025 20:03:32 +0000 (15:03 -0500)]
container: stop deleting python generated files
Stop deleting the python generated files (pyc, pyo) that RPM packages
have installed. At some point in the misty past someone thought it would
be a good idea to remove these. This practice got carried over to the
new in-tree Containerfile. IMO this is probably due to a thought to save
space, but if that's the case then the RPMs should not be carrying them
either. Plus, not having them is going to slow python down as it needs
to compile every py file that gets loaded. Let's be consistent: if the
RPMs have pyc and pyo files then they should be in the image - if
they're bad or too big they should not be in the RPMs either, right?
This has the pleasant side effect of making `rpm -Va` inside the image
happier.
Fixes: https://tracker.ceph.com/issues/69869 Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 0f178e61de52c6a0b757f8f6937340c002e66c73)
John Mulligan [Sat, 8 Feb 2025 19:51:23 +0000 (14:51 -0500)]
container: avoid installing docs using the dnf configuration
Avoid installing docs by using the dnf configuration tsflags parameter,
passing the nodocs flag. This tells dnf and rpm not to install
documentation, such as manpages. Stop installing the docs just to delete
them later with an `rm -rf` type command. Now the docs don't get
installed in the first place, saving space, but the rpm is happy
(`rpm -Va` no longer shows docs as 'missing').
Fixes: https://tracker.ceph.com/issues/69868 Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit bf9b8d36aba3c7a8c7a3ecfc4d00359985e745b6)
Hannes Baum [Wed, 6 Nov 2024 08:46:09 +0000 (09:46 +0100)]
mgr: fix subuser creation via dashboard
Subusers couldn't be created through the dashboard, because the get call was overwritten with Python magic due to it being the function under the HTTP call.
The get function was therefore split into an "external" and "internal" function, whereas one
can be used by functions without triggering the magic. Since the user object was then returned correctly, json.loads could be removed.
This test deals with enabling/disabling the modules. The assumption I
have is after enabling the
module test will wait for an active mgr but its not able to find it in
time and it fails. so taking inspiration from https://github.com/ceph/ceph/pull/58995/commits/6c7253be6f6fbfa6faed7a539cb78847fec04580 adding retries and logs to see if that's the case
Joshua Baergen [Wed, 18 Dec 2024 17:27:58 +0000 (10:27 -0700)]
blk/KernelDevice: Introduce a cap on the number of pending discards
Some disks have a discard performance that is too low to keep up with
write workloads. Using async discard in this case will cause the OSD to
run out of capacity due to the number of outstanding discards preventing
allocations from being freed. While sync discard could be used in this
case to cause backpressure, this might have unacceptable performance
implications.
For the most part, as long as enough discards are getting through to a
device, then it will stay trimmed enough to maintain acceptable
performance. Thus, we can introduce a cap on the pending discard count,
ensuring that the queue of allocations to be freed doesn't get too long
while also issuing sufficient discards to disk. The default value of 1000000 has ample room for discard spikes (e.g. from snaptrim); it could
result in multiple minutes of discards being queued up, but at least
it's not unbounded (though if a user really wants unbounded behaviour,
they can choose it by setting the new configuration option to 0).
ceph-volume: allow zapping partitions on multipath devices
ceph-volume refuses to zap a device if it is a partition on a multipath
device due to an overly strict condition. This change ensures that only
full mapper devices (excluding partitions) are blocked from being zapped,
allowing partitions on multipath devices to be processed correctly.