Zac Dover [Mon, 4 Mar 2024 10:41:16 +0000 (20:41 +1000)]
doc/rados: link to pg setting commands
Link to the instructions for manually setting the number of PGs per
pool, from the mention of placement groups. These instructions are
included here in response to a request from Ronen Friedman on the
occasion of the removal of links to the PGcalc (see
https://github.com/ceph/ceph/pull/55899#pullrequestreview-1912940118).
Zac Dover [Sun, 3 Mar 2024 10:28:00 +0000 (20:28 +1000)]
doc/rados: remove PGcalc from docs
Remove mention of the "PG calc" tool from the documentation. I have
removed all mention of this in one fell swoop to help posterity restore
mention of this tool if we decide we need to do so.
The rbd-wnbd daemon currently caches one rados context per cluster.
However, it's registering hooks against the global context
admin socket, which won't be available. For this reason,
the "rbd-wnbd stats" command no longer works.
To address this issue, we'll ensure that rbd-wnbd sets command hooks
against the right admin socket instance, leveraging the image
context.
The "rbd-wnbd unmap" command is currently telling the WNBD driver
to remove the mapping without contacting the rbd-wnbd daemon
and waiting for it to perform its cleanup.
For this reason, attempting to delete the image immediately after
unmapping it can fail due to existing watchers.
As a temporary solution, we'll retry the image remove operation.
At a later time, we'll update the "rbd-wnbd unmap" command to go
through the rbd-wnbd daemon, ensuring that all the necessary
cleanup is performed before returning.
While at it, we're dropping a redundant LOG.error call so that we
won't print expected exceptions.
This commit will store the mapping config in the Windows registry
only after initializing the mapping. This ensures that we aren't
replacing the registry settings for already mapped images.
We'll also check if the registry setting was added by us before
cleaning it up.
Lucian Petrut [Mon, 12 Jun 2023 13:16:39 +0000 (13:16 +0000)]
rbd-wnbd: use one daemon process per host
We're currently using one rbd-wnbd process per image mapping.
Since OSD connections aren't shared across those processes,
we end up with an excessive amount of TCP sessions, potentially
exceeding Windows limits:
https://ask.cloudbase.it/question/3598/ceph-for-windows-tcp-session-count/
In order to improve rbd-wnbd's scalability, we're going to use
a single process per host (unless "-f" is passed when mapping the
image, in which case the daemon will run as part of the same
process). This allows OSD sessions to be shared across image
mappings.
Another advantage is that the "ceph-rbd" service starts faster,
especially when having a large number of image mappings.
Zac Dover [Fri, 1 Mar 2024 12:11:14 +0000 (22:11 +1000)]
doc/install: add manual RADOSGW install procedure
Add a manual RADOSGW installation procedure to
doc/install/manual-deployment.rst. This procedure was developed by Janne
Johansson and reported to the ceph-users mailing list on 29 Jan 2024
here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/LB3YRIKAPOHXYCW7MKLVUJPYWYRQVARU/
Co-authored-by: Janne Johansson <icepic.dz@gmail.com> Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 565bc9503838906995fa48f59debcd2843775b18)
Samuel Just [Fri, 16 Feb 2024 00:04:05 +0000 (00:04 +0000)]
unittest-seastar-socket: debug to error on unexpected return from dispatch_rw_bounded
Related: https://tracker.ceph.com/issues/64457 Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit 66969c07dc0cd5e0f01685ec19de26dae09279f5)
Zac Dover [Thu, 29 Feb 2024 08:08:10 +0000 (18:08 +1000)]
doc/glossary: improve "MDS" entry
Improve the entry for "MDS" in doc/glossary.rst by linking to the
"ceph-mds" man page and mentioning the relationship between clients and
MDS (or MDSes).
Zac Dover [Mon, 26 Feb 2024 10:03:48 +0000 (20:03 +1000)]
doc/rados: add "change public network" procedure
Add a procedure to /doc/rados/operations/add-or-rm-mons.rst that
explains how to change the public_network in a Ceph cluster deployed
with cephadm. This procedure was developed by Eugen Block, and can be
seen in its original form here:
https://heiterbiswolkig.blogs.nde.ag/2024/02/22/cephadm-change-public-network/
Casey Bodley [Mon, 26 Feb 2024 14:38:52 +0000 (09:38 -0500)]
test/rgw: increase timeouts in unittest_rgw_dmclock_scheduler
1ms sleeps are generally below the timer's resolution. increase run_for()
durations to 50ms to make the tests far less sensitive to timing. in
practice, none of the sleeps actually wait the full 50ms
Zac Dover [Fri, 23 Feb 2024 16:05:42 +0000 (02:05 +1000)]
doc/rbd: repair ordered list
Fix the numbering in an ordered list. The numbering was thrown off
because a ".. prompt" directive was improperly indented (it wasn't
indented at all).
See https://github.com/ceph/ceph/pull/55540#discussion_r1500051264
Redouane Kachach [Thu, 22 Feb 2024 09:19:06 +0000 (10:19 +0100)]
mgr/rook: adding empty calls to upgrade_ls and upgrade_status
added empty calls to upgrade_ls and upgrade_status to avoid
dashboard errors when entering the view Cluster > Upgrade. Empty
calls are used because we don't support the upgrade functionality
in rook as we do for normal Ceph deployments. In case of rook user
has to follow a different process to upgrade Ceph.
Redouane Kachach [Thu, 22 Feb 2024 09:18:28 +0000 (10:18 +0100)]
mgr/rook: removing all the code related to OSDs creation/removal Fixes: https://tracker.ceph.com/issues/64211 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
Afreen [Tue, 13 Feb 2024 10:26:09 +0000 (15:56 +0530)]
mgr/dashboard: Handle errors for /api/osd/settings
Fixes https://tracker.ceph.com/issues/62089
issue:
=====
/api/osd/settings returns "TypeError: string indices must be
integers" sometimes.
The result is coming from `osd dump` command which instead of returning
an object returns an error message which then displays error on
dashboard.
fix:
====
Added a try-catch block to handle error and updated frontend code to
handle those
Oguzhan Ozmen [Tue, 23 Jan 2024 15:25:44 +0000 (10:25 -0500)]
rgw/lc: decorating log events with more details
* some minor typos in the log event strings
* correcting the names of the owning functions in some of the log events
* adding worker index to the events in LCWorker::entry()
* adding worker index to the cycle-finished events
* adding bucket name to the interval budget expired events
* adding bucket name to the events found in RGWLC::bucket_lc_process()
* adding event to capture the end and the return code for the call to
bucket_lc_process()
When doing PG dump using 'ceph pg dump --format json-pretty'
the output is extremely big that the command hangs and also
the ceph-mgr hangs and eventuall fails over.
The exact size depends on the number of OSDs in the cluster
and the number of peers for each OSD.
In tests, it's been identified that the network ping times
is the largest component in terms of size which is removed
from the output now so as to limit the overall size.
Ronen Friedman [Mon, 12 Feb 2024 14:50:22 +0000 (08:50 -0600)]
test/osd: fix test_scrub_sched following scrubber changes
Replacing PgScrubber::determine_scrub_time() with a local copy,
as a stop-gap measure to keep the test running.
The scrub scheduling refactoring will remove the need for
this function, and the test will be updated accordingly.
Adam King [Thu, 15 Feb 2024 14:42:50 +0000 (09:42 -0500)]
Merge pull request #55566 from zdover23/wip-doc-2024-02-14-cephadm-services-nfs
doc/cephadm: correct nfs config pool name
Reviewed-by: Adam King <adking@redhat.com> Reviewed-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com> Reviewed-by: John Mulligan <jmulligan@redhat.com>
Casey Bodley [Wed, 14 Feb 2024 14:43:14 +0000 (09:43 -0500)]
rgw/putobj: RadosWriter uses part head object for multipart parts
the cleanup logic in the RadosWrite destructor was using the wrong
`head_obj` to avoid races between cleanup and part re-uploads. it
pointed at the final location of the multipart upload, rather than the
head object of the current part
Ilya Dryomov [Mon, 12 Feb 2024 12:07:22 +0000 (13:07 +0100)]
librbd: refactor merge() for SparseBufferlistExtent
- pass left.length + right.length instead of bl.length()
for consistency and to avoid circumventing the assert in
SparseBufferlistExtent constructor
- claim_append() takes an lvalue reference, no need to move
- follow the pattern used in split()
Ilya Dryomov [Mon, 12 Feb 2024 10:00:45 +0000 (11:00 +0100)]
librbd: fix split() for SparseExtent and SparseBufferlistExtent
SparseExtents and SparseBufferlist are typedefs for interval_map. In
both cases, split() handler is broken: for the former the extent isn't
actually split and for the latter incorrect bufferlist is attached to
the split extent.
Fortunately, both SnapshotDelta as produced by ObjectListSnapsRequest
and SparseBufferlist used in a couple of places seem to be collections
where only disjoint intervals are inserted and splitting doesn't occur
(at least in the common case). But still, this is a landmine waiting
for someone to step on it.