Nizamudeen A [Wed, 24 Aug 2022 07:47:50 +0000 (13:17 +0530)]
mgr/dashboard: fix unable to create ingress unmanaged
the following snipped is the error from backend
```
File "/lib/python3.6/site-packages/ceph/deployment/service_spec.py", line 698, in _from_json_impl
_cls.validate()
File "/lib/python3.6/site-packages/ceph/deployment/service_spec.py", line 1058, in validate
'Cannot add ingress: No frontend_port specified')
ceph.deployment.hostspec.SpecValidationError: Cannot add ingress: No frontend_port specified
```
It looks like even if we set unmanaged flag, we need to input the
backend_service, frontend_port, monitor_port and virtual_ip, because there is a
validation going for that in the backend.
Image contexts are reopen even though we pass the context as an
argument. This commit changes that so you can forget about reopening
a rbd image context again.
mgr/dashboard: add rbd list search and disable sorting
- Disable sorting in each column because it will not be possible to
sort with this pagination implementation.
- Add search capabilities to the rbd list pagination endpoint.
Adam King [Tue, 2 Aug 2022 20:29:04 +0000 (16:29 -0400)]
mgr/cephadm: recreate osd config when redeploy/reconfiguring
OSDs have a config file that includes addresses for the mon daemons.
We already have in place logic to cause a reconfig of OSDs if the mon map
changes, but when we do we aren't actually regenerating the config
so it's never updated with the new mon addresses. This change is to
have us recreate the OSD config when we redeploy or reconfig an OSD
so it gets the new mon addresses.
Adam King [Wed, 17 Aug 2022 20:54:54 +0000 (16:54 -0400)]
cephadm: return nonzero exit code when applying spec fails in bootstrap
This is mostly useful for testing automation, but right now if applying the
spec provided with --apply-spec fails, the return code remains zero. We don't
want to error out entirely in that case as we still want to print the remaining
output (e.g. the dashboard password). Continuing onward and then returning a
nonzero code could provide a balance where we still give all the output but
still have something to make it easier for those writing automation around bootstrap.
Adam King [Fri, 29 Jul 2022 20:10:09 +0000 (16:10 -0400)]
mgr/cephadm: fix handling of draining hosts with explicit placement specs
Basically, if you have a placement that explicitly defines the hosts
to place on, and then add _no_schedule label to one of the hosts (which
should cause all daemons to be removed from the host) cpehadm will simply
fail to apply the spec, saying the host with the _no_schedule label is "Unknown".
This is due to the fact that we remove hosts with the _no_schedule label from
the pool of hosts the scheduler has to work with entirely. If we also provide
the scheduler with a list of currently draining hosts, it can handle this
better and the daemon can be drained off the host as expected.
To be able to recreate and test pg log duplicate entries, a new option
added to the COT: --op pg-log-inject-dups we will also need to provide
--file json_arry of dups, it can get as many dups that need to be inject
the json for dups is in the following format:
{"reqid": "client.n.n:n", "version": "n'n", "user_version": n, "return_code": n}
Prashant D [Tue, 31 May 2022 05:07:47 +0000 (06:07 +0100)]
osd/scrub: Reintroduce scrub starts message
Whenever the scrubbing process starts for a PG, We used to log
scrub/deep-scrub "starts" message in the cluster log. We are
not seeing scrub "starts" messages anymore. Reintroduce scrub
starts message in order to calculate exact time taken to
scrub/deep-scrub the PG.
Prashant D [Fri, 24 Jun 2022 15:10:03 +0000 (11:10 -0400)]
mgr: Remove service_daemon handling from MMgrUpdate
The service_daemon is only required if client services
needs to be registered with mgr. The service_daemon
handling is not required in case of MMgrUpdate as it
handles mon metadata updates only.
Prashant D [Mon, 28 Mar 2022 13:02:08 +0000 (14:02 +0100)]
mgr, mon: Keep upto date metadata with mgr for MONs
The mgr updates mon metadata through handle_mon_map which
gets triggered when MONs were removed/added from/to cluster or
the active mgr is restarted or mgr failsover.
We could have handled metadata update through MMgrOpen or
early MMgrReport messages but these are sent before monitor
electin completes and lead monitor updates pending metadata
in monstore. Instead of relying on fetching mon metadata using
'ceph mon metadata <id>' command, explicitly send metadata
update request with mon metadata to mgr.
Kefu Chai [Fri, 5 Aug 2022 08:40:41 +0000 (16:40 +0800)]
cmake: remove spaces in macro used for compiling cython code
we are facing following FTBFS on jammy + GCC-11.2 + Cython 0.29 +
CMake 3.22:
creating /home/jenkins-build/build/workspace/ceph-api/build/lib/cython_modules/temp.linux-x86_64-3.10/home/jenkins-build/build/workspace/ceph-api/build/src/pybind/cephfs
compile options: '-I/usr/include/python3.10 -I/usr/include/python3.10 -c'
extra options: '-Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -iquote/home/jenkins-build/build/workspace/ceph-api/src/include -w -Dvoid0=dead_function(void) -D__Pyx_check_single_interpreter(ARG)=ARG ## 0 -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2'
cc: /home/jenkins-build/build/workspace/ceph-api/build/src/pybind/cephfs/cephfs.c
cc: warning: ##: linker input file unused because linking not done
cc: error: ##: linker input file not found: No such file or directory
cc: warning: 0: linker input file unused because linking not done
cc: error: 0: linker input file not found: No such file or directory
it seems cython is not able to escape the space in the "extra options"
anymore, so the "##" and "0" are considered as object files passed to
compiler in addition to cephfs.c.
in this change the spaces are removed to help cython to make the right
decision.
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit 5824fed5b427f1d055fb7104fea2e68cd36e6844)
Resolves: rhbz#118423 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Adam C. Emerson [Fri, 8 Jul 2022 18:58:16 +0000 (14:58 -0400)]
rgw: Guard against malformed bucket URLs
Misplaced colons can result in radosgw thinking is has a bucket URL
but with no bucket name, leading to a crash later on.
Fixes: https://tracker.ceph.com/issues/55765 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 3ee9a3b41a289a926fed8b8927ca2a93b4f120a6) Fixes: https://tracker.ceph.com/issues/56585 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit f0da8c00a849ab739ba2adfa600616ce921a2055)
Resolves: rhbz#118423 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Conflicts:
src/rgw/rgw_sal.h
- `unique_ptr` overload of empty
Fixes: https://tracker.ceph.com/issues/56585 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit e40ce4a3511e669e761da5f39d81b14e6cdbdeba)
Resolves: rhbz#118423 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Adam King [Mon, 23 May 2022 19:57:14 +0000 (15:57 -0400)]
mgr/cephadm: store device info separately from rest of host cache
device info tends to take up the most space out of
everything, so the hope is by giving it its own
location in the config key store we can avoid hitting
issues where the host cache value we attempt to
place in the config key store exceeds the size limit
When we are activating we may receive several service map updates
initiated by the previous active mgr. Treat them all as initial map.
The code also adds "pending_service_map_dirty == 0" assert, which we
expect is true when receiving an initial map -- otherwise we can't
just initialize pending_service_map with received map.
Fixes: https://tracker.ceph.com/issues/53210 Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Service instances was displaying only the ceph services, but with these changes it'll display instances
of cephadm services as well.
Signed-off-by: Pedro Gonzalez Gomez <pegonzal@redhat.com> Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
(cherry picked from commit 1a17bd27a08d87ef24457788542f064f4e917d54)
One-click button in the case of an orch cluster for configuring the
rbd-mirroring when its not properly setup. This button will create an
rbd-mirror service and also an rbd labelled pool(replicated: size-3) (if they are not
existing)
Fixes: https://tracker.ceph.com/issues/55646 Signed-off-by: Nizamudeen A <nia@redhat.com>
Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/core/error/error.component.html
src/pybind/mgr/dashboard/frontend/src/app/shared/services/module-status-guard.service.ts
This commits had minor conflicts where the addition of dashboardButton
had to be explicitly added and the state of api/orch/status in
module-status-guard had to be updated.
Show "No RBD pools available" error page when accessing block/rbd if there are no rbd pools.
Add a "button_name" and "button_route" property to `ModuleStatusGuardService` config to customize the button on the error page.
Modify `ModuleStatusGuardService` to execute API calls to `/ui-api/<uiApiPath>/status` which uses the `UIRouter`.
Images with 'Replaying' state will be displayed in Syncing tab. syncTmpl
removed as it was unnecessary if sate is provided from the backend.
Replaying images in contrast of Syncing images don't have a progress
percentage, nevertheless, we have an approximation of how much time left
there is until the image is fully synced. Therefore, we can use seconds_until_synced to represent the progress.
Also show the mirror-snapshot in the image where snapshot is enabled
When parsing images if an image has the snapshot mode enabled, it will
try to run commands that don't work with that mode. The solution was
not running those for now and appending the mode in the get call.
Fixes: https://tracker.ceph.com/issues/55648 Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com> Signed-off-by: Nizamudeen A <nia@redhat.com> Signed-off-by: Aashish Sharma <aasharma@redhat.com> Signed-off-by: Avan Thakkar <athakkar@redhat.com>
(cherry picked from commit 489a385a95d6ffa5dbd4c5f9c53c1f80ea179142)
(cherry picked from commit 3ca9ca7e215562912daf00ca3cca40dbc5d560b5)
mgr/dashboard:Get "Different Storage Class" metrics in Prometheus dashboard
Get metrics of the different "HDDRule" and "MixedUse" classes of the "Raw Storage" for their ceph VMs. So that Prometheus can scrape the data and display it to them in grafana
Ilya Dryomov [Fri, 12 Aug 2022 09:10:45 +0000 (11:10 +0200)]
rbd: remove incorrect use of std::includes()
- std::includes() requires sorted ranges but command specs aren't
sorted
- std::includes() purpose is to check whether the second range is
a subsequence of the first range but here the size of the second
range is always equal to the size of the first range, which means
that, had the ranges been sorted, std::includes() would have checked
straight equality
Ilya Dryomov [Fri, 12 Aug 2022 09:10:45 +0000 (11:10 +0200)]
rbd: find_action() should sort actions first
The order in which objects with static storage duration in
different TUs are initialized is undefined. If the compiler
chooses to initialize Shell::Action objects in action/Trash.cc
before Shell::Action objects in action/TrashPurgeSchedule.cc,
all "rbd trash purge schedule ..." commands get shadowed by
"rbd trash purge" command:
$ rbd trash purge schedule list
rbd: too many arguments
The confusing error arises because "rbd trash purge" takes a single
positional argument. "schedule" gets interpreted as <pool-spec> and
"list" generates an error.
osd: Handle oncommits and wait for future work items from mClock queue
When a worker thread with the smallest thread index waits for future work
items from the mClock queue, oncommit callbacks are called. But after the
callback, the thread has to continue waiting instead of returning back to
the ShardedThreadPool::shardedthreadpool_worker() loop. Returning results
in the threads with the smallest index across all shards to busy loop
causing very high CPU utilization.
The fix involves reacquiring the shard_lock and waiting on sdata_cond
until notified or until time period lapses. After this, the smallest
thread index repopulates the oncommit queue from the context_queue
if there were any additions.
libcephsqlite: ceph-mgr crashes when compiled with gcc12
regex in libcephsqlite, when compiled with GCC12 treats '-' as a range
operator resulting in the following error.
"Invalid start of '[x-x]' range in regular expression"
Kotresh HR [Fri, 4 Feb 2022 09:58:39 +0000 (15:28 +0530)]
qa: validate subvolume discover on upgrade
Validate subvolume discover on upgrade from
legacy subvolume to v1. The handcrafted
".meta" file on legacy subvolume root should
not be used for any subvolume apis like getpath,
authorize.
Kotresh HR [Fri, 4 Feb 2022 09:25:03 +0000 (14:55 +0530)]
mgr/volumes: Fix subvolume discover during upgrade
Fixes the subvolume discover to use the correct
metadata file after an upgrade from legacy subvolume
to v1. The fix makes sure, it doesn't use the
handcrafted metadata file placed in the subvolume
root of legacy subvolume.
Co-authored-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch> Co-authored-by: Dan van der Ster <daniel.vanderster@cern.ch> Co-authored-by: Ramana Raja <rraja@redhat.com> Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 7eba9cab6cfb9a13a84062177d7a0fa228311e13)
(cherry picked from commit 50f4ac7b75185c89ede210b2c975672e8bdd73aa)