胡玮文 [Mon, 21 Jun 2021 13:31:49 +0000 (21:31 +0800)]
mgr/dashboard: fix OSD out count
Think we have 3 OSDs out but up (prepare for re-formatting to change min_alloc_size), and another OSD down but in
(during reboot). The dashboard will display "1 down, 2 out", which is obviously incorrect. It should be "1 down, 3 out"
The rgw bucket creation form has the Name field which have an async
validator. The validator calls all the bucket name and check if the
entered name is unique or not. This happens on every keystroke. So if
100 or more buckets are there, then the async validation can be real
slow and causes misvalidations in different fields.
I changed the validation logic and did some cleanups to improve the
performance of the async validation.
Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/rgw/rgw-bucket-form/rgw-bucket-form.component.ts
- Solved some import conflicts. Used the I18N import and removed the
forkJoin import
src/pybind/mgr/dashboard/frontend/src/app/shared/api/rgw-bucket.service.spec.ts
- Dont need ${RgwHelper.DAEMON_QUERY_PARAM}
src/pybind/mgr/dashboard/frontend/src/app/shared/api/rgw-bucket.service.ts
- Removed enumerate function
Aaryan Porwal [Wed, 26 May 2021 08:58:15 +0000 (14:28 +0530)]
mgr/dashboard: fix for right sidebar nav icon not clickable
fixed the responsive sidebar not opening on click event, and close sidebar on clicking tasks and notification list item because it'll be over shadowed by the sidebar Signed-off-by: Aaryan Porwal <aaryanporwal2233@gmail.com>
(cherry picked from commit 4e53a139d96215477d00eb709c1662d8277cba1d)
Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/core/navigation/navigation/navigation.component.html
- Adopt the master branch changes.
Deepika Upadhyay [Wed, 26 May 2021 09:11:55 +0000 (14:41 +0530)]
rados/cephadm/qa/distros: update to latest distros
- removes ubuntu_18.04 support for podman, instead we move to focal.
- use rhel_8.3 for all rhel_8 references
- use {centos/rhel}_8 instead of {rhel/centos}_latest: to keep things
same in master and octopus since we use: rhel_8 and centos_8 as latest
version symlinks, which differentiated after an octopus only commit.
this was not cherry picked from master as octopus had some of the
symlinks, not in sync with master, this commit does cleanup for them,
and tries to make them similar to master.
Sage Weil [Wed, 3 Mar 2021 14:14:29 +0000 (08:14 -0600)]
qa: new kubic distro files; use kubic podman for centos/rhel
The current centos/rhel version of podman (2.2.1) is broken.
- create new qa/distros/podman/* files that install kubic podman
- include centos/rhel variants
- adjust cephadm jobs to use new yaml files
- remove old qa/distros/all/*_podman.yaml files
trivial fix: we do not have cephadm/thrash suite in octopus(removed)
- distro(from octopus) renamed to 0-distro(from pacific)
Tatjana Dehler [Thu, 27 May 2021 09:46:50 +0000 (11:46 +0200)]
mgr/dashboard: show partially deleted RBDs
An RBD might be partially deleted if the deletion
process has been started but was interrupted. In
this case return the RBD as part of the RBD list
and mark it as partially deleted.
Fixes: https://tracker.ceph.com/issues/48603 Signed-off-by: Tatjana Dehler <tdehler@suse.com>
(cherry picked from commit d83c277ac1861df31d2a39d16e20c7bebbea676e)
Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/block/rbd-details/rbd-details.component.html
src/pybind/mgr/dashboard/frontend/src/app/ceph/block/rbd-list/rbd-list.component.spec.ts
src/pybind/mgr/dashboard/frontend/src/app/ceph/block/rbd-list/rbd-list.component.ts
src/pybind/mgr/dashboard/services/rbd.py
src/pybind/mgr/dashboard/tests/test_rbd_service.py
Resolved various conflicts because octopus and
master diverged a lot.
Conflicts:
src/mds/Server.cc
- most of the master commit was already backported via c5362b8464bdafbea7556acdee9e877b71ed4f8d
This backports just one small part that was missed in that commit.
Kefu Chai [Fri, 4 Jun 2021 03:25:12 +0000 (11:25 +0800)]
debian/control: ceph-mgr-modules-core does not Recommend ceph-mgr-rook anymore
per https://www.debian.org/doc/debian-policy/ch-relationships.html
> Recommends
> This declares a strong, but not absolute, dependency.
>
> The Recommends field should list packages that would be found together
> with this one in all but unusual installations.
ceph-mgr-modules-core provides a set of ceph-mgr modules which are
always enabeld. but the rook module enables ceph-mgr to install and
configure a Ceph cluster using Rook. this module is very useful but
it does not have such a strong connection with ceph-mgr-modules-core.
we can always install it separately for using better intergration with
Rook.
Sage Weil [Fri, 4 Jun 2021 17:49:40 +0000 (12:49 -0500)]
mgr/telemetry: pass leaderboard flag even w/o ident
Allow non-identified clusters to appear in the leaderboard.
The leaderboard option still defaults to false, so the change here
is that if they opt in to leaderboard but not ident we'll see
that on the backend.
Note that a leaderboard still does not exist (yet), so this doesn't
have any immediate impact. But if/when we do create one, it will
allow us to show big clusters (that opt in) on the leaderboard
as 'unidentified' or similar.
Cory Snyder [Fri, 28 May 2021 19:08:49 +0000 (15:08 -0400)]
mgr/DaemonServer.cc: prevent integer underflow that is triggered by large increases to pg_num/pgp_num
This fixes a scenario where mgrs continually crash while attempting to apply large increases to pg_num/pgp_num. The max step size (estmax) for each incremental update to the pgp_num is calculated as a percentage of the pg_num, which permits the possibility for the max step size (estmax) to be greater than the current pgp_num when the increase is large; this causes an integer underflow when the max step size is subtracted from the pgp_num in order to calculate the next step size with std::clamp. The integer underflow causes hi < lo in args passed to std::clamp, which causes a failed assertion, SIGABRT, and ultimately crashing mgr.
Jonas Jelten [Mon, 15 Mar 2021 22:21:07 +0000 (23:21 +0100)]
os/bluestore: strip trailing slash for directory listings
Calls to BlueRocksEnv::GetChildren may contain a trailing / in the
queried directory, which is stripped away with this patch.
If it's not stripped, the directory entry is not found in BlueFS:
```
10 bluefs readdir db/
20 bluefs readdir dir db/ not found
3 rocksdb: [db/db_impl/db_impl_open.cc:1785] Persisting Option File error: OK
```
Kefu Chai [Tue, 16 Mar 2021 01:38:41 +0000 (09:38 +0800)]
os/bluestore: use string_view in BlueFS
this improves the performance of lookup. when it comes to mutation
operations, std::map<> always require an instance of string, but
it only require a single instance of string for each mutation operation
in general, so this does not incurs performance regression due to
multiple copies of the same string_view object.
Kefu Chai [Tue, 16 Mar 2021 01:13:00 +0000 (09:13 +0800)]
os/bluestore: move BlueRocksEnv::split() to .cc
this helper is only used by the functions in the .cc file, and it does
not reference BlueRocksEnv member variable or methods. so move it to
an anonymous namespace.
Igor Fedotov [Mon, 17 May 2021 19:23:26 +0000 (22:23 +0300)]
os/bluestore: fix unexpected ENOSPC in Avl/Hybrid allocators.
Avl allocator mode was returning unexpected ENOSPC in first-fit mode if all size-
matching available extents were unaligned but applying the alignment made all of
them shorter than required. Since no lookup retry with smaller size -
ENOSPC is returned.
Additionally we should proceed with a lookup in best-fit mode even when
original size has been truncated to match the avail size.
(force_range_size_alloc==true)
Fixes: https://tracker.ceph.com/issues/50656 Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 0eed13a4969d02eeb23681519f2a23130e51ac59)
monitoring/grafana: Remove erroneous elements in hosts-overview Grafana dashboard
The hosts-overview Grafana dashboard json file contains a repeated element, making
it invalid JSON. Some JSON parsers handle this. However, this prevents Jsonnet
from parsing the dashboard, which prevents the deployment of this dashboard via
Jsonnet.
Deepika Upadhyay [Wed, 26 May 2021 19:18:38 +0000 (00:48 +0530)]
octopus: qa/upgrade: disable update_features test_notify with older client as lockowner
* with the recent support for async rbd operations from pacific+ when an
older client(non async support) goes on upgrade, and simultaneously
interacts with a newer client which expects the requests to be async,
experiences hang; considering the return code for request completion to
be acknowledgement for async request, which then keeps waiting for
another acknowledgement of request completion.
this if happens should be a rare only when lockowner is an old client
and should be deferred if compatibility issues arises.
* qa/upgrade: amend upgrade test workunits to use respective stable branches
Xiubo Li [Fri, 14 May 2021 02:38:49 +0000 (10:38 +0800)]
mds: place the journaler pointer under the mds_lock
When the _recovery_thread is trying to reformat the journal, it
will delete the old journal pointer and assign with a new one,
during this the mds_lock is unlocked. That means in other threads,
such as when 'flush journal', who are using the MDSLog::journaler
pointer will potentially hit use-after-free bug.
Dan van der Ster [Wed, 28 Apr 2021 12:27:17 +0000 (14:27 +0200)]
mds: completed_requests -> num_completed_requests
Rename this in the session dump so we don't collide with the
completed_requests dump in mdstypes session_info_t::dump.
Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch> Fixes: https://tracker.ceph.com/issues/50559
(cherry picked from commit 52098f197e74490d5f2329674dfa93ab5a191063)
max_misplaced with replaced by in target_max_misplaced_ratio edbd592ee44e02a5328e1510879555c2f9dcfc9e, but the document was not
sync'ed. let's update it accordingly.
In 7f047005fc72e1f37a45cde2d742bb2eb1e62881, we made the pg removal code
much more efficient. But it started marking the pgmeta object as an unexpected
onode, which in reality is expected to be removed after all the other objects.
This behavior is very easily reproducible in a vstart cluster:
ceph osd pool create test 1 1
rados -p test bench 10 write --no-cleanup
ceph osd pool delete test test --yes-i-really-really-mean-it
Before this patch:
"do_delete_work additional unexpected onode list (new onodes has appeared
since PG removal started[#2:00000000::::head#]" seen in the OSD logs.
After this patch:
"do_delete_work removing pgmeta object #2:00000000::::head#" is seen.
Related to:https://tracker.ceph.com/issues/50466 Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit 0e917f1b1e18ca9e48b3f91110d3a46b086f7d83)