Rishabh Dave [Mon, 7 Feb 2022 18:44:42 +0000 (00:14 +0530)]
monitoring: mention PyYAML only once in requirements
Following error occurs while running "sudo install-deps.sh" -
ERROR: Double requirement given: PyYAML==6.0 (from -r requirements-lint.txt (line 5)) (already in pyyaml (from -r requirements-alerts.txt (line 1)), name='PyYAML')
PyYAML is mentioned twice as a requirement. It is mentioned once in both
the following files -
monitoring/ceph-mixin/requirements-lint.txt
monitoring/ceph-mixin/requirements-alerts.txt
Nizamudeen A [Mon, 7 Feb 2022 10:53:29 +0000 (16:23 +0530)]
cephadm: change shared_folder directory for prometheus and grafana
After https://github.com/ceph/ceph/pull/44059 the monitoring/prometheus
and monitoring/grafana/dashboards directories are changed to
monitoring/ceph-mixins. That broke the shared_folders in the cephadm
bootstrap script.
Changed all the instances of monitoring/prometheus and
monitoring/grafana/dashboards to monitoring/ceph-mixins
Also, renaming all the instances of prometheus_alerts.yaml to
prometheus_alerts.yml.
Fixes: https://tracker.ceph.com/issues/54176 Signed-off-by: Nizamudeen A <nia@redhat.com>
Ilya Dryomov [Mon, 31 Jan 2022 13:08:26 +0000 (14:08 +0100)]
qa/suites/krbd: add legacy+rxbounce and crc+rxbounce coverage
For basic, rbd and rbd-nomount subsuites, replace legacy and crc
facets with "legacy or legacy+rxbounce" and "crc or crc+rxbounce"
facets (chosen at random).
For fsx, singleton and thrash subsuites, add legacy+rxbounce and
crc+rxbounce facets and drop prefer-crc facet. The expected behaviour
of the latter depends on cluster configuration and should be tested
separately.
os/BlueStore: NCB fixes recovery code with shared blobs
Replaces the BitmapAllocator used by NCB Recovery code with a dedicated SimpleBitmap.
The SimpleBitmap allows for bits to be set multiple times without any adverse effect.
This is needed beacuse shared-blobs will report the same allocation multiple times.
Fixes: https://tracker.ceph.com/issues/53678 Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
Andriy Tkachuk [Wed, 2 Feb 2022 11:25:59 +0000 (11:25 +0000)]
rgw_sal_motr: fix possible memleak on put
Currently, the MotrAtomicWriter::cleanup() is called from
MotrAtomicWriter::commit(), which may not be called at all
by rgw in case of md5 checksum failure.
Solution: call cleanup() from process() when data is zero.
rgw calls Writer::process(data, off) with zero data at the
end of the loop to allow writes to flush the data. From:
src/rgw/rgw_op.cc:RGWPutObj::execute():
Patrick Donnelly [Mon, 31 Jan 2022 22:58:55 +0000 (17:58 -0500)]
mds: add inline feature to MDS bootstrap incompat
File systems that had inline data enabled at some point would have this
bit in the CompatSet "incompat" set. This would conflict during upgrade
with the default v16.2.4 CompatSet assigned to existing (16.2.4-) MDS.
Subsequently, this would cause an assertion in FSMap::sanity during
pending map creation.
This bit will get added anyway during the upgrade process so might as
well add it to the MDS CompatSet during bootstrap.
Fixes: https://tracker.ceph.com/issues/54081 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Build jsonnet and jb in the testso that we can build ceph without
internet access and still be able to run the test needed for monitoring
using jsonnet tools.
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
spec: debian: monitoring: build jsonnet from source to use 0.18.0
As this new version is recently released it's still not in every distro
we use. We now build jsonnet from source so that we can use this new
version of jsonnet. This commit could be reverted later on when the new
version would be available everywhere.
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
mgr/dashboard: monitoring: refactor into ceph-mixin
Mixin is a way to bundle dashboards, prometheus rules and alerts into
jsonnet package. Shifting to mixin will allow easier integration with
monitoring automation that some users may use.
This commit moves `/monitoring/grafana/dashboards` and
`/monitoring/prometheus` to `/monitoring/ceph-mixin`. Prometheus alerts
was also converted to Jsonnet using an automated way (from yaml to json
to jsonnet). This commit minimises any change made to the generated files
and should not change neithers the dashboards nor the Prometheus alerts.
In the future some configuration will also be added to jsonnet to add
more functionalities to the dashboards or alerts (i.e.: multi cluster).
Fixes: https://tracker.ceph.com/issues/53374 Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
Ilya Dryomov [Sat, 29 Jan 2022 14:01:27 +0000 (15:01 +0100)]
qa/suites/rbd: add cram-based mon command API test
With mon (rbd_support mgr module in this case) command definitions
generated automatically by @CLI{Read,Write}Command decorator, it's
very easy to accidentally break the external facing API.
Ilya Dryomov [Sat, 29 Jan 2022 14:01:27 +0000 (15:01 +0100)]
mgr/rbd_support: level_spec is optional for schedule list/status
Commit fea6fdff4c74 ("mgr/rbd_support: level_spec passed to some
commands is not optional") is wrong. While it is true that a valid
level_spec is needed to create a LevelSpec instance, an empty string
is very much a valid level spec -- it signifies "all levels".
This wasn't caught because within Ceph these commands are wrapped by
rbd CLI which injects an empty string in get_level_spec_args().
Ilya Dryomov [Fri, 28 Jan 2022 22:01:08 +0000 (23:01 +0100)]
mgr/rbd_support: "trash remove" takes image_id_spec, not image_spec
Because of @CLIWriteCommand, the parameter name has to adhere to
the mon command API. Commit dcb51b067a49 ("mgr/rbd_support: define
commands using CLICommand") accidentally changed image_id_spec to
image_spec, breaking external users such as go-ceph.
Matt Benjamin [Sat, 22 Jan 2022 18:14:31 +0000 (13:14 -0500)]
rgwlc: fix compat-decoding of cls_rgw_lc_get_entry_ret
Fix compat-decode of cls_rgw_lc_get_entry_ret, which had changed
earlier in 394750597. While not initially a problem, the more
recent change to allow radosgw-admin lc process to operate on a single
bucket created a way to decode an un-upgraded structure.
Fixes: https://tracker.ceph.com/issues/53927
Reported by Jeegn Chen <jeegnchen@gmail.com>.
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
J. Eric Ivancich [Wed, 22 Dec 2021 19:45:59 +0000 (14:45 -0500)]
rgw: in bucket reshard list, clarify new num shards is tentative
With dynamic bucket index resharding, when the average number of
objects per shard exceeds the configured value, that bucket is
scheduled for reshard. That bucket may receive more new objects before
the resharding takes place. As a result, the existing code
re-calculates the number of new shards just prior to resharding,
rather than waste a resharding opportunity with too low a value.
The same holds true for a user-scheduled resharding.
A user reported confusion that the number reported in `radosgw-admin
reshard list` wasn't the number that the reshard operation ultimately
used. This commit makes it clear that the new number of shards is
"tentative". And test_rgw_reshard.py is updated to reflect this
altered output.
Additionally this commit adds some modernization and efficiency to the
"reshard list" subcommand.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>