Kotresh HR [Thu, 10 Feb 2022 05:34:41 +0000 (11:04 +0530)]
mgr/volumes: Fix clone uid/gid mismatch
This is the regression caused by commit 18b85c53a.
The 'set_attrs' function sets the uid/gid of the
group to the subvolume if uid/gid is not passed.
The attrs of the clone should match the source
snapshot. Hence, don't use the 'set_attrs'
function to set only the quota attrs for the
clone.
Nizamudeen A [Tue, 8 Feb 2022 06:20:29 +0000 (11:50 +0530)]
mgr/dashboard: set appropriate baseline branch for applitools
All the dashboard PRs are checked against a baseline branch called
'default' in the visual regresstion testing. This will cause issues when
testing PRs in different branches. For eg: currently our master and
pacific has to save two different screenshots since the two of them
differ slightly.
Disabling the applitools logs as well because its too 'noisy'
Fixes: https://tracker.ceph.com/issues/54190 Signed-off-by: Nizamudeen A <nia@redhat.com>
Soumya Koduri [Thu, 3 Feb 2022 18:55:22 +0000 (00:25 +0530)]
rgw/dbstore: Handle read vs delete races
Now that tail objects are associated with objectID, they are not deleted
as part of this DeleteObj operation. Such tail objects (with no head object
in *.object.table are cleaned up later by GC thread.
To avoid races between writes/reads & GC delete, mtime is maintained for each
tail object. This mtime is updated when tail object is written and also when
its corresponding head object is deleted (like here in this case).
Ronen Friedman [Sun, 30 Jan 2022 15:23:39 +0000 (15:23 +0000)]
osd/scrub: fix unintended changes to scrub (cluster)logs
OSD logs and cluster logs are monitored by some scrub tests:
some specific strings are required to either appear or not appear in
the logs. The Scrubber backend PR has unintentionally modified some
of these logs, and here we restore the exact logs text.
Rishabh Dave [Mon, 7 Feb 2022 18:44:42 +0000 (00:14 +0530)]
monitoring: mention PyYAML only once in requirements
Following error occurs while running "sudo install-deps.sh" -
ERROR: Double requirement given: PyYAML==6.0 (from -r requirements-lint.txt (line 5)) (already in pyyaml (from -r requirements-alerts.txt (line 1)), name='PyYAML')
PyYAML is mentioned twice as a requirement. It is mentioned once in both
the following files -
monitoring/ceph-mixin/requirements-lint.txt
monitoring/ceph-mixin/requirements-alerts.txt
Nizamudeen A [Mon, 7 Feb 2022 10:53:29 +0000 (16:23 +0530)]
cephadm: change shared_folder directory for prometheus and grafana
After https://github.com/ceph/ceph/pull/44059 the monitoring/prometheus
and monitoring/grafana/dashboards directories are changed to
monitoring/ceph-mixins. That broke the shared_folders in the cephadm
bootstrap script.
Changed all the instances of monitoring/prometheus and
monitoring/grafana/dashboards to monitoring/ceph-mixins
Also, renaming all the instances of prometheus_alerts.yaml to
prometheus_alerts.yml.
Fixes: https://tracker.ceph.com/issues/54176 Signed-off-by: Nizamudeen A <nia@redhat.com>
Soumya Koduri [Thu, 13 Jan 2022 20:44:19 +0000 (02:14 +0530)]
rgw/dbstore: Use Object ID to handle racing writes
Create unique ID for each object upload which will be
atomically updated in the head object at the end. This will
prevent data corruption during concurrent writes.
Incase of Multipart Uploads, upload_id is used as ObjectID.
XXX: The stale or obsolete tail data needs to be deleted
Also addressed invalid usage of CephContext in dbstore tests.
wanghao72 [Thu, 20 Jan 2022 07:57:29 +0000 (15:57 +0800)]
rgw: CopyObject works with x-amz-copy-source-if-* headers
CopyObject api support condition headers, eg x-amz-copy-source-if-match, while radosgw miss out the 'source' keyword Fixes: https://tracker.ceph.com/issues/53945 Signed-off-by: Wang Hao <wanghao72@baidu.com>
Ilya Dryomov [Mon, 31 Jan 2022 13:08:26 +0000 (14:08 +0100)]
qa/suites/krbd: add legacy+rxbounce and crc+rxbounce coverage
For basic, rbd and rbd-nomount subsuites, replace legacy and crc
facets with "legacy or legacy+rxbounce" and "crc or crc+rxbounce"
facets (chosen at random).
For fsx, singleton and thrash subsuites, add legacy+rxbounce and
crc+rxbounce facets and drop prefer-crc facet. The expected behaviour
of the latter depends on cluster configuration and should be tested
separately.
os/BlueStore: NCB fixes recovery code with shared blobs
Replaces the BitmapAllocator used by NCB Recovery code with a dedicated SimpleBitmap.
The SimpleBitmap allows for bits to be set multiple times without any adverse effect.
This is needed beacuse shared-blobs will report the same allocation multiple times.
Fixes: https://tracker.ceph.com/issues/53678 Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
Andriy Tkachuk [Wed, 2 Feb 2022 11:25:59 +0000 (11:25 +0000)]
rgw_sal_motr: fix possible memleak on put
Currently, the MotrAtomicWriter::cleanup() is called from
MotrAtomicWriter::commit(), which may not be called at all
by rgw in case of md5 checksum failure.
Solution: call cleanup() from process() when data is zero.
rgw calls Writer::process(data, off) with zero data at the
end of the loop to allow writes to flush the data. From:
src/rgw/rgw_op.cc:RGWPutObj::execute():
Patrick Donnelly [Mon, 31 Jan 2022 22:58:55 +0000 (17:58 -0500)]
mds: add inline feature to MDS bootstrap incompat
File systems that had inline data enabled at some point would have this
bit in the CompatSet "incompat" set. This would conflict during upgrade
with the default v16.2.4 CompatSet assigned to existing (16.2.4-) MDS.
Subsequently, this would cause an assertion in FSMap::sanity during
pending map creation.
This bit will get added anyway during the upgrade process so might as
well add it to the MDS CompatSet during bootstrap.
Fixes: https://tracker.ceph.com/issues/54081 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Build jsonnet and jb in the testso that we can build ceph without
internet access and still be able to run the test needed for monitoring
using jsonnet tools.
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
spec: debian: monitoring: build jsonnet from source to use 0.18.0
As this new version is recently released it's still not in every distro
we use. We now build jsonnet from source so that we can use this new
version of jsonnet. This commit could be reverted later on when the new
version would be available everywhere.
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
mgr/dashboard: monitoring: refactor into ceph-mixin
Mixin is a way to bundle dashboards, prometheus rules and alerts into
jsonnet package. Shifting to mixin will allow easier integration with
monitoring automation that some users may use.
This commit moves `/monitoring/grafana/dashboards` and
`/monitoring/prometheus` to `/monitoring/ceph-mixin`. Prometheus alerts
was also converted to Jsonnet using an automated way (from yaml to json
to jsonnet). This commit minimises any change made to the generated files
and should not change neithers the dashboards nor the Prometheus alerts.
In the future some configuration will also be added to jsonnet to add
more functionalities to the dashboards or alerts (i.e.: multi cluster).
Fixes: https://tracker.ceph.com/issues/53374 Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
Adam Kupczyk [Wed, 2 Feb 2022 19:28:14 +0000 (20:28 +0100)]
os/bluestore/bluefs: Make volume selector operations atomic
Make all RocksDBBlueFSVolumeSelector files/extents/size tracking atomic.
It used to be synchronized by BlueFS global lock.
Now, in Fine Grain Locking era, it is necessary to prevent corruption.
Fixes: https://tracker.ceph.com/issues/53906 Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
Adam Kupczyk [Thu, 20 Jan 2022 12:44:35 +0000 (13:44 +0100)]
os/bluestore/bluefs: Code for volume selector check
Adds ability to verify that volume selector properly tracks disk usage.
Creates options:
- bluefs_check_volume_selector_on_umount
- bluefs_check_volume_selector_often
that can be used to validate that vselector does not diverge from
values it should have.