Sungmin Lee [Mon, 14 Feb 2022 05:15:00 +0000 (14:15 +0900)]
test: fix TierFlushDuringFlush to wait until dedup_tier is set on base pool
When start_dedup() is called while the base pool is not set the dedup_tier,
it is not possible to know the target pool of the chunk object.
1. User set the dedup_tier on a base pool by mon_command().
2. User issues tier_flush on the object which has a manifest (base pool)
before the dedup_tier is applied on the base pool.
3. OSD calls start_dedup() to flush the chunk objects to chunk pool.
4. OSD calls get_dedup_tier() to get the chunk pool of the base pool,
but it is not possible to know the chunk pool.
5. get_dedup_tier() returns 0 because it is not applied on the base pool yet.
6. This makes refcount_manifest() lost it's way to chunk pool.
To prevent this issue, start_dedup() has to be called after dedup_tier is set
on the base pool. To do so, this commit prohibits getting chunk pool id if
dedup_tier is not set.
Fixes: http://tracker.ceph.com/issues/53855 Signed-off-by: Sungmin Lee <sung_min.lee@samsung.com>
Soumya Koduri [Thu, 3 Feb 2022 18:55:22 +0000 (00:25 +0530)]
rgw/dbstore: Handle read vs delete races
Now that tail objects are associated with objectID, they are not deleted
as part of this DeleteObj operation. Such tail objects (with no head object
in *.object.table are cleaned up later by GC thread.
To avoid races between writes/reads & GC delete, mtime is maintained for each
tail object. This mtime is updated when tail object is written and also when
its corresponding head object is deleted (like here in this case).
Ronen Friedman [Sun, 30 Jan 2022 15:23:39 +0000 (15:23 +0000)]
osd/scrub: fix unintended changes to scrub (cluster)logs
OSD logs and cluster logs are monitored by some scrub tests:
some specific strings are required to either appear or not appear in
the logs. The Scrubber backend PR has unintentionally modified some
of these logs, and here we restore the exact logs text.
Rishabh Dave [Mon, 7 Feb 2022 18:44:42 +0000 (00:14 +0530)]
monitoring: mention PyYAML only once in requirements
Following error occurs while running "sudo install-deps.sh" -
ERROR: Double requirement given: PyYAML==6.0 (from -r requirements-lint.txt (line 5)) (already in pyyaml (from -r requirements-alerts.txt (line 1)), name='PyYAML')
PyYAML is mentioned twice as a requirement. It is mentioned once in both
the following files -
monitoring/ceph-mixin/requirements-lint.txt
monitoring/ceph-mixin/requirements-alerts.txt
Nizamudeen A [Mon, 7 Feb 2022 10:53:29 +0000 (16:23 +0530)]
cephadm: change shared_folder directory for prometheus and grafana
After https://github.com/ceph/ceph/pull/44059 the monitoring/prometheus
and monitoring/grafana/dashboards directories are changed to
monitoring/ceph-mixins. That broke the shared_folders in the cephadm
bootstrap script.
Changed all the instances of monitoring/prometheus and
monitoring/grafana/dashboards to monitoring/ceph-mixins
Also, renaming all the instances of prometheus_alerts.yaml to
prometheus_alerts.yml.
Fixes: https://tracker.ceph.com/issues/54176 Signed-off-by: Nizamudeen A <nia@redhat.com>
Soumya Koduri [Thu, 13 Jan 2022 20:44:19 +0000 (02:14 +0530)]
rgw/dbstore: Use Object ID to handle racing writes
Create unique ID for each object upload which will be
atomically updated in the head object at the end. This will
prevent data corruption during concurrent writes.
Incase of Multipart Uploads, upload_id is used as ObjectID.
XXX: The stale or obsolete tail data needs to be deleted
Also addressed invalid usage of CephContext in dbstore tests.
wanghao72 [Thu, 20 Jan 2022 07:57:29 +0000 (15:57 +0800)]
rgw: CopyObject works with x-amz-copy-source-if-* headers
CopyObject api support condition headers, eg x-amz-copy-source-if-match, while radosgw miss out the 'source' keyword Fixes: https://tracker.ceph.com/issues/53945 Signed-off-by: Wang Hao <wanghao72@baidu.com>
Ilya Dryomov [Mon, 31 Jan 2022 13:08:26 +0000 (14:08 +0100)]
qa/suites/krbd: add legacy+rxbounce and crc+rxbounce coverage
For basic, rbd and rbd-nomount subsuites, replace legacy and crc
facets with "legacy or legacy+rxbounce" and "crc or crc+rxbounce"
facets (chosen at random).
For fsx, singleton and thrash subsuites, add legacy+rxbounce and
crc+rxbounce facets and drop prefer-crc facet. The expected behaviour
of the latter depends on cluster configuration and should be tested
separately.
os/BlueStore: NCB fixes recovery code with shared blobs
Replaces the BitmapAllocator used by NCB Recovery code with a dedicated SimpleBitmap.
The SimpleBitmap allows for bits to be set multiple times without any adverse effect.
This is needed beacuse shared-blobs will report the same allocation multiple times.
Fixes: https://tracker.ceph.com/issues/53678 Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
Andriy Tkachuk [Wed, 2 Feb 2022 11:25:59 +0000 (11:25 +0000)]
rgw_sal_motr: fix possible memleak on put
Currently, the MotrAtomicWriter::cleanup() is called from
MotrAtomicWriter::commit(), which may not be called at all
by rgw in case of md5 checksum failure.
Solution: call cleanup() from process() when data is zero.
rgw calls Writer::process(data, off) with zero data at the
end of the loop to allow writes to flush the data. From:
src/rgw/rgw_op.cc:RGWPutObj::execute():
Patrick Donnelly [Mon, 31 Jan 2022 22:58:55 +0000 (17:58 -0500)]
mds: add inline feature to MDS bootstrap incompat
File systems that had inline data enabled at some point would have this
bit in the CompatSet "incompat" set. This would conflict during upgrade
with the default v16.2.4 CompatSet assigned to existing (16.2.4-) MDS.
Subsequently, this would cause an assertion in FSMap::sanity during
pending map creation.
This bit will get added anyway during the upgrade process so might as
well add it to the MDS CompatSet during bootstrap.
Fixes: https://tracker.ceph.com/issues/54081 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Build jsonnet and jb in the testso that we can build ceph without
internet access and still be able to run the test needed for monitoring
using jsonnet tools.
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
spec: debian: monitoring: build jsonnet from source to use 0.18.0
As this new version is recently released it's still not in every distro
we use. We now build jsonnet from source so that we can use this new
version of jsonnet. This commit could be reverted later on when the new
version would be available everywhere.
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
mgr/dashboard: monitoring: refactor into ceph-mixin
Mixin is a way to bundle dashboards, prometheus rules and alerts into
jsonnet package. Shifting to mixin will allow easier integration with
monitoring automation that some users may use.
This commit moves `/monitoring/grafana/dashboards` and
`/monitoring/prometheus` to `/monitoring/ceph-mixin`. Prometheus alerts
was also converted to Jsonnet using an automated way (from yaml to json
to jsonnet). This commit minimises any change made to the generated files
and should not change neithers the dashboards nor the Prometheus alerts.
In the future some configuration will also be added to jsonnet to add
more functionalities to the dashboards or alerts (i.e.: multi cluster).
Fixes: https://tracker.ceph.com/issues/53374 Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
Adam Kupczyk [Wed, 2 Feb 2022 19:28:14 +0000 (20:28 +0100)]
os/bluestore/bluefs: Make volume selector operations atomic
Make all RocksDBBlueFSVolumeSelector files/extents/size tracking atomic.
It used to be synchronized by BlueFS global lock.
Now, in Fine Grain Locking era, it is necessary to prevent corruption.
Fixes: https://tracker.ceph.com/issues/53906 Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
Adam Kupczyk [Thu, 20 Jan 2022 12:44:35 +0000 (13:44 +0100)]
os/bluestore/bluefs: Code for volume selector check
Adds ability to verify that volume selector properly tracks disk usage.
Creates options:
- bluefs_check_volume_selector_on_umount
- bluefs_check_volume_selector_often
that can be used to validate that vselector does not diverge from
values it should have.