if per_pool_stats is false when OSD has no PG, osd_sum.num_osds not equals osd_sum.num_per_pool_osds, then ceph df command return STORED == USED, this is not our expectation
test/{librbd, rgw}: retry when bind fail with port 0
there is chance that the bind() call may fail if we have another test
happen to pick the free port picked by operating system. in this case,
we just retry up to 42 times.
in theory, this change does not fully address the racing, but it should
help to alleviate this issue.
Ronen Friedman [Tue, 23 Aug 2022 05:12:18 +0000 (05:12 +0000)]
tests/osd: creating a Teuthology test re missing SnapMapper entries
The test (in the standalone/scrub suite) verifies that the scrubber
detects (and issues a cluster-log error) whenever a mapping entry
("SNA_") is missing in the SnapMapper DB.
Specifically, here the entry is corrupted - shortened as per
https://tracker.ceph.com/issues/56147.
Ronen Friedman [Mon, 1 Aug 2022 10:14:58 +0000 (10:14 +0000)]
osd/scrub: verify SnapMapper consistency
Whenever the scrubber access the SnapMapper for the snaps of a specific
clone, the mapper will now verify that the snaps have the required
mapping DB entries (the 'SNA_' keys).
crimson/block: Rename Device::get_size() to get_available_size()
Due to maintaining super block and other tracking information in the
disk, the entire disk size is not available, so rename the function
to represent that it actually returns available size on the device.
get_available_size() represents together free and used space available
on the device.
Aravind Ramesh [Tue, 30 Aug 2022 11:33:27 +0000 (17:03 +0530)]
crimson/block: fix the device size calculation.
In BlockSegmentManager, super block is updated with device size.
But a small amount of device capacity is reserved to store the
super block information and other tracking information.
Number of segments is calculated after discounting super block
size and tracking information size. This creates a mismatch
with the actual available size versus actual number of segments.
Update the available size after considering the reserved device
capacity and the number of segments and segment size.
Aravind Ramesh [Thu, 25 Aug 2022 03:44:52 +0000 (09:14 +0530)]
crimson/zns: crimson osd crashes when device size is huge
In reset_device(), if the total number of 512B sectors on the device
is more than INT_MAX then there was a overflow happening, rendering
the nr_sectors as 0, which was causing the failure of the ioctl and
subsequent crash, fix the overflow.
Aravind Ramesh [Thu, 18 Aug 2022 15:42:13 +0000 (21:12 +0530)]
crimson/zns: Add zone-capacity support.
ZNS SSDs have an attribute called zone_capacity which can be less than or
equal to zone_size. zone_capacity represents the actual writable media in
a zone. When zone_capacity is less than zone_size, writing to offsets
beyond zone_capacity will cause write errors.
Set the segment size as equal to zone_capacity, so that segment managers
writes only upto capacity of the zone/segment.
Update device size to actual available bytes so that the gc can kick in
at appropriate thresholds.
Aravind Ramesh [Thu, 18 Aug 2022 09:06:48 +0000 (14:36 +0530)]
crimson/zns: ZNSSegmentManager::release() should reset the zone.
For a ZNS device, a open/full zone has to be reset before it can be
reused to write from start. Seastore releases a segment/zone and marks
it empty and expects to be able to write to it from start. So as a part
of release reset the zone, so it moves to empty state on the device.
crimson/zns: segment_close() should finish the zone.
Zones in IMP-OPEN, EXP-OPEN, CLOSED states in a ZNS device are
counted as active resources. ZNS SSDs can have a limit on the
number of zones that can be active at the same time (max_active_resources).
If CLOSED zones reach max_active_zones supported by the device, then
opening/writing to newer zones will fail.
So a close_segment() from Seastore is essentially a FINISH
operation on a ZNS zone.
Do FINISH operation on a zone instead of CLOSE from segment_close().
crimson/zns: advance write pointer before writing tail-info.
SegmentAllocator::close_segment() writes tail information to a
segment before closing the segment, and this is written at the
end of segment. However, for ZNS SSDs, the writes have to always happen
at write pointer, so writing tail info at the end of a zone fails if
the WP is not at the offset requested by close_segment().
If the write pointer is not at lba where the tail information is written,
then advance write pointer by writing zeroes to the zone from it's current
write pointer. Then write the tail information at the end of zone.
Added advance_wp() function which advances the write pointer and then write
tail information, in case of ZNS devices but for a regular device it
continues to write at the end of segment.
Do close_segment() call after writing tail information, closing a segment
first and then writing tail information can cause potential race conditions
on a zns backed segment.
Adam King [Mon, 22 Aug 2022 15:14:12 +0000 (11:14 -0400)]
mgr/cephadm: allow setting prometheus retention time
When we deploy Prometheus server, we don't provide any
ability to define the tsdb retention time - so it defaults to 15d.
This change adds a field that can be passed in a prometheus service
spec that will be passed as an arg to the --storage.tsdb.retention.time
parameter for the prometheus daemon.
Fixes: https://tracker.ceph.com/issues/54308 Signed-off-by: Adam King <adking@redhat.com>
rgw: avoid string_view to temporary in RGWBulkUploadOp
the `else` block below constructs a temporary std::string that destructs
at the end of the statement, leaving `filename` as a dangling view:
```
filename = file_prefix + std::string(header->get_filename());
```
store a copy of the `std::string` instead
David Galloway [Wed, 31 Aug 2022 18:21:16 +0000 (14:21 -0400)]
.github: Give folks 30 seconds to fill out the checklist
Otherwise GitHub sends an annoying e-mail right away when you file a PR that doesn't have the checklist filled out. It's easier IMO to create the PR, then check the boxes instead of putting Xes in brackets while filling out the PR comment.
Signed-off-by: David Galloway <dgallowa@redhat.com>
Aravind Ramesh [Wed, 29 Jun 2022 10:57:40 +0000 (16:27 +0530)]
crimson/zns: ensure writes happen at write pointer.
For ZNS SSDs, every write to a segment/zone has to happen at the zone's
write pointer. Any write request to an offset which is not at
write pointer will be failed by the drive.
In ZNSSegment::write() error out if write offset is not same as WP.
RGW - Zipper - Remove a number of casts from rgw_admin
There are still a ton of casts to RadosStore in rgw_admin. Remove the
easy ones. Many of the rest represent actual operations that are
specific to RadosStore, and need to be split out.
Signed-off-by: Daniel Gryniewicz <dang@redhat.com>
Redouane Kachach [Wed, 31 Aug 2022 11:49:37 +0000 (13:49 +0200)]
mgr/cephadm: Fix how we check if a host belongs to public network Fixes: https://tracker.ceph.com/issues/57060 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
J. Eric Ivancich [Tue, 23 Aug 2022 20:44:24 +0000 (16:44 -0400)]
rgw: remove dout_subsys defs from header files
Each compilation unit should be able to define its own dout_subsys
without generating a redefinition warning. When dout_subsys is defined
in header files, it complicates this matter. This commit removes
definitions and header files and makes sure definitions are added to
.cc files as needed.
Additionally, at Adam Emerson's suggestion, use "static constexpr"
rather than "#define" to set "dout_subsys" in a few places as a
reminder to ultimately do it more broadly.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>