Laura Flores [Wed, 7 Sep 2022 19:45:57 +0000 (19:45 +0000)]
install-deps.sh: fix install-deps script for focal and bionic
When run on focal and bionic, install-deps ends early
due to this extra debug message that was added to the
end of `ensure_decent_gcc_on_ubuntu`. The debug message
prints when the script is run in a jenkins environment.
When the script is not run in a jenkins environment, the
value returned there is "false" or "0", which acts as
an early return. This stops the script from completing.
We can remove this line, as `ensure_decent_gcc_on_ubuntu`
is only called for focal and bionic, and most of the jenkins
nodes are running jammy. Also, there is a debug message at the
beginning of the function that should suffice.
Fixes: https://tracker.ceph.com/issues/57466 Signed-off-by: Laura Flores <lflores@redhat.com>
With auto-deletion of trashed snapshots, it is relatively easy to lose
a race to "rbd flatten" as follows:
- when V2_GET_PARENT runs, the image is technically still a clone
- when V2_REFRESH_PARENT runs, the image is fully flattened and the
snapshot in the parent image is deleted
This results in a spurious ENOENT error, mainly when trying to open the
image (e.g. for "rbd info"). This race condition has always been there
but auto-deletion of trashed snapshots makes it much worse.
Retry ENOENT in V2_REFRESH_PARENT the same way as in V2_GET_SNAPSHOTS.
librbd: fix a bunch of issues with restarting RefreshRequest
Make RefreshRequest properly restartable, at least up until and including
V2_REFRESH_PARENT step:
- clear m_migration_spec when skipping GET_MIGRATION_HEADER
- don't rely on potentially stale m_incomplete_update on retry
- reset m_legacy_parent when retrying more than just V2_GET_PARENT
- don't rely on potentially stale m_parent_md.overlap and
m_head_parent_overlap on retry
- clear m_metadata before fetching image metadata (but not before
fetching pool metadata)
- clear m_op_features when skipping V2_GET_OP_FEATURES
- clear m_group_spec on EOPNOTSUPP error in V2_GET_GROUP
- reset m_legacy_snapshot when retrying more than just V2_GET_SNAPSHOTS
- don't rely on potentially stale m_snap_parents on retry
test/{librbd, rgw}: retry when bind fail with port 0
there is chance that the bind() call may fail if we have another test
happen to pick the free port picked by operating system. in this case,
we just retry up to 42 times.
in theory, this change does not fully address the racing, but it should
help to alleviate this issue.
Ronen Friedman [Tue, 23 Aug 2022 05:12:18 +0000 (05:12 +0000)]
tests/osd: creating a Teuthology test re missing SnapMapper entries
The test (in the standalone/scrub suite) verifies that the scrubber
detects (and issues a cluster-log error) whenever a mapping entry
("SNA_") is missing in the SnapMapper DB.
Specifically, here the entry is corrupted - shortened as per
https://tracker.ceph.com/issues/56147.
Ronen Friedman [Mon, 1 Aug 2022 10:14:58 +0000 (10:14 +0000)]
osd/scrub: verify SnapMapper consistency
Whenever the scrubber access the SnapMapper for the snaps of a specific
clone, the mapper will now verify that the snaps have the required
mapping DB entries (the 'SNA_' keys).
crimson/block: Rename Device::get_size() to get_available_size()
Due to maintaining super block and other tracking information in the
disk, the entire disk size is not available, so rename the function
to represent that it actually returns available size on the device.
get_available_size() represents together free and used space available
on the device.
Aravind Ramesh [Tue, 30 Aug 2022 11:33:27 +0000 (17:03 +0530)]
crimson/block: fix the device size calculation.
In BlockSegmentManager, super block is updated with device size.
But a small amount of device capacity is reserved to store the
super block information and other tracking information.
Number of segments is calculated after discounting super block
size and tracking information size. This creates a mismatch
with the actual available size versus actual number of segments.
Update the available size after considering the reserved device
capacity and the number of segments and segment size.
Aravind Ramesh [Thu, 25 Aug 2022 03:44:52 +0000 (09:14 +0530)]
crimson/zns: crimson osd crashes when device size is huge
In reset_device(), if the total number of 512B sectors on the device
is more than INT_MAX then there was a overflow happening, rendering
the nr_sectors as 0, which was causing the failure of the ioctl and
subsequent crash, fix the overflow.
Aravind Ramesh [Thu, 18 Aug 2022 15:42:13 +0000 (21:12 +0530)]
crimson/zns: Add zone-capacity support.
ZNS SSDs have an attribute called zone_capacity which can be less than or
equal to zone_size. zone_capacity represents the actual writable media in
a zone. When zone_capacity is less than zone_size, writing to offsets
beyond zone_capacity will cause write errors.
Set the segment size as equal to zone_capacity, so that segment managers
writes only upto capacity of the zone/segment.
Update device size to actual available bytes so that the gc can kick in
at appropriate thresholds.
Aravind Ramesh [Thu, 18 Aug 2022 09:06:48 +0000 (14:36 +0530)]
crimson/zns: ZNSSegmentManager::release() should reset the zone.
For a ZNS device, a open/full zone has to be reset before it can be
reused to write from start. Seastore releases a segment/zone and marks
it empty and expects to be able to write to it from start. So as a part
of release reset the zone, so it moves to empty state on the device.
crimson/zns: segment_close() should finish the zone.
Zones in IMP-OPEN, EXP-OPEN, CLOSED states in a ZNS device are
counted as active resources. ZNS SSDs can have a limit on the
number of zones that can be active at the same time (max_active_resources).
If CLOSED zones reach max_active_zones supported by the device, then
opening/writing to newer zones will fail.
So a close_segment() from Seastore is essentially a FINISH
operation on a ZNS zone.
Do FINISH operation on a zone instead of CLOSE from segment_close().
crimson/zns: advance write pointer before writing tail-info.
SegmentAllocator::close_segment() writes tail information to a
segment before closing the segment, and this is written at the
end of segment. However, for ZNS SSDs, the writes have to always happen
at write pointer, so writing tail info at the end of a zone fails if
the WP is not at the offset requested by close_segment().
If the write pointer is not at lba where the tail information is written,
then advance write pointer by writing zeroes to the zone from it's current
write pointer. Then write the tail information at the end of zone.
Added advance_wp() function which advances the write pointer and then write
tail information, in case of ZNS devices but for a regular device it
continues to write at the end of segment.
Do close_segment() call after writing tail information, closing a segment
first and then writing tail information can cause potential race conditions
on a zns backed segment.
Adam King [Mon, 22 Aug 2022 15:14:12 +0000 (11:14 -0400)]
mgr/cephadm: allow setting prometheus retention time
When we deploy Prometheus server, we don't provide any
ability to define the tsdb retention time - so it defaults to 15d.
This change adds a field that can be passed in a prometheus service
spec that will be passed as an arg to the --storage.tsdb.retention.time
parameter for the prometheus daemon.
Fixes: https://tracker.ceph.com/issues/54308 Signed-off-by: Adam King <adking@redhat.com>
config options with `type: size` are not actually `size_t` but
`uint64_t`, so accessing them with `get_val<size_t>()` leads to
a `bad_variant_access` exception. use the `Option::size_t` type
instead
rgw: avoid string_view to temporary in RGWBulkUploadOp
the `else` block below constructs a temporary std::string that destructs
at the end of the statement, leaving `filename` as a dangling view:
```
filename = file_prefix + std::string(header->get_filename());
```
store a copy of the `std::string` instead
David Galloway [Wed, 31 Aug 2022 18:21:16 +0000 (14:21 -0400)]
.github: Give folks 30 seconds to fill out the checklist
Otherwise GitHub sends an annoying e-mail right away when you file a PR that doesn't have the checklist filled out. It's easier IMO to create the PR, then check the boxes instead of putting Xes in brackets while filling out the PR comment.
Signed-off-by: David Galloway <dgallowa@redhat.com>
Aravind Ramesh [Wed, 29 Jun 2022 10:57:40 +0000 (16:27 +0530)]
crimson/zns: ensure writes happen at write pointer.
For ZNS SSDs, every write to a segment/zone has to happen at the zone's
write pointer. Any write request to an offset which is not at
write pointer will be failed by the drive.
In ZNSSegment::write() error out if write offset is not same as WP.