Josh Durgin [Fri, 7 Jan 2022 18:37:13 +0000 (13:37 -0500)]
mon/OSDMonitor: avoid null dereference if stats are not available
Not confirmed yet whether this was the issue in the bug referenced
below, however it's a necessary defensive check for the
'osd pool get-quota' command.
All other uses of get_pool_stats() already handle this case.
Alfonso Martínez [Tue, 23 Nov 2021 14:17:54 +0000 (15:17 +0100)]
mgr/dashboard: upgrade Cypress to the latest stable version
- Remove unneeded dependency that was causing UI performance issues: zone.js
- Ignore 'ResizeObserver loop limit exceeded' error.
- run-frontend-e2e-tests.sh refactoring: create rgw dashboard user through
'ceph dashboard set-rgw-credentials' and use it on rgw buckets' tests.
Fixes: https://tracker.ceph.com/issues/53357 Signed-off-by: Alfonso Martínez <almartin@redhat.com>
(cherry picked from commit 3e4e29590aa1742fc3b44d21389325a13cca8199)
Conflicts:
src/pybind/mgr/dashboard/frontend/cypress/integration/rgw/buckets.e2e-spec.ts
Reject the current changes
src/pybind/mgr/dashboard/frontend/cypress/integration/rgw/buckets.po.ts
Reject the current changes
src/pybind/mgr/dashboard/frontend/cypress/integration/ui/navigation.po.ts
Deleted this file since its not in octopus
src/pybind/mgr/dashboard/frontend/package-lock.json
Generated new file
src/pybind/mgr/dashboard/frontend/package.json
Kept zone.js and changed the cypress version to 9.0.0
src/pybind/mgr/dashboard/run-frontend-e2e-tests.sh
Accept the current change
This is redundant and makes nsenter throw messages like following:
```
Failed to find sysfs mount point
dev/block/11:0/holders/: opendir failed: Not a directory
dev/block/252:0/holders/: opendir failed: Not a directory
dev/block/253:0/holders/: opendir failed: Not a directory
dev/block/252:1/holders/: opendir failed: Not a directory
dev/block/253:1/holders/: opendir failed: Not a directory
dev/block/252:2/holders/: opendir failed: Not a directory
dev/block/253:2/holders/: opendir failed: Not a directory
dev/block/252:3/holders/: opendir failed: Not a directory
dev/block/253:3/holders/: opendir failed: Not a directory
dev/block/252:16/holders/: opendir failed: Not a directory
dev/block/252:32/holders/: opendir failed: Not a directory
dev/block/252:48/holders/: opendir failed: Not a directory
dev/block/252:64/holders/: opendir failed: Not a directory
```
ceph-volume should run pv/vg/lv commands in the host namespace rather than
running them inside the container in order to avoid lvm metadata corruption.
Jeff Layton [Wed, 10 Nov 2021 18:10:50 +0000 (13:10 -0500)]
qa: account for split of the kclient "metrics" debugfs file
Recently, Luis posted a patch to turn the metrics debugfs file into a
directory with separate files for the different sections in the old
metrics file.
Account for this change in get_op_read_count().
Fixes: https://tracker.ceph.com/issues/53214 Signed-off-by: Jeff Layton <jlayton@redhat.com>
(cherry picked from commit e9f2bff8cd7df1c81ff8bbfa2530f470d9c6af2c)
Neha Ojha [Mon, 9 Aug 2021 14:35:01 +0000 (14:35 +0000)]
qa/suites/rados/perf/ceph.yaml: remove rgw
This is no longer required because we removed cosbench workloads in fd350fd0150a2d4072f055658c20314a435a19ba. This is also required to prevent
failures like the following or any other changes that break the rgw task:
```
2021-08-06T20:13:25.812 INFO:teuthology.orchestra.run.smithi060.stderr:curl: (7) Failed to connect to smithi060.front.sepia.ceph.com port 80: Connection refused
2021-08-06T20:15:33.813 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_git_teuthology_04c2febe7099917d97a71271f17abb5710030132/teuthology/contextutil.py", line 31, in nested
vars.append(enter())
File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "/home/teuthworker/src/github.com_ceph_ceph-c_3c0f8c8164075af7aac4d1f2805d3f4580709461/qa/tasks/rgw.py", line 191, in start_rgw
wait_for_radosgw(url, remote)
File "/home/teuthworker/src/github.com_ceph_ceph-c_3c0f8c8164075af7aac4d1f2805d3f4580709461/qa/tasks/util/rgw.py", line 94, in wait_for_radosgw
assert exit_status == 0
AssertionError
```
Yaarit Hatuka [Wed, 25 Aug 2021 02:12:08 +0000 (02:12 +0000)]
rpm, debian: move smartmontools and nvme-cli to ceph-base
We wish to be able to scrape SMART and NVMe metrics from OSD and MON
nodes. For this we require / recommend smartmontools and nvme-cli
dependencies for both the ceph-osd and ceph-mon packages. However, the
sudoers file (which is required for invoking `smartctl` by user 'ceph')
was installed only in the ceph-osd package. Since different packages
cannot own the same file, and because we want to be able to scrape from
every daemon, we move the dependencies and the sudoers installation to
ceph-base. For generalization, we rename:
sudoers.d/ceph-osd-smartctl -> sudoers.d/ceph-smartctl
Igor Fedotov [Thu, 27 May 2021 12:49:05 +0000 (15:49 +0300)]
common/PriorityCache: low perf counters priorities for submodules.
Having too many perf counters with nicknames priorities >= PRIO_INTERESTING spoils daemonperf output and causes no "osd" section there due to presumably too many columns.
Fixes: https://tracker.ceph.com/issues/51002 Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 35238d41360a22e22fae7d8ceddf3a2a047e5464)
yanqiang-ux [Mon, 7 Jun 2021 07:54:44 +0000 (15:54 +0800)]
osd: set r only if succeed in FillInVerifyExtent
When read failed, ret can be taken as data len in FillInVerifyExtent, which should be avoided.
It may cause errors in crc repair or retry read because of the data len. In my case, we use FillInVerifyExtent for EC read,
when meet -EIO,we will try crc repair, which need read data from other shard accrding to data len.
And I meet assert in ECBackend.cc (loc: line 2288 ceph_assert(range.first != range.second) ), But it seems master branch not support EC crc repair.
In shot, when reuse the readop may cause unpredictable error.
Fixes: https://tracker.ceph.com/issues/51115 Signed-off-by: yanqiang-ux <yanqiang_ux@163.com>
(cherry picked from commit 127745161fbcdee06b2dfa8464270c3934bcd06a)
J. Eric Ivancich [Wed, 28 Jul 2021 17:52:29 +0000 (13:52 -0400)]
rgw: user stats showing 0 value for "size_utilized" and "size_kb_utilized" fields
When accumulating user stats, the "utilized" fields are not looked
at. Updates RGWStorageStats::dump so it only outputs the "utilized"
data if they're updated.
Kajetan Janiak [Wed, 18 Nov 2020 10:42:07 +0000 (11:42 +0100)]
rgw: disable prefetch in rgw_file
Each call to rgw_read (rgw_file.cc) invokes three calls to RGWRados::get_obj_state with s->prefetch_data=true. It results in great read amplification. If length argument in rgw_read call is smaller than rgw_max_chunk_size, then the amplification is threefold.
Duncan Bellamy [Sat, 8 May 2021 10:52:35 +0000 (11:52 +0100)]
mds: PurgeQueue.cc fix for 32bit compilation
files_high_water is defined as uint64_t but when compiling on 32bit these max functions
fail as they are both not considered uint64_t by gcc 10 even though they are
Adam Kupczyk [Sat, 13 Nov 2021 10:28:18 +0000 (11:28 +0100)]
os/bluestore: Fix omap upgrade to per-pg scheme
This is fix to regression introduced by fix to omap upgrade: https://github.com/ceph/ceph/pull/43687
The problem was that we always skipped first omap entry.
This worked fine with objects having omap header key.
For objects without header key we skipped first actual omap key.
Fixes: https://tracker.ceph.com/issues/53260 Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
(cherry picked from commit 65a3f374aa1c57c5bb9401e57dab98a643b4360a)
Those functions should return `None` if 0 or more than 1 item is returned.
The current name of these functions are confusing and can lead to thinking that
we just want the first item returned, even though it returns more than 1
item, let's rename them to `get_single_pv()`, `get_single_vg()` and
`get_single_lv()`
Marcus Watts [Fri, 17 Sep 2021 09:28:53 +0000 (05:28 -0400)]
Fix vault token file access.
Put the vault token file in a location that ceph can read.
Make it readable only by ceph.
On rhel8 (and indeed, any vanilla rhel machine), $HOME is liable to be
mode 700. This means the ceph user can't read things in that user's
directory. This causes radosgw to emit the confusing message "ERROR:
Vault token file ... not found" even though the teuthology log will
plainly show it was created and made readable by ceph.
Fixes: http://tracker.ceph.com/issues/51539 Signed-off-by: Marcus Watts <mwatts@redhat.com>
(cherry picked from commit 454cc8a18c4c3851de5976d3e36e42644dbb1a70)
Conflicts:
qa/tasks/rgw.py
Cherry-pick notes:
- Conflict due to ctx.rgw.vault_role not set in Octopus test
Mark Kogan [Mon, 15 Nov 2021 15:50:49 +0000 (15:50 +0000)]
rgwi/beast: stream timer with duration 0 disables timeout
fixes all S3 operations failing with:
`2021-11-15T15:46:05.992+0000 7ffee17fa700 20 failed to read header: Bad file descriptor`
when `--rgw_frontends="beast port=8000 request_timeout_ms=0"`
Casey Bodley [Thu, 11 Nov 2021 17:01:06 +0000 (12:01 -0500)]
rgw/beast: reference count Connections for timeout_handler
resolves a use-after-free in the timeout_handler, where a timeout fires
and schedules the timeout_handler for execution, but the coroutine exits
and destroys the socket before asio executes the timeout_handler
timeout_handler now holds a reference on the Connection to extend its
lifetime
now that the Connection is allocated on the heap, we can include the
parse_buffer in this memory instead of allocating it separately
Casey Bodley [Sat, 30 Oct 2021 23:47:02 +0000 (19:47 -0400)]
rgw/beast: replace beast::tcp_stream with manual timeouts
remove the beast::tcp_stream wrapper from the socket, and track timeouts
manually with a timeout_timer. this timer uses ceph's coarse_mono_clock
which is cheaper to sample than std::chrono::steady_clock
Casey Bodley [Mon, 1 Nov 2021 17:14:16 +0000 (13:14 -0400)]
spawn: use explicit strand executor
the default spawn::yield_context uses the polymorphic boost::asio::executor
to support any executor type
rgw's beast frontend always uses the same executor type for these
coroutines, so we can use that type directly to avoid the overhead of
type erasure and virtual function calls
Cherry-pick notes:
- src/rgw/rgw_d3n_cacherequest.h doesn't exist in Octopus
- src/rgw/rgw_sync_checkpoint.cc doesn't exist in Octopus
- conflicts due to rename of structs after Octopus
- conflicts due to macro for conditional inclusion of beast context in Octopus
The current check only allows to request an OSD id that exists but
marked as 'destroyed'.
With this small fix, we can now use `--osd-id` with an id that doesn't
exist at all.
ceph-volume: fix bug with miscalculation of required db/wal slot size for VGs with multiple PVs
Previous logic for calculating db/wal slot sizes made the assumption that there would only be
a single PV backing each db/wal VG. This wasn't the case for OSDs deployed prior to v15.2.8,
since ceph-volume previously deployed multiple SSDs in the same VG. This fix removes the
assumption and does the correct calculation in either case.
The marker was not working correctly as segments of the bucket index
were listed to shut down any incomplete multipart uploads. This fixes
the marker, so it's maintained properly across iterations.
Igor Fedotov [Thu, 15 Jul 2021 12:10:14 +0000 (15:10 +0300)]
os/bluestore: fix improper offset calculation when repairing.
While repairing misreferenced blobs BlueStore could improperly calculate
an offset within a blob being fixed. This could happen when single
physical extent has been replaced by multiple ones - the following
pextent (if any in the current blob) would be treated with the improper offset within the blob. Offset calculation didn't account for each of that new pextents but the last one only.
Fixes: https://tracker.ceph.com/issues/51682 Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit ca4b6675fc3fd2f4cadad58044c97c5bb23d5938)
Igor Fedotov [Thu, 15 Jul 2021 11:16:39 +0000 (14:16 +0300)]
test/objectstore/bluestore_types: add map_bl test case
Along with the basic bluestore_blob_t::map_any functionality
verification this UT shows how invalid offset might appear in
https://tracker.ceph.com/issues/51682
Venky Shankar [Fri, 1 Oct 2021 08:55:40 +0000 (04:55 -0400)]
mds: skip journaling blocklisted clients when in `replay` state
When a standby MDS is transitioning to active, it passes through
`replay` state. When the MDS is in this state, there are no journal
segments available for recording journal updates. If the MDS receives
an OSDMap update in this state, journaling blocklisted clients causes
a crash since no journal segments are available. This is a bit hard
to reproduce as it requires correct timing of an OSDMap update along
with various other factors.
Note that, when the MDS reaches `reconnect` state, it will journal
the blocklisted clients anyway.
This partially fixes tracker: https://tracker.ceph.com/issues/51589
which mentions a similar crash but in `reconnect` state. However,
that crash was seen in nautilus.
A couple of minor changes include removing hardcoded function names
and carving out reusable parts into a separate function.
so AvlAllocator can switch from the first-first mode to best-fit mode
without walking through the whole space map tree. in the
highly-fragmented system, iterating the whole tree could hurt the
performance of fast storage system a lot.
the idea comes from openzfs's metaslab allocator.
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 5a26875049d13130ffe5954428da0e1b9750359f) Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Conflicts:
src/common/options/global.yaml.in:
- Moved new option into src/common/options.cc
so AvlAllocator can switch from the first-first mode to best-fit mode
without walking through the whole space map tree. in the
highly-fragmented system, iterating the whole tree could hurt the
performance of fast storage system a lot.
the idea comes from openzfs's metaslab allocator.
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 40f05b971f5a8064cf9819f80fc3bbf21d5206da) Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Conflicts:
src/common/options/global.yaml.in
- Moved new option into src/common/options.cc
Kefu Chai [Wed, 2 Jun 2021 08:31:18 +0000 (16:31 +0800)]
os/bluestore/AvlAllocator: use cbit for counting the order of alignment
no need to calculate the alignment first, cbits() would suffice. as it
counts the first set bit and the follow 0's in a number. the result
is identical to the cbit(alignment of that number).
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 573cbb796e8ba2f433caa308925735101a8161a6) Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
before this change AvlAllocator::_block_picker() is used by both the
best-fit mode and first-fit mode. but since we cannot achieve the
locality by searching in the area pointed by curosr in best-fit mode,
we just pass a dummy cursor to AvlAllocator::_block_picker() when
searching in the best-fit mode.
but since the range_size_tree is already sorted by the size of ranges,
if _block_picker() fails to find one by the size, we should just give
up right away, and instead try again using a smaller size.
after this change, instead of sharing AvlAllocator::_block_picker()
across both the first-fit mode and the best-fit mode, this method
is specialize to two different variants: one for first-fit, and the
simplified one for best-fit.
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 4837166f9e7a659742d4184f021ad12260247888) Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Adam Kupczyk [Wed, 19 May 2021 10:49:37 +0000 (12:49 +0200)]
os/bluestore: Improve _block_picker function
Make _block_picker function scan (*cursor, end) + (begin, *cursor) instead of (*cursor, end) + (begin, end).
The second run over range (*cursor, end) could never yield any results.
Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
(cherry picked from commit c732060d3e3ef96c6da06c9dde3ed8c064a50965) Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Sage Weil [Thu, 18 Mar 2021 16:45:48 +0000 (11:45 -0500)]
mon/MgrStatMonitor: ignore MMgrReport from non-active mgr
If it's not the active mgr, we should ignore it.
Since the mgr instance is best identified by the gid, add that to the
message. (We can't use the source_addrs for the message since that is
the MgrStandby monc addr, not the active mgr addrs in the MgrMap.)
This fixes a problem where a just-demoted mgr report gets processed and a
new mgr gets a ServiceMap with an epoch >= its pending map. (At least,
that is my theory!)