Ernesto Puerta [Fri, 25 Mar 2022 15:26:48 +0000 (16:26 +0100)]
mgr/dashboard: fix api test issue with pip
Fix
```
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
apache-libcloud 3.5.0 requires requests>=2.26.0, but you have requests 2.25.1 which is incompatible.
Successfully installed CherryPy-13.1.0 PyJWT-2.0.1 Routes-2.4.1 bcrypt-3.1.4 ceph-1.0.0 chardet-4.0.0 cheroot-8.6.0 idna-2.10 jaraco.functools-3.5.0 more-itertools-4.1.0 natsort-8.1.0 portend-3.1.0 pyopenssl-22.0.0 pytz-2022.1 repoze.lru-0.7 requests-2.25.1 tempora-5.0.1
```
Conflicts:
src/cephadm/cephadm
src/pybind/mgr/cephadm/module.py:
- Accept quincy changes and bring only updates in the Grafana,
Prometheus, Alertmanager and Node Exporter versions
os/BlueStore: NCB fix for SimpleBitmap boundary check
The boundary check in SimpleBitmap is off by one causing an assert to trigger
Also fixed a bug when asking for the next clear_extent on a unaligned map when the last bits in the map were set.
Adding unit-tests Fixes: https://tracker.ceph.com/issues/55145 Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
(cherry picked from commit 7dfa20863090d5eb58c798b6903386dcce6a52f8)
The trailing '3' was missed in one instance, ceph-mgr-cephadm, leading to:
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
ceph-mgr-cephadm : Depends: python3-cherrypy but it is not installable
osd/osd_types: Increasing decode version of scrub_duration in pg stats
All new fields added to pg stats after quincy RC need to have the decode field bumped up to avoid decoding errors during an upgrade from quincy RC to the quincy stable version
Dimitri Savineau [Mon, 28 Mar 2022 14:50:51 +0000 (10:50 -0400)]
cephadm: set quincy as stable release
Quincy isn't master anymore so we don't need the DEFAULT_IMAGE_IS_MASTER
variable set to true (which produces a warning message).
This also sets the LATEST_STABLE_RELEASE variable to quincy to match the
DEFAULT_IMAGE_RELEASE variable.
qa/standalone: Fix test_activate_osd() test in ceph-helpers.sh
Modify test_activate_osd() to get the type of scheduler in use and then
verify the value of osd_max_backfills. This is because mclock scheduler
overrides this option to 1000 upon OSD initialization.
The test earlier used to pass because the OSD daemon was killed but not
marked down and upon being brought up, the wait for OSD up check was
passing quickly. But the OSD still didn't have the latest config values.
But now upon killing the OSD, the osd_fast_shutdown sequence notifies the
mon (see PR: https://github.com/ceph/ceph/pull/44807) and is marked down
and dead. Upon bringing it up, the wait for OSD up check takes a longer
time and this is sufficient for the config values to be updated. This
results in the correct values being read from the config 'Values' map.
Satoru Takeuchi [Thu, 18 Nov 2021 20:48:18 +0000 (20:48 +0000)]
osd: make osd_fast_shutdown_notify_mon option true by default
osd_fast_shutdown_notify_mon option is false by default. So users suffer
from error log flood, slow ops, and the long I/O timeouts on voluntary OS
shutdown before they are aware of the existence of this option. Let's
make this option true by default.
doc: Improvements to mClock configuration reference documentation
Improve the documentation around.
- mclock client types.
- Describe in greater detail about mclock config profiles.
- Add notes about manually benchmarking OSDs and tuning bluestore throttle
parameters.
- Include a couple of missing mclock configuration options.
Ronen Friedman [Fri, 25 Mar 2022 10:45:47 +0000 (10:45 +0000)]
osd/scrub: restart snap trimming only after scrubbing is done
Snap trimming that was postponed as the target PG was scrubbing
must be restarted at scrub completion.
PR #38111 moved trimming restart to just before the scrub fully
terminated. The current PR fixes that.
Trimming is also restarted in those cases where scrub was
queued but aborted immediately.
During test LibRadosWatchNotify.Watch2Delete rados_watch_check can return error -102 if reconnect happened, in that case Broken pipe reconnect and -102 returned
Fix a problem in store_test::BluestoreBrokenNoSharedBlobRepairTest where the check for active null-fm was wrong and so reporting bogus errors when null-fm was inactive
The check need to access dynamic value and not config setting (which can be overridden) Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>
(cherry picked from commit 2969539d20a8157d62ae27f842c43b801efdc0ee)
Bug-Fix from PR-44370 force setting need_to_destage_allocation_file to True on device expansion without checking if we work in null-fm mode Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
(cherry picked from commit f7ebef8a804b8ce193bcbee4284dc28102708f37)
os/bluestore: Disable NCB functionality on rotational drives
NCB code needs to recover allocation map after an OSD crash.
The recovery process on rotational drives is about 20x slower than SSD making this solution unacceptable for that environmentÂ
Casey Bodley [Fri, 4 Feb 2022 14:51:24 +0000 (09:51 -0500)]
ceph.spec.in: seastar drops _FORTIFY_SOURCE from CFLAGS also
the arrow submodule builds some C sources that trip up on _FORTIFY_SOURCE in debug builds
[ 79%] Building C object src/arrow/CMakeFiles/arrow_objlib.dir/vendored/musl/strptime.c.o
In file included from /usr/include/time.h:25,
from /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-10531-gc73e1fda/rpm/el8/BUILD/ceph-17.0.0-10531-gc73e1fda/src/arrow/cpp/src/arrow/vendored/strptime.h:20,
from /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-10531-gc73e1fda/rpm/el8/BUILD/ceph-17.0.0-10531-gc73e1fda/src/arrow/cpp/src/arrow/vendored/musl/strptime.c:4:
/usr/include/features.h:381:4: error: #warning _FORTIFY_SOURCE requires compiling with optimization (-O) [-Werror=cpp]
381 | # warning _FORTIFY_SOURCE requires compiling with optimization (-O)
| ^~~~~~~
cc1: all warnings being treated as errors
make[5]: *** [src/arrow/CMakeFiles/arrow_objlib.dir/build.make:2543: src/arrow/CMakeFiles/arrow_objlib.dir/vendored/musl/strptime.c.o] Error 1
Casey Bodley [Fri, 28 Jan 2022 18:44:56 +0000 (13:44 -0500)]
cmake: add submodule for utf8proc at v2.2.0
adds utf8proc submodule, needed by the arrow submodule in centos. add a
WITH_SYSTEM_UTF8PROC option that controls whether or not utf8proc is
built from submodule
non-system utf8proc is built as a static library to avoid conflicts with
system-provided libraries
ceph.spec.in sets WITH_SYSTEM_UTF8PROC=OFF until it's available in
centos
Casey Bodley [Thu, 20 Jan 2022 15:22:27 +0000 (10:22 -0500)]
cmake: add submodule for Apache Arrow at v6.0.1
adds an arrow submodule. when WITH_RADOSGW_SELECT_PARQUET is enabled,
the submodule is built as an external project and rgw links against its
imported Arrow::Parquet target
Currently, for CEPH_OSD_OP_OMAPRMKEYRANGE ops, clean_omap gets set to true,
which results in incomplete recovery of objects and results in
inconsistent PGs after a scrub.
Ilya Dryomov [Wed, 16 Mar 2022 19:05:56 +0000 (20:05 +0100)]
librados: check latest osdmap on ENOENT in pool_reverse_lookup()
Avoid spurious ENOENT errors from rados_pool_reverse_lookup() and
Rados::pool_reverse_lookup().
This makes lookup by id consistent with lookup by name: the latter
has been checking latest osdmap since commit 7e5669b11b14 ("rados: we
need to get the latest osdmap when pool does not exists").
Teoman ONAY [Thu, 11 Nov 2021 15:05:49 +0000 (15:05 +0000)]
cephadm: remove containers pids-limit
The default pids-limit (docker 4096/podman 2048) prevent some
customization from working (http threads on RGW) or limits the number
of luns per iscsi target.
iovec have unsigned length (size_t) and before this patch the
total length was computed by adding iovec's length to a signed
length variable (ssize_t). While the code checked if the resulting
length was negative on overflow, the case where length is positive
after overflow was not checked. This patch fixes the overflow check
by changing length to unsigned size_t.
Additionally, this patch fixes the case where some iovecs have been
added to the bufferlist and the aio completion has been blocked, but
adding an additional iovec fails because of overflow. This leads to
the UserBufferDeleter trying to unblock the completion on destruction
of the bufferlist but asserting because the completion was never
armed. We avoid this by first computing the total length and checking
for overflows and iovcnt before adding them to the bufferlist.
Ilya Dryomov [Sat, 19 Mar 2022 13:04:52 +0000 (14:04 +0100)]
qa/workunits/rbd/cli_generic.sh: relax trash purge schedule status assert
Commit 08df6e0fd006 ("qa/workunits/rbd: expand LevelSpec parsing
coverage") didn't account for images with a separate data pool. This
was missed because of small-cache-pool.yaml breakage.
Add the snaptrim duration to the json formatted output of the pg dump
stats. Define methods for a PG to set the snaptrim begin time and then to
calculate the total time spent to trim all the objects for the snaps in
the snap_trimq for the PG.
Tests:
- Librados C and C++ API tests to verify the time spent for a snaptrim
operation on a PG. These tests use the self-managed snaps APIs.
- Standalone tests to verify snaptrim duration using rados pool snaps.
Add a new column, OBJECTS_TRIMMED, to the pg dump stats that shows the
number of objects trimmed when a snap is removed.
When a pg splits, the stats from the parent pg is copied to the child
pg. In such a case, reset objects_trimmed to 0 for the child pg
(see PeeringState::split_into()). Otherwise, this will result in incorrect
stats to be shown for a child pg after the split operation.
Tests:
- Librados C and C++ API tests to verify the number of objects trimmed
during snaptrim operation. These tests use the self-managed snaps APIs.
- Standalone tests to verify objects trimmed using rados pool snaps.