Yin Congmin [Tue, 29 Mar 2022 08:59:05 +0000 (16:59 +0800)]
librbd/cache/pwl: add basic metrics to ImageCacheState
Add basic metrics to ImageCacheState and persist them, including
allocated_bytes, cached_bytes, dirty_bytes, free_bytes and hit/miss
info.
Leverage periodic_stats() timer to call update_image_cache_state.
In order to avoid outputting too much debug information, the original
statistics output log level is changed to 5.
Switch to json_spirit for encoding because encode_json encodes bool as
"true"/"false" string.
Remove rbd_persistent_cache_log_periodic_stats option because we need
to always update cache state.
[ idryomov: add cached_bytes and hits_partial; report misses and
miss_bytes instead of respective totals; naming ]
The trailing '3' was missed in one instance, ceph-mgr-cephadm, leading to:
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
ceph-mgr-cephadm : Depends: python3-cherrypy but it is not installable
osd/osd_types: Increasing decode version of scrub_duration in pg stats
All new fields added to pg stats after quincy RC need to have the decode field bumped up to avoid decoding errors during an upgrade from quincy RC to the quincy stable version
Redouane Kachach [Tue, 29 Mar 2022 16:37:10 +0000 (18:37 +0200)]
mgr/cephadm: fallback to normal sorted if cannot import natsorted Fixes: https://tracker.ceph.com/issues/55113 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
ceph.spec.in: use gcc-toolset-10 for building crimson
This commit bumps up the toolset version but only to build crimson.
That is, the classical OSD stays unaffected.
The reason behind the upgrade is the following FTBFS:
```
[ 32%] Building CXX object src/seastar/CMakeFiles/seastar.dir/src/core/reactor.cc.o
/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-11345-ga3bb1485/rpm/el8/BUILD/ceph-17.0.0-11345-ga3bb1485/src/seastar/src/core/reactor.cc: In constructor ‘seastar::reactor::reactor(std::shared_ptr<seastar::smp>, seastar::alien::instance&, unsigned int, seastar::reactor_backend_selector, seastar::reactor_config)’:
/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-11345-ga3bb1485/rpm/el8/BUILD/ceph-17.0.0-11345-ga3bb1485/src/seastar/src/core/reactor.cc:926:90: error: use of deleted function ‘seastar::condition_variable::condition_variable()’
926 | , _thread_pool(std::make_unique<thread_pool>(this, seastar::format("syscall-{}", id))) {
| ^
In file included from /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-11345-ga3bb1485/rpm/el8/BUILD/ceph-17.0.0-11345-ga3bb1485/src/seastar/include/seastar/core/reactor.hh:74,
from /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-11345-ga3bb1485/rpm/el8/BUILD/ceph-17.0.0-11345-ga3bb1485/src/seastar/src/core/reactor.cc:32:
/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-11345-ga3bb1485/rpm/el8/BUILD/ceph-17.0.0-11345-ga3bb1485/src/seastar/include/seastar/core/condition-variable.hh:157:5: note: ‘seastar::condition_variable::condition_variable() noexcept’ is implicitly deleted because its exception-specification does not match the implicit exception-specification ‘’
157 | condition_variable() noexcept = default;
```
Xuehan Xu [Sun, 20 Mar 2022 12:36:05 +0000 (20:36 +0800)]
crimson/os/seastore: coordinate segment seq of journal and ool segments
the segment seq in ool segments' headers also need to be set to the
current journal segment seq, because we rely on this to judge whether a
delta needs to be replayed
Ernesto Puerta [Fri, 25 Mar 2022 15:26:48 +0000 (16:26 +0100)]
mgr/dashboard: fix api test issue with pip
Fix
```
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
apache-libcloud 3.5.0 requires requests>=2.26.0, but you have requests 2.25.1 which is incompatible.
Successfully installed CherryPy-13.1.0 PyJWT-2.0.1 Routes-2.4.1 bcrypt-3.1.4 ceph-1.0.0 chardet-4.0.0 cheroot-8.6.0 idna-2.10 jaraco.functools-3.5.0 more-itertools-4.1.0 natsort-8.1.0 portend-3.1.0 pyopenssl-22.0.0 pytz-2022.1 repoze.lru-0.7 requests-2.25.1 tempora-5.0.1
```
Tim Serong [Mon, 28 Mar 2022 04:12:10 +0000 (15:12 +1100)]
ceph.spec.in: remove build directory at end of %install
By the time we get to the end of the %install section, all the
built binaries have been installed in the build root, so we can
delete the build directory from the source tree. This frees up
about 17GB of disk space on build hosts, which is helpful in
case other processes later in the RPM build need more disk space.
Fixes: https://tracker.ceph.com/issues/55079 Signed-off-by: Tim Serong <tserong@suse.com>
Kefu Chai [Sat, 26 Mar 2022 17:00:19 +0000 (01:00 +0800)]
doc/dpdk: reword the root access part
the root access to system is not a must have for running a DPDK
application. so reword the "Configuring OSD DPDKStack" section.
also, manually editing /etc/passwd is not encouraged, so use
"usermod" instead. to add a directory after user's command
interpreter in /etc/passwd does not make sense. see PASSWD(5).
so drop the paragraph on editing /etc/passwd.
Adam King [Fri, 4 Mar 2022 02:47:47 +0000 (21:47 -0500)]
mgr/cephadm: offline host watcher
To be able to detect if certain offline hosts go
offline quicker. Could be useful for the NFS
HA feature as this requires moving nfs daemons from
offline hosts within 90 seconds.
qa/standalone: Fix test_activate_osd() test in ceph-helpers.sh
Modify test_activate_osd() to get the type of scheduler in use and then
verify the value of osd_max_backfills. This is because mclock scheduler
overrides this option to 1000 upon OSD initialization.
The test earlier used to pass because the OSD daemon was killed but not
marked down and upon being brought up, the wait for OSD up check was
passing quickly. But the OSD still didn't have the latest config values.
But now upon killing the OSD, the osd_fast_shutdown sequence notifies the
mon (see PR: https://github.com/ceph/ceph/pull/44807) and is marked down
and dead. Upon bringing it up, the wait for OSD up check takes a longer
time and this is sufficient for the config values to be updated. This
results in the correct values being read from the config 'Values' map.
Kefu Chai [Fri, 25 Mar 2022 15:06:46 +0000 (23:06 +0800)]
cmake/modules: avoid using distutils
to address following warning from python 3.9:
<string>:1: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
<string>:1: DeprecationWarning: The distutils.sysconfig module is deprecated, use sysconfig instead
Ronen Friedman [Fri, 25 Mar 2022 10:45:47 +0000 (10:45 +0000)]
osd/scrub: restart snap trimming only after scrubbing is done
Snap trimming that was postponed as the target PG was scrubbing
must be restarted at scrub completion.
PR #38111 moved trimming restart to just before the scrub fully
terminated. The current PR fixes that.
Trimming is also restarted in those cases where scrub was
queued but aborted immediately.
J. Eric Ivancich [Fri, 21 Jan 2022 20:30:45 +0000 (15:30 -0500)]
rgw: configurable instrumentation on bucket index transactions
In order to better understand corner cases with bucket index
operations, extra instrumentation is now added and controlled by a
boolean configuration variable
("rgw_bucket_index_transaction_instrumentation").
When set to true, there is extra logging during all CLS operations
involving bucket index transactions. Additionally, all these log
entries are tagged with "BITX" to make them easier to find in the
logs. This is preferable to setting all OSD logging at a high level
due to the log size issues.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
J. Eric Ivancich [Thu, 20 Jan 2022 15:57:32 +0000 (10:57 -0500)]
rgw: make bucket index pending op expiration configureable
Bucket index operations are transactional with data object
manipulation. The operation is prepared by adding a pending operation
record. And when the data object side is complete, the bucket index
operation is committed.
If it fails to be committed, later bucket listings will compare the
pending ops with the current data object state and see whether it
completed or not and then either commit or expire the op. The time
span for expiration is currently hard-coded as 120 seconds (unless
overridden in the bucket header, which can happen during "bucket
check").
This commit allows that expiration time to be configured.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
J. Eric Ivancich [Thu, 20 Jan 2022 15:56:49 +0000 (10:56 -0500)]
osd: add new CLS call to retrieve global configuration
Currently there is no easy way to gain access to global configuration
from CLS (objclass) code. This adds a new call to the CLS interface
that returns a "const ConfigProxy&" from which configuration can be
accessed.
NOTE: Working code to do provide this functionality in crimson is not
provided.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
doc: Improvements to mClock configuration reference documentation
Improve the documentation around.
- mclock client types.
- Describe in greater detail about mclock config profiles.
- Add notes about manually benchmarking OSDs and tuning bluestore throttle
parameters.
- Include a couple of missing mclock configuration options.
build, crimson/osd: do not let Seastar to interfere with ELF's program headers.
For the sake of avoiding locking on the `__cxa_throw` paths, Seastar
hijacks `dl_iterate_phdr` of the dynamic linker. Unfortunately, this
has a nasty side effect: it makes impossible to catch an exception in
in a plugin (a DSO loaded via the `dlopen()` machinery).
For mote details please consult:
* https://gist.github.com/rzarzynski/3abe9ed6b50cfa1893d34988e1628bfc,
* `seastar/src/core/exception_hacks.cc`.
This patch deals with the problem by simply disabling the problematic
workaround which could be iatrogenic too. If that would be the case,
we can consider:
* preloading all our Ceph Classes before reaching `smp::configure()`,
* statically linking them.