set(dpdk_LIBRARIES) does not reset this variable, it leaves it
unchanged.
if pkg-config manages to find DPDK libraries, dpdk_LIBRARIES would be
set with a string like "rte_node;rte_graph;..." by
pkg_check_modules(dpdk QUIET libdpdk).
but we would want to set this variable to the import paths of the
required libraries. so reset it before appending them to this variable.
this change helps to address the build failure when building Ceph with
DPDK installed into system along with its .pc file.
Patrick Donnelly [Tue, 30 Mar 2021 21:26:08 +0000 (14:26 -0700)]
mon,mds: use per-MDS compat to inform replacement
This diff makes the following changes:
- FSMap::compat is now just a "default compat" of currently unknown
utility. It is used when constructing a new file system but does
not really have any effect or current use.
- The `mds compat *` CLI commands are deprecated. They manipulate
the default compat which has no useful effect.
- Each MDS sends its compat to the mons in its beacon. This is from
MDSMap::get_compat_set_all() at MDS boot. This CompatSet does not
change for the duration of the MDS lifetime.
- Mons record each MDS compat in the FSMap to inform standby failover.
An MDS is only promoted if it is compatible with the file system
compat.
- Mons upgrade (merge) the file system compat when (a) the number of
*in* MDS is 1 (effected by max_mds=1) and (b) the mons are promoting a
standby with a new compat. A file system is never upgraded when there
is more than 1 rank to prevent two MDS with incompatible compat.
- A suite of `fs compat` commands exist to manipulate the file system
compat. These exist mostly for testing.
The consequence of these changes is that the upgrade procedure for MDS
can be updated to no longer require turning off all MDS but rank 0
before performing any upgrades. A CompatSet change would cause all MDS
receiving the new MDSMap to suicide due to incompatibility (if so).
Instead, the monitors will no longer assign an incompatible MDS to a
file system and enforce an upgrade procedure if incompatibilities exist.
Fixes: https://tracker.ceph.com/issues/49720 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Tue, 30 Mar 2021 21:07:46 +0000 (14:07 -0700)]
mon: do not update inline incompat except via mds
The MDS_FEATURE_INCOMPAT_INLINE feature indicates that an MDS knows how
to read/write inline data and that the file system may have it. The
separate setting for inline_data protects this file system feature.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
With the implementation of DBStore, it was determined that the API used
for writing in Zipper was too tied to RADOS. Implement a clean writing
API named Writer.
Signed-off-by: Daniel Gryniewicz <dang@redhat.com>
1. ver-health.sh:
a. TEST_check_version_health_1():
To avoid intermittent timeouts observed in wait_for_health_string(),
increase the wait time to 20 secs.
qa/standalone/scrub: Force a subset of scrub tests to use "wpq" scheduler
The following tests in the test files mentioned below use the
"osd_scrub_sleep" option to introduce delays during scrubbing to help
determine scrubbing states, validate reservations during scrubbing etc..
This works when using the "wpq" scheduler.
But when the "mclock_scheduler" is enabled, the "osd_scrub_sleep" is
disabled and overridden to 0. This is done to delegate the scheduling of
the background scrubs to the "mclock_scheduler" based on the set QoS
parameters. Due to this, the checks to verify the scrub states,
reservations etc. fail since the window to check them is very short
due to scrubs completing very quickly. This affects a small subset of
scrub tests mentioned below,
Only for the above tests, until there's a reliable way to query scrub
states with "--osd-scrub-sleep" set to 0, the "osd_op_queue" config
option is set to "wpq".
qa/standalone/erasure-code: Modify erasure-code tests for mclock scheduler
Modified test cases:
1. test-erasure-eio.sh:
a. Test_ec_backfill_unfound():
- Set osd_mclock_profile to high_recovery_ops profile.
- Increase the wait for backfill_unfound timeout to 240 secs.
qa/standalone/osd-backfill: Modify backfill tests for mclock scheduler
Modified test cases:
1. osd-backfill-prio.sh:
Set osd_op_queue = wpq for all tests since the mclock doesn't
consider recovery priority as part of its scheduling algorithm.
2. osd-backfill-space.sh:
Set osd_mclock_profile to high_recovery_ops and increase the wait
for backfills timeout to 1200 secs for the following tests:
- TEST_backfill_test_simple()
- TEST_backfill_test_multi()
- TEST_backfill_test_sametarget()
- TEST_backfill_multi_partial()
- TEST_ec_backfill_simple()
- TEST_ec_backfill_multi()
- SKIP_TEST_ec_backfill_multi_partial()
- SKIP_TEST_ec_backfill_multi_partial()
3. osd-backfill-stats:
- TEST_backfill_ec_down_all_out():
Set osd_mclock_profile to high_recovery_ops and increase the wait
for recovery timeout to 240 secs.
qa/standalone/osd: Modify osd tests for mclock scheduler
Modified test cases:
1. osd-recovery-prio.sh:
Set osd_op_queue = wpq for all tests since mclock
doesn't consider recovery priority as part of its
scheduling algorithm.
2. osd-recovery-stats.sh:
a. TEST_recovery_undersized():
- Set osd_mclock_profile to high_recovery_ops profile.
- Increase wait for recovery timeout to 300 secs.
3. osd-rep-recov-eio.sh:
a. TEST_rep_backfill_unfound():
- Set osd_mclock_profile to high_recovery_ops profile.
- Increase wait for backfill_unfound to 360 secs.
4. repeer-on-acting-back.sh:
a. TEST_repeer_on_down_act():
- Set osd_mclock_profile to high_recovery_ops profile.
(To improve the test duration)
qa/standalone: Modify ceph-helpers.sh tests for mclock scheduler.
List of changes:
1. Remove the enforcement to use osd_op_queue=wpq when an osd is brought
up in the following functions:
- run_osd()
- run_osd_filestore() and
- activate_osd()
2. New functions:
- get_op_scheduler() - Get the current osd_op_queue for an osd.
3. Modified test cases:
- test_run_osd() - Add check for osd_max_backfill count.
The mclock scheduler overrides the count to 1000.
4. New test cases:
- test_activate_osd_after_mark_down()
- test_get_op_scheduler()
osd: Add a new config option to forcibly run OSD benchmark on init
The new config option "osd_mclock_force_run_benchmark_on_init" is
introduced to allow a user to force run the OSD benchmark test on every
OSD boot-up even if the historical data about the OSD's iops capacity is
available on the MON config store. The 'force_run_benchmark' flag is set
to the value indicated by the new config option.
By default this new config option is set to false.
The utility of this option is to help refresh the OSD iops capacity
when the underlying device's performance characteristics have changed
significantly. In such cases, the OSD can be restarted with this option
enabled temporarily. Once the new iops capacity is updated to the MON
store, this option can be removed from the OSD's start-up config.
osd: Add mechanism to avoid running OSD benchmark on every OSD boot-up
Use "mon_cmd_set_config()" to store the OSD's max iops capacity to
the MON store during the first bring-up. Don't run the OSD benchmark
test on subsequent boot-ups if a previously persisted iops capacity is
available on the MON store and is different from the default iops
capacity.
Add the 'force_run_benchmark' flag to force a run of the benchmark
in case the default iops capacity cannot be determined.
common/config: Add methods to return the default value of a config option
Add wrapper method "get_val_default()" to the ConfigProxy class that takes
the config option key to search. This method in-turn calls another method
with the same name added to md_config_t class that does the actual work of
searching for the config option. If the option is valid, _get_val_default()
is used to get the default value. Otherwise, the wrapper method returns
std::nullopt.
osd: Add method to store config option key/value on the MON store
Add method mon_cmd_set_config() to save config option key and
value to the MON store. The ConfigMonitor command, 'config set' is
used to achieve this.
A corresponding get method is unnecessary since any config option
found on the MON store is loaded during OSD boot-up and set using
the md_config_t::set_mon_vals() method. Therefore, the existing
versions of ConfigProxy::get_val() method are sufficient to get
the latest value for the config option.
so we check SNAPPY_VERSION to tell if we should use `uint32_t` or
`uint32`.
in this change, snappy version used to build win32 client is bumped
to the latest stable version, v1.1.9, to include the fix of
SNAPPY_VERSION. this paves the road to fix of https://tracker.ceph.com/issues/50934
The clean_cgroup method assumes that the ctx.fsid is set while this is
true for the bootstrap command, it isn't set for adopt or deploy commands
(and maybe others).
This ends up to the adopt command to fails:
Traceback (most recent call last):
File "/sbin/cephadm", line 8301, in <module>
main()
File "/sbin/cephadm", line 8289, in main
r = ctx.func(ctx)
File "/sbin/cephadm", line 1764, in _default_image
return func(ctx)
File "/sbin/cephadm", line 5091, in command_adopt
command_adopt_ceph(ctx, daemon_type, daemon_id, fsid)
File "/sbin/cephadm", line 5299, in command_adopt_ceph
osd_fsid=osd_fsid)
File "/sbin/cephadm", line 2884, in deploy_daemon_units
clean_cgroup(ctx, unit_name)
File "/sbin/cephadm", line 2724, in clean_cgroup
if not ctx.fsid:
File "/sbin/cephadm", line 155, in __getattr__
return super().__getattribute__(name)
AttributeError: 'CephadmContext' object has no attribute 'fsid'
Since we already have the fsid value in deploy_daemon_units (which calls
clean_cgroup) then we can pass the fsid value directly.
Adam King [Mon, 19 Jul 2021 16:07:39 +0000 (12:07 -0400)]
mgr/cephadm: stop removal of daemons from offline hosts
This check was only looking for the status of the
host and not looking at the offline_hosts set so
it wasn't actually stopping daemons from being removed
from offline hosts
Patrick Donnelly [Wed, 28 Jul 2021 17:45:08 +0000 (10:45 -0700)]
Merge PR #42349 into master
* refs/pull/42349/head:
mon/MDSMonitor: propose if FSMap struct_v is too old
mon/MDSMonitor: give a proper error message if FSMap struct_v is too old
mds/FSMap: use DECODE_OLDEST to gate FSMap version
qa: add tests for fs dump of epoch and trimming
qa: add file system support for dumping epoch
mon/MDSMonitor: return mon_mds_force_trim_to even if equal to current epoch
mon: add debugging for trimming methods
mon: fix debug spacing
qa: add nofs upgrade suite
Patrick Donnelly [Wed, 28 Jul 2021 17:34:12 +0000 (10:34 -0700)]
Merge PR #41025 into master
* refs/pull/41025/head:
qa: wait pgs to be clean before using the pools
qa: ignore PG_RECOVERY_FULL and PG_DEGRADED for mds-full
qa: wait more time since there have many more pgs than before
qa: do not multiple the full ratio twice
qa: do not raise for kclient for _fsync test
qa: use the pg autoscale mode to calcuate the pg_num
qa: set the object_size to 1M
qa: move the is_full() to parent class
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Sometimes, it can happen that the osds being destroyed in those tests
are not yet marked as 'down' for some reason. Let's add some retries on
those tasks to avoid CI failures.
Patrick Donnelly [Thu, 15 Jul 2021 01:02:20 +0000 (18:02 -0700)]
mon/MDSMonitor: propose if FSMap struct_v is too old
To flush older versions which may still be an empty MDSMap (for clusters
that have never used CephFS), we need to force a proposal so older
versions of the struct are trimmed.
This is the main fix of this branch. We removed code which processed old
encodings of the MDSMap in the mon store via 60bc524. That broke old
ceph clusters which never used CephFS (see cited ticket below). This is
because the initial epoch is an empty MDSMap (back in Infernalis/Hammer)
that is never updated. So, the fix here is to just do proposals
periodically until all of the old structs are automatically trimmed by
the mons.
Fixes: 60bc524827bac072658203e56b1fa3dede9641c5 Fixes: https://tracker.ceph.com/issues/51673 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>