libudev uses fnmatch(3) for matching attributes, meaning that shell
glob pattern matching is employed instead of literal string matching.
Escape glob metacharacters to suppress pattern matching.
See Rook issue https://github.com/rook/rook/issues/7940 for full
information.
Ceph bluestore disks can sometimes appear as though they have "phantom"
Atari (AHDI) partitions created on them when they don't in reality. This
is due to a series of bugs in the Linux kernel when it is built with
Atari support enabled. This behavior does not appear for raw mode OSDs on
partitions, only on disks.
Changing the on-disk format of Bluestore OSDs comes with
backwards-compatibility challenges, and fixing the issue in the Kernel
could be years before users get a fix. Working around the Kernel issue
in ceph-volume is therefore the best place to fix the issue for Ceph.
To work around the issue in Ceph volume, there are two behaviors that need
adjusted:
1. `ceph-volume inventory` should not report that a partition is
available if the parent device is a BlueStore OSD.
2. `ceph-volume raw list` should report parent disks if the disk is a
BlueStore OSD and not report the disk's children, BUT it should still
report children if the parent disk is not a BlueStore OSD.
Using only the exit status of `ceph-bluestore-tool show-label` to
determine if a device is a bluestore OSD could report a false negative
if there is a system error when `ceph-bluestore-tool` opens the device.
A better check is to open the device and read the bluestore device
label (the first 22 bytes of the device) to look for the bluestore
device signature ("bluestore block device"). If ceph-volume fails to
open the device due to a system error, it is safest to assume the device
is BlueStore so that an existing OSD isn't overwritten.
David Caro [Wed, 16 Jun 2021 08:32:15 +0000 (10:32 +0200)]
monitoring/grafana/cluster: use per-unit max and limit values
The value we get is a perunit, so the limits and the max value should
be over 1, not 100. Note that the value being shown was correct, it
was the gauge that was not showing the correct indicators.
octopus-only: this change was introduced in: c2486c7239f2efff1f87a0c6064ccbf792e90bf0
as a linking unintentional typo while creating symlink, the parent
commit was octopus only hence we don't require it in other branches.
Fixes failures like the following with rhel 8.3 in octopus
```
2021-08-03T17:32:19.328 INFO:tasks.workunit.client.0.smithi148.stdout:No match for argument: libarchive-3.3.3
2021-08-03T17:32:19.338 INFO:tasks.workunit.client.0.smithi148.stderr:Error: Unable to find a match: libarchive-3.3.3
2021-08-03T17:32:19.376 DEBUG:teuthology.orchestra.run:got remote process result: 1
2021-08-03T17:32:19.377 INFO:tasks.workunit:Stopping ['rados/test_envlibrados_for_rocksdb.sh'] on client.0...
```
Follow-up to https://github.com/ceph/ceph/pull/42421
Improvements and some adaptations related to the jenkins job.
Fixes: https://tracker.ceph.com/issues/51612 Signed-off-by: Alfonso Martínez <almartin@redhat.com>
(cherry picked from commit 65b75000b7694cb3cbe617bbec28c513a2522be8)
Conflicts:
doc/dev/developer_guide/dash-devel.rst
- Put changes in HACKING.rst as this file does not exist in the octopus branch.
src/pybind/mgr/dashboard/ci/cephadm/bootstrap-cluster.sh
- Resolve conflict originated by code that is deleted anyway.
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
Patrick Donnelly [Fri, 30 Jul 2021 02:35:13 +0000 (19:35 -0700)]
Merge PR #42537 into octopus
* refs/pull/42537/head:
mon/MDSMonitor: propose if FSMap struct_v is too old
mon/MDSMonitor: give a proper error message if FSMap struct_v is too old
qa: add tests for fs dump of epoch and trimming
qa: add file system support for dumping epoch
mon/MDSMonitor: return mon_mds_force_trim_to even if equal to current epoch
mon: add debugging for trimming methods
mon: fix debug spacing
Reviewed-by: Ramana Raja <rraja@redhat.com> Reviewed-by: Neha Ojha <nojha@redhat.com>
Patrick Donnelly [Thu, 15 Jul 2021 01:02:20 +0000 (18:02 -0700)]
mon/MDSMonitor: propose if FSMap struct_v is too old
To flush older versions which may still be an empty MDSMap (for clusters
that have never used CephFS), we need to force a proposal so older
versions of the struct are trimmed.
This is the main fix of this branch. We removed code which processed old
encodings of the MDSMap in the mon store via 60bc524. That broke old
ceph clusters which never used CephFS (see cited ticket below). This is
because the initial epoch is an empty MDSMap (back in Infernalis/Hammer)
that is never updated. So, the fix here is to just do proposals
periodically until all of the old structs are automatically trimmed by
the mons.
Fixes: 60bc524827bac072658203e56b1fa3dede9641c5 Fixes: https://tracker.ceph.com/issues/51673 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 56c3fc802ee8848ba85da4300adcc2ee8bd95416)
Conflicts:
src/mds/FSMap.cc: adjust for octopus which decodes old MDSMaps
src/mon/MDSMonitor.h: trivial conflicts
Patrick Donnelly [Wed, 14 Jul 2021 20:31:21 +0000 (13:31 -0700)]
mon/MDSMonitor: return mon_mds_force_trim_to even if equal to current epoch
The PaxosService code already excludes the value returned by
PaxosService::get_trim_to as the upper bound of the range of epochs to
trim. Without this fix, you need to set mon_mds_force_trim_to to one
greater than the epoch you want to trim _and_ force the current epoch to
be one greater than that; the net result being that you can only force
trimming up to 2 epochs behind the current epoch.
This change is helpful for resolving issue 51673, but not strictly
necessary.
Related-to: https://tracker.ceph.com/issues/51673 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit d9dc2f11d56fb4341ba5823f8d17459d10f3b2c1)
Conflicts:
src/common/options/mon.yaml.in: drop doc change
Sometimes, it can happen that the osds being destroyed in those tests
are not yet marked as 'down' for some reason. Let's add some retries on
those tasks to avoid CI failures.
rpm: drop use of $FIRST_ARG in ceph-immutable-object-cache
The use of $FIRST_ARG was probably required because the SUSE-specific
%service_* rpm macros were playing tricks on the shell positional parameters.
This is bad practice and error-prone, so let's assume that no macros should do
that anymore and hence it's safe to assume that positional parameters remain
unchanged after any rpm macro call.
Since inject_facts_as_vars is set to false in the ansible.cfg file then we
have to update the references to use ansible_facts[<thing>] instead of
ansible_<thing>.
We already install the dependency from ceph-ansible requirements.txt and to
avoid false positive (like after rebooting a node) we can retry failing test.
Without loading the ansible.cfg file from ceph-ansible project, we don't
have the pipelining enabled which can result in significant performance
improvement.
This removes the ANSIBLE_ACTION_PLUGINS, ANSIBLE_RETRY_FILES_ENABLED and
ANSIBLE_SSH_RETRIES environment variables as it is already included in the
ansible.cfg file.
ceph-volume/tests: update ansible ssh_args env var
The ansible ssh_args parameter is usually defined in the ansible.cfg file.
Currently this variable is overrided in tox to manage the vagrant ssh file
but we lost all default values.
we should not proceed against user's will if dual stack is specified but
only one network for a network family can be found. the right fix is
have better error message and documentation, not to tolerate the
failure.
Matthew Oliver [Mon, 10 Aug 2020 04:46:21 +0000 (04:46 +0000)]
pick_address: Warn and continue when you find at least 1 IPv4 or IPv6 address
Currently if specify a single public or cluster network, yet have both
`ms bind ipv4` and `ms bind ipv6` set daemons crash when they can't find
both IPs from the same network:
unable to find any IPv4 address in networks '2001:db8:11d::/120' interfaces ''
And rightly so, of course it can't find an IPv4 network in an IPv6
network.
This patch, adds a new helper method, networks_address_family_coverage,
that takes the list of networks and returns a bitmap of address families
supported.
We then check to see if we have enough networks defined and if you don't
it'll warn and then continue.
Also update the network-config-ref to mention having to define both
address family addresses for cluster and or public networks.
As well as a warning about `ms bind ipv4` being enabled by default which
is easy to miss, there by enabling dual stack when you may only be
expect single stack IPv6.
Thee is also a drive by to fix a `note` that wan't being displayed due
to missing RST syntax.
Signed-off-by: Matthew Oliver <moliver@suse.com> Fixes: https://tracker.ceph.com/issues/46845 Fixes: https://tracker.ceph.com/issues/39711
(cherry picked from commit 9f75dfbf364f5140b3f291e0a2c6769bc3d8cbac)
Adam Kupczyk [Mon, 24 May 2021 12:27:05 +0000 (14:27 +0200)]
os/bluestore/bluefs: Add test that detects bluefs inconsistency
Add test that detects possible scenario that will cause BlueFS to have file
that contains data that has never been written. This is done by tricking
replay log to already accept file metadata (size, allocations), but actual data
stored in these allocations is not yet synced to disk.
Scenario:
1) write to file h1 on SLOW device
2) flush h1 (and trigger h1 mark to be added to bluefs replay log)
3) write to file h2
4) fsync h2 (forces replay log to be written)
The result is:
- bluefs log now has stable state of h1
- SLOW device is not yet flushed (no fdatasync())
rpm: cleanup: drop %service_del_postun_without_restart
SUSE needs %service_del_postun (with or without restart) *only* if there
is a possibility that the RPM containing the unit file will be upgraded
from a version that packaged SysVinit scripts instead of systemd unit
files. (Which is not the case here.)
Adam Kupczyk [Mon, 24 May 2021 12:49:51 +0000 (14:49 +0200)]
os/bluestore/bluefs: Remove possibility of bluefs replay log containing files without data
It had been possible to have a bluefs replay log to serialize file metadata (size, allocations),
but actual data stored in these allocations is not yet synced to disk.
This could happen if _flush_range(h1) allocated space for file h1 on device (like SLOW) that will not
be used when flushing future replay log. Such thing can happen when we have h2 that wrote to WAL and
out replay log is on DB. After fsync(h2) we write to replay log, wait for fdatasync on WAL and DB.
There is no waiting on SLOW, but h1 was dirty and has been serialized to replay log.
Solution is to delay notifying replay log that it has to include h1 after finishing fdatasync.
Fixes: https://tracker.ceph.com/issues/50965 Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
(cherry picked from commit 03ac53f7d4c83e56f664ad371ffe3bc2d40e1837)
Dan van der Ster [Tue, 29 Jun 2021 20:36:00 +0000 (22:36 +0200)]
mgr/DaemonServer: skip redundant update of pgp_num_actual
During PG merge the MGR was observed repeatedly sending identical
set pgp_num_actual values, leading to osdmap churn at 2000/hr.
Skip the redundant osd set pgp_num_actual command if the
pgp_num is already our computed next.
Fixes: https://tracker.ceph.com/issues/51433 Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
(cherry picked from commit 3f15749de0d550a124f8c6afbd457f17ef020963)
Ilya Dryomov [Wed, 26 Aug 2020 12:12:29 +0000 (14:12 +0200)]
rbd: fix default pool handling for krbd map/unmap
The default pool name does not get passed to the kernel since commit 96f05a7956b3 ("rbd: delay determination of default pool name"). The
kernel ends up interpreting the image name as the pool name (and the
snapshot name as the image name).
Alfonso Martínez [Mon, 19 Jul 2021 07:57:26 +0000 (09:57 +0200)]
mgr/dashboard: run cephadm-backend e2e tests with KCLI
Fixes: https://tracker.ceph.com/issues/51300 Signed-off-by: Alfonso Martínez <almartin@redhat.com>
(cherry picked from commit 5c03b49c4da55cf8d0c679ecb2c58182e4d3361a)
Conflicts:
- Added content in HACKING.rst as dash-devel.rst does not exist in octopus:
doc/dev/developer_guide/dash-devel.rst
src/pybind/mgr/dashboard/HACKING.rst
- Adapted code to octopus branch in the following files due to branch divergence:
src/pybind/mgr/dashboard/frontend/cypress.json
src/pybind/mgr/dashboard/frontend/cypress/integration/cluster/configuration.e2e-spec.ts
src/pybind/mgr/dashboard/frontend/cypress/integration/cluster/hosts.po.ts
src/pybind/mgr/dashboard/frontend/cypress/integration/cluster/osds.e2e-spec.ts
src/pybind/mgr/dashboard/frontend/cypress/integration/orchestrator/workflow/01-hosts.e2e-spec.ts
src/pybind/mgr/dashboard/frontend/cypress/integration/page-helper.po.ts
src/pybind/mgr/dashboard/frontend/cypress/integration/ui/dashboard.e2e-spec.ts
Jason Dillaman [Thu, 29 Oct 2020 14:10:56 +0000 (10:10 -0400)]
librbd: refresh full global config when applying metadata
The ConfigProxy contains a point-in-time copy of the global config
that is dynamically updated in CephContext::_conf. Upon an image
refresh, pull the latest version of the global config from the
CephContext and apply it to the config stored within the ImageCtx.
Fixes: https://tracker.ceph.com/issues/48035 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 352dec753ead8b61e19b46d096255e06393b740f)