libudev uses fnmatch(3) for matching attributes, meaning that shell
glob pattern matching is employed instead of literal string matching.
Escape glob metacharacters to suppress pattern matching.
retire_extent_addr can only be called on absolute or record-relative
addrs. Record-relative addrs are only valid on extents allocated as
part of the current transaction.
Samuel Just [Fri, 27 Aug 2021 05:42:38 +0000 (05:42 +0000)]
crimson/os/seastore/lba_manager/btree: fix get_val() paddr value from iterator
This was causing a stray RETIRED_PLACEHOLDER to be created resulting in
a segment cleaner segfault in release on an invalid segment and a crash
upon adding a duplicate lba btree pin since the returned addr didn't
match the addr of the INITIAL_PENDING extent on the transaction.
Fixes: https://tracker.ceph.com/issues/52434 Fixes: https://tracker.ceph.com/issues/52435 Signed-off-by: Samuel Just <sjust@redhat.com>
Patrick Donnelly [Sat, 28 Aug 2021 01:28:02 +0000 (21:28 -0400)]
Merge PR #42620 into master
* refs/pull/42620/head:
mds: switch mds_lock to fair mutex
common/Timer: make SafeTimer a template
common/fair_mutex: add is_locked_by_me support
common: do not compile condition_variable_debug in none debug mode
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Sat, 28 Aug 2021 01:26:41 +0000 (21:26 -0400)]
Merge PR #38481 into master
* refs/pull/38481/head:
qa/vstart_runner: inherit methods instead of duplicating them
qa/ceph_manager: make it possible to reuse few methods
qa/vstart_runner: don't use "shell=False" in run_ceph_w()
qa/ceph_manager: minor refactor
Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Replace previous implementation with one based around an internal
iterator interface. Besides simplifying the implementation and
removing duplicate lookups in the allocation pathway, this implementation
should correct a design problem in the prior implementation wherein
LBALeafNode::find_hole couldn't see the first element of the subsequent
node and therefore assumed that there was one at get_meta().end.
This patch removes the btree logic from lba_btree_node_impl.* leaving
the LBAInternalNode and LBALeafNode layout in lba_btree_node.*.
lba_btree.h/cc now have the main btree update/query logic.
Samuel Just [Fri, 20 Aug 2021 08:03:22 +0000 (01:03 -0700)]
crimson/os/seastore: get_next_dirty_extents: record in transaction read set
Record the extents in the read set after wait_io() as in get_extent. This
should ensure that the interruptible_future machinery will handle the
event that one of them gets invalidated prior to beging rewritten.
Kefu Chai [Thu, 26 Aug 2021 15:57:00 +0000 (23:57 +0800)]
cmake: use upstream repo for fio
this change partially reverts 10baab3fc8293b8c30ca90a4acd76f70d011f1b5,
but since the fix for C++ build is not included by any tag or branche so
far. let's just use the sha1 for now.
Kefu Chai [Wed, 25 Aug 2021 14:25:54 +0000 (22:25 +0800)]
crimson/os: use structured binding in loop
also avoid using `map[key] = val` for setting an item in map, as, if
the key does not exist in map, `map[key]` would have to create a value
using its default ctor, and then call the `operator=(bufferlist&&)` to
set it.
Joseph Sawaya [Wed, 25 Aug 2021 13:54:17 +0000 (09:54 -0400)]
mgr/rook: add better error handling to remove
This commit adds better error handling to the remove method
in the DefaultRemover by hiding stack traces from the user
and displaying meaningful error messages.
Yaarit Hatuka [Wed, 25 Aug 2021 02:12:08 +0000 (02:12 +0000)]
rpm, debian: move smartmontools and nvme-cli to ceph-base
We wish to be able to scrape SMART and NVMe metrics from OSD and MON
nodes. For this we require / recommend smartmontools and nvme-cli
dependencies for both the ceph-osd and ceph-mon packages. However, the
sudoers file (which is required for invoking `smartctl` by user 'ceph')
was installed only in the ceph-osd package. Since different packages
cannot own the same file, and because we want to be able to scrape from
every daemon, we move the dependencies and the sudoers installation to
ceph-base. For generalization, we rename:
sudoers.d/ceph-osd-smartctl -> sudoers.d/ceph-smartctl
Jianpeng Ma [Fri, 20 Aug 2021 06:29:37 +0000 (14:29 +0800)]
librbd/cache/pwl/ssd: fix first_valid_entry calculation in retire_entries()
Consider one control_block which cotain multi encode(WriteLogCacheEntry):
Log1: WriteLogEntry
Log2: WriteLogEntry
Log3: Non-WriteLogEntry
For this case, currently calc method is: control_block_pos + sizeof(control_block).
But in fact, it should: control_block_pos + sizeof(control_block) +
data_length(Log1 + Log2).
Wrong first_valid_entry will persist to superblock and restart to read.
This cause read wrong position and when decode(WriteLogCacheEntry) it
will report bug.
Fixes: https://tracker.ceph.com/issues/52323 Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Xiubo Li [Mon, 2 Aug 2021 03:53:37 +0000 (11:53 +0800)]
mds: switch mds_lock to fair mutex
The implementations of the Mutex (e.g. std::mutex in C++) do not
guarantee fairness, they do not guarantee that the lock will be
acquired by threads in the order that they called the lock().
In most case this works well, but in corner case in the Finisher
thread in mds daemon, which may call more than one complete()s
once the mdlog flushing succeeds, after the mdlog flushing is done
it will call the queued complete callbacks and the Finisher thread
could always successfully acquire the mds_lock in successive
complete callbakcs even there may have other threads already being
stuck waiting the mds_lock. This will make the other threads starve
and if they are client's requests, it will cause several or even
tens of seconds long delay for user's operations.
This will switch the mds_lock to fair mutex and it could make sure
that the all the mds_lock waiters are in FIFO order and the Finisher
thread won't hold the mds_lock that long.
At the same time, if the finisher thread has many completes needed
to run the fair mutex could guarantee that the finisher won't be
scheduled out due to fair mutex unlock() if no any other mds_lock
waiter queued.
Fixes: https://tracker.ceph.com/issues/51722 Signed-off-by: Xiubo Li <xiubli@redhat.com>
crimson/osd: implicitly append '--smp 1' when invoked without it
This commit is basically a hack supposed to fulfil the obligation
of crimson being a drop-in replacement for the classical OSD
we already made by packaging it under `/usr/bin/ceph-osd`.
The discussion whether the interface-exactness should be continued
or not is out of scope of the commit; it's supposed just to handle
the issue unveiled by the Rook integration effort: `crimson-osd`
is unable to `--mkfs` because Seastar, if not restricted by passing
`--smp N`, considers all CPU cores available in the system when
allocating resources. This leads to the following error:
```
ERROR 2021-08-24 14:17:32,105 [shard 5] seastar - Could not setup Async I/O: Resource temporarily unavailable. The most common cause is not enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that number or reducing the amount of logical CPUs available for your application
```
This hack will need to be dropped when integrating multi-reactor
support in crimson.