Laura Flores [Thu, 7 Jul 2022 17:15:10 +0000 (12:15 -0500)]
.github/workflows: run the stale bot every hour
Currently, the stale bot runs once a day at 1:30 UTC.
This will only process a few PRs per day, which is not
enough to handle the volume of PRs that come into the
Ceph repository. Running the bot once every hour with
a limit of 30 operations will keep within the rate
limit, but it will also process more PRs per day.
Sébastien Han [Wed, 6 Jul 2022 15:10:45 +0000 (17:10 +0200)]
.github/workflows: process PRs in ascending order
The Ceph repo has many PRs and we cannot process all the PRs with the
default "operations-per-run" value (30). At the time of writting the bot
processes 408 every day and there are around 938 PRs.
Even the job informs us that not enough PRs might have been processed
and encouraged us to increase "operations-per-run" if possible. However
it might expose us to Github's API rate limit.
So let's operate with the oldest PRs first, this should close a bunch of
PRs already. If not enough we can try to increase "operations-per-run".
osd: Set initial mClock QoS params at CONF_DEFAULT level
Create the initial mClock QoS params at CONF_DEFAULT level using
set_val_default(). This allows switching to a custom profile on a
running OSD and to make necessary changes to the desired QoS params.
Note that Switching to ‘custom’ profile and then subsequently changing
the QoS params using “config set osd.n …” will be at a higher level i.e.
at CONF_MON.
But When switching back to a built-in profile, the new values won’t take
effect since CONF_DEFAULT < CONF_MON. For the values to take effect, the
config keys created as part of the ‘custom’ profile must be removed from
the ConfigMonitor store after switching back to a built-in profile.
- Added a couple of standalone tests to exercise the scenario.
- Updated the mClock configuration document and the mClock internal
documentation with a couple of typos relating to the best effort weights.
- Added new sections to the mClock configuration document outlining the
steps to switch between the built-in and custom profile and vice-versa.
crimson/os/seastore/circular_bounded_journal: do not split records
* no split record due to relative paddr resolution
* fix md_bl.substr_of(bl, 0, header.mdlength)
* maintain written_to in range [get_start_addr(), get_journal_end())
crimson/osd: fix life-time management of OSDConnectionPriv
Before the patch there was a possibility that `OSDConnectionPriv`
gets destructed before a `PipelineHandle` instance that was using
it. The reason is our remote-handling operations store `conn` directly
while `handle` is defined in a parent class. Due to the language rules
the former gets deinitialized earlier.
```
==756032==ERROR: AddressSanitizer: heap-use-after-free on address 0x615000039684 at pc 0x0000020bdfa2 bp 0x7ffd3abfa370 sp 0x7ffd3abfa360
READ of size 1 at 0x615000039684 thread T0
Reactor stalled for 261 ms on shard 0. Backtrace: 0x45d9d 0xe90f6d1 0xe6b8a1d 0xe6d1205 0xe6d16a8 0xe6d1938 0xe6d1c03 0x12cdf 0xccebf 0x7f6447161b1e 0x7f644714aee8 0x7f644714eed6 0x7f644714fb36 0x7f64471420b5 0x 7f6447143f3a 0xd61d0 0x32412 0xbd8a7 0xbd134 0xbdc1a 0x20bdfa1 0x20c184e 0x352eb7f 0x352fa28 0x20b04a5 0x1be30e5 0xe694bc4 0xe6ebb8a 0xe843a11 0xe845a22 0xe29f497 0xe2a3ccd 0x1ab1841 0x3aca2 0x175698d
#0 0x20bdfa1 in seastar::shared_mutex::unlock() ../src/seastar/include/seastar/core/shared_mutex.hh:122
#1 0x20c184e in crimson::OrderedExclusivePhaseT<crimson::osd::ConnectionPipeline::GetPG>::exit() ../src/crimson/common/operation.h:548
#2 0x20c184e in crimson::OrderedExclusivePhaseT<crimson::osd::ConnectionPipeline::GetPG>::ExitBarrier::exit() ../src/crimson/common/operation.h:533
#3 0x20c184e in crimson::OrderedExclusivePhaseT<crimson::osd::ConnectionPipeline::GetPG>::ExitBarrier::cancel() ../src/crimson/common/operation.h:539
#4 0x20c184e in crimson::OrderedExclusivePhaseT<crimson::osd::ConnectionPipeline::GetPG>::ExitBarrier::~ExitBarrier() ../src/crimson/common/operation.h:543
#5 0x20c184e in crimson::OrderedExclusivePhaseT<crimson::osd::ConnectionPipeline::GetPG>::ExitBarrier::~ExitBarrier() ../src/crimson/common/operation.h:544
#6 0x352eb7f in std::default_delete<crimson::PipelineExitBarrierI>::operator()(crimson::PipelineExitBarrierI*) const /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/unique_ptr.h:85
#7 0x352eb7f in std::unique_ptr<crimson::PipelineExitBarrierI, std::default_delete<crimson::PipelineExitBarrierI> >::~unique_ptr() /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/unique_ptr.h:361
#8 0x352eb7f in crimson::PipelineHandle::~PipelineHandle() ../src/crimson/common/operation.h:457
#9 0x352eb7f in crimson::osd::PhasedOperationT<crimson::osd::ClientRequest>::~PhasedOperationT() ../src/crimson/osd/osd_operation.h:152
#10 0x352eb7f in crimson::osd::ClientRequest::~ClientRequest() ../src/crimson/osd/osd_operations/client_request.cc:64
#11 ...
```
Before we were treating the `--no-mon-config` as one of the
`seastar_n_early_args`. However, this was wrong as it truly
belongs to `config_proxy_args` as:
```cpp
int md_config_t::parse_argv(ConfigValues& values,
const ConfigTracker& tracker,
std::vector<const char*>& args, int level)
{
// ...
else if (ceph_argparse_flag(args, i, "--no-mon-config", (char*)NULL)) {
values.no_mon_config = true;
}
// ...
}
```
The net result of this ignoring `--no-mon-config` which was
the reason behind many dead jobs at Sepia.
doc/cephadm/services: the config section of service specs Fixes: https://tracker.ceph.com/issues/53997 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
If any clone is in pending or in-progress state then
show these clones in 'fs subvolume snapshot info'
command output. This field only exists if clones are
in pending or in progress state.
Kai [Sun, 3 Jul 2022 19:36:27 +0000 (21:36 +0200)]
README.md: HTTP => HTTPS
Switching the link http://ceph.com/ from HTTP to HTTPS, so https://ceph.com/, to skip the redirect when opening it.
(http://ceph.com/ is being redirected to https://ceph.com/)
Signed-off-by: Kai Hollberg <kai.hollberg@googlemail.com>
In order to support filters, make the base classes in Zipper pure
virtual. Then, derive from them a set of Store base classes, that
implement common code used by stores, and a set of Filter base classes
that automatically pass through to the next layer. Modify the stores to
derive from the Store base classes.
This implements the pure virtual base classes, the Store base classes,
and a framework of the Filter base classes.
Signed-off-by: Daniel Gryniewicz <dang@redhat.com>
wanwencong [Fri, 24 Jun 2022 15:54:52 +0000 (23:54 +0800)]
rbd-fuse: librados will filter out -r option from command-line
The -r option will be filtered out by librados
when exec cmd "rbd-fuse /mountpoint -p pool_name -r rbd_name"
other rbds can be seen under the mount point
crimson/tools: fix FTBFS due to seastore/segment_cleaner.h
```
In file included from ../src/crimson/tools/store_nbd/tm_driver.cc:4:
../src/crimson/tools/store_nbd/tm_driver.h:7:10: fatal error: crimson/os/seastore/segment_cleaner.h: No such file or directory
7 | #include "crimson/os/seastore/segment_cleaner.h"
|
```
crimson/osd: implement CEPH_OSD_OP_SETALLOCHINT in OpsExecuter
This commits brings support setting allocation hints to `OpsExecuter`.
What is important to note that `SETALLOCHINTS`, at the ops execution
layer, behaves basically like `TOUCH`, and thus should be ignored
(for now) at the object store layer to not miss the part constituted
by `PGBackend::maybe_create_new_object()`.
crimson/os: ignore CEPH_OSD_OP_SETALLOCHINT in SeaStore
At the moment crimson ignores this operation at the layer
of ops execution. However, this handles only those alloc
hints that come from clients while `set_alloc_hint` can be
called from `ReplicatedRecoveryBackend::prep_push_target()`.
Likely this was the reason behind the following crash:
```
INFO 2022-06-20 11:03:38,952 [shard 0] osd - Entering state: Started/ReplicaActive/RepRecovering
ERROR 2022-06-20 11:03:39,002 [shard 0] seastore - SeaStore::_do_transaction_step: bad op 39
```