This PR adds a section to the Developer Guide chapter
"Essentials" that explains what Dependabot is. This
section is adapted from an email from Ernesto Puerta
to the CLT that was sent on 08 Jul 2022.
TransactionManager::get_extents_if_live should return a list of
extents that are located in range paddr~len. When SegmentCleaner
invokes get_extents_if_live, the target extent may have been split into
multiple pieces by other transaction, so only search the paddr as key
will lose other pieces need to be rewritten.
Signed-off-by: Zhang Song <zhangsong325@gmail.com>
cmake: link librados applications against ceph-common
to address link failures like:
[100%] Linking CXX executable ../../../bin/unittest_global_doublefree
/opt/rh/gcc-toolset-12/root/usr/bin/ld: /opt/rh/gcc-toolset-12/root/usr/lib/gcc/x86_64-redhat-linux/12/libstdc++_nonshared.a(sstream-inst80.o): undefined reference to symbol '_ZTVNSt7__cxx1119basic_ostringstreamIwSt11char_traitsIwESaIwEEE@@GLIBCXX_3.4.21'
/opt/rh/gcc-toolset-12/root/usr/bin/ld: /usr/lib64/libstdc++.so.6: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
this happens when using gcc-toolset to build the tree.
because neither librados.so nor libcephfs exposes libstdc++ symbols
to executable linking against it. while CMake uses "c++" to link
C++ executables. the "c++" executable comes from GTS links the C++
executables agaist
/opt/rh/gcc-toolset-12/root/usr/lib/gcc/x86_64-redhat-linux/12/libstdc++.so,
which in turn is a ld script:
```
$ cat /opt/rh/gcc-toolset-12/root/usr/lib/gcc/x86_64-redhat-linux/12/libstdc++.so
/* GNU ld script
Use the shared library, but some functions are only in
the static library, so try that secondarily. */
OUTPUT_FORMAT(elf64-x86-64)
INPUT ( /usr/lib64/libstdc++.so.6 -lstdc++_nonshared )
```
but the thing is, stdc++_nonshared references some symbols
provided by libstdc++.so.6, and it is listed before it. that's
why "ld" is not able to resolve the referenced symbols used by
the executable, despite that they are provided by libstdc++ in
this case.
in this change, ceph-common is added to the linkage of executables
linked against librados and/or libcephfs, even the executables
in question does not reference ceph-common symbols. unlike librados,
libcephfs and librgw, ceph-common is an internal library, which does
not hide *any* symbols from its consumer, it is also able to provide
symbols from C++ standard library linked by it. so, in our case,
we can link the C++ executables against ceph-common for accessing
the C++ standard library. the reason why we don't link aginst libstdc++
explictly is that, we should leave this to the C++ compiler instead of
referencing a specific C++ standard library explictly by its name.
what if user wants to link against libc++ instead of libstdc++?
another fix could be to remove '-Wl,--as-needed' linker options
from the command line linking the librados applications, so the linker
does not ignore the symbols from libstdc++ when resolving the ones
referenced by stdc++_nonshared, but that would be complicated.
please note, linking against ceph-common does not change the linkage
of
* Ceph executables compiled using non-gcc-toolset toolchain, because we
always pass '-Wl,--as-needed' to "c++" when linking executables,
so "ld" should be able to drop ceph-common even we instruct it
to link against ceph-common. so it would be a no-op in this case.
* 3rd party librados executables compiled using non-gcc-toolset toolchain,
but linked against librados compiled using gcc-toolset toolchain.
because they still link against the /usr/lib64/libstdc++.so.6, when
these executables are compiled and linked. and librados is always
able to access libceph-common. so librados is safe.
`lsblk_all()` should return an empty dict `{}` if nothing was found.
If we raise `RuntimeError()` then the loop in `scan.Scan.main` will stop
and make ceph-volume fails because we don't try to catch this exception.
`scan.Scan.main()` has its own logic in order to detect the given path
is a ceph-disk created OSD anyway.
Laura Flores [Thu, 7 Jul 2022 17:15:10 +0000 (12:15 -0500)]
.github/workflows: run the stale bot every hour
Currently, the stale bot runs once a day at 1:30 UTC.
This will only process a few PRs per day, which is not
enough to handle the volume of PRs that come into the
Ceph repository. Running the bot once every hour with
a limit of 30 operations will keep within the rate
limit, but it will also process more PRs per day.
`cephadm` started passing this argv which caused the
problem reported by Li, Jianxin <jianxin1.li@intel.com>.
See:
* https://gist.github.com/rzarzynski/4d1225971b6c28758cb2b68fbda3bf5f?permalink_comment_id=4223998#gistcomment-4223998
* https://docs.ceph.com/en/octopus/cephadm/drivegroups/
doc/cephfs/dirfrags: clarify the unit of threshold limits
Rationale: There are many threshold limits for split and
merge in this doc that just says like:
"A directory fragment is eligible for splitting
when its size exceeds `mds_bal_split_size`
(default 10000)". Need to clarify what 10000 actually
means. This applies to all other such entries in this
doc.
Sébastien Han [Wed, 6 Jul 2022 15:10:45 +0000 (17:10 +0200)]
.github/workflows: process PRs in ascending order
The Ceph repo has many PRs and we cannot process all the PRs with the
default "operations-per-run" value (30). At the time of writting the bot
processes 408 every day and there are around 938 PRs.
Even the job informs us that not enough PRs might have been processed
and encouraged us to increase "operations-per-run" if possible. However
it might expose us to Github's API rate limit.
So let's operate with the oldest PRs first, this should close a bunch of
PRs already. If not enough we can try to increase "operations-per-run".
osd: Set initial mClock QoS params at CONF_DEFAULT level
Create the initial mClock QoS params at CONF_DEFAULT level using
set_val_default(). This allows switching to a custom profile on a
running OSD and to make necessary changes to the desired QoS params.
Note that Switching to ‘custom’ profile and then subsequently changing
the QoS params using “config set osd.n …” will be at a higher level i.e.
at CONF_MON.
But When switching back to a built-in profile, the new values won’t take
effect since CONF_DEFAULT < CONF_MON. For the values to take effect, the
config keys created as part of the ‘custom’ profile must be removed from
the ConfigMonitor store after switching back to a built-in profile.
- Added a couple of standalone tests to exercise the scenario.
- Updated the mClock configuration document and the mClock internal
documentation with a couple of typos relating to the best effort weights.
- Added new sections to the mClock configuration document outlining the
steps to switch between the built-in and custom profile and vice-versa.
Add ln command to cephfs-shell for enabling creation of hard and soft
links. It allows creating hard links to regular files and soft links to
regular files as well as directories. The behaviour of this cephfs-shell
command is kept as close as possible to the ln command from GNU
coreutils.
crimson/os/seastore/circular_bounded_journal: do not split records
* no split record due to relative paddr resolution
* fix md_bl.substr_of(bl, 0, header.mdlength)
* maintain written_to in range [get_start_addr(), get_journal_end())
crimson/osd: fix life-time management of OSDConnectionPriv
Before the patch there was a possibility that `OSDConnectionPriv`
gets destructed before a `PipelineHandle` instance that was using
it. The reason is our remote-handling operations store `conn` directly
while `handle` is defined in a parent class. Due to the language rules
the former gets deinitialized earlier.
```
==756032==ERROR: AddressSanitizer: heap-use-after-free on address 0x615000039684 at pc 0x0000020bdfa2 bp 0x7ffd3abfa370 sp 0x7ffd3abfa360
READ of size 1 at 0x615000039684 thread T0
Reactor stalled for 261 ms on shard 0. Backtrace: 0x45d9d 0xe90f6d1 0xe6b8a1d 0xe6d1205 0xe6d16a8 0xe6d1938 0xe6d1c03 0x12cdf 0xccebf 0x7f6447161b1e 0x7f644714aee8 0x7f644714eed6 0x7f644714fb36 0x7f64471420b5 0x 7f6447143f3a 0xd61d0 0x32412 0xbd8a7 0xbd134 0xbdc1a 0x20bdfa1 0x20c184e 0x352eb7f 0x352fa28 0x20b04a5 0x1be30e5 0xe694bc4 0xe6ebb8a 0xe843a11 0xe845a22 0xe29f497 0xe2a3ccd 0x1ab1841 0x3aca2 0x175698d
#0 0x20bdfa1 in seastar::shared_mutex::unlock() ../src/seastar/include/seastar/core/shared_mutex.hh:122
#1 0x20c184e in crimson::OrderedExclusivePhaseT<crimson::osd::ConnectionPipeline::GetPG>::exit() ../src/crimson/common/operation.h:548
#2 0x20c184e in crimson::OrderedExclusivePhaseT<crimson::osd::ConnectionPipeline::GetPG>::ExitBarrier::exit() ../src/crimson/common/operation.h:533
#3 0x20c184e in crimson::OrderedExclusivePhaseT<crimson::osd::ConnectionPipeline::GetPG>::ExitBarrier::cancel() ../src/crimson/common/operation.h:539
#4 0x20c184e in crimson::OrderedExclusivePhaseT<crimson::osd::ConnectionPipeline::GetPG>::ExitBarrier::~ExitBarrier() ../src/crimson/common/operation.h:543
#5 0x20c184e in crimson::OrderedExclusivePhaseT<crimson::osd::ConnectionPipeline::GetPG>::ExitBarrier::~ExitBarrier() ../src/crimson/common/operation.h:544
#6 0x352eb7f in std::default_delete<crimson::PipelineExitBarrierI>::operator()(crimson::PipelineExitBarrierI*) const /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/unique_ptr.h:85
#7 0x352eb7f in std::unique_ptr<crimson::PipelineExitBarrierI, std::default_delete<crimson::PipelineExitBarrierI> >::~unique_ptr() /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/unique_ptr.h:361
#8 0x352eb7f in crimson::PipelineHandle::~PipelineHandle() ../src/crimson/common/operation.h:457
#9 0x352eb7f in crimson::osd::PhasedOperationT<crimson::osd::ClientRequest>::~PhasedOperationT() ../src/crimson/osd/osd_operation.h:152
#10 0x352eb7f in crimson::osd::ClientRequest::~ClientRequest() ../src/crimson/osd/osd_operations/client_request.cc:64
#11 ...
```
Before we were treating the `--no-mon-config` as one of the
`seastar_n_early_args`. However, this was wrong as it truly
belongs to `config_proxy_args` as:
```cpp
int md_config_t::parse_argv(ConfigValues& values,
const ConfigTracker& tracker,
std::vector<const char*>& args, int level)
{
// ...
else if (ceph_argparse_flag(args, i, "--no-mon-config", (char*)NULL)) {
values.no_mon_config = true;
}
// ...
}
```
The net result of this ignoring `--no-mon-config` which was
the reason behind many dead jobs at Sepia.
Soumya Koduri [Mon, 27 Jun 2022 08:43:19 +0000 (14:13 +0530)]
rgw/dbstore: Lifecycle support
Fixed issues with LC rule processing in dbstore. Also wrt object
transition, for now just the target storage class is updated for that
object in the object table without any other special action taken.
TODO: Once zonegroup, zone and storage-classes can be configured for
dbstore, need to validate target storage-class/placement rules and also
perform any other actions necessary (for eg., moving objects to another
table etc., if in case each storage class needs to have separate object table)