common/tracer: fix decoding when jaeger tracing is disabled
We aren't currently using jaeger tracing on Windows. The issue is
that Windows hosts (or any other host that doesn't use jaeger)
are experiencing message decoding failures after a recent change [1].
This change updates the tracer encoding so that messages from
non-jaeger hosts may be decoded by services that use jaeger.
[1] https://github.com/ceph/ceph/pull/47457
Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
This commit rebrings 3701ffa6733b001d4278a0b68395c5efe2382f25 which
got reverted due to an implicit dependency with other revert. Please
see https://github.com/ceph/ceph/pull/52114#issuecomment-1950288188.
Omri Zeneva [Wed, 24 Aug 2022 13:57:11 +0000 (09:57 -0400)]
tracer/osd/librados/build/rgw: rgw and osd end2end tracing using opentelemetry
* build: add opentelemetry to cmake system
crimson targets that uses Message.cc/h are built before opentelemetry (o-tel), so we need to build o-tel eralier so we also add the library to the include path earlier
this shoud work for WITH_JAEGER flag both the ON/OFF cases, and for librados where the compilation flag is ignored
* msg/tracer: add o-tel trace to Messages with decode/encode function in tracer.h
some files that uses Message.cc/h just need the encode/decode functions and not all others functions.
some crimson targets does not link with ceph_context (common) which is required for tracer.cc file. so we just need to include that functions
* librados: Add opentelemtry trace param for aio_operate and operate methods
in order to propagate the trace info I added the otel-trace as an extra param.
in some places, there already was a blkin trace info, and since it is not used in other places we can safely change it to o-tel trace info.
this will be done in another commit, so the cleanup of blkin trace will be in a dedicated commit
* osd: use the o-tel trace of the msg as a parent span of the osd trace
if there is a valid span in the msg, we will add this op to the request
trace, otherwise it will start a new trace for the OSD op
* rgw: pass put obj trace info to librados
in order to make it possible, I saved the trace info inside the sal::Object, so we can use it later when writing the object to rados
it could be used also later for read ops.
note the trace field of req_state is initalized only in rgw_process, so it's also required in librgw request flow
* prevent breaking channges to kSize. make sure that changes between components built with
different versions of OTEL do not break message compatibility
This commit introduces a major refactor of the main
entrypoint.
- subclass threading.Thread:
- Introduce a new class `BaseThread()` that is a
`threading.Thread()` abstraction class in order
to monitor the different threads.
- `BaseSystem()` inherits from `BaseThread()`.
- Handle `SIGTERM` signal in order to gracefully shutdown
node-proxy (make threads exit gracefully, log out from RedFish API, etc.)
Additionally, this:
- drops the class `Logger()` from util.py which
was not adding value. It is now replaced with a simple `get_logger()`
function.
- changes the node-proxy API port from 8080 to 9456
(8080 being widely used for frontend apps...)
- changes the container entrypoint in order to use the
`ceph-node-proxy` binary from the packaging
Zac Dover [Fri, 2 Feb 2024 01:53:45 +0000 (11:53 +1000)]
doc/rados: update config for autoscaler
Update doc/rados/configuration/pool-pg-config-ref.rst to account for the
behavior of autoscaler.
Previously, this file was last meaningfully altered in 2013, prior to
the invention of autoscaler. A recent confusion was brought to my
attention on the Ceph Slack whereby a user attempted to alter the
default values of a Quincy cluster, as suggested in this documentation.
That alteration caused Ceph to throw the error "Error ERANGE: 'pgp_num'
must be greater than 0 and lower or equal than 'pg_num', which in this
case is one" and a related "rgw_init_ioctx ERROR" reading in part
"Numerical result out of range". The user removed the
"osd_pool_default_pgp_num" configuration line from ceph.conf and the
cluster worked as expected. I presume that this is because the removal
of this configuration line allowed autoscaler to work as intended.
Fixes: https://tracker.ceph.com/issues/64259 Co-authored-by: David Orman <ormandj@corenode.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
Casey Bodley [Wed, 31 Jan 2024 19:29:43 +0000 (14:29 -0500)]
rgw: SiteConfig::load() falls back to local zonegroup
allow radosgw-admin commands like 'user create' to operate on a new zone
that hasn't been committed to the period yet. this follows similar logic
in RGWSI_Zone::do_start()
Lucian Petrut [Thu, 1 Feb 2024 14:40:03 +0000 (14:40 +0000)]
msg: update MOSDOp() to use ceph_tid_t instead of long
The MOSDOp constructor receives the the transaction ID as a long
instead of ceph_tid_t.
The issue is that "long" uses 32b on Windows instead of 64 bits,
so it flips after about 2 billion requests. At that point, the OSD
replies are dropped because of transaction ID mismatches.
We'll solve the issue by using the correct type for the transaction
id, specifically ceph_tid_t.
As ScrubResources is no longer involved in remote reservations, some
of the data listed by 'dump_scrub_reservations' is now collected by
OsdScrub itself (prior to this change, OsdScrub just forwarded the
request to ScrubResources).
Josh Salomon [Wed, 24 Jan 2024 12:40:53 +0000 (14:40 +0200)]
osd: Add score for read balance osd size aware policy
This score works for pools in which the read_ratio
value is set.
Current limitations:
- This mechanism ignores osd read affinty
- There is a plan adding support for read affinity 0
in the next version.
- This mechanism works only when all PGs are full
- If read_ration is not set - the existing mechanism (named
fair score) is used.
Josh Salomon [Tue, 16 Jan 2024 18:33:47 +0000 (20:33 +0200)]
osd: Read balancer for OSDs with different sizes
This commit adds calculation for desired primary distribution which
takes into account the osd size. This way smaller OSDs can take more
read operations (by adding more primaries) and the larger OSDs take less
primaries and the load of the cluater can increase. (This feature offset
a bit the weakest link in the chain effect under some conditions). In
order to calculate the loads correctly there is a need to know the
read/write ratio for the pool, and this commit assumes the read_ratio
parameter is available for the pool.
Josh Salomon [Tue, 26 Dec 2023 08:41:18 +0000 (10:41 +0200)]
osd: Add 'read_ratio' pool parameterr
This parameter is used for better read balancing with non identical
devices.
- This parameter is controlled using the commands 'ceph osd pool set/get'
- This parameter is applicable only for replicated pools
- Valid values are integers in the range [0..100] and represent the
percentage of read IOs out of all IOs in the pool
- Value of 0 unsets this parameter and the value will be the default
value (this is the generic behavior of the command 'ceph osd pool
set'
- default value can be set by config parameter
`osd_pool_default_read_ratio`
mgr/cephadm: add a new config option 'oob_default_addr'
So there's a default value (169.254.1.1) which is the default
address for the 'OS to iDrac pass-through' interface.
Given that node-proxy will reach the RedFish API through this interface,
we can make users avoid to pass that addr when providing the host spec
at bootstrap time.
Venky Shankar [Tue, 30 Jan 2024 07:40:19 +0000 (13:10 +0530)]
Merge PR #52652 into main
* refs/pull/52652/head:
PendingReleaseNotes: add note about new mdlog trimming configurations
mds: drive mdlog trimming via a separate thread
mds: allow runtime modification of mdlog trimming configuration
mds: remove a bunch of heuristics from MDLog::trim()
mds: add mdlog trimming threshold and decay counter
mds: remove a bunch of heuristics from MDLog::trim()
These were probbaly introduced to workaround some sort of
resource overusage by the MDS during trimming, but now it
looks like they are not really neeeded, especially if we
introduce a dedicated thread for log trimming.
Venky Shankar [Thu, 25 Jan 2024 09:32:33 +0000 (15:02 +0530)]
qa: remove error string checks and check w/ return value
I ran into this failure once #54972 was merged. The test is validating
the error string returned due to the failed mount. There aren't any
return value checks - which is a _more_ important check. Generic error
string checks will fail once a (error) string is changed (typo, etc..).