Tobias Urdin [Tue, 6 Feb 2024 07:50:55 +0000 (07:50 +0000)]
rgw/auth: ignoring signatures for HTTP OPTIONS calls
Before [1] we always sent all HTTP OPTIONS requests to
the S3AnonymousEngine and ignored any provided AWSv4
credentials sent in the request.
That PR changed so that if we got credentials in the
request we instead sent it through the authentication
code in order to solve HTTP OPTIONS requests on tenanted
users to start working (because we need to resolve the
tenant, also called bucket tenant in the code, and we can't
only rely on the bucket name since it will not be found).
We solved this by modifying the canonical HTTP method used
when calculating the AWSv4 signature by instead using the
access-control-request-method header which worked good.
This change did not take into account that when you generated
a presigned URL for a put_object request you can also pass in
extra parameters like a canned ACL [2] to the Params variable
in for example boto3's generated_presigned_url().
Doing that will cause the client to add the x-amz-acl header
to x-amz-signedheaders and also use that in their signature
calculation.
When doing a HTTP OPTIONS calls for CORS on that presigned URL
the browser will never send a x-amz-acl header with the correct
data since that is something that the actual PUT request should
include later, so that HTTP OPTIONS call should pass even though
the signature can never be calculated correctly server-side like
verified against AWS S3 in tracker [3].
This patch as a result skips the signature calculation when doing
EC2 auth using the LocalEngine but we still need to pass the request
there in order to lookup the user to support buckets in a tenant.
For the Keystone EC2 auth we're pretty out of luck in the sense that
Keystone's API itself requires us to send the AWSv4 signature in the
request with the access_key in order to obtain a token, and we cannot
leave the signature out, we also cannot spoof the signature from
rgw -> keystone since we don't have access to the secret_key if it's
not in our cache.
For that approach we simply pass on to get_access_token() that if it's
an HTTP OPTIONS and we find the access_key in the cache we pull that
and ignore verifying signature and pass it on for validation. This means
that the cache must be warm if using Keystone auth and adding extra
params to a presigned URL.
This partly makes some of the commits in [1] redundant for EC2
LocalEngine auth but we still need it for tenanted bucket support.
This commit introduces a major refactor of the main
entrypoint.
- subclass threading.Thread:
- Introduce a new class `BaseThread()` that is a
`threading.Thread()` abstraction class in order
to monitor the different threads.
- `BaseSystem()` inherits from `BaseThread()`.
- Handle `SIGTERM` signal in order to gracefully shutdown
node-proxy (make threads exit gracefully, log out from RedFish API, etc.)
Additionally, this:
- drops the class `Logger()` from util.py which
was not adding value. It is now replaced with a simple `get_logger()`
function.
- changes the node-proxy API port from 8080 to 9456
(8080 being widely used for frontend apps...)
- changes the container entrypoint in order to use the
`ceph-node-proxy` binary from the packaging
Zac Dover [Fri, 2 Feb 2024 01:53:45 +0000 (11:53 +1000)]
doc/rados: update config for autoscaler
Update doc/rados/configuration/pool-pg-config-ref.rst to account for the
behavior of autoscaler.
Previously, this file was last meaningfully altered in 2013, prior to
the invention of autoscaler. A recent confusion was brought to my
attention on the Ceph Slack whereby a user attempted to alter the
default values of a Quincy cluster, as suggested in this documentation.
That alteration caused Ceph to throw the error "Error ERANGE: 'pgp_num'
must be greater than 0 and lower or equal than 'pg_num', which in this
case is one" and a related "rgw_init_ioctx ERROR" reading in part
"Numerical result out of range". The user removed the
"osd_pool_default_pgp_num" configuration line from ceph.conf and the
cluster worked as expected. I presume that this is because the removal
of this configuration line allowed autoscaler to work as intended.
Fixes: https://tracker.ceph.com/issues/64259 Co-authored-by: David Orman <ormandj@corenode.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
Casey Bodley [Wed, 31 Jan 2024 19:29:43 +0000 (14:29 -0500)]
rgw: SiteConfig::load() falls back to local zonegroup
allow radosgw-admin commands like 'user create' to operate on a new zone
that hasn't been committed to the period yet. this follows similar logic
in RGWSI_Zone::do_start()
Lucian Petrut [Thu, 1 Feb 2024 14:40:03 +0000 (14:40 +0000)]
msg: update MOSDOp() to use ceph_tid_t instead of long
The MOSDOp constructor receives the the transaction ID as a long
instead of ceph_tid_t.
The issue is that "long" uses 32b on Windows instead of 64 bits,
so it flips after about 2 billion requests. At that point, the OSD
replies are dropped because of transaction ID mismatches.
We'll solve the issue by using the correct type for the transaction
id, specifically ceph_tid_t.
As ScrubResources is no longer involved in remote reservations, some
of the data listed by 'dump_scrub_reservations' is now collected by
OsdScrub itself (prior to this change, OsdScrub just forwarded the
request to ScrubResources).
Josh Salomon [Wed, 24 Jan 2024 12:40:53 +0000 (14:40 +0200)]
osd: Add score for read balance osd size aware policy
This score works for pools in which the read_ratio
value is set.
Current limitations:
- This mechanism ignores osd read affinty
- There is a plan adding support for read affinity 0
in the next version.
- This mechanism works only when all PGs are full
- If read_ration is not set - the existing mechanism (named
fair score) is used.
Josh Salomon [Tue, 16 Jan 2024 18:33:47 +0000 (20:33 +0200)]
osd: Read balancer for OSDs with different sizes
This commit adds calculation for desired primary distribution which
takes into account the osd size. This way smaller OSDs can take more
read operations (by adding more primaries) and the larger OSDs take less
primaries and the load of the cluater can increase. (This feature offset
a bit the weakest link in the chain effect under some conditions). In
order to calculate the loads correctly there is a need to know the
read/write ratio for the pool, and this commit assumes the read_ratio
parameter is available for the pool.
Josh Salomon [Tue, 26 Dec 2023 08:41:18 +0000 (10:41 +0200)]
osd: Add 'read_ratio' pool parameterr
This parameter is used for better read balancing with non identical
devices.
- This parameter is controlled using the commands 'ceph osd pool set/get'
- This parameter is applicable only for replicated pools
- Valid values are integers in the range [0..100] and represent the
percentage of read IOs out of all IOs in the pool
- Value of 0 unsets this parameter and the value will be the default
value (this is the generic behavior of the command 'ceph osd pool
set'
- default value can be set by config parameter
`osd_pool_default_read_ratio`
mgr/cephadm: add a new config option 'oob_default_addr'
So there's a default value (169.254.1.1) which is the default
address for the 'OS to iDrac pass-through' interface.
Given that node-proxy will reach the RedFish API through this interface,
we can make users avoid to pass that addr when providing the host spec
at bootstrap time.
Venky Shankar [Tue, 30 Jan 2024 07:40:19 +0000 (13:10 +0530)]
Merge PR #52652 into main
* refs/pull/52652/head:
PendingReleaseNotes: add note about new mdlog trimming configurations
mds: drive mdlog trimming via a separate thread
mds: allow runtime modification of mdlog trimming configuration
mds: remove a bunch of heuristics from MDLog::trim()
mds: add mdlog trimming threshold and decay counter
mds: remove a bunch of heuristics from MDLog::trim()
These were probbaly introduced to workaround some sort of
resource overusage by the MDS during trimming, but now it
looks like they are not really neeeded, especially if we
introduce a dedicated thread for log trimming.
Venky Shankar [Thu, 25 Jan 2024 09:32:33 +0000 (15:02 +0530)]
qa: remove error string checks and check w/ return value
I ran into this failure once #54972 was merged. The test is validating
the error string returned due to the failed mount. There aren't any
return value checks - which is a _more_ important check. Generic error
string checks will fail once a (error) string is changed (typo, etc..).
Venky Shankar [Tue, 30 Jan 2024 04:33:57 +0000 (10:03 +0530)]
Merge PR #54808 into main
* refs/pull/54808/head:
client: fix copying bufferlist to iovec structures in Client::_read
src/test: test sync call providing nullptr as ctx to async api
Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Milind Changire <mchangir@redhat.com> Reviewed-by: Frank S. Filz <ffilzlnx@mindspring.com>