git-server-git.apps.pok.os.sepia.ceph.com Git - ceph-ci.git/log

]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph-ci.git/log

projects / ceph-ci.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Jane Zhu [Wed, 20 Aug 2025 18:38:23 +0000 (18:38 +0000)]

rgw: discard olh_ attributes when copying object from a versioning-suspended bucket to a versioning-disabled bucket

Resolves: rhbz#2390658

Signed-off-by: Jane Zhu <jzhu116@bloomberg.net>
(cherry picked from commit 3fed58f43c3cb3977130926a2d1bca551deefade)

commit | commitdiff | tree

Matt Benjamin [Mon, 8 Sep 2025 20:26:26 +0000 (16:26 -0400)]

rgw: fix policy enforcement for GetObjectAttributes

Per https://docs.aws.amazon.com/cli/latest/reference/s3api/get-object-attributes.html:

"If the bucket is not versioned, you need the s3:GetObject and s3:GetObjectAttributes permissions."

Fixes: https://tracker.ceph.com/issues/72915
Resolves: rhbz#2313820

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 16ab79dacbf7d8e94e70d28192c945cd79c5934c)

commit | commitdiff | tree

Casey Bodley [Wed, 3 Sep 2025 13:27:18 +0000 (09:27 -0400)]

rgw/admin: allow listing account's root users

`radosgw-admin user list`, when given `--account-id` or
`--account-name`, lists only the users from that account

add support for the `--account-root` option to list only that account's
root users

Fixes: https://tracker.ceph.com/issues/72847
Resolves: rhbz#2360695

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 772fbbbafcdd1d26ff95ef005211f2200b724741)

commit | commitdiff | tree

Ali Masarwa [Thu, 24 Aug 2023 15:40:22 +0000 (18:40 +0300)]

RGW: When using Keystone auth for RGW, include the Keystone user in ops log

Resolves: rhbz#1769182

Signed-off-by: Ali Masarwa <ali.saed.masarwa@gmail.com>
Signed-off-by: Ali Masarwa <amasarwa@redhat.com>
(cherry picked from commit 47166556c5bbcf1f26621bf24cf04221b65af366)

commit | commitdiff | tree

Oguzhan Ozmen [Thu, 31 Jul 2025 22:15:24 +0000 (22:15 +0000)]

RGW: multi object delete op; skip olh update for all deletes but the last one

Fixes: https://tracker.ceph.com/issues/72375
Resolves: rhbz#2387764

Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
(cherry picked from commit 9bb170104446bfea0ad87b34244f3a3d47962fcc)
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>

commit | commitdiff | tree

Mark Kogan [Wed, 30 Jul 2025 12:54:19 +0000 (12:54 +0000)]

rgw: add rate limit for LIST & DELETE ops

Add rate limiting specific to LIST ops,
similar to the current rate-limiting
(https://docs.ceph.com/en/latest/radosgw/admin/#rate-limit-management)

Example usage:

```
./bin/radosgw-admin ratelimit set --ratelimit-scope=user --uid=<UID> --max_list_ops=2
./bin/radosgw-admin ratelimit set --ratelimit-scope=user --uid=<UID> --max_delete_ops=2
./bin/radosgw-admin ratelimit enable --ratelimit-scope=user --uid=<UID>

./bin/radosgw-admin ratelimit get --ratelimit-scope=user --uid=<UID>
{
  "user_ratelimit": {
    "max_read_ops": 0,
    "max_write_ops": 0,
    "max_list_ops": 2,
    "max_delete_ops": 2,
    "max_read_bytes": 0,
    "max_write_bytes": 0,
    "enabled": true
  }
}

pkill -9 radosgw
./bin/radosgw -c ./ceph.conf ...

aws --endpoint-url 'http://0:8000' s3 mb s3://bkt
aws --endpoint-url 'http://0:8000' s3 cp  ./ceph.conf s3://bkt

aws --endpoint-url http://0:8000 s3api list-objects-v2 --bucket bkt --prefix 'ceph.conf' --delimiter '/'
{
    "Contents": [
        {
            "Key": "ceph.conf",
            "LastModified": "2025-07-30T13:59:38+00:00",
            "ETag": "\"13d11d431ae290134562c019d9e40c0e\"",
            "Size": 32346,
            "StorageClass": "STANDARD"
        }
    ],
    "RequestCharged": null
}

aws --endpoint-url http://0:8000 s3api list-objects-v2 --bucket bkt --prefix 'ceph.conf' --delimiter '/'
{
    "Contents": [
        {
            "Key": "ceph.conf",
            "LastModified": "2025-07-30T13:59:38+00:00",
            "ETag": "\"13d11d431ae290134562c019d9e40c0e\"",
            "Size": 32346,
            "StorageClass": "STANDARD"
        }
    ],
    "RequestCharged": null
}

aws --endpoint-url http://0:8000 s3api list-objects-v2 --bucket bkt --prefix 'ceph.conf' --delimiter '/'
argument of type 'NoneType' is not iterable

tail -F ./out/radosgw.8000.log | grep beast
...
beast: 0x7fffbbe09780:  [30/Jul/2025:15:44:50.359 +0000] " GET /bkt?list-type=2&delimiter=%2F&prefix=ceph.conf&encoding-type=url HTTP/1.1" 200 535 - "aws-cli/2.15.31 Python/3.9.21 Linux/5.14.0-570.28.1.el9_6.x86_64 source/x86_64.rhel.9 prompt/off command/s3api.list-objects-v2" - latency=0.000999995s
beast: 0x7fffbbe09780:  [30/Jul/2025:15:44:53.904 +0000] " GET /bkt?list-type=2&delimiter=%2F&prefix=ceph.conf&encoding-type=url HTTP/1.1" 200 535 - "aws-cli/2.15.31 Python/3.9.21 Linux/5.14.0-570.28.1.el9_6.x86_64 source/x86_64.rhel.9 prompt/off command/s3api.list-objects-v2" - latency=0.000999995s
                                                                                                                                           vvv
beast: 0x7fffbbe09780:  [30/Jul/2025:15:44:58.192 +0000] " GET /bkt?list-type=2&delimiter=%2F&prefix=ceph.conf&encoding-type=url HTTP/1.1" 503 228 - "aws-cli/2.15.31 Python/3.9.21 Linux/5.14.0-570.28.1.el9_6.x86_64 source/x86_64.rhel.9 prompt/off command/s3api.list-objects-v2" - latency=0.000000000s
beast: 0x7fffbbe09780:  [30/Jul/2025:15:44:58.798 +0000] " GET /bkt?list-type=2&delimiter=%2F&prefix=ceph.conf&encoding-type=url HTTP/1.1" 503 228 - "aws-cli/2.15.31 Python/3.9.21 Linux/5.14.0-570.28.1.el9_6.x86_64 source/x86_64.rhel.9 prompt/off command/s3api.list-objects-v2" - latency=0.000999994s
beast: 0x7fffbbe09780:  [30/Jul/2025:15:44:59.807 +0000] " GET /bkt?list-type=2&delimiter=%2F&prefix=ceph.conf&encoding-type=url HTTP/1.1" 503 228 - "aws-cli/2.15.31 Python/3.9.21 Linux/5.14.0-570.28.1.el9_6.x86_64 source/x86_64.rhel.9 prompt/off command/s3api.list-objects-v2" - latency=0.000000000s

s3cmd put ./ceph.conf s3://bkt/1
s3cmd put ./ceph.conf s3://bkt/2
s3cmd put ./ceph.conf s3://bkt/3

s3cmd rm s3://bkt/1
s3cmd rm s3://bkt/2
s3cmd rm s3://bkt/3

delete: 's3://bkt/1'
delete: 's3://bkt/2'
WARNING: Retrying failed request: /3 (503 (SlowDown))
WARNING: Waiting 3 sec...
WARNING: Retrying failed request: /3 (503 (SlowDown))
                                      ^^^
```

Signed-off-by: Mark Kogan <mkogan@ibm.com>
Update PendingReleaseNotes

Co-authored-by: Yuval Lifshitz <yuvalif@yahoo.com>
Signed-off-by: Mark Kogan <31659604+mkogan1@users.noreply.github.com>
Update PendingReleaseNotes

Resolves: rhbz#2391529

Co-authored-by: Yuval Lifshitz <yuvalif@yahoo.com>
Signed-off-by: Mark Kogan <31659604+mkogan1@users.noreply.github.com>
(cherry picked from commit 965eda7a45b12c9ccd78f230076002043f7df65c)
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>

commit | commitdiff | tree

Marcus Watts [Sat, 22 Jun 2024 02:02:00 +0000 (22:02 -0400)]

rgw: trivial cleanup from former fix attribute handling for swift bucket post and put

Trivial "free' cleanup: this commit removes an unused variable "battrs".

This is a remanent of a much larger patch that now has a different
fix upstream.

Signed-off-by: Marcus Watts <mwatts@redhat.com>
Conflicts:
src/rgw/rgw_op.cc
(cherry picked from commit 340d10bf63c8ae53021dd26c7ea7fbd35db5d4b8)

commit | commitdiff | tree

Marcus Watts [Tue, 25 Feb 2025 22:00:06 +0000 (17:00 -0500)]

copy object encryption fixes - fixups

minor fixup on byte ranges.
other updates to match ceph main.

Fixes: https://tracker.ceph.com/issues/23264
Signed-off-by: Marcus Watts <mwatts@redhat.com>
(cherry picked from commit 2292920e188987f37b848cfa1789c02d31173b39)

commit | commitdiff | tree

Soumya Koduri [Tue, 29 Oct 2024 08:44:11 +0000 (14:14 +0530)]

rgw/copy-object: Fix overflow with bufferlist copy

This fixes the issue with bufferlist copy overflow in the `copy-object`
Op path.

Resolves: rhbz#2321269

Reviewed-by: Marcus Watts <mwatts@redhat.com>
Signed-off-by: Soumya Koduri <skoduri@redhat.com>
(cherry picked from commit 95ac4e63be73790474c03d3cd314fec7983f12e9)

commit | commitdiff | tree

Marcus Watts [Wed, 28 Aug 2024 21:21:13 +0000 (17:21 -0400)]

rgw/storage class. Don't inherit storage class for copy object.

When an object is copied, it should only be depending on data
in the request to determine the storage class, and if it is
not specified, it should default to 'STANDARD'. In radosgw,
this means that this is another attribute (similar to encryption)
that should not be merged from the source object.

Fixes: https://tracker.ceph.com/issues/67787
Signed-off-by: Marcus Watts <mwatts@redhat.com>
(cherry picked from commit a0e60bda70d4af93aa545a3fdea46eb9e68088c4)

Resolves: rhbz#2300284

commit | commitdiff | tree

Marcus Watts [Wed, 28 Aug 2024 15:42:05 +0000 (11:42 -0400)]

rgw/storage class: don't store/report STANDARD storage class.

While 'STANDARD' is a valid storage class, it is not supposed
to ever be returned when fetching an object. This change suppresses
storing 'STANDARD' as the attribute value, so that objects
explicitly created with 'STANDARD' will in fact be indistinguishable
from those where it was implicitly set.

Fixes: https://tracker.ceph.com/issues/67786
Signed-off-by: Marcus Watts <mwatts@redhat.com>
(cherry picked from commit b95e743ab9374cd3463a29c5f719ffce1c9fb28a)

Resolves: rhbz#2300284

commit | commitdiff | tree

Marcus Watts [Sat, 25 May 2024 03:45:14 +0000 (23:45 -0400)]

Fix lifecycle transition of encrypted multipart objects.

Lifecycle transtion can copy objects to a different storage tier.
When this happens, since the object is repacked, the original
manifest is invalidated.  It is necessary to store a special
"parts_len" attribute to fix this.  There was code in PutObj
to handle this, but that was only used for multisite replication,
it is not used by the lifecycle transisiton code.  This fix
adds similar logic to the lifecycle transition code path to make the
same thing happen.

Fixes: https://tracker.ceph.com/issues/23264
Signed-off-by: Marcus Watts <mwatts@redhat.com>
(cherry picked from commit 60ddd17d2753b769ba2f5ebde60eb7753649d73f)

Resolves: rhbz#2300284

commit | commitdiff | tree

Marcus Watts [Fri, 14 Apr 2023 09:19:59 +0000 (05:19 -0400)]

copy object encryption fixes

This contains code to allow copyobject to copy encrypted objects.

It includes additional data paths to communicate data from the
rest layer down to the sal layer to handle decrypting
objects. The data paths include logic to use filter chains
from get and put that process encryption and compression.
There are several hacks to deal with quirks of the filter chains.
The "get" path has to propgate flushes around the chain,
because a flush isn't guaranteed to propagate through it.
Also the "get" and "put" chains have conflicting uses of the
buffer list logic, so the buffer list has to be copied so that
they don't step on each other's toes.

Fixes: https://tracker.ceph.com/issues/23264
Signed-off-by: Marcus Watts <mwatts@redhat.com>
(cherry picked from commit bcaaf55f4182da0a980c87c1dbd7e1d3c868626c)

Resolves: rhbz#2300284

commit | commitdiff | tree

Marcus Watts [Tue, 16 Jul 2024 21:16:10 +0000 (17:16 -0400)]

rgw/compression antibug check

If another bug tells the compression filter to decompress more
data than is actually present, the resulting "end_of_buffer"
error was thrown. The thrown exception unwinds the stack,
including a completion that is pending. The resulting core dump
indicates a failure with this completion rather than the end of buffer
exception, which is misleading and not useful.

With this change, radosgw does not abort, and instead logs
a somewhat useful message before returning an "unknown" error
to the client.

Fixes: https://tracker.ceph.com/issues/23264
Signed-off-by: Marcus Watts <mwatts@redhat.com>
(cherry picked from commit 8c7b0fac53107c5fdfcd1b9d5c5d6933b7ace39f)

Resolves: rhbz#2300284

commit | commitdiff | tree

Kalpesh Pandya [Tue, 30 Jul 2024 10:31:37 +0000 (16:01 +0530)]

src/rgw: Adding "sync error trim" option

Just adding the "sync error trim" option for --shard-id
while executing `radosgw-admin --help`
Fixes: https://tracker.ceph.com/issues/68548
Signed-off-by: Kalpesh Pandya <kapandya@redhat.com>
(cherry picked from commit 34312bb253f083fd06a62119727caedb97945d02)
resolves rhbz#2282369

commit | commitdiff | tree

Casey Bodley [Wed, 16 Apr 2025 15:18:09 +0000 (11:18 -0400)]

rgw: frontend reads/writes respect rgw_beast_enable_async

rgw_beast_enable_async=0 can be used to run process_request() without a
coroutine context, which can make stack traces easier to view and debug

however, the frontend's reads/writes through ClientIO were still using
the yield_context to suspend/resume. so after ClientIO, the stack traces
came from the coroutine resume instead of process_request()

the beast frontend's ClientIO now issues synchronous reads/writes when
rgw_beast_enable_async is disabled

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 747557da73182fe9d0535af6c2b9ed5c2cccd185)
Resolves: rhbz#2350607
(cherry picked from commit fd80adeba09d0ecb3c53cb2d82c592e9962fcd71)

commit | commitdiff | tree

Ali Masarwa [Thu, 24 Jul 2025 15:25:27 +0000 (18:25 +0300)]

RGW | Added debugs in cases where precondition check fails

Resolves: rhbz#2379914

Signed-off-by: Ali Masarwa <amasarwa@redhat.com>
(cherry picked from commit b99a47f1cb60e98bc2cf1c47f72953fd5accee17)

commit | commitdiff | tree

Ali Masarwa [Mon, 30 Jun 2025 13:07:01 +0000 (16:07 +0300)]

RGW | fix conditional Delete and MultiDelete

size_match supports size 0
checks_preconditions checks for last_modified and size as well
supports versioned object

Resolves: rhbz#2375000

Signed-off-by: Ali Masarwa <amasarwa@redhat.com>
(cherry picked from commit 55f5b762c67fd7c177835e1a488692f012042d94)

commit | commitdiff | tree

Matt Benjamin [Sun, 9 Mar 2025 16:30:24 +0000 (12:30 -0400)]

rgw: introduce rgw_bucket_eexist_override

S3: conditionally override 200, OK result for same-owner
CreateBucket requests

* also send an error message to avoid confusing awscli
* maps ERR_BUCKET_EXISTS to the same result, message as EEXIST

Fixes: https://tracker.ceph.com/issues/70369
Resolves: rhbz#2336983

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit d3009d41bf93a30740db5ca67272b3e303512026)

commit | commitdiff | tree

matt benjamin [Fri, 16 May 2025 16:02:20 +0000 (12:02 -0400)]

rgw: defensive fix for crash attemping part-copy of '%' versioned obj

The proximate cause of the issue actually appears to be in recognizing
the key.name of the object, only failing in rgw_rados due to an assert
on key.name being non-empty.

Resolves: rhbz#2356922

Signed-off-by: matt benjamin <mbenjamin@redhat.com>
(cherry picked from commit 5111b625a174aa2eaeb4be943dec9fe4b9d948af)
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>

commit | commitdiff | tree

Matt Benjamin [Mon, 30 Jun 2025 14:26:25 +0000 (10:26 -0400)]

rgwlc: fix removal of delete markers (SAL)

S3 delete markers do not have head objects, and SAL's Object::load_obj_state()
returns -ENOENT in this case. Handle this case in LC's remove_expired_obj().

Fixes: https://tracker.ceph.com/issues/70853
Resolves: rhbz#2381933

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 224821147f2664e54f81b0bb93ccd23669f31f04)

commit | commitdiff | tree

Matt Benjamin [Sun, 10 Aug 2025 18:05:43 +0000 (14:05 -0400)]

rgw:chksum: pull up aws-sdk-java-v2 and fix S3Builder invocation

This commit pulls up aws-sdk-java-v2 to 2.32.2, which has trailing header
formatting previously seen with golang v2 sdk--for which the upstream
*Reef* logic is not present (see prior commit by Yixin Jin).

And it fixes the construction of S3Client to accept endpoint self-signed
certificates--logic which is present in the main function example code
in jcksum.java, but somehow not in putobjects.java (anymore?).

Resolves: rhbz#2392604

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 7c56e11d6a9892d02ec9f750b5b785c501f966a3)

commit | commitdiff | tree

Yixin Jin [Sun, 10 Aug 2025 15:59:18 +0000 (11:59 -0400)]

rgw:cksum: fix two checksum-trailer related signing issues

1. return error code on signature mismatch (should be 400,
   XAmzContentSHA256Mismatch

2. reorder final chunk extraction and signing to better address
   what we were handling as a special case of a few trailing bytes--
   this is arising because the implementer was working against Reef,
   which I guess doesn't have the extra extraction logic (c.f.,
   ceph/main and its upstream backport)

(A change to catch rgw::io::Exception at rgw_process_authenticated
has been removed, as it is already handled in the only applicable
path.)

Fixes: https://tracker.ceph.com/issues/72253
Resolves: rhbz#2392604

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit fc7088e84ce2fb38a03ef50996357e54dcd9531c)

commit | commitdiff | tree

Matt Benjamin [Tue, 3 Jun 2025 16:54:38 +0000 (12:54 -0400)]

add explicit checksum matrix

Resolves: rhbz#2392604

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 2927e89e725f0fc7b5e11c44f460d3b9584da590)

commit | commitdiff | tree

Matt Benjamin [Fri, 30 May 2025 21:56:10 +0000 (17:56 -0400)]

rgw: framework shell of gosdk tests

Contains two golang functions based on the checksum failure reproducer
provided by Fred Heinecke.

Resolves: rhbz#2392604

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 54ef383698a9e8256a709f8e9dbeeb9dbdc28854)

commit | commitdiff | tree

Matt Benjamin [Tue, 3 Jun 2025 16:16:28 +0000 (12:16 -0400)]

missed internal, apparently invalid no-length exception case

Resolves: rhbz#2392604

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit be1f82f94964fa9fb64f990bb471a4a1addb8b14)

commit | commitdiff | tree

Matt Benjamin [Sun, 18 May 2025 01:02:34 +0000 (21:02 -0400)]

rgw: aws-chunked need not supply any content-length

The updated logic for aws chunked handling (2024) appears sufficient
to handle the cases produced by aws-sdk-go-v2.

Note that https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-streaming.html
states that "For all requests, you must include the
x-amz-decoded-content-length header, specifying the size of the object in
bytes." (accessed 5/17/2025) (but now we do not enforce it).

Reported (with reproducer!) by: Fred Heinecke.

Fixes: https://tracker.ceph.com/issues/71183
Resolves: rhbz#2392604

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 0624dbbc3bd10f816262d5e096fa7b147231b2fb)

commit | commitdiff | tree

Matt Benjamin [Sat, 17 May 2025 23:42:09 +0000 (19:42 -0400)]

rgw_cksum: select checksum algo from only a checksum trailer header

When the checksum payload will be sent in trailer section, a typed
checksum header name will be one of the values of x-amz-trailer.

Resolves: rhbz#2392604

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 1bd625a613a180a340ec7c9d08e9050ddd498446)

commit | commitdiff | tree

Matt Benjamin [Sat, 17 May 2025 19:52:20 +0000 (15:52 -0400)]

rgw: recognize checksum from x-amz-checksum-{type} alone

Some SDKs may send x-amz-checksum-algorithm or
x-amz-sdk-checksum-algorithm regardless as well, but those are
only required if the checksum header is in the trailer section.

Fixes: https://tracker.ceph.com/issues/71350
Resolves: rhbz#2392604

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 572289a2c7fb1cceebef7fefdec032ba95418cf4)

commit | commitdiff | tree

Max Kellermann [Thu, 24 Apr 2025 11:22:55 +0000 (13:22 +0200)]

rgw/rgw_cksum: work around -Wsometimes-uninitialized

clang complains that `cck3` might not be initialized:

```
/home/jenkins-build/build/workspace/ceph-api/src/rgw/rgw_cksum.cc:74:2: error: variable 'cck3' is used uninitialized whenever switch default is taken [-Werror,-Wsometimes-uninitialized]
    74 |         default:
       |         ^~~~~~~
/home/jenkins-build/build/workspace/ceph-api/src/rgw/rgw_cksum.cc:78:31: note: uninitialized use occurs here
    78 |         cck3 = rgw::digest::byteswap(cck3);
       |                                      ^~~~
/home/jenkins-build/build/workspace/ceph-api/src/rgw/rgw_cksum.cc:61:15: note: initialize the variable 'cck3' to silence this warning
    61 |         uint32_t cck3;
       |                      ^
       |                       = 0
```

The `default:` case however is not reachable because `ck1.type` has
already been checked.  Adding initializers to `cck3` would only hide
potential future bugs, therefore I suggest just bailing out of the
function for this unreachable piece of code.  With C++23, we could use
`std::unreachable()` instead.

Resolves: rhbz#2392604

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
(cherry picked from commit 2afbc2ff9d15e685edb26ce22efd3c377799efb4)

commit | commitdiff | tree

Sachin Punadikar [Thu, 21 Aug 2025 10:09:17 +0000 (06:09 -0400)]

NFS CONF: Disable dentry caching in Ganesha

Disbale dentry caching in Ganesha. This caching leads to inconsistent
directory listing to connected NFS clients.
Fixes - https://tracker.ceph.com/issues/72797

Signed-off-by: Sachin Punadikar <sachin.punadikar@ibm.com>

commit | commitdiff | tree

Shilpa Jagannath [Fri, 8 Aug 2025 23:34:54 +0000 (19:34 -0400)]

rgw/multisite: handle secondary zone's response appropriately
depending on primary zone's version.
decode primary's response only when generate-key is true.

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
(cherry picked from commit f1f55030a5bc982c3ead6ed756643e33aeec689e)

commit | commitdiff | tree

Shilpa Jagannath [Wed, 30 Jul 2025 19:48:32 +0000 (15:48 -0400)]

rgw/multisite: forward create_key request to master, fetch the newly created key
and store it on secondary. also, include 'create_date' in the user info response to
help identify timestamp of each key.

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
(cherry picked from commit e46f3324791c8b6d82d3c40be4b0803538d9cb61)

commit | commitdiff | tree

Casey Bodley [Tue, 1 Jul 2025 14:42:15 +0000 (10:42 -0400)]

rgw/s3: fix PutObject's canned_acl comparisons for BlockPublicAcls

canned_acl.compare() returns 0 for matches, so this was rejecting all canned acls

Fixes: https://tracker.ceph.com/issues/49135
Resolves: rhbz#2344639

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit e9eedecdc85609e6d7f7bcb071334fcb6663c504)
(cherry picked from commit 30a57c148f9b4869f454a9dc94bf4d923db6833c)

commit | commitdiff | tree

Shilpa Jagannath [Thu, 1 May 2025 16:46:35 +0000 (12:46 -0400)]

rgw/multisite: sync bucket obj_lock.
add json encoding/decoding to members

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
(cherry picked from commit c2d235788c4e6ea0d3c7990cbc93af1ef2d31692)
resolves rhbz#2317768
(cherry picked from commit 1340456fe4e9b9a16c1bf72357b525cd5e8317e3)

commit | commitdiff | tree

Mark Kogan [Mon, 27 May 2024 17:01:01 +0000 (17:01 +0000)]

rgw: qat: if necesary initialize the `qat` supplemental group

when RGW is started as an entry point of a container the shell
does not have the opportunity to initialize the supplemental groups
hence the `sudo usermod -a -G qat <USER>` has not taken effect,
a call to `man 3 initgroups` is necessary

Fixes: https://tracker.ceph.com/issues/66233
Resolves: rhbz#2266529

Signed-off-by: Mark Kogan <mkogan@ibm.com>
(cherry picked from commit d692450f6253d987d05ed63a773183d615f3e719)
(cherry picked from commit d8ca9155e1f2e5e292e8c5f99ca94b1f8ce53c36)

commit | commitdiff | tree

Raja Sharma [Thu, 22 May 2025 11:08:00 +0000 (16:38 +0530)]

rgw/sts: GetCallerIndentity API

Tracker: https://tracker.ceph.com/issues/72157
Resolves: rhbz#2381577

Signed-off-by: Raja Sharma <raja@ibm.com>
(cherry picked from commit 694bffd999442016f39eba9616ade83ce2dedefa)

commit | commitdiff | tree

Raja Sharma [Fri, 13 Jun 2025 14:58:36 +0000 (20:28 +0530)]

get_caller_identity utility

Tracker: https://tracker.ceph.com/issues/72157
Resolves: rhbz#2381577

Signed-off-by: Raja Sharma <raja@ibm.com>
(cherry picked from commit 9965d326b0234cc597a46451d0a5413db5ee9e39)

commit | commitdiff | tree

Raja Sharma [Fri, 6 Jun 2025 08:35:27 +0000 (14:05 +0530)]

rgw/iam: getAccountSummary API

Tracker: https://tracker.ceph.com/issues/72158
Resolves: rhbz#2381576

Signed-off-by: Raja Sharma <raja@ibm.com>
(cherry picked from commit 7e9a6e3a5db524a988c9441b751981670d05322d)

commit | commitdiff | tree

Justin Caratzas [Thu, 18 Sep 2025 20:45:01 +0000 (16:45 -0400)]

cephadm: add ubuntu 24.04 container build test for completeness

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit e915b3963720a8424e7718fac09bf6954c9e8400)
(cherry picked from commit 349b9415a7848d51336c70cb5b53a62287a753aa)

Resolves: rhbz#2388210

commit | commitdiff | tree

Justin Caratzas [Thu, 18 Sep 2025 20:45:01 +0000 (16:45 -0400)]

cephadm: enable test case for centos10 cephadm rpm build

Now that the build script is updated we can enable the test for
centos 10 based rpm sourced cephadm builds.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 16530fde5e4a84c5690da760f4be5dd91131a69b)
(cherry picked from commit b79cc22ac73b407759f72acfa06271957d9ecc3d)

Resolves: rhbz#2388210

commit | commitdiff | tree

Justin Caratzas [Thu, 18 Sep 2025 20:45:01 +0000 (16:45 -0400)]

cephadm: support cephadm rpm based builds without top_level.txt

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 26a499a8da339d870af193ea964368afbc84c694)
(cherry picked from commit 3441c109f3d371c78797f2959a4cb0f91b8319d6)

Resolves: rhbz#2388210

commit | commitdiff | tree

Justin Caratzas [Thu, 18 Sep 2025 20:45:01 +0000 (16:45 -0400)]

cephadm: add centos 10 container images for cephadm build tests

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 32e98c484ac1ee518d8a479f17ccf4c5b7a7264b)
(cherry picked from commit eda0d5b64217df2ec0f97716561c64465aeda890)

Resolves: rhbz#2388210

commit | commitdiff | tree

Justin Caratzas [Thu, 18 Sep 2025 20:45:01 +0000 (16:45 -0400)]

cephadm: remove centos 8 from the cephadm build suite containers

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 4d3e7b6bcb85f703823b6c414f414a9ff4f379aa)
(cherry picked from commit 0148b5f146718e5df2a11534418748be8fb9e4b9)

Resolves: rhbz#2388210

commit | commitdiff | tree

Justin Caratzas [Thu, 18 Sep 2025 20:45:01 +0000 (16:45 -0400)]

cephadm: fix some issues running existing cephadm build tests

As time has marched on and people changed things our tests no longer
match the expected inputs.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 31c8010faa417ca53614bd30379a9b9c0c9199de)
(cherry picked from commit fdc1b9a80f3af7bc11ba46226d1399903e220d5f)

Resolves: rhbz#2388210

commit | commitdiff | tree

Avan Thakkar [Mon, 4 Aug 2025 14:44:53 +0000 (20:14 +0530)]

doc/mgr/smb: add doc for QoS support for CephFS-backed SMB shares

Signed-off-by: Avan Thakkar <athakkar@redhat.com>
(cherry picked from commit 83db4df25a2d29538eebda7c6efdfb4cf2aedb04)

commit | commitdiff | tree

Avan Thakkar [Mon, 4 Aug 2025 17:41:36 +0000 (23:11 +0530)]

mgr/smb: add test coverage for rate-limiting

Add comprehensive QoS test coverage including:
  * Basic QoS configuration application
  * QoS updates
  * QoS removal
  * QoS delay_max

Signed-off-by: Avan Thakkar <athakkar@redhat.com>
(cherry picked from commit 7700193d3d595b8c200ead79f8a19051335f9d1b)

commit | commitdiff | tree

Avan Thakkar [Thu, 31 Jul 2025 14:47:03 +0000 (20:17 +0530)]

mgr/smb: add rate limiting support

Introduce a new optional `qos` component under the `cephfs` block
of the Share resource to configure rate limiting options per SMB share.

The new structure supports:
- read_iops_limit
- write_iops_limit
- read_bw_limit
- write_bw_limit
- read_delay_max
- write_delay_max

A new CLI command is added:
`ceph smb share update cephfs qos <cluster> <share> [options]`

Signed-off-by: Avan Thakkar <athakkar@redhat.com>
(cherry picked from commit ffb684f320e01238e3084d1321c620cc5c86e515)

commit | commitdiff | tree

Christopher Hoffman [Thu, 21 Aug 2025 19:24:48 +0000 (19:24 +0000)]

test: Test unsupported fscrypt policy

Signed-off-by: Christopher Hoffman <choffman@redhat.com>
(cherry picked from commit 25743cdc518af1c619b713820e03c0c4e51b7dc2)
Resolves: rhbz#2362686

commit | commitdiff | tree

Christopher Hoffman [Thu, 21 Aug 2025 19:23:44 +0000 (19:23 +0000)]

client: Check for supported fscrypt policy

When setting a policy on a directory, check to make sure
policy is supported.

Signed-off-by: Christopher Hoffman <choffman@redhat.com>
(cherry picked from commit e3c5d4de0d8f528092e8ed33905e29d460ecb2c6)
Resolves: rhbz#2362686

commit | commitdiff | tree

Christopher Hoffman [Wed, 20 Aug 2025 19:57:39 +0000 (19:57 +0000)]

qa/cephfs: Add test case for enctag too long

Signed-off-by: Christopher Hoffman <choffman@redhat.com>
(cherry picked from commit a33e21e08fa08e4859c4c4c45a3eb02ab9fd730f)
Resolves: rhbz#2359400

commit | commitdiff | tree

Christopher Hoffman [Wed, 20 Aug 2025 19:36:14 +0000 (19:36 +0000)]

mgr/volumes: Enforce enctag max size

Introduce enctag max length. Include error messages when
outside of range.

Signed-off-by: Christopher Hoffman <choffman@redhat.com>
(cherry picked from commit df555cf5d027bd816fa5e94706dd914414e47c29)
Resolves: rhbz#2359400

commit | commitdiff | tree

John Mulligan [Wed, 23 Jul 2025 12:42:33 +0000 (08:42 -0400)]

doc: add documentation for keybridge and fscrypt options

Add docs for the keybrige configuration and cephfs fscrypt options
added to the smb mgr module resource definitions.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 3c3bd9414e4d0ddce0855432dd891680143e36e9)

commit | commitdiff | tree

John Mulligan [Fri, 18 Jul 2025 14:24:56 +0000 (10:24 -0400)]

mgr/smb: add some keybridge related unit test cases

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 624540dc720d0c5cd01685b4ffec1f8ea001dde3)

commit | commitdiff | tree

John Mulligan [Fri, 18 Jul 2025 15:13:32 +0000 (11:13 -0400)]

mgr/smb: add support for generating keybridge configuration

Add support for generating the sambacc configuration section for
keybridge. Add support for configuring smb shares for keybridge access.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 3b7f511351d8be0caa312efa09d324fa31acdda5)

commit | commitdiff | tree

John Mulligan [Fri, 18 Jul 2025 14:24:45 +0000 (10:24 -0400)]

mgr/smb: add cross-check validation for keybridge scopes

Validate that scope names are not re-used, etc. Check on things that
can't be done in single object validation.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 0f1263cab97e93e8da901cea628ac3da0a2b7a29)

commit | commitdiff | tree

John Mulligan [Tue, 22 Jul 2025 23:24:11 +0000 (19:24 -0400)]

mgr/smb: add new cephfs parameter for getting fscrypt keys

Add a new field to the cephfs configuration section for shares. This
section selects the keybridge scope and key name to use when acquiring
the key to use for fscrypt.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit f70dc52e220f6cf85d2bde5c8ca3fb13d82c3802)

commit | commitdiff | tree

John Mulligan [Tue, 22 Jul 2025 23:22:15 +0000 (19:22 -0400)]

mgr/smb: add keybridge configuration to cluster resource

Add keybridge service configuration classes and parameters to the
resources module. This supports enabling the keybridge, setting up
scopes for the keybridge and it's access control.

A helper class is added that parses and helps validate the scope names.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit b44e5a27d1af3920878bbc0085e8cf0c587d2c5c)

commit | commitdiff | tree

John Mulligan [Wed, 16 Jul 2025 21:55:44 +0000 (17:55 -0400)]

mgr/smb: add enums that will be used for configuring keybridge

Add a pair of enum types that will be used for configuring the
keybridge. The scope type identifies what kind of scope is being
used. The peer policy can be used to allow a dev or other user
more access to the keybridge api for development purposes.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit a3e1343e49ad8c550fa0eb89b36db915cac250a7)

commit | commitdiff | tree

John Mulligan [Fri, 18 Jul 2025 14:23:12 +0000 (10:23 -0400)]

mgr/smb: add raw data methods to MemConfigStore

Add the set_data/get_data methods to the MemConfigStore so that future
test updates will not fail to save tls credential objects.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 3a9d30e7ba62ffd971444c96a2654dab710ee557)

commit | commitdiff | tree

John Mulligan [Fri, 18 Jul 2025 14:23:31 +0000 (10:23 -0400)]

mgr/smb: fix a resource error unpacking str instead of list

Add special handling for the case where a string is passed instead of a
list. Without this fix a string will be converted into a list of single
letter items, something pretty much no one ever wants. Raise an
exception instead.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 72017e1224ff731d59c00fe15e1f87b7cb875d21)

commit | commitdiff | tree

John Mulligan [Fri, 18 Jul 2025 16:20:17 +0000 (12:20 -0400)]

cephadm: add keybridge sidecar to smb daemon module

The keybridge uses the sambacc configuration but can also be passed
CLI options. Since cephadm writes the cert files, cephadm must also
pass the file names to use to the container args.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit a140d9d0c7ffc6837c7fa02fe92082efefe9ffc5)

commit | commitdiff | tree

John Mulligan [Fri, 18 Jul 2025 16:20:29 +0000 (12:20 -0400)]

mgr/cephadm: enable setting up SSL/TLS files for keybridge sidecar

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 07705d6ea872274786bffdcea574e1eadb1f9f43)

commit | commitdiff | tree

John Mulligan [Wed, 16 Jul 2025 21:08:49 +0000 (17:08 -0400)]

python-common/deployment: add keybridge feature to smb service spec

The keybridge sidecar is enabled by the keybridge feature flag.
This sidecar will be used to help fetch keys over various protocols
for the ceph module to use to set up fs encryption.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 44e9c408340d5af51a305cf58e5e0d186ffcb808)

commit | commitdiff | tree

Sachin Prabhu [Thu, 1 May 2025 10:59:54 +0000 (11:59 +0100)]

doc/mgr/smb: document the 'provider' option for smb share

Signed-off-by: Sachin Prabhu <sp@spui.uk>
(cherry picked from commit 742659b18a21cd8ccc36a0f0a53bea265a13a541)

commit | commitdiff | tree

Xavi Hernandez [Thu, 18 Sep 2025 10:39:30 +0000 (12:39 +0200)]

libcephfs_proxy: implement support for fscrypt

Signed-off-by: Xavi Hernandez <xhernandez@gmail.com>
(cherry picked from commit 9fd9f8860c4cae3619f654ca2925726f3191064c)

commit | commitdiff | tree

Christopher Hoffman [Wed, 13 Aug 2025 15:45:37 +0000 (15:45 +0000)]

libcephfs: Include libcephfs.h def for ceph_get_fscrypt_key_status

The libcephfs api header definition for call ceph_get_fscrypt_key_status
was not defined. Define this api call in libcephfs.h.

Signed-off-by: Christopher Hoffman <choffman@redhat.com>
(cherry picked from commit ab115142ff0123e7427c575a4c95afa57940b8f9)

commit | commitdiff | tree

Shweta Bhosale [Mon, 1 Sep 2025 12:56:11 +0000 (18:26 +0530)]

mgr/cephadm: set a healthwarning for host SSH timeout

Fixes: https://tracker.ceph.com/issues/72345
Signed-off-by: Shweta Bhosale <Shweta.Bhosale1@ibm.com>
Resolves: rhbz#2346030

commit | commitdiff | tree

Shweta Bhosale [Thu, 28 Aug 2025 15:01:09 +0000 (20:31 +0530)]

mgr/cephadm: After reapplying the osd spec, the OSD services are continuously applied in each serve loop iteration

Fixes: https://tracker.ceph.com/issues/72774
Signed-off-by: Shweta Bhosale <Shweta.Bhosale1@ibm.com>
Resolves: rhbz#2390044

commit | commitdiff | tree

Shweta Bhosale [Wed, 20 Aug 2025 10:03:04 +0000 (15:33 +0530)]

mgr/cephadm: Config parameter to set the max number of OSDs to upgrade in single iteration

Fixes: https://tracker.ceph.com/issues/72652
Signed-off-by: Shweta Bhosale <Shweta.Bhosale1@ibm.com>
Resolves: rhbz#2390040

commit | commitdiff | tree

Shweta Bhosale [Wed, 23 Jul 2025 12:38:18 +0000 (18:08 +0530)]

mgr/cephadm: Allow registry credentials to define multiple container registries

Fixes: https://tracker.ceph.com/issues/72206
Signed-off-by: Shweta Bhosale <Shweta.Bhosale1@ibm.com>
Resolves: rhbz#2338350

commit | commitdiff | tree

Adam Kupczyk [Mon, 31 Mar 2025 11:38:08 +0000 (13:38 +0200)]

os/bluestore: Add ability to ignore BlueFS zombie files

Under normal circumstances BlueFS _replay() procedure
does not allow for zombie files to be present.
Zombie files are files that are declared but are not attached to any name+dir.
One exception if special BlueFS Log file (ino 1).

This change introduces configurable 'bluefs_log_replay_remove_zombie_files'.
When set to 'true', instead of refusing to mount, BlueFS logs the error and ignores the file.
This is equivalent of removing it, with the distinction being that until BlueFS log is
compacted, one cannot revert the option back, or the problem will reemerge.

Resolves: rhbz#2354192

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
(cherry picked from commit ea677e0c43e0439e2afd7e8b95c8a1be7f2dba25)
(cherry picked from commit 53d95a2f5949305fa3442a75fb3fb421266fb329)

commit | commitdiff | tree

Jamie Pryde [Thu, 21 Aug 2025 20:28:27 +0000 (21:28 +0100)]

erasure-code: Enable EC optimizations for ISA-L cauchy

Signed-off-by: Jamie Pryde <jamiepry@uk.ibm.com>
Resolves: rhbz#2369130
(cherry picked from commit c353185a177fbae0576fadbb7dfd26a7baff0f2a)

commit | commitdiff | tree

Jamie Pryde [Wed, 13 Aug 2025 10:57:40 +0000 (11:57 +0100)]

erasure-code: use cauchy if K and M values are not supported by ISA-L reed_sol_van

ISA-L supports a limited set of K and M values when using a vandermonde matrix. There are no such limitations when using a cauchy matrix. If the user specifies reed_sol_van (or does not specify a technique and relies on the default reed_sol_van setting) and an unsupported K/M combination, then we will automatically switch the technique for the new EC profile to cauchy. Benchmarking has not shown any noticeable performance differences between ISA-L in reed_sol_van vs cauchy modes.

Signed-off-by: Jamie Pryde <jamiepry@uk.ibm.com>
(cherry picked from commit c9d3f76b4719ecf602f01f58a0e1a7a577a80949)

Resolves: rhbz#2369130
(cherry picked from commit 2911965aba7dac0f6c38559e39684fd7441d25d3)

commit | commitdiff | tree

Shraddha Agrawal [Tue, 16 Sep 2025 13:52:27 +0000 (19:22 +0530)]

options/mon: disable availability tracking by default

Signed-off-by: Shraddha Agrawal <shraddhaag@ibm.com>
(cherry picked from commit ef7effaa33bd6b936d7433e668d36f80ed7bee65)
(cherry picked from commit a4b32523949a75b9761122d22bbdc0514158ba2d)

commit | commitdiff | tree

Jamie Pryde [Thu, 18 Sep 2025 21:45:03 +0000 (22:45 +0100)]

osd: Set ec optimizations default to true

We want new pools created in 9.0 to use optimized/fast EC by default

Signed-off-by: Jamie Pryde <jamiepry@ibm.com>
Resolves: rhbz#2396437

commit | commitdiff | tree

Radoslaw Zarzynski [Tue, 5 Aug 2025 14:11:59 +0000 (16:11 +0200)]

osd: stop scrub_purged_snaps() from ignoring osd_beacon_report_interval

OSD beacons could be burdersome to the enitre cluster, as they lead
to generation of new `OSDMap` epochs. Therefore their frequency is
restricted through `osd_beacon_report_interval` to 5 mins by default.

Unfortunately, the `OSD::send_purged_snaps()` is unaware about this
policy with the net result being storm of OSDMaps. This patch unifies
its behavior with `OSD::tick_without_osd_lock()`.

Fixes: https://tracker.ceph.com/issues/72412
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit d4f90ea6d3c5c565d1ddc6f9ecd9499048f054c0)
(cherry picked from commit 814b2766f4e696d38aab9b194ef445e1388e0529)

commit | commitdiff | tree

Brad Hubbard [Wed, 30 Jul 2025 23:12:23 +0000 (09:12 +1000)]

mgr/DaemonState: Minimise time we hold the DaemonStateIndex lock

Calling back into python functions whilst holding the lock can result in
this thread being queued for the GIL and resulting in extended delays
for threads waiting to acquire the lock.

Fixes: https://tracker.ceph.com/issues/72337
Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
(cherry picked from commit b4304d521f61b61515cade872824210e7d67f6db)
(cherry picked from commit 2d227ab51fd72313fb359a0bf7b53722fe793fbc)

commit | commitdiff | tree

Jamie Pryde [Fri, 4 Jul 2025 20:34:36 +0000 (21:34 +0100)]

qa: Run RADOS suites with ec optimizations on and off

Signed-off-by: Jamie Pryde <jamiepry@uk.ibm.com>
(cherry picked from commit 45a1280e8a37a0581f9fbb5d03347b989ac345a2)
(cherry picked from commit 764f858eaf98e8d77a18ec756c39bafe98af0e4b)

commit | commitdiff | tree

Jon Bailey [Wed, 20 Aug 2025 10:11:09 +0000 (11:11 +0100)]

osd: Reduce the amount of status invalidations when rolling shards forwards during peering

Currently stats invalidations happen during peering when rolling forward shards.
We can reduce this so we only invalidate the stats when we don't have any other shards at the version we want to roll the stats forwards to.
In the cases where we have a shard with the stats at the correct version, we use those stats instead of invalidating.
If we do not have any shards with the correct version of stats, we do the invalidate as before.

Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>
(cherry picked from commit b5cad2694569b7f0eef173f87a7eecb2ddd6b27e)
(cherry picked from commit ba483851c1642573b0a9b4e2f5b6c768175bff6b)

commit | commitdiff | tree

Bill Scales [Wed, 27 Aug 2025 13:44:08 +0000 (14:44 +0100)]

osd: Optimized EC incorrectly rolled backwards write

A bug in choose_acting in this scenario:

* Current primary shard has been absent so has missed the latest few writes
* All the recent writes are partial writes that have not updated shard X
* All the recent writes have completed

The authorative shard is chosen from the set of primary-capable shards
that have the highest last epoch started, these have all got log entries
for the recent writes.

The get log shard is chosen from the set of shards that have the highest
last epoch started, this chooses shard X because its furthest behind

The primary shard last update is not less than get log shard last
update so this if statement decides that it has a good enough log:

if ((repeat_getlog != nullptr) &&
    get_log_shard != all_info.end() &&
    (info.last_update < get_log_shard->second.last_update) &&
    pool.info.is_nonprimary_shard(get_log_shard->first.shard)) {

We then proceed through peering using the primary log and the
log from shard X. Neither have details about the recent writes
which are then incorrectly rolled back.

The if statement should be looking at last_update for the
authorative shard rather than the get_log_shard, the code
would then realize that it needs to get the log from the
authorative shard first and then have a second pass
where it gets the log from the get log shard.

Peering would then have information about the partial writes
(obtained from the authorative shards log) and could correctly
roll these writes forward by deducing that the get_log_shard
didn't have these log entries because they were partial writes.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit ac4e0926bbac4ee4d8e33110b8a434495d730770)
(cherry picked from commit 4eb4b5e01ed0a96ec20022ab106b9f2fe8292cf7)

commit | commitdiff | tree

Alex Ainscow [Tue, 12 Aug 2025 16:12:45 +0000 (17:12 +0100)]

osd: Clear zero_for_decode for shards where read failed on recovery

Not clearing this can lead to a failed decode, which panics, rather than
a recovery or IO failure.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit 6365803275b1b6a142200cc2db9735d48c86ae03)
(cherry picked from commit e05a070bc0077f37b3b504833fd17ac1e712f49d)

commit | commitdiff | tree

Alex Ainscow [Fri, 8 Aug 2025 15:20:32 +0000 (16:20 +0100)]

osd: Reduce buffer-printing debug strings to debug level 30

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
# Conflicts:
# src/osd/ECBackend.cc
(cherry picked from commit b4ab3b1dcef59a19c67bb3b9e3f90dfa09c4f30b)
(cherry picked from commit ea7481b9b3d039da99ad9fa3269705777bb8df87)

commit | commitdiff | tree

Alex Ainscow [Fri, 8 Aug 2025 09:25:53 +0000 (10:25 +0100)]

osd: Fix segfault in EC debug string

The old debug_string implementation was potentially reading up to 3
bytes off the end of an array. It was also doing lots of unnecessary
bufferlist reconstructs. This refactor of this function fixes both
issues.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit da3ccdf4d03e40b747f8876449199102e53e00ce)
(cherry picked from commit 2026d8d0ed76cd6a44251488087f99ec4490f526)

commit | commitdiff | tree

Bill Scales [Fri, 8 Aug 2025 08:58:14 +0000 (09:58 +0100)]

osd: Optimized EC backfill interval has wrong versions

Bug in the optimized EC code creating the backfill
interval on the primary. It is creating a map with
the object version for each backfilling shard. When
there are multiple backfill targets the code was
overwriting oi.version with the version
for a shard that has had partial writes which
can result in the object not being backfilled.

Can manifest as a data integirty issue, scrub
error or snapshot corruption.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit acca514f9a3d0995b7329f4577f6881ba093a429)
(cherry picked from commit 73a5468e0b0df9b50f656ebfc894f8f8a7170a31)

commit | commitdiff | tree

Bill Scales [Mon, 4 Aug 2025 15:24:41 +0000 (16:24 +0100)]

osd: Optimized EC choose_acting needs to use best primary shard

There have been a couple of corner case bugs with choose_acting
with optimized EC pools in the scenario where a new primary
with no existing log is choosen and find_best_info selects
a non-primary shard as the authorative shard.

Non-primary shards don't have a full log so in this scenario
we need to get the log from a shard that does have a complete
log first (so our log is ahead or eqivalent to authorative shard)
and then repeat the get log for the authorative shard.

Problems arise if we make different decisions about the acting
set and backfill/recovery based on these two different shards.
In one bug we osicillated between two different primaries
because one primary used one shard to making peering decisions
and the other primary used the other shard, resulting in
looping flip/flop changes to the acting_set.

In another bug we used one shard to decide that we could do
async recovery but then tried to get the log from another
shard and asserted because we didn't have enough history in
the log to do recovery and should have choosen to do a backfill.

This change makes optimized EC pools always choose the
best !non_primary shard when making decisions about peering
(irrespective of whether the primary has a full log or not).
The best overall shard is now only used for get log when
deciding how far to rollback the log.

It also sets repeat_getlog to false if peering fails because
the PG is incomplete to avoid looping forever trying to get
the log.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit f3f45c2ef3e3dd7c7f556b286be21bd5a7620ef7)
(cherry picked from commit 30c287090344279eec5badaf7f545e1b4cdd00ef)

commit | commitdiff | tree

Alex Ainscow [Fri, 1 Aug 2025 14:09:58 +0000 (15:09 +0100)]

osd: Do not sent PDWs if read count > k

The main point of PDW (as currently implemented) is to reduce the amount
of reading performed by the primary when preparing for a read-modify-write (RMW).

It was making the assumption that if any recovery was required by a
conventional RMW, then a PDW is always better. This was an incorrect assumption
as a conventional RMW performs at most K reads for any plugin which
supports PDW. As such, we tweak this logic to perform a conventional RMW
if the PDW is going to read k or more shards.

This should improve performance in some minor areas.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit cffd10f3cc82e0aef29209e6e823b92bdb0291ce)
(cherry picked from commit cf3dc4ac300b20d07ea4f6870ceda32336f6ee41)

commit | commitdiff | tree

Alex Ainscow [Wed, 18 Jun 2025 19:46:49 +0000 (20:46 +0100)]

osd: Fix decode for some extent cache reads.

The extent cache in EC can cause the backend to perform some surprising reads. Some
of the patterns were discovered in test that caused the decode to attempt to
decode more data than was anticipated during the read planning, leading to an
assert. This simple fix reduces the scope of the decode to the minimum.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit 2ab45a22397112916bbcdb82adb85f99599e03c0)
(cherry picked from commit 1a91c3d5f35b22e1fbcf8509d25c96d684301a80)

commit | commitdiff | tree

Bill Scales [Fri, 1 Aug 2025 10:48:18 +0000 (11:48 +0100)]

osd: Optimized EC calculate_maxles_and_minlua needs to use ...
exclude_nonprimary_shards

When an optimized EC pool is searching for the best shard that
isn't a non-primary shard then the calculation for maxles and
minlua needs to exclude nonprimary-shards

This bug was seen in a test run where activating a PG was
interrupted by a new epoch and only a couple of non-primary
shards became active and updated les. In the next epoch
a new primary (without log) failed to find a shard that
wasn't non-primary with the latest les. The les of
non-primary shards should be ignored when looking for
an appropriate shard to get the full log from.

This is safe because an epoch cannot start I/O without
at least K shards that have updated les, and there
are always K-1 non-primary shards. If I/O has started
then we will find the latest les even if we skip
non-primary shards. If I/O has not started then the
latest les ignoring non-primary shards is the
last epoch in which I/O was started and has a good
enough log+missing list.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 72d55eec85afa4c00fac8dc18a1fb49751e61985)
(cherry picked from commit 43fdccd8911758df6833c1806e4c797b1bced9a8)

commit | commitdiff | tree

Bill Scales [Fri, 1 Aug 2025 09:39:16 +0000 (10:39 +0100)]

osd: Optimized EC choose_async_recovery_ec must use auth_shard

Optimized EC pools modify how GetLog and choose_acting work,
if the auth_shard is a non-primary shard and the (new) primary
is behind the auth_shard then we cannot just get the log from
the non-primary shard because it will be missing entries for
partial writes. Instead we need to get the log from a shard
that has the full log first and then repeat GetLog to get
the log from the auth_shard.

choose_acting was modifying auth_shard in the case where
we need to get the log from another shard first. This is
wrong - the remainder of the logic in choose_acting and
in particular choose_async_recovery_ec needs to use the
auth_shard to calculate what the acting set will be.
Using a different shard occasional can cause a
different acting set to be selected (because of
thresholds about the number of log entries behind
a shard needs to be to perform async recovery) and
this can lead to two shards flip/flopping with
different opinions about what the acting set should be.

Fix is to separate out which shard will be returned
to GetLog from the auth_shard which will be used
for acting set calculations.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 3c2161ee7350a05e0d81a23ce24cd0712dfef5fb)
(cherry picked from commit 56fbf22db3b393fdcf69442b23fae7f694fdef89)

commit | commitdiff | tree

Bill Scales [Fri, 1 Aug 2025 09:22:47 +0000 (10:22 +0100)]

osd: Optimized EC don't try to trim past crt

If there is an exceptionally long sequence of partial writes
that did not update a shard that is followed by a full write
then it is possible that the log trim point is ahead of the
previous write to the shard (and hence crt). We cannot trim
beyond crt. In this scenario its fine to limit the trim to crt
because the shard doesn't have any of the log entries for the
partial writes so there is nothing more to trim.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 645cdf9f61e79764eca019f58a4d9c6b51768c81)
(cherry picked from commit d85ea954e9ebd8ea7135bc7eecaac41c66e4a7fc)

commit | commitdiff | tree

Bill Scales [Fri, 1 Aug 2025 08:56:23 +0000 (09:56 +0100)]

osd: Optimized EC missing call to apply_pwlc after updating pwlc

update_peer_info was updating pwlc with a newer version received
from another shard, but failed to update the peer_info's to
reflect the new pwlc by calling apply_pwlc.

Scenario was primary receiving an update from shard X which had
newer information about shard Y. The code was calling apply_pwlc
for shard X but not for shard Y.

The fix simplifies the logic in update_peer_info - if we are
the primary update all peer_info's that have pwlc. If we
are a non-primary and there is pwlc then update info.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit d19f3a3bcbb848e530e4d31cbfe195973fa9a144)
(cherry picked from commit 7e4a694c7fa688a6c32149526a45c5ef610df472)

commit | commitdiff | tree

Bill Scales [Wed, 30 Jul 2025 11:44:10 +0000 (12:44 +0100)]

osd: Optimized EC don't apply pwlc for divergent writes

Split pwlc epoch into a separate variable so that we
can use epoch and version number when comparing if
last_update is within a pwlc range. This ensures that
pwlc is not applied to a shard that has a divergent
write, but still tracks the most recent update of pwlc.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit d634f824f229677aa6df7dded57352f7a59f3597)
(cherry picked from commit 286579f5995e85cf474dc9d53c501dda1c0f0f2b)

commit | commitdiff | tree

Bill Scales [Wed, 30 Jul 2025 11:41:34 +0000 (12:41 +0100)]

osd: Optimized EC present_shards no longer needed

present_shards is no longer needed in the PG log entry, this has been
replaced with code in proc_master_log that calculates which shards were
in the last epoch started and are still present.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 880a17e39626d99a0b6cc8259523daa83c72802c)
(cherry picked from commit 8183125027ff369eb3408e9c251b770adf137003)

commit | commitdiff | tree

Bill Scales [Mon, 28 Jul 2025 08:26:36 +0000 (09:26 +0100)]

osd: Optimized EC proc_master_log fix roll-forward logic when shard is absent

Fix bug in optimized EC code where proc_master_log incorrectly did not
roll forward a write if one of the written shards is missing in the current
epoch and there is a stray version of that shard that did not receive the
write.

As long as the currently present shards that participated in les and were
updated by a write have the update then the write should be rolled-forward.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit e0e8117769a8b30b2856f940ab9fc00ad1e04f63)
(cherry picked from commit f6657b0de958a89d2b4da5011bddba7b59860544)

commit | commitdiff | tree

Bill Scales [Mon, 28 Jul 2025 08:21:54 +0000 (09:21 +0100)]

osd: Refactor find_best_info and choose_acting

Refactor find_best_info to have separate function to calculate
maxles and minlua. The refactor makes history_les_bound
optional, tidy up the choose_acting interface removing this
where it is not used.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit f1826fdbf136dc7c96756f0fb8a047c9d9dda82a)
(cherry picked from commit 0c94c96ac3ab068221e3f6ddc93a185f4f50e259)

commit | commitdiff | tree

Bill Scales [Thu, 17 Jul 2025 18:17:27 +0000 (19:17 +0100)]

osd: EC Optimizations proc_master_log boundary case bug fixes

Fix a couple of bugs in proc_master_log for optimized EC
pools dealing with boundary conditions such as an empty
log and merging two logs that diverge from the very first
entry.

Refactor the code to handle the boundary conditions and
neaten up the code.

Predicate the code block with if (pool.info.allows_ecoptimizations())
to make it clear this code path is only for optimized EC pools.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 1b44fd9991f5f46b969911440363563ddfad94ad)
(cherry picked from commit b9ecda4c9da4868ca441d4316053215a83e39ef5)

commit | commitdiff | tree

Jon Bailey [Fri, 25 Jul 2025 13:16:35 +0000 (14:16 +0100)]

osd: Invalidate stats during peering if we are rolling a shard forwards.

This change will mean we always recalculate stats upon rolling stats forwards. This prevent the situation where we end up with incorrect statistics due to where we always take the stats of the oldest shard during peering; causing outdated pg stats being applied for cases where the oldest shards are shards that don't see partial writes where num_bytes has changed on other places after that point on that shard.

Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>
(cherry picked from commit b178ce476f4a5b2bb0743e36d78f3a6e23ad5506)
(cherry picked from commit 07d8434b749577b3aa2dd39d68cb2cddcdc3570e)

commit | commitdiff | tree

Radoslaw Zarzynski [Wed, 21 May 2025 16:33:15 +0000 (16:33 +0000)]

osd: ECTransaction.h includes OSDMap.h

Needed for crimson.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 6dd393e37f6afb9063c4bed3e573557bd0efb6bd)
(cherry picked from commit bc5025151e1ab93fa0eb1ea96be72ab8b8260815)

commit | commitdiff | tree

Radoslaw Zarzynski [Mon, 21 Apr 2025 08:49:55 +0000 (08:49 +0000)]

osd: bypass messenger for local EC reads

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit b07d1f67625c8b621b2ebf5a7f744c588cae99d3)
(cherry picked from commit f28d21c9ca3cccf0c23b5e607ac609e718ecbcef)

Unnamed repository; edit this file 'description' to name the repository.