From: Radosław Zarzyński Date: Wed, 22 May 2024 13:33:23 +0000 (+0200) Subject: qa: test-erasure-eio.sh honors the EC partial read support X-Git-Tag: v20.0.0~1565^2~4 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=1308da3a8800430e77b64ce33e9223115388e92f;p=ceph.git qa: test-erasure-eio.sh honors the EC partial read support This is supposed to fix: ``` 2024-05-15T01:19:55.945 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:243: rados_get_data_bad_size: rados_get td/test-erasure- eio pool-jerasure obj-size-81362-1-10 fail 2024-05-15T01:19:55.946 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:104: rados_get: local dir=td/test-erasure-eio 2024-05-15T01:19:55.946 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:105: rados_get: local poolname=pool-jerasure 2024-05-15T01:19:55.946 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:106: rados_get: local objname=obj-size-81362-1-10 2024-05-15T01:19:55.946 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:107: rados_get: local expect=fail 2024-05-15T01:19:55.946 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:112: rados_get: '[' fail = fail ']' 2024-05-15T01:19:55.946 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:114: rados_get: rados --pool pool-jerasure get obj-size- 81362-1-10 td/test-erasure-eio/COPY 2024-05-15T01:19:56.175 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:115: rados_get: return 2024-05-15T01:19:56.175 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:243: rados_get_data_bad_size: return 1 2024-05-15T01:19:56.175 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:323: TEST_rados_get_bad_size_shard_1: return 1 2024-05-15T01:19:56.175 INFO:tasks.workunit.client.0.smithi190.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/erasure-code/test-erasure-eio.sh:41: run: return 1 ``` (https://pulpito.ceph.com/rzarzynski-2024-05-14_22:09:16-rados-wip-osd-ec-partial-reads-distro-default-smithi/7706517/) The failed scenario was exercising a behavior that got truly changed by introduction of partial reads. Before, regardless of read size, OSD was always reading and checking for errors entire stripe. In this test first 4 KB has been read from an EC pool with m=2 k=1 while errors had been injected to shards 1 and 2. Handling the first 4 KB doesn't really require the damaged shards but, because of the full-stripe alignment, EIO was returned. This is not anymore. Signed-off-by: Radosław Zarzyński --- diff --git a/qa/standalone/erasure-code/test-erasure-eio.sh b/qa/standalone/erasure-code/test-erasure-eio.sh index 42c538eb9184..4c23b4b4488f 100755 --- a/qa/standalone/erasure-code/test-erasure-eio.sh +++ b/qa/standalone/erasure-code/test-erasure-eio.sh @@ -178,9 +178,19 @@ function rados_put_get_data() { wait_for_clean || return 1 # Won't check for eio on get here -- recovery above might have fixed it else - shard_id=$(expr $shard_id + 1) - inject_$inject ec data $poolname $objname $dir $shard_id || return 1 - rados_get $dir $poolname $objname fail || return 1 + local another_shard_id=$(expr $shard_id + 1) + inject_$inject ec data $poolname $objname $dir $another_shard_id || return 1 + if [ $shard_id -eq 1 -a $another_shard_id -eq 2 ]; + then + # we're reading 4 kb long object while the stripe size is 8 kb. + # as we do partial reads and this request can be satisfied + # from the undamaged shard 0, we expect a success. + rados_get $dir $poolname $objname || return 1 + else + # both shards 0 and 1 are demaged. there is no way no serve + # the requests, regardless of partial reads + rados_get $dir $poolname $objname fail || return 1 + fi rm $dir/ORIGINAL fi @@ -238,9 +248,19 @@ function rados_get_data_bad_size() { rados_get $dir $poolname $objname || return 1 # Leave objname and modify another shard - shard_id=$(expr $shard_id + 1) - set_size $objname $dir $shard_id $bytes $mode || return 1 - rados_get $dir $poolname $objname fail || return 1 + local another_shard_id=$(expr $shard_id + 1) + set_size $objname $dir $another_shard_id $bytes $mode || return 1 + if [ $shard_id -eq 1 -a $another_shard_id -eq 2 ]; + then + # we're reading 4 kb long object while the stripe size is 8 kb. + # as we do partial reads and this request can be satisfied + # from the undamaged shard 0, we expect a success. + rados_get $dir $poolname $objname || return 1 + else + # both shards 0 and 1 are demaged. there is no way no serve + # the requests, regardless of partial reads + rados_get $dir $poolname $objname fail || return 1 + fi rm $dir/ORIGINAL }