Soumya Koduri [Wed, 23 Jun 2021 18:00:11 +0000 (23:30 +0530)]
rgw/CloudTransition: Replace Coroutines with RGWRestConn APIs
To avoid the overhead of using coroutines during lifecycle transition,
RGWRESTStream* APIs are used to transition objects to remote cloud.
Also handled few optimizations and cleanup stated below:
* Store the list of cloud target buckets as part of LCWorker instead
of making it global. This list is maintained for the duration of
RGWLC::process(), post which discarded.
* Refactor code to remove coroutine based class definitions which are no
longer needed and use direct function calls instead.
* Check for cloud transitioned objects using tier-type and return error if
accessed in RGWGetObj, RGWCopyObj and RGWPutObj ops.
Soumya Koduri [Wed, 17 Mar 2021 21:12:54 +0000 (02:42 +0530)]
rgw/CloudTransition: Handle versioned objects
For versioned and locked objects, similar semantics as that of LifecycleExpiration are applied as stated below -
If the bucket versioning is enabled and the object transitioned to cloud is
- current version, irrespective of what the config option "retain_object" value is, the object is not deleted but instead delete marker is created on the source rgw server.
- noncurrent version, it is deleted or retained based on the config option "retain_object" value.
If the object is locked, and is
- current version, it is transitioned to cloud post which it is made noncurrent with delete marker created.
- noncurrent version, transition is skipped.
Also misc rebase fixes and cleanup -
* Rename config option to "retain_head_object"
to reflect its functionality to keep head object post transitioning
to cloud if enabled
Soumya Koduri [Sun, 7 Mar 2021 14:14:36 +0000 (19:44 +0530)]
rgw/CloudTransition: Skip transition to cloud if the object is locked
If an object is locked, skip its transition to cloud.
@todo: Do we need special checks for bucket versioning too?
If current, instead of deleting the data, do we need to create
a delete marker? What about the case if retain_object is set to true.
Soumya Koduri [Fri, 26 Feb 2021 16:48:52 +0000 (22:18 +0530)]
rgw/CloudTransition: Change tier-type to cloud-s3
Currently the transition is supported to cloud providers
that are compatible with AWS/S3. Hence change the tier-type to
cloud-s3 to configure the S3 style endpoint details.
Soumya Koduri [Wed, 4 Nov 2020 18:24:47 +0000 (23:54 +0530)]
rgw/CloudTransition: Fail GET on cloud tiered objects
As per https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html
GET operation may fail with “InvalidObjectStateError” error if the
object is in GLACIER or DEEP_ARCHIVE storage class and not restored.
Same can apply for cloud tiered objects. However STAT/HEAD requests
shall return the metadata stored.
Soumya Koduri [Sun, 16 Aug 2020 09:01:50 +0000 (14:31 +0530)]
rgw/CloudTransition: Verify if the object is already tiered
Add class to fetch headers from remote endpoint and verify if the object
is already tiered.
& Few other fixes stated below -
* Erase data in the head of cloud transitioned object
* 'placement rm' command should erase tier_config details
* A new option added in the object manifest to denote if the
object is tiered in multiparts
Soumya Koduri [Tue, 18 Aug 2020 07:02:22 +0000 (12:32 +0530)]
rgw/CloudTransition: Store the status of multipart uploads
Store the status of multipart upload parts to verify if the object
hasn't changed during the transition and if yes, abort the upload.
Also avoid re-creating target buckets -
Its not ideal to try creating target bucket for every object
transition to cloud. To avoid it caching the bucket creations in
a map with an expiry period set to '2*lc_debug_interval' for each
entry.
Soumya Koduri [Sun, 2 Aug 2020 19:54:19 +0000 (01:24 +0530)]
rgw/CloudTransition: Delete cloud tiered objects by default
Added a new option "retain_object" in tier_config which determines
whether a cloud tiered object is deleted or if its head object is
retained. By default the value is false i.e, the objects get
deleted.
XXX: verify that if Object is locked (ATTR_RETENTION), transition is
not processed. Also check if the transition takes place separately for
each version.
rgw/CloudTransition: Update object metadata and bi post cloud tranistion
After transitioning the object to cloud, following updates are done
to the existing object.
* In bi entry, change object category to CloudTiered
* Update cloud-tier details (like endpoint, keys etc) in Object Manifest
* Mark the tail objects expired to be deleted by gc
TODO:
* Update all the cloud config details including multiparts
* Check if any other object metadata needs to be changed
* Optimize to avoid using read_op again to read attrs.
* Check for mtime to resolve conflicts when multiple zones try to transition obj
Soumya Koduri [Wed, 23 Dec 2020 05:44:53 +0000 (11:14 +0530)]
rgw/CloudTransition: Tier objects to remote cloud
If the storage class configured is of cloud, transition
the objects to remote endpoint configured.
In case the object size is >mulitpart size limit (say 5M),
upload the object into multiparts.
As part of transition, map rgw attributes to http attrs,
including ACLs.
A new attribute (x-amz-meta-source: rgw) is added to denote
that the object is transitioned from RGW source.
Added two new options to tier-config to configure multipart size -
* multipart_sync_threshold - determines the limit of object size,
when exceeded transitioned in multiparts
* multipart_min_part_size - the minimum size of the multipart upload part
Default values for both the options is 32M and minimum value supported
is 5M.
rgw/CloudTransition: Add new options to configure tier endpoint
As mentioned in https://docs.google.com/document/d/1IoeITPCF64A5W-UA-9Y3Vp2oSfz3xVQHu31GTu3u3Ug/edit,
the tier storage class will be configured at zonegroup level.
So the existing CLI "radosgw-admin zonegroup placement add <id> --storage-class <class>" will be
used to add tier storage classes as well but with extra tier-config options mentioned below -
Adam Kupczyk [Sat, 13 Nov 2021 10:28:18 +0000 (11:28 +0100)]
os/bluestore: Fix omap upgrade to per-pg scheme
This is fix to regression introduced by fix to omap upgrade: https://github.com/ceph/ceph/pull/43687
The problem was that we always skipped first omap entry.
This worked fine with objects having omap header key.
For objects without header key we skipped first actual omap key.
Fixes: https://tracker.ceph.com/issues/53260 Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
Ronen Friedman [Sat, 13 Nov 2021 14:12:57 +0000 (14:12 +0000)]
osd/scrub: removing some safeguards against out-of-order scrub calls
as m_scrubber is created in the PG ctor, and only deleted in the dtor,
we should not encounter a scrub event dispatched to the PG without
having a valid scrubber sub-object.
Ronen Friedman [Mon, 9 Aug 2021 18:20:37 +0000 (18:20 +0000)]
osd/scrub: mark PG as being scrubbed, from scrub initiation to Inactive state
The scrubber's state-machine changes states only following a message dispatched
via the OSD queue. That creates some vulnerability periods, from when the
decision to change the state is made, till when the message carrying the event
is dequeued and processed by the state-machine.
One of the problems thus created is a second scrub being started on a PG, before
the previous scrub is fully terminated and cleaned up.
Here we add a 'being-scrubbed' flag, that is asserted when the first scrub
initiation message is queued and is only cleared when the state machine reaches
Inactive state after the scrub is done.
To note: scrub_finish() is now part of the FSM transition from WaitDigest to Inactive,
closing this specific vulnerability period;
Greg Farnum [Thu, 11 Nov 2021 20:20:11 +0000 (20:20 +0000)]
mon: MonMap: do not increase mon_info_t's compatv in stretch mode, for real
This was supposed to be fixed a year ago in commit 2e3643647bfbe955b54c62c8aaf114744dedb86e, but it set compat_v to 4 instead of all
the way back to 1 as it should have.
Our testing for stretch mode in these areas is just not very thorough -- the
kernel only supports compat_v 1 and apparently nobody's noticed the issue
since then? :/
As the prior commit says, you can't set locations without being gated on a
server feature bit, so simply cancelling this enforcement is completely safe.
Casey Bodley [Tue, 9 Nov 2021 02:24:52 +0000 (21:24 -0500)]
cls/rgw: index cancelation still cleans up remove_objs
when multipart uploads complete their final bucket index transaction,
they pass the list of part objects in 'remove_objs' for bulk removal -
the part objects, along with their bucket stats, get replaced by the
head object
but if CompleteMultipart races with another upload, the head object
write will fail with ECANCELED and the bucket index transaction gets
canceled with CLS_RGW_OP_CANCEL. these canceled uploads still need to
clean up their 'remove_objs', but cancelation was returning too early.
as a result, these bucket index entries get orphaned and leave the
bucket stats inconsistent
this commit reworks rgw_bucket_complete_op() so that CLS_RGW_OP_CANCEL
is handled the same way as OP_ADD and OP_DEL, so always runs the loop to
clean up 'remove_objs'
Mark Kogan [Mon, 15 Nov 2021 15:50:49 +0000 (15:50 +0000)]
rgwi/beast: stream timer with duration 0 disables timeout
fixes all S3 operations failing with:
`2021-11-15T15:46:05.992+0000 7ffee17fa700 20 failed to read header: Bad file descriptor`
when `--rgw_frontends="beast port=8000 request_timeout_ms=0"`
Sage Weil [Mon, 15 Nov 2021 15:43:32 +0000 (10:43 -0500)]
cephadm: only make_log_dir for ceph daemons
For non-ceph daemons, (1) they don't log to /var/log/ceph, and (2) the
container image isn't a ceph image, which means the uid/gid extraction
won't have the correct uid/gid and we'll end up with a log directory that
ceph daemons no longer have write permissions for.
Fixes: https://tracker.ceph.com/issues/53257 Signed-off-by: Sage Weil <sage@newdream.net>
The calls to remove a bucket had parameters to specify a prefix and
delimiter, which does not make sense. This was precipitated due to some
existing Swift protocol logic, but buckets are removed irrespective of
prefix and delimiter. So the functions and calls are adjusted to
remove those parameters. Additionally, those same parameters were
removed for aborting incomplete multipart uploads.
Additionally a bug is fixed in which during bucket removal, multipart
uploads were only removed if the prefix was non-empty.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Greg Farnum [Fri, 12 Nov 2021 23:05:02 +0000 (23:05 +0000)]
mon: MonMap: display disallowed_leaders whenever they're set
In c59a6f89465e3933631afa2ba92e8c1ae1c31c06, I erroneously changed
the CLI display output so it would only dump disallowed_leaders in
stretch mode. But they can also be set in connectivity or disallow
election modes and we want users to be able to see them then as well.
J. Eric Ivancich [Thu, 11 Nov 2021 22:20:55 +0000 (17:20 -0500)]
rgw: make some logging easier to read
While __PRETTY_FUNCTION__ includes more information, it can clutter
the logs. So this reverts some uses of __PRETTY_FUNCTION__ back to
__func__.
I'm thinking that a strategy going forward is for the function entry
logging to use __PRETTY_FUNCTION__ to disambiguate overloaded
functions, but all others in the function simply to use __func__.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
J. Eric Ivancich [Thu, 11 Nov 2021 16:10:17 +0000 (11:10 -0500)]
rgw: add ability to easily display ListParams
During debugging it can be useful to see all the contents of
rgw::sal::Bucket::ListParams. This allows the structure to be dumped
to an output stream in human-readable format.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
J. Eric Ivancich [Thu, 11 Nov 2021 22:20:24 +0000 (17:20 -0500)]
rgw: fix `bi put` not using right bucket index shard
When `radosgw-admin bi put` adds an entry for an incomplete multipart
upload, the bucket index shard is not calculated correctly. It should
be based on the name of the ultimate object. However the calculation
was including the added organizational suffixes as well. This corrects
that.
NOTE: When entries are not put in the correct index shard, unordered
listing becomes unreliable, perhaps causing entries to be skipped or
infinite loops to form.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>