Aashish Sharma [Fri, 4 Oct 2024 10:54:02 +0000 (16:24 +0530)]
mgr/dashboard: increase timeout to detect replication user in the secondary cluster
Increase timeout to detect replication user in the secondary cluster in rgw multisite automation wizard. Currently its set to 2 mins, increase it to 5 minutes.
when you import realm token to the secondary cluster, we wait for the replication/system user we created in the primary cluster to be present in the secondary cluster and when we find that user we set the credentials in the secondary cluster using ceph dashboard set-rgw-crdentials . The timeout for this is set to 2 minutes and sometimes it takes more than 2 minutes for the user to be replicated in the secondary cluster
Zac Dover [Fri, 4 Oct 2024 13:21:32 +0000 (23:21 +1000)]
doc/governance: add exec council responsibilites
Add the Ceph Executive Council's responsibilties to the
doc/governance.rst document. It was decided during the weekly CLT
meeting on 30 Sep 2024 to add this to the ceph/ceph git repository.
mgr/smb: fix condition for smb earmark when cluster_id doesn't match
This commit resolves an issue where accessing `earmark.split('.')[2]` would cause a
"list index out of range" error when the earmark is set to just "smb" without additional scopes.
The fix introduces a parsing function to safely handle earmarks, ensuring proper behavior
even when no cluster ID or additional scopes are present.
John Mulligan [Tue, 1 Oct 2024 15:27:44 +0000 (11:27 -0400)]
cephadm: use a shared smb.conf for clustered smb container sets
Use a shared smb.conf when deploying ctdb enabled containers. There was
a problem updating configs on the ctdb enabled clusters and the issue
was that the configwatch sidecar was not using CTDB, rather it had a
"default" copy of smb.conf that enabled only registry config, but not
CTDB. Examining the cluster this problem was found to be general to all
sidecars that are either sambacc based (not starting smbd, winbindd,
etc) and the smbmetrics sidecar.
Fixes: https://tracker.ceph.com/issues/68322 Signed-off-by: John Mulligan <jmulligan@redhat.com>
RGW: Cloud Restore cli and its corresponding response for user.
* For first and repititive request 202 Accepted will be corresponding response code.
* For CloudRestored status 200 OK will be corresponding response code.
* For conflicting requests 409 Conflict corresponding response code.
Also Fixed storage class update while listing objects.
Earlier while restoring object temporarily list-objects (s3api) and
radosgw-admin bucket list didn't have updated storage class. With this
fixed it now has the cloudtier storage class.
* It allows read-through for cloud-tiered objects via restore_obj_from_cloud
* New tier config options user need to set allow_read_through to true and
read_through_restore_days more than 1 for this feature to work, also
objects with retain_head_object will be available for this feature.
* First get request will fail with restoring in progress error, objects
are downloaded asynchronously.
* The objects restore are temporary.
* Tested `aws s3api get-object`, `aws s3api head-object` and `aws s3 cp`
In addition send timeout errors for first readthrough request
Also addressed lint warning and other cleanup(review comments)
Signed-off-by: Jiffin Tony Thottan <thottanjiffin@gmail.com>
Soumya Koduri [Thu, 3 Oct 2024 02:33:20 +0000 (08:03 +0530)]
rgw/cloudtier: Restore object from cloud endpoint
1)Add functionality to restore cloud-transitioned objects on demand.
Current commit has below -
* Given <bucket,object>, fetch the object from the cloud endpoint.
* if days provided and > 0, the restore is marked temporary with expiry date.
* Without <days>, it is marked as permanent restore.
2)Use ObjectExpirer/delete_at attr to delete temp objects
For temporarily restored objects, set delete_at attr to the expiration time.
This will add those objects to ObjectExpirer list. Use LC worker thread to
scan that list and delete expired objects. By delete here, it means to delete
restored object data and reset HEAD object as Cloud-transitioned object as it
was before restore.
In addition below changes are done -
* If temporary, object is still marked RGWObj::CloudTiered and mtime is set same as
transition time.
* If permanent, object is marked RGWObj::Main and mtime is set to restore time (now()).
* rgw_restore_debug_interval option added to set configure restore Days (similar to rgw_lc_debug_interval)
There is an issue with ObjectExpirer code where in if an object is added
to ObjectExpirer list and is re-written, it is not deleted from the expirer list
and hence the new object may get deleted. Fixed the same and also addressed
minor review comments.
3)Design doc added
4) ObjCategory should be set to CloudTiered only for cloud-transitioned
objects and temporarily restored objects. Permanent copies are to be
treated as regular objects.
Dan Mick [Wed, 26 Jun 2024 02:07:41 +0000 (19:07 -0700)]
Add Containerfile and build.sh to build it.
The intent is to replace ceph-container.git, at first for ci containers
only, and eventually production containers as well.
There is code present for production containers, including
a separate "make-manifest-list.py" to scan for and glue the two
arch-specific containers into a 'manifest-list' 'fat' container,
but that code is not yet fully tested.
This code will not be used until a corresponding change to the
Jenkins jobs in ceph-build.git is pushed.
Note that this tooling does not authenticate to the container repo;
it is assumed that will be done elsewhere. Authentication is
verified by pushing a minimal image to the requested repo.
mgr/dashboard: Allow adding all listeners unders a subsystems
Issue:
- Currently a user cannot add all listeners under a subsystem
- This results into an error: `Failure adding nqn.2001-07.com.ceph:1725013182540 listener at 10.70.44.140:4420: Gateway's host name must match current host (dhcp47-54)`
Reason:
- The gateway address used while creating listener is random now in nvmeof client
- After checking the gateway logs of each node, its is found that no grpc request recieved for adding listener on the respective node rather going to the node that is chosen by default in nvmeof client.
- But nvmeof backend check that current gateway matches the one with sent in request for adding listener (ref: https://github.com/ceph/ceph-nvmeof/blob/devel/control/grpc.py#L2104)
Fix:
- Using `traddr` from listener API to set the current gateway address
- Since `traddr` gives only IP address, without port therefore extracting full address from `NvmeofGatewaysConfig.get_gateways_config()`
- This ensures correct path usage
This effort was started long time ago by Mike Perez.[1] I now completed the remaining steps to achieve the Passing level of the OpenSSF Best Practices Badge.[2]
It should be used as an opportunity to implement best practices in the Ceph community. For example, the Passing level was achieved without meeting the optional compliance with Dynamic Analysis or Security Scanning.
- when no group is set a bad request is made `?gw_group=null`
- this causes a dashboard exception to be raised as well
- avoiding sending any request when no group is present
- fixes casing of "hostname" in gateway column
doc/rados: edit "Placement Groups Never Get Clean"
Make grammar improvements (and correct a verb disagreement) in the
section "Placement Groups Never Get Clean" in
doc/rados/troubleshooting/troubleshooting-pg.rst.
* the steps performed by the Windows CI job
* artifact structure
* frequently asked questions
The document is meant to assist the Ceph developers in investigating
CI failures. This is especially important as the Windows CI job runs
integration tests that would otherwise only be executed by
Teuthology, thus helping catch potential regressions quickly.
Note that the identified regressions are not necessarily Windows
specific, usually affecting Linux builds as well.
Prachi Goel [Sun, 11 Aug 2024 17:05:59 +0000 (22:35 +0530)]
mgr/dashboard: rbd table actions enhancements
Fixes:https://tracker.ceph.com/issues/67198
Below are the changes:-
Delete option moved to last in list.
Delete is disabled in case image is secondary.
Move to trash should be second last option.
Tooltip is added for ‘Move to Trash’ and ‘Delete’
Tooltip is added for Copy option
Subheading is added for Copy Form.
Tooltip is added for ‘Flatten’ option.
Delete tooltip content has been in case image is primary.
Leonid Chernin [Thu, 26 Sep 2024 13:47:00 +0000 (13:47 +0000)]
mon/nvmeofgw*: fix tracking gateways in DELETING state
1.Ignore subsystems of GWs in state DELETING when calculate number namespaces
2.Call tracking function always in the monitor's tick - not just if the
beacon is active
Signed-off-by: Leonid Chernin <leonidc@il.ibm.com>
instead of calling ioctx.get_last_version() after a rados operation,
callers now pass version_t* as an output parameter. in the null_yield
case, that version is assigned to ioctx.get_last_version() as normal. in
the async case, we get the version out of librados::async_operate()'s
return value
in bucket incremental sync, if a generation other than the current generation is requested,
we mark it with -EAGAIN and retry in error repo in RGWDataSyncSingleEntry(). within this block,
we check if that requested generation is lesser than the current gen, write it to error repo,
set -EAGAIN and write it to error repo in the outer function once again. don't duplicate error
repo entry addition for this condition.