Leonid Usov [Thu, 28 Mar 2024 05:32:26 +0000 (01:32 -0400)]
mds/quiesce: prevent an overflow of the wait duration
QuiesceTimeInterval::max() may overflow inside of a call to
std::condition_variable::wait_for and result in a busy-loop,
making the call to timeout immediately
The solution is to cap the wait duration to a value which can
certainly fit in whichever clock std library is using internally.
Fixes: https://tracker.ceph.com/issues/65276 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
(cherry picked from commit 508e870ee383265b8489e18a4c73854616a4110a)
Niklas Hambüchen [Sat, 30 Mar 2024 16:42:48 +0000 (17:42 +0100)]
doc/rados/operations: Improve crush_location docs
* Fix incorrect syntax
* Use underscores for config options, like other ceph docs did
* Fix incorrect statement that crush_location_hook adds fiels; it replaces
* Explain `root=default host=HOSTNAME` is not set if `crush_location` is given
* Remove duplication across sections
* Point out that `root=default` is important
Afreen [Fri, 1 Mar 2024 07:26:25 +0000 (12:56 +0530)]
mgr/dashboard: Locking improvements in bucket create form
Fixes https://tracker.ceph.com/issues/64658
- Addition of help texts
- Addition of info/warnings related to modes and versioning
- change of Locking section layout
- renaming locking to 'Object Locking'
- changes default retention period to 10
- edit bucket only shows lock when its enabled
Adam King [Mon, 19 Feb 2024 16:14:11 +0000 (11:14 -0500)]
cephadm: create ceph-exporter sock dir if it's not present
Since this is usually /var/run/ceph/ which ends up getting
created by other daemons as well, it was common to see
ceph-exporter fail to deploy and then deploy fine after
once other daemons were down on the host. I don't see any
reason we can't just try to make the directory here instead
of bailing out.
Fixes: https://tracker.ceph.com/issues/64491wq Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 862fca945f5bf48144b6a589f1d3cd971444daf7)
John Mulligan [Fri, 24 Nov 2023 19:45:34 +0000 (14:45 -0500)]
cephadm: call container daemon form prepare_data_dir
Instead of always climbing through an "if ladder" based on daemon type
variables we will have the option of using the common method provided
by container daemon form classes. This will initially be used by the
smb daemon. I don't have the energy to refactor all the existing stuff
at the moment.
John Mulligan [Fri, 24 Nov 2023 19:45:15 +0000 (14:45 -0500)]
cephadm: add a prepare_data_dir method to container daemon form
The prepare_data_dir method is a general way for classes to prepare the
data dir (e.g. `/var/lib/ceph/$FSID/$DAEMON_TYPE.$DAEMON_ID`) before
containers will use it.
John Mulligan [Fri, 24 Nov 2023 19:45:01 +0000 (14:45 -0500)]
cephadm: allow passing pathlib.Path objects to file_utils.makedirs
Update the type annotations to allow passing pathlib.Path objects to the
makedirs function. All the calls makedirs uses already can accept Path
objects. This causes mypy to accept calling makedirs with a Path
argument and Paths are nice.
Patrick Donnelly [Mon, 25 Mar 2024 17:58:13 +0000 (13:58 -0400)]
Merge PR #56406 into squid
* refs/pull/56406/head:
doc/dev: update quiesce developer document
qa: wrap quiesce verification to dump debugging on error
qa: update quiesce tests for control via locallock
qa: set archive path in vstart_runner
qa: refactor CephFSMount.kill_background to optionally kill all background jobs
qa: use kwarg for rank parameter
qa: simplify calls to (rank|mds)_(tell|asok)
Revert "pybind/mgr/volumes: block quiesce for critical .meta file"
mds: remove is_root indication on quiesce_inode op
mds: prevent new lock cache cons when invalidating an existing one
mds: use XLOCK_WAIT For local lock xlockers
mds: prevent new wrlocks on LocalLock if there exists any xlock waiter
mds: block import discover when parent directory inode is quiesced
mds: avoid issuing exclusive caps to clients lacking w caps
mds: print lock cache during invalidation
mds: use inodeno_t to track quiesce requests
mds: dispatch quiesce_inode ops after dir traversal
mds: remove quiescelock handling for SimpleLock type
mds: quiescelock as local lock + cap masking
qa: run quiesce unit tests in fs:functional
qa: add quiesce protocol unit tests
qa: detect partial migrations during large config of dist epin
qa: use stdin-killer to timeout run_shell_payload
qa: simplify run_shell argument processing
doc: add dev docs for quiesce protocol
pybind/mgr/volumes: block quiesce for critical .meta file
mds: add vxattr to block quiesce on an inode
mds: convert encoded ephemeral dist pin to flags
mds: add counter to throttle quiesce
mds: add quiesce set feature flag
mds: skip non-head inodes for quiesce
mds: add quiesce op
mds: print all SimpleLock flags in debug output
mds: pretty print mutation when dumping lock
mds: add new inode quiescelock
mds: use 128 bits for waiters on MDSCacheObject
mds: provide mechanism to authpin while freezing
mds: add command to get specific op
mds: finish request before completing internal req
mds: complete internal op if killed
mds: avoid killing dead requests
mds: add command to kill request
mds: add path argument to `ops` and `dump tree` to stream result to local file
mds: print internal_request filepaths if present
mds: add more information to debug message
mds: remove redundant parenthesis
mds: implement Mutation::dump method
mds: make LockType fields const
mds: annotate mdr with try_rdlock_snap_layout failure
mds: refactor if into switch
mds: call Locker method using this
mds: simplify assert
mds: dump locks passed to Locker::acquire_locks
mds: add LockOp::print method for debugging
mds: use new insert template via print
mds: add request result to mutation for analysis by tests
mds: add comment on locking order rules
mds: allow specifying rdlock position
mds: remove dead method
common: provide a template for object dumps
common: support long running ops without slow warnings
common: simplify loop
common: add JSONFormatterFile class
common: use more efficient vector for stack
include: use larger int for large gathers
Patrick Donnelly [Mon, 25 Mar 2024 17:57:14 +0000 (13:57 -0400)]
Merge PR #56407 into squid
* refs/pull/56407/head:
qa/cephfs: stop ignoring MON_DOWN globally
qa: extend mon timeout coming up after mondb creation
qa: update dashboard schema for mon_status
mon: do not log MON_DOWN if monitor uptime is less than threshold
Nizamudeen A [Tue, 19 Mar 2024 14:57:13 +0000 (20:27 +0530)]
mgr/dashboard: rm warning/error threshold for cpu usage
for multi-core cpu's the value can be more than 100% so it doesn't make
sense to show warning/error when the usage is at or more than 100%.
hence removing it
Ernesto Puerta [Wed, 13 Mar 2024 13:06:10 +0000 (14:06 +0100)]
mgr/dashboard: fix NVMeoF API
* Update NVMe-oF gRPC Proto to 1.0.0
* Error handling,
* Missing PATCH for certain namespace ops (resize, set QoS, set balance
groups),
* Stop bypassing gRPC payloads and validate those in the back-end,
* Fix incorrect HTTP 1.1 semantics for some POST/DELETE and URIs.
* Catch errors/exceptions.
* Clean-up EndpointDoc Params
* Run Black linter.
* Remove most of NVMeoFClient glue code between gRPC and controller.
* Fix namespace delete endpoint by exposing trsvcid
* nvmeof io_stats support
Nizamudeen A [Tue, 5 Mar 2024 10:39:25 +0000 (16:09 +0530)]
mgr/dashboard: fix nvmeof api documentation
From Aviv:
POST /api/nvmeof/hosts - the description of the command it wrong IMO. It is not about creating a host. It is about allowing a host X to access subsystem Y.
GET /api/nvmeof/hosts/{subsystem_nqn} - also the description is not accurate. The command lists all hosts that are allowed to access this subsystem.
DELETE /api/nvmeof/hosts/{subsystem_nqn}/{host_nqn} - again the description should be changed as above.
POST /api/nvmeof/namespace - bad formatting of the description
GET /api/nvmeof/subsystem - the description is wrong, should say - "List all NVMeoF subsystems". And it shouldn't get any param.
POST /api/nvmeof/subsystem - few issues here. The serial_number, and max_namespaces are optional (we need to mention that). Also it is missing the --enable-ha argument that is also optional.
Some commands are missing: log_level, connection.i