Following support is added,
- Ability to retain snapshots on subvolume deletion
- Modify directory where snapshot is created to the subvolume
- "features" supported to subvolume info output, specifically ability
for a subvolume to retain snashots
- Current state of subvolume in info output
- Auto upgrade to v2 from eligible v1 subvolumes
- Adjust other functions as needed to support the changes
mgr/volumes: Use operation type during subvolume open
Subvolume open currently takes in 2 optional parameters to
denote desired state and type. This enables the open to
allow the operation to suceed based on the (type, state)
tuple.
Instead, pass an operation type to be performed on a subvolume
during open, and decide internal to a subvolume version if the
operation is allowed based on its state and type.
Also modifies the state machine code, to be more amenable to
modifications and improves redability.
Patrick Donnelly [Wed, 29 Jul 2020 18:05:02 +0000 (11:05 -0700)]
Merge PR #24068 into master
* refs/pull/24068/head:
mds: rename {CDir,Migrator}::cache to mdcache
mds: make MDSCacheObject::is_ambiguous_auth() virtual
mds: make sure rename old inode's parent dirfrag is projected.
mds: track projected inode/fnode in Mutation
mds: use smart pointer to manager CDir::fnode
mds: use smart pointer to manage CInode::{inode,xattrs,old_inodes}
osdc/Filer: make layout pointer const
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Yan, Zheng [Thu, 7 May 2020 02:33:12 +0000 (10:33 +0800)]
mds: make sure rename old inode's parent dirfrag is projected.
if rename dest dentry is remote dentry, Server::_rename_prepare() only
pre dirty old inode, but does not project fnode for old inode's parent
dirfrag. This will trigger a assertion (introduced by previous commit)
in CDir::mark_dirty().
we should not remove an element while iterating it in a map, as erasing
the element invalidates the iterator, which causes segmfault when we are
advancing it after erasing the dereferenced element.
in this change, an iterator is used for walking through the map, in
comparision with creating a to-be-removed list, this one is more
efficient and more idiomatic.
If the first message in a map message had a crc error, then the
loop would exit with last < start, which would then cause a null
dereference in _committed_osd_maps.
Fixes: https://tracker.ceph.com/issues/46443 Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
Changcheng Liu [Thu, 23 Jul 2020 03:09:46 +0000 (11:09 +0800)]
doc: specify RBD_LOCK_MODE_EXCLUSIVE for exclusive-lock
The exclusive-lock could be transited transparently between clients
after finishing write operation. To disable "transparent" transition,
it needs to acquire the lock with RBD_LOCK_MODE_EXCLUSIVE.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
both has protocol type of "any". so, to enable crimson to use settings
like this, we should let crimson to accept them, and drop the connection
if the peer claim to be using an incompatible protocol, when they are
exchanging banners.
Tatjana Dehler [Tue, 28 Jul 2020 11:18:56 +0000 (13:18 +0200)]
mgr/dashboard: wait longer for health status to be cleared
Because of reasons the cluster needs more time to recover from
HEALTH_WARN while changes are made by `test_pool_update_metadata`.
Lets wait several times for the cluster status to be HEALTH_OK
again.
Fixes: https://tracker.ceph.com/issues/46573 Signed-off-by: Tatjana Dehler <tdehler@suse.com>
Matthew Oliver [Wed, 22 Jul 2020 07:09:12 +0000 (17:09 +1000)]
cephadm: Add tcmu-runner container when deploying ceph-iscsi
Currently when we deploy ceph-iscsi via cephadm it doesn't include a
running tcmu-runner. Which means initiators will be able to login but
you wont see the LUNS on the initiator.
This patch deploys an additional tcmu-runner container along side the
ceph-iscsi container that just runs the tcmu-runner service.
Fixes: https://tracker.ceph.com/issues/46540 Signed-off-by: Matthew Oliver <moliver@suse.com>
Jason Dillaman [Fri, 24 Jul 2020 16:13:10 +0000 (12:13 -0400)]
librbd: use task finisher thread for image open/close callbacks
There was a potential race condition with utilizing the AsioEngine
to deliver asynchronous image open and close callbacks. This left
the potential for the io_context thread to attempt to destroy itself.
This commit changes the behavior of the image open and close callbacks
to always delete the ImageCtx (now matches the synchronous API behavior)
and it always invokes the callback in Finisher thread whose lifetime is
tied to the CephContext.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
`ceph-volume lvm zap` command fails under certain conditions.
when passing `--osd-id` or `--osd-fsid` to `ceph-volume lvm zap` command
it tries to zap additionnal devices that have nothing to do with the osd
being zapped.
When calling `api.get_lvs()` in `ensure_associated_lvs()` we have to
pass the osd-id/osd-fsid information so only related devices are
returned by `get_lvs()` method
crimson/os/alienstore: always use fsid in bluestore
alienstore should not be stateful in this perspective, it should proxy
all acccess of fsid to bluestore.
there are couple issues in existing implementation:
* when mkfs, bluestore tries to generate a new osd_fsid if the specified
one is empty. but we explicitly pass the given uuid down to
AlienStore::mkfs() so the bluestore can use it. so we should pass it
down instad of storing it locally.
* when persisting superblock in OSD::mkfs(), superblock.osd_fsid() is
read from store->get_fsid(), if user specifies an empty uuid, we
should persist the generated uuid in the superblock.
in this change, all access to fsid is proxied to the underlying
bluestore.
osd sends a MOSDMarkMeDown message to monitor and waits for its ack
before timeout, so if we can stop osd before stopping mon, stop.sh can
return sooner without waiting until the timeout.
to avoid the attempts to connect an OSD which is bound to a v2
address to a v1 address of a mgr.
in general, osd is bound to both v1 and v2 addresses, but crimson
msgr does not support multiple bound address at the time of writing, so
to avoid the failures when trying to connect to incompatible addresses,
let's filter out them when connecting to monitor. this change
silence warnings like:
peer_addr_for_me v1:172.21.15.106:60008/0 type doesn't match myaddr
v2:0.0.0.0:6802/26710
to avoid the attempts to connect an OSD which is bound to a v2 address
to a v1 addrss of a monitor.
in general, osd is bound to both v1 and v2 addresses, but crimson msgr
does not support multiple bound address at the time of writing, so to
avoid the failures when trying to connect to incompatible addresses,
let's filter out them when connecting to monitor. this change silence
warnings like:
peer_addr_for_me v1:172.21.15.106:60008/0 type doesn't match myaddr
v2:0.0.0.0:6802/26710
This will crash the choose_acting() procedure as it will mistakenly
think that peer 3 should continue to perform asynchronous recovery
(e.g., due to num_objects_missing = 1) in contrast to fully
backfill-recovered.
While I did not dig into the real cause, there are a couple of
possible explanations of how num_objects can be off. I think that
if a roll forward or log replay could delete something twice, maybe
there would be an undercount. Or maybe something as simple as a
corruption.
Since _update_calc_stats() is going to fix num_objects_missing
for that peer anyway, let's make sure it always starts with a
clean state.