MOSDScrub2 is sent from mgr for serving "ceph pg
{scrub|deep-scrub|repair}' commands when it's talking to a mimic and newer OSD.
ceph task checks if all pgs are scrubbed by looking at the `last_scrub_stamp` fields
in the `ceph pg dump` output. and it request the not-yet-scrubbed pgs a
deep scrub to ensure they are scrub before timeout.
in this change, crimson handles MOSDScrub2 by starting a remote peering
request, and the underlying peering_state will notify the corresponding
PG to start scrub. to get the test pass, a minimal implmentation is
added to update the scrub timestamp to `now` upon request of
peering_state.
we will need to add the correct scrubbing support later. but this is
enough for passing the thrasher test and for preparing for more tests
which uses the "ceph" task.
Patrick Donnelly [Wed, 29 Jul 2020 18:05:02 +0000 (11:05 -0700)]
Merge PR #24068 into master
* refs/pull/24068/head:
mds: rename {CDir,Migrator}::cache to mdcache
mds: make MDSCacheObject::is_ambiguous_auth() virtual
mds: make sure rename old inode's parent dirfrag is projected.
mds: track projected inode/fnode in Mutation
mds: use smart pointer to manager CDir::fnode
mds: use smart pointer to manage CInode::{inode,xattrs,old_inodes}
osdc/Filer: make layout pointer const
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Yan, Zheng [Thu, 7 May 2020 02:33:12 +0000 (10:33 +0800)]
mds: make sure rename old inode's parent dirfrag is projected.
if rename dest dentry is remote dentry, Server::_rename_prepare() only
pre dirty old inode, but does not project fnode for old inode's parent
dirfrag. This will trigger a assertion (introduced by previous commit)
in CDir::mark_dirty().
client: expose ceph.quota.max_bytes xattr within snapshots
For directories within snapshots, expose the ceph.quota.max_bytes
extended attribute information. This enables fetching quota
information when the snapshot was taken and is particularly useful
when cloning subvolume snapshots, to enforce the quota on the
clone subvolume as well.
we should not remove an element while iterating it in a map, as erasing
the element invalidates the iterator, which causes segmfault when we are
advancing it after erasing the dereferenced element.
in this change, an iterator is used for walking through the map, in
comparision with creating a to-be-removed list, this one is more
efficient and more idiomatic.
If the first message in a map message had a crc error, then the
loop would exit with last < start, which would then cause a null
dereference in _committed_osd_maps.
Fixes: https://tracker.ceph.com/issues/46443 Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
Changcheng Liu [Thu, 23 Jul 2020 03:09:46 +0000 (11:09 +0800)]
doc: specify RBD_LOCK_MODE_EXCLUSIVE for exclusive-lock
The exclusive-lock could be transited transparently between clients
after finishing write operation. To disable "transparent" transition,
it needs to acquire the lock with RBD_LOCK_MODE_EXCLUSIVE.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
both has protocol type of "any". so, to enable crimson to use settings
like this, we should let crimson to accept them, and drop the connection
if the peer claim to be using an incompatible protocol, when they are
exchanging banners.
Tatjana Dehler [Tue, 28 Jul 2020 11:18:56 +0000 (13:18 +0200)]
mgr/dashboard: wait longer for health status to be cleared
Because of reasons the cluster needs more time to recover from
HEALTH_WARN while changes are made by `test_pool_update_metadata`.
Lets wait several times for the cluster status to be HEALTH_OK
again.
Fixes: https://tracker.ceph.com/issues/46573 Signed-off-by: Tatjana Dehler <tdehler@suse.com>
Matthew Oliver [Wed, 22 Jul 2020 07:09:12 +0000 (17:09 +1000)]
cephadm: Add tcmu-runner container when deploying ceph-iscsi
Currently when we deploy ceph-iscsi via cephadm it doesn't include a
running tcmu-runner. Which means initiators will be able to login but
you wont see the LUNS on the initiator.
This patch deploys an additional tcmu-runner container along side the
ceph-iscsi container that just runs the tcmu-runner service.
Fixes: https://tracker.ceph.com/issues/46540 Signed-off-by: Matthew Oliver <moliver@suse.com>
Jason Dillaman [Fri, 24 Jul 2020 16:13:10 +0000 (12:13 -0400)]
librbd: use task finisher thread for image open/close callbacks
There was a potential race condition with utilizing the AsioEngine
to deliver asynchronous image open and close callbacks. This left
the potential for the io_context thread to attempt to destroy itself.
This commit changes the behavior of the image open and close callbacks
to always delete the ImageCtx (now matches the synchronous API behavior)
and it always invokes the callback in Finisher thread whose lifetime is
tied to the CephContext.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
`ceph-volume lvm zap` command fails under certain conditions.
when passing `--osd-id` or `--osd-fsid` to `ceph-volume lvm zap` command
it tries to zap additionnal devices that have nothing to do with the osd
being zapped.
When calling `api.get_lvs()` in `ensure_associated_lvs()` we have to
pass the osd-id/osd-fsid information so only related devices are
returned by `get_lvs()` method
crimson/os/alienstore: always use fsid in bluestore
alienstore should not be stateful in this perspective, it should proxy
all acccess of fsid to bluestore.
there are couple issues in existing implementation:
* when mkfs, bluestore tries to generate a new osd_fsid if the specified
one is empty. but we explicitly pass the given uuid down to
AlienStore::mkfs() so the bluestore can use it. so we should pass it
down instad of storing it locally.
* when persisting superblock in OSD::mkfs(), superblock.osd_fsid() is
read from store->get_fsid(), if user specifies an empty uuid, we
should persist the generated uuid in the superblock.
in this change, all access to fsid is proxied to the underlying
bluestore.
osd sends a MOSDMarkMeDown message to monitor and waits for its ack
before timeout, so if we can stop osd before stopping mon, stop.sh can
return sooner without waiting until the timeout.
to avoid the attempts to connect an OSD which is bound to a v2
address to a v1 address of a mgr.
in general, osd is bound to both v1 and v2 addresses, but crimson
msgr does not support multiple bound address at the time of writing, so
to avoid the failures when trying to connect to incompatible addresses,
let's filter out them when connecting to monitor. this change
silence warnings like:
peer_addr_for_me v1:172.21.15.106:60008/0 type doesn't match myaddr
v2:0.0.0.0:6802/26710
to avoid the attempts to connect an OSD which is bound to a v2 address
to a v1 addrss of a monitor.
in general, osd is bound to both v1 and v2 addresses, but crimson msgr
does not support multiple bound address at the time of writing, so to
avoid the failures when trying to connect to incompatible addresses,
let's filter out them when connecting to monitor. this change silence
warnings like:
peer_addr_for_me v1:172.21.15.106:60008/0 type doesn't match myaddr
v2:0.0.0.0:6802/26710