Venky Shankar [Mon, 17 Jun 2019 12:21:43 +0000 (08:21 -0400)]
mgr / volumes: purge queue for async subvolume delete
Support asynchronous subvolume deletes by handing off the delete
operation to a dedicated set of threads. A subvolume delete operation
renames the subvolume (subdirectory) to a unique trash path entry
and signals the set of worker threads to pick up entries from the
trash directory for background removal.
This commit implements a `thread pool` strategy as a class mixin.
Venky Shankar [Mon, 17 Jun 2019 09:36:54 +0000 (05:36 -0400)]
mgr / volumes: maintain connection pool for fs volumes
Right now every [sub]volume call does a connect/disconnect to the
cephfs filesystem. This is unnecessary and can be optimized by
caching the filesystem handle in a connection pool and (re)using
the handle for subsequent [sub]volume operations.
This would be useful for implementing features such as purge queue
for asynchronous subvolume deletes.
mgr / volumes: use negative error codes everywhere
cephfs python binding returns positive error code. mgr/volumes
incorrectly does error code checks assuming the error codes to
be negative.
this was not an issue till now since mgr/volumes mostly does a
`raise VolumeException()` for the most part followed by the exception
being displayed to the operator (one exception is catching cephfs
ObjectNotFound error, in which case -errno.ENOENT is returned
(and checked whereever required)).
test: cleanup removing all subvolumes before removing subvolume group
Test `test_subvolume_create_with_desired_mode_in_group()` creates three
subvolume in a subvolume group. During cleanup, it only removed two of
the three subvolumes. This causes failure when removing the subvolume
group since it's not empty.
crimson/net: check front_msg correctly during sweep
In order to check whether the front_msg is unchanged, we need to make sure:
* The sent message is not reused;
* The message to be checked is not freed;
in vstart.sh, if MDS is enabled, `ceph fs volume create` is used to
create cephfs volume. and `fs volume create` command is implemented by
`src/pybind/mgr/volumes/module.py`, which in turn uses `cephfs` python
binding indirectly. so we need to add `cephfs` to `vstart` target to
facilidate the cephfs development using vstart.
Sage Weil [Thu, 4 Jul 2019 02:26:57 +0000 (21:26 -0500)]
Merge PR #28865 into master
* refs/pull/28865/head:
mon/OSDMonitor: fix _lookup_snap to verify the pool matches
ceph_test_rados_api_*: make failing to clean up namespace non-fatal
osd: store purged_snaps history under separate object
Sage Weil [Wed, 3 Jul 2019 18:29:15 +0000 (13:29 -0500)]
osd: store purged_snaps history under separate object
We can't put this in the snapmapper object because filestore does not
allow multiple concurrent omap iterators on the same object. (This is a
limitation that could be fixed with some read/write locking, but not
without some significant changes to DBObjectMap; since that is old crufty
legacy code let's avoid touching it!)
Paul Emmerich [Tue, 2 Jul 2019 10:58:08 +0000 (12:58 +0200)]
debian/control: add python-routes dependency
the dashboard requires python-routes via cherrypy/_cpdispatch.py during runtime
but the cherrypy debian package only recommends it and doesn't depend on it
Fixes: https://tracker.ceph.com/issues/24420 Signed-off-by: Paul Emmerich <paul.emmerich@croit.io>
Sage Weil [Wed, 3 Jul 2019 13:22:34 +0000 (08:22 -0500)]
Merge PR #28330 into master
* refs/pull/28330/head:
osd: drop osd_lock during scrub
ceph_test_rados_api_tier_pp: tolerate ENOENT or success from deleted snap
osd: automatically scrub purged_snaps every deep scrub interval
osd: move scrub_purged_snaps to helper
osd/OSDMap: SERVER_OCTOPUS feature bit is now significant
ceph_test_rados_api_snapshots_pp: drop unnecessary assert
mon/OSDMonitor: record last_purged_snaps_scrub from beacon to osdmap
osd: report last_purged_snaps_scrub as part of beacon
osd: log purged_snaps scrub to cluster log
osd: record last_purged_snaps_scrub in superblock
osd/OSDMap: add last_purged_snaps_stamp to osd_xinfo_t
mon/OSDMonitor: fix bug in try_prune_purged_snaps
mon/OSDMonitor: record snap removal seq as purged
mon/OSDMonitor: do not bother reporting gaps in removed_snaps
osdc/Objecter: don't worry about gap_removed_snaps from map gaps
mds/SnapServer: make not about pre-octopus compat code
osd: implement scrub_purged_snaps command
osd/PrimaryLogPG: always remove the snap we are trimming
ceph_test_rados_api_snapshots_pp: (partial) test to reproduce stray clones
osd: sync old purged_snaps on startup after upgrade or osd creation
osd: record purged_snaps when we store new maps
mon/OSDMonitor: add messages to get past purged_snaps
mon/OSDMonitor: record pre-octopus purged snaps with first octopus map
mon/OSDMonitor: record purged_snaps for each epoch
mon/OSDMonitor: make_snap_epoch_key -> make_removed_snap_epoch_key
osd/osd_types: add purged_snaps_last to OSDSuperblock
osd/osd_types: clean up initial values for OSDSuperblock
mon/OSDMonitor: make {removed,purged}_snap storage more efficient
mon/OSDMonitor: move (removed, purged) snap update into a helper
mon/OSDMonitor: generalize/refactor lookup_*_snap
mon/OSDMonitor: refactor snap key and value helpers
mon/OSDMonitor: make_snap_key -> make_removed_snap_key, make_purged_snap_key
mon/OSDMonitor: fix lookup_purged_snap implementation
mon/OSDMonitor: lookup_pruned_snap -> lookup_purged_snap
osd: adjust snapmapper keys on first start as octopus
osd/SnapMapper: include poolid in snap index
mon/OSDMonitor: document osd snap metadata format
osd/SnapMapper: document stored keys and values
mon/OSDMonitor: use structured binding for prepare_remove_snaps
mon/OSDMonitor: send MRemoveSnaps back to octopus MDS
mds/SnapServer: handle MRemoveSnaps acks from mon
CMakeLists: include 'cephfs' (which includes libcephfs) in 'vstart' target
mon/PaxosService: add C_ReplyOp
vnewosd.sh: add script to add a new osd to an existing vstart
vstart.sh: remove useless auth add for osds
vstart.sh: wait for mgr volume module to start up
mon/OSDMonitor: make snap removal handle dups safely
mon/OSDMonitor: only update removed_snaps when pre-octopus
ceph_test_rados: stop doing long object names
ceph_test_rados_api_tier_pp: fix osd version checks
osd/PrimaryLogPG: use get_ssc_as_of for snapc for flushing clones
osd/PrimaryLogPG: only maintain SnapSet::snaps for pre-octopus compat
mon/OSDMonitor: only maintain pg_pool_t::removed_snaps for pre-octopus
osd/osd_types: mark SnapSet::snaps as legacy
osd/osd_types: SnapSet::get_ssc_as_of: use clone_snaps
osd/PrimaryLogPG: change fabrication of promoted clone snaps
osd/PrimaryLogPG: only filter SnapSet::snaps for flush for pre-octopus compat
osd/PrimaryLogPG: trim_objects: only filter SnapSet::snaps for pre-octopus
osd/PrimaryLogPG: make best effort to sanitize clones on copy-from
mds/SnapServer: int -> int32_t for encoded type
messages/MRemoveSnaps: int -> int32_t on encoded type
osd/PrimaryLogPG: find_object_context: trust SnapSet's clone_snaps
osd/PrimaryLogPG: use osdmap removed_snaps_queue for snap trimming
mon/OSDMonitor: avoid is_removed_snap()
osd/PeeringState: drop some mimic conditionals
osd/PG: drop pre-mimic snap_trimq code
osd/PeeringState: removed pre-mimic removed snap tracking
osd: move snap_interval_set_t to osd_types
mon: drop mon_debug_no_require_mimic
mon/OSDMonitor: remove pre-mimic snap behavior support
mon/OSDMonitor: remove support for pre-mimic conversion
osd/osd_types: remove build_removed_snaps(), maybe_update_removed_snaps()
osd: remove luminous compat code for removed_snaps
Reviewed-by: Samuel Just <sjust@redhat.com> Reviewed-by: Neha Ojha <nojha@redhat.com>
* glibc offers two variants of basename(). one modifies the content of
`path`, the other does not. to be standard compliant, and to fix
the FTBFS with musl-libc, we need to use the POSIX variant.
* #include <libgen.h> for basename(3), the POSIX compliant one.
see
http://pubs.opengroup.org/onlinepubs/009695399/functions/basename.html
Sage Weil [Wed, 26 Jun 2019 20:43:25 +0000 (15:43 -0500)]
osd: automatically scrub purged_snaps every deep scrub interval
With randomization.
We do this from tick() for simplicity. It is a rare event, will take 10s
of seconds at most, and nothing else particularly time-sensitive is
happening from tick().
Sage Weil [Fri, 21 Jun 2019 02:54:09 +0000 (21:54 -0500)]
mon/OSDMonitor: fix bug in try_prune_purged_snaps
If 'begin' isn't found, we'll get a [pbegin,pend) range back that was
nearby. Only if it overlaps the [begin,end) range do we want to shorten
our range to [begin,pbegin); the old assert was making the assumption
that the lookup would only return a range that was after 'begin', but in
reality it can return was that comes before it too.
Sage Weil [Thu, 20 Jun 2019 17:07:38 +0000 (12:07 -0500)]
mon/OSDMonitor: record snap removal seq as purged
When we delete a selfmanaged snap we have to bump seq. Record this as
purged so that we avoid discontinuities in the history and so our storage
is a bit more efficient.
Sage Weil [Wed, 12 Jun 2019 21:47:29 +0000 (16:47 -0500)]
osdc/Objecter: don't worry about gap_removed_snaps from map gaps
This was an attempt to ensure that we didn't let removed_snaps slip by
when we had a discontiguous stream of OSDMaps. In octopus, this can still
happen, but it's mostly harmless--the OSDs will periodically scrub to
clean up any resulting stray clones. It's not worth the complexity.