Sage Weil [Wed, 3 Jul 2019 18:29:15 +0000 (13:29 -0500)]
osd: store purged_snaps history under separate object
We can't put this in the snapmapper object because filestore does not
allow multiple concurrent omap iterators on the same object. (This is a
limitation that could be fixed with some read/write locking, but not
without some significant changes to DBObjectMap; since that is old crufty
legacy code let's avoid touching it!)
Sage Weil [Wed, 26 Jun 2019 20:43:25 +0000 (15:43 -0500)]
osd: automatically scrub purged_snaps every deep scrub interval
With randomization.
We do this from tick() for simplicity. It is a rare event, will take 10s
of seconds at most, and nothing else particularly time-sensitive is
happening from tick().
Sage Weil [Fri, 21 Jun 2019 02:54:09 +0000 (21:54 -0500)]
mon/OSDMonitor: fix bug in try_prune_purged_snaps
If 'begin' isn't found, we'll get a [pbegin,pend) range back that was
nearby. Only if it overlaps the [begin,end) range do we want to shorten
our range to [begin,pbegin); the old assert was making the assumption
that the lookup would only return a range that was after 'begin', but in
reality it can return was that comes before it too.
Sage Weil [Thu, 20 Jun 2019 17:07:38 +0000 (12:07 -0500)]
mon/OSDMonitor: record snap removal seq as purged
When we delete a selfmanaged snap we have to bump seq. Record this as
purged so that we avoid discontinuities in the history and so our storage
is a bit more efficient.
Sage Weil [Wed, 12 Jun 2019 21:47:29 +0000 (16:47 -0500)]
osdc/Objecter: don't worry about gap_removed_snaps from map gaps
This was an attempt to ensure that we didn't let removed_snaps slip by
when we had a discontiguous stream of OSDMaps. In octopus, this can still
happen, but it's mostly harmless--the OSDs will periodically scrub to
clean up any resulting stray clones. It's not worth the complexity.
Sage Weil [Mon, 10 Jun 2019 22:31:54 +0000 (17:31 -0500)]
osd: implement scrub_purged_snaps command
This a naive one-shot implementation that does the full scan synchronously
in the command thread. It shouldn't block any IO except to the extent
that it will compete for IO reading the underlying snapmapper omap object.
When we discover mapped objects that are covered by ranges of snaps that
should be purged, we requeue the snapid for trim on the relevant PG(s).
For these 'repeat' trims we skip the final step(s) to mark the snapid as
purged, since that presumably already happened some time ago.
Sage Weil [Mon, 10 Jun 2019 22:25:45 +0000 (17:25 -0500)]
ceph_test_rados_api_snapshots_pp: (partial) test to reproduce stray clones
The test creates a snap, removes it, waits for it to (hopefully) purge,
and then uses that snapid in a snapc to generate a clone.
This isn't a complete test because (1) it doesn't wait for the purge to
happen (e.g., by watching the osdmaps go by), and (2) it doesn't trigger
an osd scrub_purged_snaps afterwards.
Sage Weil [Mon, 10 Jun 2019 15:13:49 +0000 (10:13 -0500)]
osd: sync old purged_snaps on startup after upgrade or osd creation
This path only triggers after an upgrade or osd creation, when
purged_snaps_last < current_epoch. When that happens, we slurp down the
old purged snaps so that we have a full history recorded locally.
Sage Weil [Mon, 10 Jun 2019 15:12:04 +0000 (10:12 -0500)]
osd: record purged_snaps when we store new maps
When we get a new map, record the (new) purged_snaps.
Only do this if the OSD has purged_snaps that are in sync with the latest
OSDMap. That means that after an upgrade, if the OSD didn't sync the
old purged_snaps on startup, it won't sync anything until it *next* starts
up.
Sage Weil [Fri, 7 Jun 2019 21:08:27 +0000 (16:08 -0500)]
mon/OSDMonitor: record pre-octopus purged snaps with first octopus map
When we public our first require_osd_release >= octopus osdmap, record
all prior purged snaps in a key linked to the previous osdmap. We assume
this will encode and fit into a single key and transaction because the
even larger set of removed_snaps is already a member of pg_pool_t, which
is included in every osdmap.
- look at purged, not removed snap keys
- fix the key check to look at the *key name* prefix, not the overall
prefix (the one implemented by the KeyValueDB interface).
Sage Weil [Wed, 5 Jun 2019 21:53:25 +0000 (16:53 -0500)]
osd/SnapMapper: include poolid in snap index
We want to sort starting with (pool, snapid, ...) so that we align with
the structure of the purged_snaps. Simply flattening all snaps across
pools is less than ideal because the purge records are intervals (with the
snap in the key the last snap for the interval); flattening means we'd have
to look at many records (across pools) to conclude anything. Putting
these in the form we really want them simplifies things going forward.
Sage Weil [Fri, 14 Jun 2019 16:22:54 +0000 (11:22 -0500)]
ceph_test_rados: stop doing long object names
This was there to test filestore's long file name handling, which (1)
works, and (2) we don't care that much about anymore. Meanwhile, the
long names make the OSD log files *really* painful to read.
Sage Weil [Wed, 12 Jun 2019 15:37:51 +0000 (10:37 -0500)]
osd/PrimaryLogPG: use get_ssc_as_of for snapc for flushing clones
We will stop maintaining SnapSet::snaps shortly. Instead, generate this
snapc using the existing SnapSet::get_ssc_as_of() method, which will now
derive the snap list from the clone_snaps member.
Instead of checking the OSDMap pg_pool_t whether a snap exists, instead
1- Look at the clone_snaps more carefully. If the snap didn't exist when
the clone was last touched (created or partially-trimmed) then it still
doesn't exist now (snaps aren't resurrected).
2- Check in the OSDMap's removed snaps queue. This will catch anything
that is still being removed but hasn't been reflected by the clone_snaps
yet.
* refs/pull/27073/head:
qa/tasks: Check MDS failover during mon_thrash
qa/tasks: Compare two FSStatuses
qa/suites/fs: renamed default.yaml to mds.yaml
qa/suites/fs: mon_thrash test for fs
qa/tasks: Fix typo in the comment
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>