]> git.apps.os.sepia.ceph.com Git - ceph-ci.git/log
ceph-ci.git
7 years agoMerge pull request #23939 from votdev/bug_35685
Lenz Grimmer [Mon, 10 Sep 2018 12:29:24 +0000 (14:29 +0200)]
Merge pull request #23939 from votdev/bug_35685

mgr/dashboard: Fix bug in user form when changing password

Reviewed-by: Stephan Müller <smueller@suse.com>
7 years agoMerge pull request #23839 from trociny/wip-migration-commit-race
Jason Dillaman [Mon, 10 Sep 2018 11:27:24 +0000 (07:27 -0400)]
Merge pull request #23839 from trociny/wip-migration-commit-race

librbd: fix potential live migration after commit issues due to not refreshed image header

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
7 years agomgr/dashboard: Unable to edit user when making an accidental change to the password...
Volker Theile [Wed, 5 Sep 2018 12:03:26 +0000 (14:03 +0200)]
mgr/dashboard: Unable to edit user when making an accidental change to the password field

Fixes: https://tracker.ceph.com/issues/35685
Signed-off-by: Volker Theile <vtheile@suse.com>
7 years agoMerge pull request #23993 from badone/wip-fedora-build-Cython3-error
Kefu Chai [Mon, 10 Sep 2018 08:46:26 +0000 (16:46 +0800)]
Merge pull request #23993 from badone/wip-fedora-build-Cython3-error

rpm: Fix Fedora error "No matching package to install: 'Cython3'"

Reviewed-by: Kefu Chai <kchai@redhat.com>
7 years agoMerge pull request #23833 from falcon78921/wip-docs-34539
Kefu Chai [Mon, 10 Sep 2018 07:13:20 +0000 (15:13 +0800)]
Merge pull request #23833 from falcon78921/wip-docs-34539

doc/rados: fixed hit set type link

Reviewed-by: Kefu Chai <kchai@redhat.com>
7 years agoMerge pull request #24000 from libingyang-zte/master
Xie Xingguo [Mon, 10 Sep 2018 02:56:01 +0000 (10:56 +0800)]
Merge pull request #24000 from libingyang-zte/master

doc: Fix Spelling Error of Radosgw

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
7 years agodoc: fixed hit set type link
James McClune [Fri, 31 Aug 2018 03:28:24 +0000 (23:28 -0400)]
doc: fixed hit set type link

Fixed reference link for hit set type value. Restructured wording in description.
Fixes: https://tracker.ceph.com/issues/34539
Signed-off-by: James McClune <jmcclune@mcclunetechnologies.net>
7 years agodoc: Fix Spelling Error of Radosgw
李丙洋 10208981 [Mon, 10 Sep 2018 01:21:27 +0000 (09:21 +0800)]
doc: Fix Spelling Error of Radosgw

Signed-off-by: Li Bingyang <li.bingyang1@zte.com.cn>
7 years agoMerge pull request #23895 from xiexingguo/wip-more-async-fixes
Xie Xingguo [Sat, 8 Sep 2018 01:51:04 +0000 (09:51 +0800)]
Merge pull request #23895 from xiexingguo/wip-more-async-fixes

osd/PrimaryLogPG: update missing_loc more carefully

Reviewed-by: Neha Ojha <nojha@redhat.com>
7 years agoMerge pull request #23958 from xiexingguo/wip-heartbeat-stuck
Xie Xingguo [Sat, 8 Sep 2018 01:49:27 +0000 (09:49 +0800)]
Merge pull request #23958 from xiexingguo/wip-heartbeat-stuck

osd/OSD: ping monitor if we are stuck at __waiting_for_healthy__

Reviewed-by: Sage Weil <sage@redhat.com>
7 years agoMerge PR #23449 into master
Sage Weil [Sat, 8 Sep 2018 00:34:00 +0000 (19:34 -0500)]
Merge PR #23449 into master

* refs/pull/23449/head:
osd/OSDMap: cleanup: s/tmpmap/nextmap/
qa/standalone/osd/osd-backfill-stats: fixes
osd/OSDMap: clean out pg_temp mappings that exceed pool size
mon/OSDMonitor: clean temps and upmaps in encode_pending, efficiently
osd/OSDMapMapping: do not crash if acting > pool size

Reviewed-by: David Zafman <dzafman@redhat.com>
Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
Reviewed-by: Neha Ojha <nojha@redhat.com>
7 years agoMerge PR #23984 into master
Patrick Donnelly [Fri, 7 Sep 2018 22:35:55 +0000 (15:35 -0700)]
Merge PR #23984 into master

* refs/pull/23984/head:
mon: test if gid exists in pending for prepare_beacon

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Greg Farnum <gfarnum@redhat.com>
7 years agoosd/OSDMap: cleanup: s/tmpmap/nextmap/
Sage Weil [Fri, 31 Aug 2018 15:54:43 +0000 (10:54 -0500)]
osd/OSDMap: cleanup: s/tmpmap/nextmap/

Be consistent with OSDMap.h

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoqa/standalone/osd/osd-backfill-stats: fixes
Sage Weil [Fri, 31 Aug 2018 15:52:04 +0000 (10:52 -0500)]
qa/standalone/osd/osd-backfill-stats: fixes

Grep from the primary's log, not every osd's log.

For the backfill_remapped task in particular, after the pg_temp change it
just so happens that the primary changes across the pool size change and
thus two different primaries do (some) backfill.  Fix that test to pass
the correct primary.

Other tests are unaffected as they do not (happen to) trigger a primary
change and already satisfied the (removed) check that only one OSD does
backfill.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd/OSDMap: clean out pg_temp mappings that exceed pool size
Sage Weil [Mon, 6 Aug 2018 18:12:33 +0000 (13:12 -0500)]
osd/OSDMap: clean out pg_temp mappings that exceed pool size

If the pool size is reduced, we can end up with pg_temp mappings that are
too big.  This can trigger bad behavior elsewhere (e.g., OSDMapMapping,
which assumes that acting and up are always <= pool size).

Fixes: http://tracker.ceph.com/issues/26866
Signed-off-by: Sage Weil <sage@redhat.com>
7 years agomon/OSDMonitor: clean temps and upmaps in encode_pending, efficiently
Sage Weil [Mon, 6 Aug 2018 17:54:55 +0000 (12:54 -0500)]
mon/OSDMonitor: clean temps and upmaps in encode_pending, efficiently

- do not rebuild the next map when we already have it
- do this work in encode_pending, not create_pending, so we get bad
values before they are published.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd/OSDMapMapping: do not crash if acting > pool size
Sage Weil [Mon, 6 Aug 2018 17:57:27 +0000 (12:57 -0500)]
osd/OSDMapMapping: do not crash if acting > pool size

Existing oversized pg_temp mappings (or some other bug) might make acting
exceed the pool size.  Avoid overrunning out buffer if that happens.

Note that the mapping won't be completely accurate in that case!

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agomon: test if gid exists in pending for prepare_beacon
Patrick Donnelly [Fri, 7 Sep 2018 19:06:11 +0000 (12:06 -0700)]
mon: test if gid exists in pending for prepare_beacon

If it does not, send a null map. Bug introduced by
624efc64323f99b2e843f376879c1080276e036f which made preprocess_beacon only look
at the current fsmap (correctly). prepare_beacon relied on preprocess_beacon
doing that check on pending.

Running:

    while sleep 0.5; do bin/ceph mds fail 0; done

is sufficient to reproduce this bug. You will see:

    2018-09-07 15:33:30.350 7fffe36a8700  5 mon.a@0(leader).mds e69 preprocess_beacon mdsbeacon(24412/a up:reconnect seq 2 v69) v7 from mds.0 127.0.0.1:6813/2891525302 compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
    2018-09-07 15:33:30.350 7fffe36a8700 10 mon.a@0(leader).mds e69 preprocess_beacon: GID exists in map: 24412
    2018-09-07 15:33:30.350 7fffe36a8700  5 mon.a@0(leader).mds e69 _note_beacon mdsbeacon(24412/a up:reconnect seq 2 v69) v7 noting time
    2018-09-07 15:33:30.350 7fffe36a8700  7 mon.a@0(leader).mds e69 prepare_update mdsbeacon(24412/a up:reconnect seq 2 v69) v7
    2018-09-07 15:33:30.350 7fffe36a8700 12 mon.a@0(leader).mds e69 prepare_beacon mdsbeacon(24412/a up:reconnect seq 2 v69) v7 from mds.0 127.0.0.1:6813/2891525302
    2018-09-07 15:33:30.350 7fffe36a8700 15 mon.a@0(leader).mds e69 prepare_beacon got health from gid 24412 with 0 metrics.
    2018-09-07 15:33:30.350 7fffe36a8700  5 mon.a@0(leader).mds e69 mds_beacon mdsbeacon(24412/a up:reconnect seq 2 v69) v7 is not in fsmap (state up:reconnect)

in the mon leader log. The last line indicates the problem was safely handled.

Fixes: http://tracker.ceph.com/issues/35848
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
7 years agoMerge PR #20469 into master
Sage Weil [Fri, 7 Sep 2018 20:55:21 +0000 (15:55 -0500)]
Merge PR #20469 into master

* refs/pull/20469/head:
osd/PG: remove warn on delete+merge race
osd: base project_pg_history on is_new_interval
osd: make project_pg_history handle concurrent osdmap publish
osd: handle pg delete vs merge race
osd/PG: do not purge strays in premerge state
doc/rados/operations/placement-groups: a few minor corrections
doc/man/8/ceph: drop enumeration of pg states
doc/dev/placement-groups: drop old 'splitting' reference
osd: wait for laggy pgs without osd_lock in handle_osd_map
osd: drain peering wq in start_boot, not _committed_maps
osd: kick split children
osd: no osd_lock for finish_splits
osd/osd_types: remove is_split assert
ceph-objectstore-tool: prevent import of pg that has since merged
qa/suites: test pg merging
qa/tasks/thrashosds: support merging pgs too
mon/OSDMonitor: mon_inject_pg_merge_bounce_probability
doc/rados/operations/placement-groups: update to describe pg_num reductions too
doc/rados/operations: remove reference to lpgs
osd: implement pg merge
osd/PG: implement merge_from
osdc/Objecter: resend ops on pg merge
osd: collect and record pg_num changes by pool
osd: make load_pgs remove message more accurate
osd/osd_types: pg_t: add is_merge_target()
osd/osd_types: pg_t::is_merge -> is_merge_source
osd/osd_types: adding or substracting invalid stats -> invalid stats
osd/PG: clear_ready_to_merge on_shutdown (or final merge source prep)
osd: debug pending_creates_from_osd cleanup, don't use cbegin
ceph-objectstore-tool: debug intervals update
mgr/ClusterState: discard pg updates for pgs >= pg_num
mon/OSDMonitor: fix long line
mon/OSDMonitor: move pool created check into caller
mon/OSDMonitor: adjust pgp_num_target down along with pg_num_target as needed
mon/OSDMonitor: add mon_osd_max_initial_pgs to cap initial pool pgs
osd/OSDMap: set pg[p]_num_target in build_simple*() methods
mon/PGMap: adjust SMALLER_PGP_NUM warning to use *_target values
mon/OSDMonitor: set CREATING flag for force-create-pg
mon/OSDMonitor: start sending new-style pg_create2 messages
mon/OSDMonitor: set last_force_resend_prenautilus for pg_num_pending changes
osd: ignore pg creates when pool FLAG_CREATING is not set
mgr: do not adjust pg_num until FLAG_CREATING removed from pool
mon/OSDMonitor: add FLAG_CREATING on upgrade if pools still creating
mon/OSDMonitor: prevent FLAG_CREATING from getting set pre-nautilus
mon/OSDMonitor: disallow pg_num changes while CREATING flag is set
mon/OSDMonitor: set POOL_CREATING flag until initial pool pgs are created
osd/osd_types: add pg_pool_t FLAG_POOL_CREATING
osd/osd_types: introduce last_force_resend_prenautilus
osd/PGLog: merge_from helper
osd: no cache agent or snap trimming during premerge
osd: notify mon when pending PGs are ready to merge
mgr: add simple controller to adjust pg[p]_num_actual
mon/OSDMonitor: MOSDPGReadyToMerge to complete a pg_num change
mon/OSDMonitor: allow pg_num to adjusted up or down via pg[p]_num_target
osd/osd_types: make pg merge an interval boundary
osd/osd_types: add pg_t::is_merge() method
osd/osd_types: add pg_num_pending to pg_pool_t
osd: allow multiple threads to block on wait_min_pg_epoch
osd: restructure advance_pg() call mechanism
mon/PGMap: prune merged pgs
mon/PGMap: track pgs by state for each pool
osd/SnapMapper: allow split_bits to decrease (merge)
os/bluestore: fix osr_drain before merge
os/bluestore: allow reuse of osr from existing collection
os/filestore: (re)implement merge
os/filestore: add _merge_collections post-check
os: implement merge_collection
os/ObjectStore: add merge_collection operation to Transaction

7 years agoMerge pull request #23894 from xiexingguo/wip-complete-to-2
Yuri Weinstein [Fri, 7 Sep 2018 20:03:28 +0000 (13:03 -0700)]
Merge pull request #23894 from xiexingguo/wip-complete-to-2

osd/PrimaryLogPG: avoid dereferencing invalid complete_to

Reviewed-by: Sage Weil <sage@redhat.com>
7 years agoMerge pull request #23976 from idryomov/wip-cram-git-clone
Ilya Dryomov [Fri, 7 Sep 2018 17:57:42 +0000 (19:57 +0200)]
Merge pull request #23976 from idryomov/wip-cram-git-clone

qa/tasks/cram: tasks now must live in the repository

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
7 years agoMerge pull request #23828 from cbodley/wip-rgw-sync-trace-cleanup
Casey Bodley [Fri, 7 Sep 2018 17:46:37 +0000 (13:46 -0400)]
Merge pull request #23828 from cbodley/wip-rgw-sync-trace-cleanup

rgw: cleanups for sync tracing

Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
7 years agoMerge pull request #23571 from cbodley/wip-26938
Casey Bodley [Fri, 7 Sep 2018 17:45:36 +0000 (13:45 -0400)]
Merge pull request #23571 from cbodley/wip-26938

rgw: data sync respects error_retry_time for backoff on error_repo

Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
7 years agoosd/PG: remove warn on delete+merge race
Sage Weil [Fri, 31 Aug 2018 17:10:58 +0000 (12:10 -0500)]
osd/PG: remove warn on delete+merge race

This was there just to confirm that this path was exercised by the
rados suite (it is, several hits per rados run of 1/666).

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd: base project_pg_history on is_new_interval
Sage Weil [Thu, 16 Aug 2018 17:30:02 +0000 (12:30 -0500)]
osd: base project_pg_history on is_new_interval

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd: make project_pg_history handle concurrent osdmap publish
Sage Weil [Thu, 16 Aug 2018 17:22:57 +0000 (12:22 -0500)]
osd: make project_pg_history handle concurrent osdmap publish

The class's osdmap may be updated while we are in our loop.  Pass it in
explicitly instead.

Fixes: http://tracker.ceph.com/issues/26970
Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd: handle pg delete vs merge race
Sage Weil [Tue, 14 Aug 2018 17:15:52 +0000 (12:15 -0500)]
osd: handle pg delete vs merge race

Deletion involves an awkward dance between the pg lock and shard locks,
while the merge prep and tracking is "shard down".  If the delete has
finished its work we may find that a merge has since been prepped.

Unwinding the merge tracking is nontrivial, especially because it might
involved a second PG, possibly even a fabricated placeholder one. Instead,
if we delete and find that a merge is coming, undo our deletion and let
things play out in the future map epoch.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd/PG: do not purge strays in premerge state
Sage Weil [Fri, 10 Aug 2018 13:50:42 +0000 (08:50 -0500)]
osd/PG: do not purge strays in premerge state

The point of premerge is to ensure that the constituent parts of the
target PG are fully clean.  If there is an intervening PG migration and
one of the halves finishes migrating before the other, one half could
get removed and the final merge could result in an incomplete PG.  In the
worst case, the two halves (let's call them A and B) could have started
out together on say [0,1,2], A moves to [3,4,5] and gets deleted from
[0,1,2], and then the final merge happens such that *all* copies of the PG
are incomplete.

We could construct a clever check that does allow removal of strays when
the sibling PG is also ready to go, but it would be complicated.  Do the
simple thing.  In reality, this would be an extremely hard case to hit
because the premerge window is generally very short.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agodoc/rados/operations/placement-groups: a few minor corrections
Sage Weil [Wed, 8 Aug 2018 17:58:23 +0000 (12:58 -0500)]
doc/rados/operations/placement-groups: a few minor corrections

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agodoc/man/8/ceph: drop enumeration of pg states
Sage Weil [Wed, 8 Aug 2018 17:58:09 +0000 (12:58 -0500)]
doc/man/8/ceph: drop enumeration of pg states

This is more maintainable.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agodoc/dev/placement-groups: drop old 'splitting' reference
Sage Weil [Wed, 8 Aug 2018 17:57:43 +0000 (12:57 -0500)]
doc/dev/placement-groups: drop old 'splitting' reference

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd: wait for laggy pgs without osd_lock in handle_osd_map
Sage Weil [Fri, 3 Aug 2018 15:45:51 +0000 (10:45 -0500)]
osd: wait for laggy pgs without osd_lock in handle_osd_map

We can't hold osd_lock while blocking because other objectstore completions
need to take osd_lock (e.g., _committed_osd_maps), and those objectstore
completions need to complete in order to finish_splits.  Move the blocking
to the top before we establish any local state in this stack frame since
both the public and cluster dispatchers may race in handle_osd_map and
we are dropping and retaking osd_lock.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd: drain peering wq in start_boot, not _committed_maps
Sage Weil [Wed, 1 Aug 2018 21:33:22 +0000 (16:33 -0500)]
osd: drain peering wq in start_boot, not _committed_maps

We can't safely block in _committed_osd_maps because we are being run
by the store's finisher threads, and we may have to wait for a PG to split
and then merge via that same queue and deadlock.

Do not hold osd_lock while waiting as this can interfere with *other*
objectstore completions that take osd_lock.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd: kick split children
Sage Weil [Mon, 30 Jul 2018 14:40:35 +0000 (09:40 -0500)]
osd: kick split children

Ensure that we bring split children up to date to the latest map even in
the absence of new OSDMaps feeding in NullEvts.  This is important when
the handle_osd_map (or boot) thread is blocked waiting for pgs to catch
up, but we also need a newly-split child to catch up (perhaps so that it
can merge).

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd: no osd_lock for finish_splits
Sage Weil [Tue, 31 Jul 2018 21:54:26 +0000 (16:54 -0500)]
osd: no osd_lock for finish_splits

This used to protect the pg registration probably?  There is no need for
it now.

More importantly, having it here can cause a deadlock when we are holding
osd_lock and blocking on wait_min_pg_epoch(), because a PG may need to
finish splitting to advance and then merge with a peer.  (The wait won't
block on *this* PG since it isn't registered in the shard yet, but it
will block on the merge peer.)

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd/osd_types: remove is_split assert
Sage Weil [Wed, 18 Apr 2018 13:01:19 +0000 (08:01 -0500)]
osd/osd_types: remove is_split assert

The problem is:

osd is at epoch 80
import pg 1.a as of e57
1.a and 1.1a merged in epoch 60something
we set up a merge now,
but in should_restart_peering via advance_pg we hit the is_split assert
that the ps is < old_pg_num

We can meaningfully return false (this is not a split) for a pg that is
beyond pg_num.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoceph-objectstore-tool: prevent import of pg that has since merged
Sage Weil [Fri, 15 Jun 2018 15:53:51 +0000 (10:53 -0500)]
ceph-objectstore-tool: prevent import of pg that has since merged

We currently import a portion of the PG if it has split.  Merge is more
complicated, though, mainly because COT is operating in a mode where it
fast-forwards the PG to the latest OSDMap epoch, which means it has to
implement any transformations to the PG (split/merge) independently.
Avoid doing this for merge.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoqa/suites: test pg merging
Sage Weil [Sat, 7 Apr 2018 19:59:54 +0000 (14:59 -0500)]
qa/suites: test pg merging

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoqa/tasks/thrashosds: support merging pgs too
Sage Weil [Sat, 7 Apr 2018 19:59:57 +0000 (14:59 -0500)]
qa/tasks/thrashosds: support merging pgs too

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agomon/OSDMonitor: mon_inject_pg_merge_bounce_probability
Sage Weil [Thu, 31 May 2018 18:00:35 +0000 (13:00 -0500)]
mon/OSDMonitor: mon_inject_pg_merge_bounce_probability

Optionally bounce pg_num back up right after we decrease it.  This triggers
conditions in the OSD where the merge and split logic may conflict.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agodoc/rados/operations/placement-groups: update to describe pg_num reductions too
Sage Weil [Tue, 10 Apr 2018 15:38:44 +0000 (10:38 -0500)]
doc/rados/operations/placement-groups: update to describe pg_num reductions too

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agodoc/rados/operations: remove reference to lpgs
Sage Weil [Tue, 10 Apr 2018 15:34:19 +0000 (10:34 -0500)]
doc/rados/operations: remove reference to lpgs

These were removed years ago.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd: implement pg merge
Sage Weil [Fri, 6 Apr 2018 15:26:52 +0000 (10:26 -0500)]
osd: implement pg merge

- Vevamps the split tracking infrastructure, and adds new tracking for
upcoming merges in consume_map.  These are now unified into the same
identify_ method.  these consume the new pg_num change tracking
instructure we just added in the prior commit.
- PGs that are about to merge have a new wait infrastructure, since all
sources and the target have to reach the target epoch before the merge
can happen.
- If one of the sources for a merge does not exist, we create an empty
dummy PG to merge with.  The implies that the resulting merged PG will
be incomplete (and mostly useless), but it unifies the code paths.
- The actual merge (PG::merge_from) happens in advance_pg().

Fixes: http://tracker.ceph.com/issues/85
Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd/PG: implement merge_from
Sage Weil [Fri, 27 Jul 2018 13:58:24 +0000 (08:58 -0500)]
osd/PG: implement merge_from

This is the building block that smooshes multiple PGs back into one.  The
resulting combination PG will have no PG log.  That means the sources
need to be clean and quiesced or else the result will end up being
marked incomplete.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosdc/Objecter: resend ops on pg merge
Sage Weil [Fri, 27 Jul 2018 22:12:59 +0000 (17:12 -0500)]
osdc/Objecter: resend ops on pg merge

This matches the split behavior.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd: collect and record pg_num changes by pool
Sage Weil [Thu, 31 May 2018 19:37:48 +0000 (14:37 -0500)]
osd: collect and record pg_num changes by pool

This will simplify our identification of split and merge events.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd: make load_pgs remove message more accurate
Sage Weil [Thu, 14 Jun 2018 19:15:10 +0000 (14:15 -0500)]
osd: make load_pgs remove message more accurate

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd/osd_types: pg_t: add is_merge_target()
Sage Weil [Fri, 15 Jun 2018 12:09:04 +0000 (07:09 -0500)]
osd/osd_types: pg_t: add is_merge_target()

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd/osd_types: pg_t::is_merge -> is_merge_source
Sage Weil [Thu, 14 Jun 2018 22:45:55 +0000 (17:45 -0500)]
osd/osd_types: pg_t::is_merge -> is_merge_source

This only checks if a pg is a merge source, not whether it is a merge
target.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd/osd_types: adding or substracting invalid stats -> invalid stats
Sage Weil [Tue, 12 Jun 2018 22:05:25 +0000 (17:05 -0500)]
osd/osd_types: adding or substracting invalid stats -> invalid stats

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd/PG: clear_ready_to_merge on_shutdown (or final merge source prep)
Sage Weil [Tue, 12 Jun 2018 12:15:02 +0000 (07:15 -0500)]
osd/PG: clear_ready_to_merge on_shutdown (or final merge source prep)

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd: debug pending_creates_from_osd cleanup, don't use cbegin
Sage Weil [Mon, 11 Jun 2018 22:26:01 +0000 (17:26 -0500)]
osd: debug pending_creates_from_osd cleanup, don't use cbegin

Got a segv on the erase line :/

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoceph-objectstore-tool: debug intervals update
Sage Weil [Thu, 31 May 2018 13:38:08 +0000 (08:38 -0500)]
ceph-objectstore-tool: debug intervals update

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agomgr/ClusterState: discard pg updates for pgs >= pg_num
Sage Weil [Mon, 30 Jul 2018 03:23:06 +0000 (22:23 -0500)]
mgr/ClusterState: discard pg updates for pgs >= pg_num

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agomon/OSDMonitor: fix long line
Sage Weil [Wed, 18 Apr 2018 17:20:58 +0000 (12:20 -0500)]
mon/OSDMonitor: fix long line

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agomon/OSDMonitor: move pool created check into caller
Sage Weil [Sun, 15 Apr 2018 21:15:40 +0000 (16:15 -0500)]
mon/OSDMonitor: move pool created check into caller

This makes for less confusing debug output.  Speaking from experience.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agomon/OSDMonitor: adjust pgp_num_target down along with pg_num_target as needed
Sage Weil [Tue, 10 Apr 2018 16:16:42 +0000 (11:16 -0500)]
mon/OSDMonitor: adjust pgp_num_target down along with pg_num_target as needed

If the user asks to reduce pg_num, reduce pg_num_target too at the same
time.

Don't completely hide pgp_num yet (by increasing it when pg_num_target
increases).

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agomon/OSDMonitor: add mon_osd_max_initial_pgs to cap initial pool pgs
Sage Weil [Mon, 9 Apr 2018 12:20:05 +0000 (07:20 -0500)]
mon/OSDMonitor: add mon_osd_max_initial_pgs to cap initial pool pgs

Configure how many initial PGs we create a pool with.  If the user wants
more than this then we do subsequent splits.

Default to 1024, so that pool creation works in the usual way for most users,
but does some splitting for very large pools/clusters.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd/OSDMap: set pg[p]_num_target in build_simple*() methods
Sage Weil [Tue, 10 Apr 2018 14:32:58 +0000 (09:32 -0500)]
osd/OSDMap: set pg[p]_num_target in build_simple*() methods

These are only used by unit tests and osdmaptool as far as I can tell.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agomon/PGMap: adjust SMALLER_PGP_NUM warning to use *_target values
Sage Weil [Sun, 8 Apr 2018 18:30:41 +0000 (13:30 -0500)]
mon/PGMap: adjust SMALLER_PGP_NUM warning to use *_target values

If the cluster is failing to converge on the target values that is a
separate problem.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agomon/OSDMonitor: set CREATING flag for force-create-pg
Sage Weil [Sat, 7 Apr 2018 19:35:36 +0000 (14:35 -0500)]
mon/OSDMonitor: set CREATING flag for force-create-pg

In order to recreate a lost PG, we need to set the CREATING flag for the
pool.  This prevents pg_num from changing in future OSDMap epochs until
*after* the PG has successfully been instantiated.

Note that a pg_num change in *this* epoch is fine; the recreated PG will
instantiate in *this* epoch, which is /after/ the split a pg_num in this
epoch would describe.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agomon/OSDMonitor: start sending new-style pg_create2 messages
Sage Weil [Fri, 6 Apr 2018 15:26:10 +0000 (10:26 -0500)]
mon/OSDMonitor: start sending new-style pg_create2 messages

The new sharded wq implementation cannot handle a resent mon create
message and a split child already existing.  This a side effect of the
new pg create path instantiating the PG at the pool create epoch osdmap
and letting it roll forward through splits; the mon may be resending a
create for a pg that was already created elsewhere and split elsewhere,
such that one of those split children has peered back onto this same OSD.
When we roll forward our re-created empty parent it may split and find the
child already exists, crashing.

This is no longer a concern because the mgr-based controller for pg_num
will not split PGs until after the initial PGs are all created.  (We
know this because the pool has the CREATED flag set.)

The old-style path had it's own problem
http://tracker.ceph.com/issues/22165.  We would build the history and
instantiate the pg in the latest osdmap epoch, ignoring any split children
that should have been created between teh pool create epoch and the
current epoch.  Since we're now taking the new path, that is no longer
a problem.

Fixes: http://tracker.ceph.com/issues/22165
Signed-off-by: Sage Weil <sage@redhat.com>
7 years agomon/OSDMonitor: set last_force_resend_prenautilus for pg_num_pending changes
Sage Weil [Fri, 6 Apr 2018 16:26:26 +0000 (11:26 -0500)]
mon/OSDMonitor: set last_force_resend_prenautilus for pg_num_pending changes

This will force pre-nautilus clients to resend ops when we are adjusting
pg_num_pending.  This is a big hammer: for nautilus+ clients, we only have
an interval change for the affected PGs (the two PGs that are about to
merge), whereas this compat hack will do an op resend for the whole pool.
However, it is better than requiring all clients be upgraded to nautilus in
order to do PG merges.

Note that we already do the same thing for pre-luminous clients both for
splits, so we've already inflicted similar pain the past (and, to my
knowledge, have not seen any negative feedback or fallout from that).

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd: ignore pg creates when pool FLAG_CREATING is not set
Sage Weil [Sat, 7 Apr 2018 18:54:49 +0000 (13:54 -0500)]
osd: ignore pg creates when pool FLAG_CREATING is not set

We only process mon-initiated PG creates while the pool is is CREATING
mode.  This ensures that we will not have any racing split or merge
operations.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agomgr: do not adjust pg_num until FLAG_CREATING removed from pool
Sage Weil [Sat, 7 Apr 2018 02:53:35 +0000 (21:53 -0500)]
mgr: do not adjust pg_num until FLAG_CREATING removed from pool

This is more reliable than looking at PG states because the PG may have
gone active and sent a notification to the mon (pg created!) and mgr
(new state!) but the mon may not have persisted that information yet.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agomon/OSDMonitor: add FLAG_CREATING on upgrade if pools still creating
Sage Weil [Fri, 13 Apr 2018 15:26:48 +0000 (10:26 -0500)]
mon/OSDMonitor: add FLAG_CREATING on upgrade if pools still creating

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agomon/OSDMonitor: prevent FLAG_CREATING from getting set pre-nautilus
Sage Weil [Fri, 13 Apr 2018 15:26:38 +0000 (10:26 -0500)]
mon/OSDMonitor: prevent FLAG_CREATING from getting set pre-nautilus

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agomon/OSDMonitor: disallow pg_num changes while CREATING flag is set
Sage Weil [Sat, 7 Apr 2018 19:18:52 +0000 (14:18 -0500)]
mon/OSDMonitor: disallow pg_num changes while CREATING flag is set

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agomon/OSDMonitor: set POOL_CREATING flag until initial pool pgs are created
Sage Weil [Sat, 7 Apr 2018 02:39:14 +0000 (21:39 -0500)]
mon/OSDMonitor: set POOL_CREATING flag until initial pool pgs are created

Set the flag when the pool is created, and clear it when the initial set
of PGs have been created by the mon.  Move the update_creating_pgs()
block so that we can process the pgid removal from the creating list and
the pool flag removal in the same epoch; otherwise we might remove the
pgid but have no cluster activity to roll over another osdmap epoch to
allow the pool flag to be removed.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd/osd_types: add pg_pool_t FLAG_POOL_CREATING
Sage Weil [Sat, 7 Apr 2018 02:38:26 +0000 (21:38 -0500)]
osd/osd_types: add pg_pool_t FLAG_POOL_CREATING

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd/osd_types: introduce last_force_resend_prenautilus
Sage Weil [Fri, 6 Apr 2018 16:24:02 +0000 (11:24 -0500)]
osd/osd_types: introduce last_force_resend_prenautilus

Previously, we renamed the old last_force_resend to
last_force_resend_preluminous and created a new last_force_resend for
luminous+.  This allowed us to force preluminous clients to resend ops
(because they didn't understand the new pg split => new interval rule)
without affecting luminous clients.

Do the same rename again, adding a last_force_resend_prenautilus (luminous
or mimic).

Adjust the OSD code accordingly so it matches the behavior we'll see from
a luminous client.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd/PGLog: merge_from helper
Sage Weil [Fri, 13 Apr 2018 22:16:41 +0000 (17:16 -0500)]
osd/PGLog: merge_from helper

When merging two logs, we throw out all of the actual log entries.
However, we need to convert them to dup ops as appropriate, and merge
those together.  Reuse the trim code to do this.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd: no cache agent or snap trimming during premerge
Sage Weil [Fri, 20 Apr 2018 03:12:19 +0000 (22:12 -0500)]
osd: no cache agent or snap trimming during premerge

The PG is quiesced; not background activity.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd: notify mon when pending PGs are ready to merge
Sage Weil [Sat, 17 Feb 2018 17:38:57 +0000 (11:38 -0600)]
osd: notify mon when pending PGs are ready to merge

When a PG is in the pending merge state it is >= pg_num_pending and <
pg_num.  When this happens quiesce IO, peer, wait for activate to commit,
and then notify the mon that we are idle and safe to merge.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agomgr: add simple controller to adjust pg[p]_num_actual
Sage Weil [Fri, 6 Apr 2018 15:26:10 +0000 (10:26 -0500)]
mgr: add simple controller to adjust pg[p]_num_actual

This is a pretty trivial controller.  It adds some constraints that were
obviously not there before when the user could set these values to anything
they wanted, but does not implement all of the "nice" stepping that we'll
eventually want.  That can come later.

Splits:
- throttle pg_num increases, currently using the same config option
(mon_osd_max_creating_pgs) that we used to throttle pg creation
- do not increase pg_num until the initial pg creation has completed.

Merges:
- wait until the source and target pgs for merge are active and clean
before doing a merge.

Adjust pgp_num all at once for now.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agomon/OSDMonitor: MOSDPGReadyToMerge to complete a pg_num change
Sage Weil [Fri, 16 Feb 2018 03:26:48 +0000 (21:26 -0600)]
mon/OSDMonitor: MOSDPGReadyToMerge to complete a pg_num change

This message allows pg_num to be decremented (once the final PGs are
ready).

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agomon/OSDMonitor: allow pg_num to adjusted up or down via pg[p]_num_target
Sage Weil [Fri, 16 Feb 2018 03:25:32 +0000 (21:25 -0600)]
mon/OSDMonitor: allow pg_num to adjusted up or down via pg[p]_num_target

The CLI now sets the *_target values, imposing only the subset of constraints that
the user needs to be concerned with.

new "pg_num_actual" and "pgp_num_actual" properties/commands are added that allow
the underlying raw values to be adjusted.  For the merge case, this sets
pg_num_pending instead of pg_num so that the OSDs can go through the
merge prep process.

A controller (in a future commit) will make pg[p]_num converge to pg[p]_num_target.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd/osd_types: make pg merge an interval boundary
Sage Weil [Fri, 16 Feb 2018 03:24:17 +0000 (21:24 -0600)]
osd/osd_types: make pg merge an interval boundary

Both the merge itself *and* the pending merge are interval transitions.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd/osd_types: add pg_t::is_merge() method
Sage Weil [Fri, 16 Feb 2018 03:13:27 +0000 (21:13 -0600)]
osd/osd_types: add pg_t::is_merge() method

This checks if we are a merge *source*, and if so, who the parent (target)
will be.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd/osd_types: add pg_num_pending to pg_pool_t
Sage Weil [Fri, 16 Feb 2018 03:12:47 +0000 (21:12 -0600)]
osd/osd_types: add pg_num_pending to pg_pool_t

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd: allow multiple threads to block on wait_min_pg_epoch
Sage Weil [Mon, 9 Jul 2018 21:06:57 +0000 (16:06 -0500)]
osd: allow multiple threads to block on wait_min_pg_epoch

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd: restructure advance_pg() call mechanism
Sage Weil [Mon, 26 Feb 2018 22:23:51 +0000 (16:23 -0600)]
osd: restructure advance_pg() call mechanism

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agomon/PGMap: prune merged pgs
Sage Weil [Wed, 28 Feb 2018 17:52:41 +0000 (11:52 -0600)]
mon/PGMap: prune merged pgs

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agomon/PGMap: track pgs by state for each pool
Sage Weil [Fri, 6 Apr 2018 15:26:35 +0000 (10:26 -0500)]
mon/PGMap: track pgs by state for each pool

We had this globally, but it's useful to have the per-pool breakdowns.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoosd/SnapMapper: allow split_bits to decrease (merge)
Sage Weil [Tue, 3 Apr 2018 21:09:17 +0000 (16:09 -0500)]
osd/SnapMapper: allow split_bits to decrease (merge)

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoos/bluestore: fix osr_drain before merge
Sage Weil [Mon, 9 Jul 2018 22:22:58 +0000 (17:22 -0500)]
os/bluestore: fix osr_drain before merge

We need to make sure the deferred writes on the source collection finish
before the merge so that ops ordered via the final target sequencer will
occur after those writes.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoos/bluestore: allow reuse of osr from existing collection
Sage Weil [Sun, 8 Jul 2018 19:24:49 +0000 (14:24 -0500)]
os/bluestore: allow reuse of osr from existing collection

We try to attach an old osr at prepare_new_collection time, but that
happens before a transaction is submitted, and we might have a
transaction that removes and then recreates a collection.

Move the logic to _osr_attach and extend it to include reusing an osr
in use by a collection already in coll_map.  Also adjust the
_osr_register_zombie method to behave if the osr is already there, which
can happen with a remove, create, remove+create transaction sequence.

Fixes: https://tracker.ceph.com/issues/25180
Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoos/filestore: (re)implement merge
Sage Weil [Sat, 4 Aug 2018 18:51:05 +0000 (13:51 -0500)]
os/filestore: (re)implement merge

Merging is a bit different then splitting, because the two collections
may already be hashed at different levels.  Since lookup etc rely on the
idea that the object is always at the deepest level of hashing, if you
merge collections with different levels that share some common bit prefix
then some objects will end up higher up the hierarchy even though deeper
hashed directories exist.

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoos/filestore: add _merge_collections post-check
Sage Weil [Mon, 25 Jun 2018 18:08:21 +0000 (13:08 -0500)]
os/filestore: add _merge_collections post-check

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoos: implement merge_collection
Sage Weil [Fri, 16 Feb 2018 19:12:59 +0000 (13:12 -0600)]
os: implement merge_collection

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agoos/ObjectStore: add merge_collection operation to Transaction
Sage Weil [Fri, 16 Feb 2018 04:43:18 +0000 (22:43 -0600)]
os/ObjectStore: add merge_collection operation to Transaction

Signed-off-by: Sage Weil <sage@redhat.com>
7 years agorgw: data sync respects error_retry_time for backoff on error_repo
Casey Bodley [Tue, 14 Aug 2018 15:16:16 +0000 (11:16 -0400)]
rgw: data sync respects error_retry_time for backoff on error_repo

don't restart processing the error_repo until error_retry_time. when
data sync is otherwise idle, don't sleep past error_retry_time

Fixes: http://tracker.ceph.com/issues/26938
Signed-off-by: Casey Bodley <cbodley@redhat.com>
7 years agocommon: adding missing ceph::coarse_real_clock helpers
Casey Bodley [Tue, 14 Aug 2018 15:12:48 +0000 (11:12 -0400)]
common: adding missing ceph::coarse_real_clock helpers

Signed-off-by: Casey Bodley <cbodley@redhat.com>
7 years agorgw: data sync uses coarse clock for error_retry_time
Casey Bodley [Tue, 14 Aug 2018 15:11:22 +0000 (11:11 -0400)]
rgw: data sync uses coarse clock for error_retry_time

Signed-off-by: Casey Bodley <cbodley@redhat.com>
7 years agoMerge pull request #23634 from cbodley/wip-21154
Casey Bodley [Fri, 7 Sep 2018 15:05:20 +0000 (11:05 -0400)]
Merge pull request #23634 from cbodley/wip-21154

rgw: RGWRadosGetOmapKeysCR takes result by shared_ptr

Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
7 years agorgw: RGWRadosGetOmapKeysCR takes result by shared_ptr
Casey Bodley [Fri, 17 Aug 2018 17:15:49 +0000 (13:15 -0400)]
rgw: RGWRadosGetOmapKeysCR takes result by shared_ptr

Fixes: http://tracker.ceph.com/issues/21154
Signed-off-by: Casey Bodley <cbodley@redhat.com>
7 years agoMerge pull request #23920 from cbodley/wip-rgw-cr-rados-fixes
Casey Bodley [Fri, 7 Sep 2018 13:30:29 +0000 (09:30 -0400)]
Merge pull request #23920 from cbodley/wip-rgw-cr-rados-fixes

rgw multisite: async rados requests don't access coroutine memory

Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
7 years agoMerge pull request #23959 from rubenk/doc-remove-unknown-option-from-manpage
Kefu Chai [Fri, 7 Sep 2018 11:12:12 +0000 (19:12 +0800)]
Merge pull request #23959 from rubenk/doc-remove-unknown-option-from-manpage

doc: remove deprecated 'scrubq' from ceph(8)

Reviewed-by: Kefu Chai <kchai@redhat.com>
7 years agoMerge pull request #23931 from cyx1231st/wip-msgr-test
Kefu Chai [Fri, 7 Sep 2018 06:43:47 +0000 (14:43 +0800)]
Merge pull request #23931 from cyx1231st/wip-msgr-test

tests: fix to check server_conn in MessengerTest.NameAddrTest

Reviewed-by: Kefu Chai <kchai@redhat.com>
7 years agotests: fix to check server_conn in MessengerTest.NameAddrTest
Yingxin [Wed, 5 Sep 2018 15:14:09 +0000 (23:14 +0800)]
tests: fix to check server_conn in MessengerTest.NameAddrTest

Signed-off-by: Yingxin <yingxin.cheng@intel.com>