]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
5 years agoosd/OSD: auto mark heartbeat sessions as stale and tear them down 30225/head
xie xingguo [Wed, 26 Jun 2019 06:24:08 +0000 (14:24 +0800)]
osd/OSD: auto mark heartbeat sessions as stale and tear them down

The primary benefit is that the OSD doesn't need to keep a flood of
blocked heartbeat messages around in memory.
This prevents OSDs from accumulating heartbeat messages due to a
broken switch and then exhausting the whole node's memory:

Jun 11 04:19:26 host-192-168-9-12 kernel: [409881.137077] Out of memory:
Kill process 1471476 (ceph-osd) score 47 or sacrifice child
Jun 11 04:19:26 host-192-168-9-12 kernel: [409881.146054] Killed process
1471476 (ceph-osd) total-vm:4822548kB, anon-rss:3097860kB,
file-rss:2556kB, shmem-rss:0kB

Fixes: http://tracker.ceph.com/issues/40586
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 6cc90f363b8096d2d5fad30e57426d0cea9e3478)

Conflicts:
src/osd/OSD.cc (no boot_finisher.stop() and no lock_guard)
src/osd/OSD.h (trivial)

Fixed get_val() call in reset_heartbeat_peers()

5 years agoosd mon mgr: Changes for rebase and correction for this branch
David Zafman [Fri, 26 Jul 2019 05:23:21 +0000 (22:23 -0700)]
osd mon mgr: Changes for rebase and correction for this branch

Fix use of asok_command() which doesn't do try/catch
Need unregister_command() since unregister_commands() doesn't exist here
Use Mutex::locker since lock_guard() isn't available
Use new g_conf which isn't g_conf() anymore
cct->_conf is a pointer now
Use ceph_abort() because cct isn't available for ceph_abort_msg()

Signed-off-by: David Zafman <dzafman@redhat.com>
5 years agotest: Ignore OSD_SLOW_PING_TIME* if injecting socket failures
David Zafman [Thu, 3 Oct 2019 16:09:10 +0000 (09:09 -0700)]
test: Ignore OSD_SLOW_PING_TIME* if injecting socket failures

Fixes: https://tracker.ceph.com/issues/41743
Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit ded58ef91d6c8a68de49fa2c6b6e01636515c59b)

Conflicts: 3 yamls don't exist in Mimic

5 years agotest: Allow fractional milliseconds to make test possible
David Zafman [Fri, 6 Sep 2019 18:20:10 +0000 (11:20 -0700)]
test: Allow fractional milliseconds to make test possible

Fixes: https://tracker.ceph.com/issues/41689
Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 6d2e4cb109caff8dae5e5e18563b6305131b488b)

5 years agodoc: Document network performance monitoring
David Zafman [Wed, 4 Sep 2019 18:38:09 +0000 (18:38 +0000)]
doc: Document network performance monitoring

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 71015b94abdf669695754a598a05a4a1c5d46f83)

Conflicts:
doc/rados/operations/monitoring.rst (trivial)

5 years agoosd doc mon mgr: To milliseconds for config value, user input and threshold out
David Zafman [Wed, 4 Sep 2019 17:13:32 +0000 (17:13 +0000)]
osd doc mon mgr: To milliseconds for config value, user input and threshold out

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 5f83a6158b29944cf8f5a069c50edba3e172cdcc)

Conflicts:
src/common/options.cc (trivial)

5 years agoosd mon mgr: Convert all network ping time output to milliseconds
David Zafman [Tue, 6 Aug 2019 03:57:48 +0000 (20:57 -0700)]
osd mon mgr: Convert all network ping time output to milliseconds

To output milliseconds (usec / 1000), treat as fixed point integers

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 9d02e5d39d7b5e2806a5d98bdde24f4584e70528)

Conflicts:
src/mon/PGMap.cc (trivial)

5 years agocommon: Add support routines to generate strings for fixed point
David Zafman [Fri, 9 Aug 2019 01:06:43 +0000 (18:06 -0700)]
common: Add support routines to generate strings for fixed point

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 8ac1562b4988fc3d52f92f15eb58075de0bcf27e)

Conflicts:
src/common/Formatter.h (trivial)

5 years agotest: Add basic test for network ping tracking
David Zafman [Sat, 13 Jul 2019 02:35:04 +0000 (19:35 -0700)]
test: Add basic test for network ping tracking

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 4fb42ea27e7b6acefd081b7b287d38347a6085ce)

5 years agoosd: Add debug_heartbeat_testing_span to allow quicker testing
David Zafman [Wed, 24 Jul 2019 21:19:43 +0000 (14:19 -0700)]
osd: Add debug_heartbeat_testing_span to allow quicker testing

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 573aea2bb1d48237df5182a6e4421e15c1eea88c)

5 years agoosd: Add debug_disable_randomized_ping config for use in testing
David Zafman [Wed, 24 Jul 2019 01:10:46 +0000 (18:10 -0700)]
osd: Add debug_disable_randomized_ping config for use in testing

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit f2b26d88f0a0727f0362ccd8b287f8bb3f41dc3c)

Conflicts:
src/osd/OSD.cc (trivial)
src/common/options.cc (trivial)

5 years agoosd mgr: Add osd_mon_heartbeat_stat_stale option to time out ping info
David Zafman [Mon, 22 Jul 2019 18:52:41 +0000 (11:52 -0700)]
osd mgr: Add osd_mon_heartbeat_stat_stale option to time out ping info
after 1 hour

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 048f8096265dd3a647adb970255e4b11c9617b2e)

Conflicts:
src/osd/OSD.cc (trivial)

5 years agomon: Indicate when an osd with slow ping time is down
David Zafman [Fri, 19 Jul 2019 04:29:49 +0000 (21:29 -0700)]
mon: Indicate when an osd with slow ping time is down

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 5ab145d6402a2525d69296de95b36214bc4c7431)

5 years agoosd mon: Add last_update to osd_stat_t heartbeat info
David Zafman [Fri, 19 Jul 2019 04:28:16 +0000 (21:28 -0700)]
osd mon: Add last_update to osd_stat_t heartbeat info

Ignore old heartbeat info which hasn't updated

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit ea20d3522aaf644cef989c565e11dd781e420e18)

Conflicts:
src/osd/osd_types.h (osd_stat_t location in file changed)

5 years agoosd: After first interval populate vectors so 5min/15min values aren't 0
David Zafman [Tue, 16 Jul 2019 19:02:43 +0000 (12:02 -0700)]
osd: After first interval populate vectors so 5min/15min values aren't 0

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 6555699d289769a44e9840424192a1be1a6ba00d)

5 years agoosd mgr: Store last pingtime for possible graphing
David Zafman [Mon, 15 Jul 2019 20:23:53 +0000 (13:23 -0700)]
osd mgr: Store last pingtime for possible graphing

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 3f846d7c806b7f62ead08f0e9fb2ba927ffe0592)

Conflicts:
src/osd/osd_types.h (osd_stat_t location in file changed)

5 years agoosd mgr: Add minimum and maximum tracking to network ping time
David Zafman [Fri, 12 Jul 2019 01:06:23 +0000 (01:06 +0000)]
osd mgr: Add minimum and maximum tracking to network ping time

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 297a0e7b1de410c094fc9a6e42be14813d6dac5e)

Conflicts:
src/osd/osd_types.cc (trivial)
src/osd/osd_types.h (osd_stat_t location in file changed)

5 years agodoc: Add documentation and release notes
David Zafman [Thu, 11 Jul 2019 00:05:47 +0000 (00:05 +0000)]
doc: Add documentation and release notes

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit f4a0be2e8707f921d65bf22a6c1090e402905ad3)

Conflicts:
PendingReleaseNotes (trivial)

5 years agoosd mgr mon: Add mon_warn_on_slow_ping_ratio config as 5% of osd_heartbeat_grace
David Zafman [Thu, 11 Jul 2019 21:24:12 +0000 (21:24 +0000)]
osd mgr mon: Add mon_warn_on_slow_ping_ratio config as 5% of osd_heartbeat_grace

Compute network ping threshold based on ratio (5% of 20 seconds is 1 second)
Make the threshold value used part of dump_osd_network for osd and mgr
Keep mon_warn_on_slow_ping_time (default 0) to optionally override the ratio

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 0d1bbd34e96e2da2027861229b376805d5ea8aa6)

5 years agomgr: Add "dump_osd_network" mgr admin request to get a sorted report
David Zafman [Tue, 9 Jul 2019 17:22:12 +0000 (17:22 +0000)]
mgr: Add "dump_osd_network" mgr admin request to get a sorted report

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 5d3c1856415f8b66e31361a0a7b9c75edc46e49e)

Conflicts:
src/mgr/ClusterState.cc (trivial)
src/mgr/ClusterState.h (trivial

5 years agoosd: Add "dump_osd_network" osd admin request to get a sorted report
David Zafman [Wed, 10 Jul 2019 18:15:44 +0000 (18:15 +0000)]
osd: Add "dump_osd_network" osd admin request to get a sorted report

Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 025b10a5329127734367a6899543f51cd8580d43)

 Conflicts:
src/osd/OSD.cc (trivial)

5 years agoosd mon: Track heartbeat ping times and report health warning
David Zafman [Wed, 26 Jun 2019 02:59:06 +0000 (02:59 +0000)]
osd mon: Track heartbeat ping times and report health warning

Fixes: http://tracker.ceph.com/issues/40640
Signed-off-by: David Zafman <dzafman@redhat.com>
(cherry picked from commit 66d44e7f911a57100d650ad7df9445f88ec70140)

Conflicts:
src/common/options.cc (trivial)
src/mon/PGMap.cc (trivial)
src/osd/OSD.cc (trivial)
src/osd/OSD.h (trivial)
src/osd/osd_types.cc (encode version difference)
src/osd/osd_types.h (osd_stat_t location in file changed)

src/mon/PGMap.cc manually get rid of extra argument to checks->add
src/osd/OSD.cc rename ping_stamp to stamp for backport

5 years agoosd/OSD: fix HeartbeatInfo.is_healthy() check
xie xingguo [Mon, 8 Jan 2018 07:02:58 +0000 (15:02 +0800)]
osd/OSD: fix HeartbeatInfo.is_healthy() check

Delay to declared to be healthy until we have received the first
replies from both front and back connections.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit d9123158d1fef329fb9bf5ff787f9c84bb51b44c)

5 years agoosd/OSD: use first_tx to calculate failed_for
xie xingguo [Mon, 8 Jan 2018 02:24:09 +0000 (10:24 +0800)]
osd/OSD: use first_tx to calculate failed_for

If we never hear any replies from a heartbeat peer, use first_tx
to calculdate failed_for, which is more accurate.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit aba603736cbce94f7e1e5ac851ae4d4f43ea63e6)

5 years agoosd: refactor heartbeat health check
xie xingguo [Mon, 16 May 2016 05:50:28 +0000 (13:50 +0800)]
osd: refactor heartbeat health check

The original logic will reuse the timestamp which we send pings to
the specific heartbeat peer to update the last_rx_front[back] field
on receiving the corresponding replies, which later shall be honoured
as the exact time we succeed in getting the corresponding replies and
is used to calculate the heartbeat latency and determine whether the
relevant peer is dead.

However this is not accurate enough as there may be a delay between
we receive a reply and call heartbeat_check(). We can eliminate
the delay by introducing a map to track the ping-history here,
each entry of which consists of three elements:

1. "tx_time", worked as the map key, indicates the exact timestamp
   we send pings.
2. "deadline", indicates we shall receive all replies by then,
   otherwise we consider this peer as "dead".
3. "unacknowledged", indicates how many pings for the corresponding
   ping are still unacknowledged. The initial value is 2(as we send
   two pings from the front and back side for each peer).

We insert an item into the map on every time we sending out a ping, and
decrease the "unacknowledged" counter by 1 each time we get a reply from
the tracked ping. If "unacknowledged" drops to 0, we know all the replies
have been successfully collected and we can safely erase the relevant
item from the map as well as the earlier sent ones,  if there is any.

By comparing the current timestamp with the oldest deadline, we can now
make a much accurate decision about whether the corresponding peer is
healthy or not. And by setting last_rx_* to the timestamp we receiving
the reply, the lower bound when we can no longer hear a reply from the
corresponding connection is also much clear now.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 477774ceee42641f6d6884536462f92567bfea11)

Conflicts:
src/osd/OSD.cc (send_still_alive() has 1 less argument)

5 years agoMerge pull request #30808 from jan--f/wip-42233-mimic
Jan Fajerski [Fri, 18 Oct 2019 12:02:01 +0000 (14:02 +0200)]
Merge pull request #30808 from jan--f/wip-42233-mimic

mimic: ceph-volume: VolumeGroups.filter shouldn't purge itself

5 years agoMerge pull request #30936 from smithfarm/wip-42130-mimic
Nathan Cutler [Fri, 18 Oct 2019 11:59:33 +0000 (13:59 +0200)]
Merge pull request #30936 from smithfarm/wip-42130-mimic

mimic: doc/ceph-fuse: mention -k option in ceph-fuse man page

Reviewed-by: Rishabh Dave <ridave@redhat.com>
5 years agoMerge pull request #30806 from jan--f/wip-42235-mimic
Jan Fajerski [Fri, 18 Oct 2019 11:58:51 +0000 (13:58 +0200)]
Merge pull request #30806 from jan--f/wip-42235-mimic

mimic: ceph-volume: PVolumes.filter shouldn't purge itself

5 years agoMerge pull request #29224 from smithfarm/wip-39223-mimic
Yuri Weinstein [Thu, 17 Oct 2019 16:45:12 +0000 (09:45 -0700)]
Merge pull request #29224 from smithfarm/wip-39223-mimic

mimic: mds: behind on trimming and [dentry] was purgeable but no longer is!

Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 years agoMerge pull request #29232 from smithfarm/wip-40439-mimic
Yuri Weinstein [Thu, 17 Oct 2019 16:44:39 +0000 (09:44 -0700)]
Merge pull request #29232 from smithfarm/wip-40439-mimic

mimic: mds: cannot switch mds state from standby-replay to active

Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 years agoMerge pull request #29479 from xiaoxichen/wip-41001
Yuri Weinstein [Thu, 17 Oct 2019 16:44:11 +0000 (09:44 -0700)]
Merge pull request #29479 from xiaoxichen/wip-41001

mimic: cephfs: client: unlink dentry for inode with llref=0

Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 years agoMerge pull request #29218 from smithfarm/wip-38709-mimic
Yuri Weinstein [Thu, 17 Oct 2019 16:04:52 +0000 (09:04 -0700)]
Merge pull request #29218 from smithfarm/wip-38709-mimic

mimic: tests: kclient unmount hangs after file system goes down

Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 years agoMerge pull request #29220 from smithfarm/wip-39210-mimic
Yuri Weinstein [Thu, 17 Oct 2019 16:04:29 +0000 (09:04 -0700)]
Merge pull request #29220 from smithfarm/wip-39210-mimic

mimic: mds: mds_cap_revoke_eviction_timeout is not used to initialize Server::cap_revoke_eviction_timeout

Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 years agoMerge pull request #29222 from smithfarm/wip-39212-mimic
Yuri Weinstein [Thu, 17 Oct 2019 16:03:59 +0000 (09:03 -0700)]
Merge pull request #29222 from smithfarm/wip-39212-mimic

mimic: cephfs: MDSTableServer.cc: 83: FAILED assert(version == tid)

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
5 years agoMerge pull request #29223 from smithfarm/wip-39215-mimic
Yuri Weinstein [Thu, 17 Oct 2019 15:58:00 +0000 (08:58 -0700)]
Merge pull request #29223 from smithfarm/wip-39215-mimic

mimic: mds: there is an assertion when calling Beacon::shutdown()

Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 years agoMerge pull request #29228 from smithfarm/wip-40219-mimic
Yuri Weinstein [Thu, 17 Oct 2019 15:57:28 +0000 (08:57 -0700)]
Merge pull request #29228 from smithfarm/wip-40219-mimic

mimic: tests: cephfs: TestMisc.test_evict_client fails

Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 years agoMerge pull request #29230 from smithfarm/wip-40437-mimic
Yuri Weinstein [Thu, 17 Oct 2019 15:56:58 +0000 (08:56 -0700)]
Merge pull request #29230 from smithfarm/wip-40437-mimic

mimic: cephfs: getattr on snap inode stuck

Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 years agoMerge pull request #30796 from dillaman/wip-36122-mimic
Yuri Weinstein [Thu, 17 Oct 2019 15:50:42 +0000 (08:50 -0700)]
Merge pull request #30796 from dillaman/wip-36122-mimic

mimic: librbd: properly handle potential object map failures

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
5 years agoMerge pull request #30828 from dillaman/wip-41882-mimic
Yuri Weinstein [Thu, 17 Oct 2019 15:49:58 +0000 (08:49 -0700)]
Merge pull request #30828 from dillaman/wip-41882-mimic

mimic: rbd-mirror: cannot restore deferred deletion mirrored images

Reviewed-by: Mykola Golub <mgolub@mirantis.com>
5 years agoMerge pull request #30213 from smithfarm/wip-41449-mimic
Yuri Weinstein [Wed, 16 Oct 2019 23:26:47 +0000 (16:26 -0700)]
Merge pull request #30213 from smithfarm/wip-41449-mimic

mimic: core: mon: C_AckMarkedDown has not handled the Callback Arguments

Reviewed-by: David Zafman <dzafman@redhat.com>
5 years agoMerge pull request #30150 from neha-ojha/wip-40769-mimic
Yuri Weinstein [Wed, 16 Oct 2019 23:22:05 +0000 (16:22 -0700)]
Merge pull request #30150 from neha-ojha/wip-40769-mimic

mimic: bluestore: common/options: Set concurrent bluestore rocksdb compactions to 2

Reviewed-by: Igor Fedotov <ifedotov@suse.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
5 years agoMerge pull request #30069 from smithfarm/wip-40124-mimic
Yuri Weinstein [Tue, 15 Oct 2019 20:06:01 +0000 (13:06 -0700)]
Merge pull request #30069 from smithfarm/wip-40124-mimic

mimic: qa/rgw: don't use ceph-ansible in s3a-hadoop suite

Reviewed-by: Casey Bodley <cbodley@redhat.com>
5 years agoMerge pull request #30133 from smithfarm/wip-40850-mimic
Yuri Weinstein [Tue, 15 Oct 2019 20:05:35 +0000 (13:05 -0700)]
Merge pull request #30133 from smithfarm/wip-40850-mimic

mimic: rgw/multisite: Don't allow certain radosgw-admin commands to run on non-master zone

Reviewed-by: Casey Bodley <cbodley@redhat.com>
5 years agoMerge pull request #29203 from smithfarm/wip-40320-mimic
Yuri Weinstein [Tue, 15 Oct 2019 19:46:27 +0000 (12:46 -0700)]
Merge pull request #29203 from smithfarm/wip-40320-mimic

mimic: tests: make: *** [hello_world_cpp] Error 127 in rados

Reviewed-by: Kefu Chai <kchai@redhat.com>
5 years agoMerge pull request #30219 from vumrao/wip-vumrao-bluefs-shared-alloc-with-log-level...
Yuri Weinstein [Tue, 15 Oct 2019 19:45:57 +0000 (12:45 -0700)]
Merge pull request #30219 from vumrao/wip-vumrao-bluefs-shared-alloc-with-log-level-change-mimic

mimic: os/bluestore: apply shared_alloc_size to shared device with log level change

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
5 years agoMerge pull request #30260 from smithfarm/wip-41723-mimic
Yuri Weinstein [Tue, 15 Oct 2019 19:45:19 +0000 (12:45 -0700)]
Merge pull request #30260 from smithfarm/wip-41723-mimic

mimic: build/ops: fix build fail related to PYTHON_EXECUTABLE variable

Reviewed-by: Kefu Chai <kchai@redhat.com>
5 years agoMerge pull request #30355 from pdvian/wip-41765-mimic
Yuri Weinstein [Tue, 15 Oct 2019 19:44:51 +0000 (12:44 -0700)]
Merge pull request #30355 from pdvian/wip-41765-mimic

mimic: build/ops: ceph.spec.in: reserve 2500MB per build job

Reviewed-by: Nathan Cutler <ncutler@suse.com>
5 years agoMerge pull request #30672 from smithfarm/wip-37520-mimic
Yuri Weinstein [Tue, 15 Oct 2019 19:44:21 +0000 (12:44 -0700)]
Merge pull request #30672 from smithfarm/wip-37520-mimic

mimic: msg/async: do not trigger RESETSESSION from connect fault during connection phase

Reviewed-by: Sage Weil <sage@redhat.com>
5 years agoMerge pull request #30784 from smithfarm/wip-41918-mimic
Yuri Weinstein [Tue, 15 Oct 2019 19:43:52 +0000 (12:43 -0700)]
Merge pull request #30784 from smithfarm/wip-41918-mimic

mimic: core: osd: scrub error on big objects; make bluestore refuse to start on big objects

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: David Zafman <dzafman@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
5 years agodoc/ceph-fuse: mention -k option in ceph-fuse man page 30936/head
Rishabh Dave [Wed, 25 Sep 2019 06:12:50 +0000 (11:42 +0530)]
doc/ceph-fuse: mention -k option in ceph-fuse man page

Fixes: https://tracker.ceph.com/issues/42044
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit a96a32bd2ef15e963bdb8c46e45edcb83e2361bb)

5 years agoMerge pull request #29258 from smithfarm/wip-39741-mimic
Yuri Weinstein [Fri, 11 Oct 2019 20:23:59 +0000 (13:23 -0700)]
Merge pull request #29258 from smithfarm/wip-39741-mimic

mimic: rgw: swift object expiry fails when a bucket reshards

Reviewed-by: Casey Bodley <cbodley@redhat.com>
5 years agoMerge pull request #29483 from pdvian/wip-40761-mimic
Yuri Weinstein [Fri, 11 Oct 2019 20:23:12 +0000 (13:23 -0700)]
Merge pull request #29483 from pdvian/wip-40761-mimic

mimic: rgw: Save an unnecessary copy of RGWEnv

Reviewed-by: Casey Bodley <cbodley@redhat.com>
5 years agoMerge pull request #29500 from pdvian/wip-40847-mimic
Yuri Weinstein [Fri, 11 Oct 2019 20:22:43 +0000 (13:22 -0700)]
Merge pull request #29500 from pdvian/wip-40847-mimic

mimic: rgw: Don't crash on copy when metadata directive not supplied

Reviewed-by: Casey Bodley <cbodley@redhat.com>
5 years agoMerge pull request #30073 from smithfarm/wip-40517-mimic
Yuri Weinstein [Fri, 11 Oct 2019 20:22:21 +0000 (13:22 -0700)]
Merge pull request #30073 from smithfarm/wip-40517-mimic

mimic: rgw: perfcounters: add gc retire counter

Reviewed-by: Casey Bodley <cbodley@redhat.com>
5 years agoMerge pull request #30077 from smithfarm/wip-40599-mimic
Yuri Weinstein [Fri, 11 Oct 2019 20:21:50 +0000 (13:21 -0700)]
Merge pull request #30077 from smithfarm/wip-40599-mimic

mimic: rgw_file: fix readdir eof() calc--caller stop implies !eof and introduce fast S3 Unix stats (immutable)

Reviewed-by: Casey Bodley <cbodley@redhat.com>
5 years agoMerge pull request #29276 from smithfarm/wip-40215-mimic
Yuri Weinstein [Fri, 11 Oct 2019 20:19:36 +0000 (13:19 -0700)]
Merge pull request #29276 from smithfarm/wip-40215-mimic

mimic: rgw_file:  fix invalidation of top-level directories

Reviewed-by: Matt Benjamin <mbenjami@redhat.com>
5 years agoMerge pull request #29984 from pdvian/wip-41110-mimic
Yuri Weinstein [Fri, 11 Oct 2019 20:18:44 +0000 (13:18 -0700)]
Merge pull request #29984 from pdvian/wip-41110-mimic

mimic: rgw: fix drain handles error when deleting bucket with bypass-gc option

Reviewed-by: Casey Bodley <cbodley@redhat.com>
5 years agoMerge pull request #30074 from smithfarm/wip-40539-mimic
Yuri Weinstein [Fri, 11 Oct 2019 20:18:15 +0000 (13:18 -0700)]
Merge pull request #30074 from smithfarm/wip-40539-mimic

mimic: cls/rgw: keep issuing bilog trim ops after reset

Reviewed-by: Casey Bodley <cbodley@redhat.com>
5 years agoMerge pull request #30105 from smithfarm/wip-41120-mimic
Yuri Weinstein [Fri, 11 Oct 2019 20:15:56 +0000 (13:15 -0700)]
Merge pull request #30105 from smithfarm/wip-41120-mimic

mimic: rgw: permit rgw-admin to populate user info by access-key

Reviewed-by: Casey Bodley <cbodley@redhat.com>
5 years agoMerge pull request #30130 from smithfarm/wip-40629-mimic
Yuri Weinstein [Fri, 11 Oct 2019 20:15:29 +0000 (13:15 -0700)]
Merge pull request #30130 from smithfarm/wip-40629-mimic

mimic: rgw: data/bilogs are trimmed when no peers are reading them

Reviewed-by: Casey Bodley <cbodley@redhat.com>
5 years agoMerge pull request #30233 from smithfarm/wip-41129-mimic
Yuri Weinstein [Thu, 10 Oct 2019 20:23:19 +0000 (13:23 -0700)]
Merge pull request #30233 from smithfarm/wip-41129-mimic

mimic: qa: use hard_reset to reboot kclient

Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 years agoMerge pull request #30234 from smithfarm/wip-40444-mimic
Yuri Weinstein [Thu, 10 Oct 2019 20:22:54 +0000 (13:22 -0700)]
Merge pull request #30234 from smithfarm/wip-40444-mimic

mimic: mds: cleanup unneeded client_snap_caps when splitting snap inode

Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 years agoMerge pull request #30235 from smithfarm/wip-40844-mimic
Yuri Weinstein [Thu, 10 Oct 2019 20:22:26 +0000 (13:22 -0700)]
Merge pull request #30235 from smithfarm/wip-40844-mimic

mimic: mon/MDSMonitor: use stringstream instead of dout for mds repaired

Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 years agoMerge pull request #30236 from smithfarm/wip-40853-mimic
Yuri Weinstein [Thu, 10 Oct 2019 20:22:01 +0000 (13:22 -0700)]
Merge pull request #30236 from smithfarm/wip-40853-mimic

mimic: cephfs: test_volume_client: fix test_put_object_versioned()

Reviewed-by: Rishabh Dave <ridave@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 years agoMerge pull request #30238 from smithfarm/wip-40896-mimic
Yuri Weinstein [Thu, 10 Oct 2019 20:21:26 +0000 (13:21 -0700)]
Merge pull request #30238 from smithfarm/wip-40896-mimic

mimic: cephfs: ceph_volume_client: convert string to bytes object

Reviewed-by: Rishabh Dave <ridave@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 years agoMerge pull request #30239 from smithfarm/wip-40899-mimic
Yuri Weinstein [Thu, 10 Oct 2019 20:20:55 +0000 (13:20 -0700)]
Merge pull request #30239 from smithfarm/wip-40899-mimic

mimic: mds: evict an unresponsive client only when another client wants its caps

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Rishabh Dave <ridave@redhat.com>
5 years agoMerge pull request #30240 from smithfarm/wip-41466-mimic
Yuri Weinstein [Thu, 10 Oct 2019 20:19:16 +0000 (13:19 -0700)]
Merge pull request #30240 from smithfarm/wip-41466-mimic

mimic: cephfs: mount.ceph: properly handle -o strictatime

Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 years agoMerge pull request #30241 from smithfarm/wip-41487-mimic
Yuri Weinstein [Thu, 10 Oct 2019 20:18:29 +0000 (13:18 -0700)]
Merge pull request #30241 from smithfarm/wip-41487-mimic

mimic: cephfs: client: return -EIO when sync file which unsafe reqs have been dropped

Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 years agoMerge pull request #30417 from pdvian/wip-41852-mimic
Yuri Weinstein [Thu, 10 Oct 2019 20:18:06 +0000 (13:18 -0700)]
Merge pull request #30417 from pdvian/wip-41852-mimic

mimic: mds: make MDSIOContextBase delete itself when shutting down

Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 years agoMerge pull request #30443 from pdvian/wip-41856-mimic
Yuri Weinstein [Thu, 10 Oct 2019 20:17:41 +0000 (13:17 -0700)]
Merge pull request #30443 from pdvian/wip-41856-mimic

mimic: cephfs: client: nfs-ganesha with cephfs client, removing dir reports not empty

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
5 years agorbd-mirror: prevent restored trash images from being deleted after delay 30828/head
Jason Dillaman [Wed, 11 Sep 2019 20:30:16 +0000 (16:30 -0400)]
rbd-mirror: prevent restored trash images from being deleted after delay

The image deleter wasn't verifying whether or not an image was still in the trash
prior to deleting the image. This not only would incorrectly remove any restored
images but it will also leave the image id object and entry within the directory.

Fixes: https://tracker.ceph.com/issues/41780
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit f091a31d5252bba76598fffdc997275ca531621d)

Conflicts:
src/test/rbd_mirror/image_deleter/test_mock_TrashRemoveRequest.cc: removed state test cases
src/tools/rbd_mirror/image_deleter/TrashRemoveRequest.h/cc: removed state validation

5 years agorbd-mirror: renamed RemoveRequest state machine to TrashRemoveRequest
Jason Dillaman [Wed, 11 Sep 2019 19:28:28 +0000 (15:28 -0400)]
rbd-mirror: renamed RemoveRequest state machine to TrashRemoveRequest

This better matches the current behavior where the images are only
removed from the trash.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 55daa8e1e28f457a897070adfabf1583093aadd3)

Conflicts:
src/test/rbd_mirror/CMakeLists.txt: trivial resolution
src/tools/rbd_mirror/ImageDeleter.cc: trivial resolution
src/tools/rbd_mirror/image_deleter/TrashRemoveRequest.cc: trivial resolution

5 years agorbd-mirror: set image as primary when moving to trash
Jason Dillaman [Wed, 11 Sep 2019 18:50:24 +0000 (14:50 -0400)]
rbd-mirror: set image as primary when moving to trash

This will allow the the image to be restored and re-mirrored if
desired.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 73d4577d1b9a9bff087b555c9de5005d1120a0ea)

Conflicts:
src/test/rbd_mirror/image_deleter/test_mock_TrashMoveRequest.cc: trivial resolution

5 years agolibrbd: reuse async trash remove state machine
Jason Dillaman [Thu, 10 Oct 2019 00:04:32 +0000 (20:04 -0400)]
librbd: reuse async trash remove state machine

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
5 years agolibrbd: async trash remove state machine
Mykola Golub [Mon, 15 Apr 2019 10:32:15 +0000 (11:32 +0100)]
librbd: async trash remove state machine

Signed-off-by: Mykola Golub <mgolub@suse.com>
(cherry picked from commit 8455d6611c48ae8a721d307d157b64d8a7041abe)

Conflicts:
src/librbd/trash/RemoveRequest.cc/h: removed set state calls
src/test/librbd/trash/test_mock_RemoveRequest.cc: removed set state test cases

5 years agoMerge pull request #29609 from jtlayton/wip-40162-mimic
Yuri Weinstein [Wed, 9 Oct 2019 19:12:11 +0000 (12:12 -0700)]
Merge pull request #29609 from jtlayton/wip-40162-mimic

mimic: cephfs: client: fix bad error handling in _lookup_parent

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
5 years agoMerge pull request #29751 from pdvian/wip-41088-mimic
Yuri Weinstein [Wed, 9 Oct 2019 19:11:38 +0000 (12:11 -0700)]
Merge pull request #29751 from pdvian/wip-41088-mimic

mimic: qa: sleep briefly after resetting kclient

Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 years agoMerge pull request #29812 from pdvian/wip-41094-mimic
Yuri Weinstein [Wed, 9 Oct 2019 19:11:01 +0000 (12:11 -0700)]
Merge pull request #29812 from pdvian/wip-41094-mimic

mimic: qa: ignore expected MDS_CLIENT_LATE_RELEASE warning

Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 years agoMerge pull request #29833 from pdvian/wip-41097-mimic
Yuri Weinstein [Wed, 9 Oct 2019 19:10:40 +0000 (12:10 -0700)]
Merge pull request #29833 from pdvian/wip-41097-mimic

mimic: cephfs: avoid map been inserted by mistake

Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 years agoMerge pull request #29915 from pdvian/wip-41100-mimic
Yuri Weinstein [Wed, 9 Oct 2019 19:10:18 +0000 (12:10 -0700)]
Merge pull request #29915 from pdvian/wip-41100-mimic

mimic: cephfs: fix a memory leak

Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 years agoMerge pull request #29940 from pdvian/wip-41108-mimic
Yuri Weinstein [Wed, 9 Oct 2019 19:09:57 +0000 (12:09 -0700)]
Merge pull request #29940 from pdvian/wip-41108-mimic

mimic: mds: delay exporting directory whose pin value exceeds max rank id

Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 years agoMerge pull request #30108 from smithfarm/wip-40442-mimic
Yuri Weinstein [Wed, 9 Oct 2019 19:09:32 +0000 (12:09 -0700)]
Merge pull request #30108 from smithfarm/wip-40442-mimic

mimic: cephfs: client: set snapdir's link count to 1

Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 years agoMerge pull request #30228 from smithfarm/wip-40841-mimic
Yuri Weinstein [Wed, 9 Oct 2019 19:08:47 +0000 (12:08 -0700)]
Merge pull request #30228 from smithfarm/wip-40841-mimic

mimic: cephfs: client: support the fallocate() when fuse version >= 2.9

Reviewed-by: Venky Shankar <vshankar@redhat.com>
5 years agoceph-volume: update tests since VolumeGroups.filter returns a list 30808/head
Rishabh Dave [Thu, 3 Oct 2019 12:14:37 +0000 (17:44 +0530)]
ceph-volume: update tests since VolumeGroups.filter returns a list

VolumeGroups.filter returns VolumeGroups object that contains VGs
matching the filter. Update the tests to hold the list returned by the
Volumes.filter() call.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit da58d239a0e067cd3ea5fd26aac24a2551b871c1)

5 years agoceph-volume: VolumeGroups.filter shouldn't purge itself
Rishabh Dave [Thu, 3 Oct 2019 12:09:37 +0000 (17:39 +0530)]
ceph-volume: VolumeGroups.filter shouldn't purge itself

VolumeGroups.filter remove VGs from the list that do no match filter.
Instead of doing that, return a new list that contains VGs that match
the fiter so that VolumeGroups object held by code calling it is not
modified.

Fixes: https://tracker.ceph.com/issues/42171
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 028757813282f764ebcce05572f9e4b76ea4e552)

5 years agoceph-volume: allow creating empty VolumeGroup objects
Rishabh Dave [Thu, 3 Oct 2019 12:10:37 +0000 (17:40 +0530)]
ceph-volume: allow creating empty VolumeGroup objects

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 2dc4f7de96e51c8117b719640ec4c09388a1412e)

5 years agoceph-volume: update tests since PVolumes.filter returns a list 30806/head
Rishabh Dave [Thu, 3 Oct 2019 11:30:38 +0000 (17:00 +0530)]
ceph-volume: update tests since PVolumes.filter returns a list

...returns a list of filtered PVs instead of removing the items that do
not match the filters from itself.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 38ccfe089a86f86b6e48b9a60286f26bb2444596)

5 years agoceph-volume: PVolumes.filter shouldn't purge itself
Rishabh Dave [Thu, 3 Oct 2019 11:18:46 +0000 (16:48 +0530)]
ceph-volume: PVolumes.filter shouldn't purge itself

PVolumes.filter removes the PVs that do not match the filters from its
list. This approach is problematic since the code calling this method
has to create a copy beforehand. Therefore, it's better to return a new
object that contains PVs that matches the filters.

Fixes: https://tracker.ceph.com/issues/42170
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit fcec33ee50457d43add844aef3b81bbf9dd2ad58)

5 years agoceph-volume: allow creating empty PVolumes objects
Rishabh Dave [Thu, 3 Oct 2019 11:14:35 +0000 (16:44 +0530)]
ceph-volume: allow creating empty PVolumes objects

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 384aaee8fd2b0de7120e58efc3ebbd2a520d709f)

5 years agoMerge pull request #30678 from sobelek/wip-42048-mimic
Jan Fajerski [Wed, 9 Oct 2019 11:13:56 +0000 (13:13 +0200)]
Merge pull request #30678 from sobelek/wip-42048-mimic

mimic: ceph-volume: fix warnings raised by pytest

5 years agoqa/rgw: update default port in perl workunits 30069/head
Casey Bodley [Tue, 18 Jun 2019 16:44:19 +0000 (12:44 -0400)]
qa/rgw: update default port in perl workunits

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 1643879218638915073d5310b859a94d10ffeac6)

5 years agoqa/rgw: extra s3tests tasks use rgw endpoint configuration
Casey Bodley [Tue, 18 Jun 2019 13:07:33 +0000 (09:07 -0400)]
qa/rgw: extra s3tests tasks use rgw endpoint configuration

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 09e992ff01b4ce286540e1230a30df67103f5968)

5 years agoqa/rgw: add dnsmasq back to s3a-hadoop
Casey Bodley [Tue, 14 May 2019 12:30:59 +0000 (08:30 -0400)]
qa/rgw: add dnsmasq back to s3a-hadoop

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 8c8a706b6f4cb3b2a5b89aa1eb06b7e47909852d)

5 years agoqa/rgw: remove ceph-ansible from s3a-hadoop suite
Casey Bodley [Fri, 10 May 2019 18:40:17 +0000 (14:40 -0400)]
qa/rgw: remove ceph-ansible from s3a-hadoop suite

Fixes: http://tracker.ceph.com/issues/39706
Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 0fc2c8ecee2b6233292b9fd1325347fd0fdf9171)

Conflicts:
qa/tasks/s3a_hadoop.py
- trivial context difference

5 years agoqa/rgw: use default ports (80 or 443) unless overridden
Casey Bodley [Tue, 5 Mar 2019 15:50:23 +0000 (10:50 -0500)]
qa/rgw: use default ports (80 or 443) unless overridden

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 3ff5f0c2406e40d1024d8152e0ac9400302a757a)

5 years agoqa/rgw: rgw task can override --rgw-dns-name on the command line
Casey Bodley [Tue, 20 Feb 2018 17:28:24 +0000 (12:28 -0500)]
qa/rgw: rgw task can override --rgw-dns-name on the command line

the value for rgw_dns_name isn't known until a machine is assigned, so
it can't be set via 'override: conf:'. add a per-client config option
to the rgw task so it can add the endpoint's hostname and/or s3website
hostname on the radosgw command line

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 658e5932fb79e2d33b73363b6ce76ff299809e16)

5 years agoqa/rgw: allow rgw client config to override port
Casey Bodley [Tue, 20 Feb 2018 16:23:00 +0000 (11:23 -0500)]
qa/rgw: allow rgw client config to override port

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 921faebb723c11686bf790ca424c952a786f358a)

5 years agolibrbd: do not unblock IO prior to growing object map during resize 30796/head
Jason Dillaman [Wed, 29 May 2019 13:37:34 +0000 (09:37 -0400)]
librbd: do not unblock IO prior to growing object map during resize

This could result in a small race condition where IO is able to write
beyond the current extent of the object map, resulting in an assertion
failure.

Fixes: http://tracker.ceph.com/issues/39952
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit c8ce520870ef46ac00dfea8acfbff46f8b869913)

5 years agotest/librbd: fix 'Uninteresting mock function call' warning
Mykola Golub [Thu, 7 Feb 2019 15:35:20 +0000 (15:35 +0000)]
test/librbd: fix 'Uninteresting mock function call' warning

Signed-off-by: Mykola Golub <mgolub@suse.com>
(cherry picked from commit 5a4526cc9f7a2434f1bb196d6031abda0c8db221)

5 years agolibrbd: properly handle potential object map failures
Jason Dillaman [Tue, 18 Sep 2018 18:37:12 +0000 (14:37 -0400)]
librbd: properly handle potential object map failures

Remove the "ceph_assert" statements and instead bubble any potential
error code up to the caller. The object map state machines should
attempt to return a 0 upon failure unless it was unable to flag the
object map as invalid.

Fixes: http://tracker.ceph.com/issues/36074
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 765f8ce2536b315a046d0aceff234e9e3c66271f)

Conflicts:
src/librbd/DeepCopyRequest.cc: trivial resolution
src/librbd/deep_copy/ObjectCopyRequest.cc: trivial resolution
src/librbd/deep_copy/SnapshotCopyRequest.cc: trivial resolution
src/librbd/exclusive_lock/PostAcquireRequest.cc: trivial resolution
src/librbd/exclusive_lock/PreReleaseRequest.cc: trivial resolution
src/librbd/image/RefreshRequest.cc: trivial resolution
src/librbd/io/CopyupRequest.cc: trivial resolution
src/librbd/io/ObjectRequest.cc: trivial resolution
src/librbd/object_map/InvalidateRequest.cc: trivial resolution
src/librbd/object_map/RefreshRequest.cc: trivial resolution
src/librbd/object_map/SnapshotRemoveRequest.cc: trivial resolution
src/librbd/operation/ResizeRequest.cc: trivial resolution
src/librbd/operation/SnapshotCreateRequest.cc: trivial resolution
src/librbd/operation/SnapshotRollbackRequest.cc: trivial resolution