git-server-git.apps.pok.os.sepia.ceph.com Git

]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/log

Kefu Chai [Tue, 28 Mar 2017 18:05:07 +0000 (02:05 +0800)]

rbd: use min<uint64_t>() explicitly

on arm32, size_t is actually int, which cannot be compared with uint64_t
using std::min().

Fixes: http://tracker.ceph.com/issues/18938
Signed-off-by: Kefu Chai <kchai@redhat.com>

commit | commitdiff | tree

Kefu Chai [Sat, 25 Mar 2017 04:13:17 +0000 (12:13 +0800)]

Merge pull request #14114 from dmick/wip-boost-j

debian/rules, ceph.spec.in: invoke cmake with -DBOOST_J

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>

commit | commitdiff | tree

Sage Weil [Fri, 24 Mar 2017 21:41:37 +0000 (16:41 -0500)]

Merge pull request #13889 from liewegas/wip-denc-nullptr

include/denc: remove nullptr runtime magic boundedness check

Reviewed-by: Kefu Chai <kchai@redhat.com>

commit | commitdiff | tree

Sage Weil [Fri, 24 Mar 2017 21:41:18 +0000 (16:41 -0500)]

Merge pull request #14096 from baiyanchun/remove_useless_parameter

common: remove useless parameter

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Pan Liu <liupan1111@gmail.com>

commit | commitdiff | tree

Sage Weil [Fri, 24 Mar 2017 20:28:27 +0000 (15:28 -0500)]

Merge pull request #14131 from liewegas/wip-crush-encode

crush: only encode class info if SERVER_LUMINOUS

Reviewed-by: Loic Dachary <ldachary@redhat.com>

commit | commitdiff | tree

Sage Weil [Fri, 24 Mar 2017 18:17:39 +0000 (13:17 -0500)]

Merge pull request #13960 from wangzhengyong/kstore

os/kstore: some error handling

Reviewed-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Fri, 24 Mar 2017 18:16:58 +0000 (13:16 -0500)]

Merge pull request #13973 from shinobu-x/wp-sk-primarylogpg-null-nullptr

osd/PrimaryLogPG: nullptr not NULL

Reviewed-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Fri, 24 Mar 2017 18:13:39 +0000 (13:13 -0500)]

Merge pull request #13995 from liuhongtong/wip-config

common/config: set rocksdb_cache_size to OPT_U64

Reviewed-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Fri, 24 Mar 2017 18:12:16 +0000 (13:12 -0500)]

Merge pull request #14013 from ShiqiCooperation/newshiqi

test/unittest_bluefs: check whether add_block_device success

Reviewed-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Fri, 24 Mar 2017 17:59:34 +0000 (13:59 -0400)]

crush: only encode class info if SERVER_LUMINOUS

This fixes OSDMap reencode crc mismatches on jewel to
luminous upgrades.

Fixes: http://tracker.ceph.com/issues/19361
Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Dan Mick [Fri, 24 Mar 2017 02:35:08 +0000 (19:35 -0700)]

ceph.spec.in: derive _smp_ncpus and use it for -DBOOST_J

Signed-off-by: Dan Mick <dan.mick@redhat.com>

commit | commitdiff | tree

Dan Mick [Fri, 24 Mar 2017 02:34:28 +0000 (19:34 -0700)]

ceph.spec.in: move lowmem_build setting of _smp_mflags

Signed-off-by: Dan Mick <dan.mick@redhat.com>

commit | commitdiff | tree

Dan Mick [Thu, 23 Mar 2017 23:36:53 +0000 (16:36 -0700)]

debian/rules: invoke cmake with -DBOOST_J

Allow boost build during toplevel cmake from Debian package build
to benefit from multiple processors. Should speed build a lot
on many-proc machines (say, arm64). Use argument passed to
debhelper.

Signed-off-by: Dan Mick <dan.mick@redhat.com>

commit | commitdiff | tree

Casey Bodley [Fri, 24 Mar 2017 15:15:05 +0000 (11:15 -0400)]

Merge pull request #14082 from idealguo/update-bucket-acl

rgw: enable to update acl of bucket created in slave zonegroup

Reviewed-by: Casey Bodley <cbodley@redhat.com>

commit | commitdiff | tree

Casey Bodley [Fri, 24 Mar 2017 15:11:50 +0000 (11:11 -0400)]

Merge pull request #14043 from zhangsw/fix-rgw-deletebucket

rgw: delete non-empty buckets in slave zonegroup works not well

Reviewed-by: Casey Bodley <cbodley@redhat.com>

commit | commitdiff | tree

Casey Bodley [Fri, 24 Mar 2017 15:10:28 +0000 (11:10 -0400)]

Merge pull request #13991 from Liuchang0812/wip-rgw-optimization

rgw: avoid listing user buckets for rgw_delete_user

Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>

commit | commitdiff | tree

Casey Bodley [Fri, 24 Mar 2017 15:08:18 +0000 (11:08 -0400)]

Merge pull request #13504 from rzarzynski/wip-rgw-chunkingfilter-cleanup

rgw: clean up the unneeded rgw::io::ChunkingFilter::has_content_length.

Reviewed-by: Casey Bodley <cbodley@redhat.com>

commit | commitdiff | tree

Kefu Chai [Fri, 24 Mar 2017 14:44:15 +0000 (22:44 +0800)]

Merge pull request #13847 from wjwithagen/wip-wjw-ceph-disk-tests-2

ceph-disk/tests/test_main.py: FreeBSD does not do multipath

Reviewed-by: Kefu Chai <kchai@redhat.com>

commit | commitdiff | tree

Kefu Chai [Fri, 24 Mar 2017 13:44:56 +0000 (21:44 +0800)]

Merge pull request #13974 from tchaikov/wip-vstart-start-mgr

vstart: do not start mgr if not start_all

Reviewed-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Kefu Chai [Fri, 24 Mar 2017 07:53:17 +0000 (15:53 +0800)]

Merge pull request #13197 from asheplyakov/master-18740

systemd/ceph-disk: make it possible to customize timeout

Reviewed-by: Loic Dachary <ldachary@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>

commit | commitdiff | tree

Kefu Chai [Fri, 24 Mar 2017 06:43:48 +0000 (14:43 +0800)]

Merge pull request #14103 from tchaikov/wip-https-github

script: ceph-release-notes: use https instead of http

Reviewed-by: Abhishek Lekshmanan <abhishek@suse.com>

commit | commitdiff | tree

Sage Weil [Fri, 24 Mar 2017 01:47:45 +0000 (20:47 -0500)]

Merge pull request #14085 from wjwithagen/wip-wjw-bluestore-fixture

test/objectstore/store_test_fixture.cc: Exclude bluestore code if required.

Reviewed-by: Kefu Chai <kchai@redhat.com>

commit | commitdiff | tree

Sage Weil [Fri, 24 Mar 2017 01:47:12 +0000 (20:47 -0500)]

Merge pull request #13931 from wangzhengyong/extent

os/bluestore: fix bug for calc extent_avg in reshard function

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
Reviewed-by: Igor Fedotov <ifedotov@mirantis.com>

commit | commitdiff | tree

Sage Weil [Fri, 24 Mar 2017 01:44:59 +0000 (20:44 -0500)]

Merge pull request #14073 from liewegas/wip-bluestore-nullptr

os/bluestore: avoid nullptr in bluestore_extent_ref_map_t::bound_encode

Reviewed-by: Kefu Chai <kchai@redhat.com>

commit | commitdiff | tree

Sage Weil [Fri, 24 Mar 2017 01:44:35 +0000 (20:44 -0500)]

Merge pull request #13577 from yonghengdexin735/wip-zzz-openalloc

os/bluestore: fix bug in _open_alloc()

Reviewed-by: Varada Kari <varada.kari@sandisk.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>

commit | commitdiff | tree

Loic Dachary [Thu, 23 Mar 2017 20:48:00 +0000 (21:48 +0100)]

Merge pull request #14110 from dachary/wip-crush-cleanup

crush: builder: clean the arguments of crush_reweight* methods

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Loic Dachary <ldachary@redhat.com>

commit | commitdiff | tree

Kefu Chai [Wed, 22 Mar 2017 05:04:06 +0000 (13:04 +0800)]

vstart.sh: do not init fsmap if "$new == 0"

we cannot create a new cephfs using a non-empty pool without '--force'
option now, so the "ceph fs new" command fails with "vstart.sh -k".

Signed-off-by: Kefu Chai <kchai@redhat.com>

commit | commitdiff | tree

Kefu Chai [Wed, 22 Mar 2017 15:33:30 +0000 (23:33 +0800)]

tests: remove mds,osd,mon args passed to vstart.sh

Signed-off-by: Kefu Chai <kchai@redhat.com>

commit | commitdiff | tree

Sahid Orentino Ferdjaoui [Mon, 13 Mar 2017 16:36:16 +0000 (12:36 -0400)]

crush: builder: clean the arguments of crush_reweight* methods

This commit is just a cleanup to make the arguments of the method
around crush_reweight all coherent.

Signed-off-by: Sahid Orentino Ferdjaoui <sahid.ferdjaoui@redhat.com>

commit | commitdiff | tree

Kefu Chai [Wed, 22 Mar 2017 03:48:40 +0000 (11:48 +0800)]

vstart.sh: remove start_*

so there are only two ways to override the number of daemons to start
- using the env var CEPH_NUM_{MON|OSD|MGR|MDS} or {MON|OSD|MGR|MDS}
- command line options: --{mon,osd,mds}_num

do prevent a daemon from running, set the corrresponding env var to 0.

Signed-off-by: Kefu Chai <kchai@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Thu, 23 Mar 2017 15:47:55 +0000 (08:47 -0700)]

Merge pull request #14050 from ovh/bp-dump-ops-by-duration

common/TrackedOp: allow dumping historic ops sorted by duration

Reviewed-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Thu, 23 Mar 2017 15:46:36 +0000 (08:46 -0700)]

Merge pull request #14060 from LiumxNL/wip-170321

osd: combine unstable stats with info.stats when publish stats to osd

Reviewed-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Thu, 23 Mar 2017 15:45:58 +0000 (08:45 -0700)]

Merge pull request #13293 from Liuchang0812/cleanup-coverity

test, osd: fix some coverity issues

Reviewed-by: Kefu Chai <kchai@redhat.com>

commit | commitdiff | tree

Casey Bodley [Thu, 23 Mar 2017 13:54:47 +0000 (09:54 -0400)]

Merge pull request #14014 from Liuchang0812/wip-fix-seg-fault

rgw: fix memory leak in RGWGetObjLayout

Reviewed-by: Jos Collin <jcollin@redhat.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>

commit | commitdiff | tree

Sage Weil [Thu, 23 Mar 2017 13:21:39 +0000 (08:21 -0500)]

os/bluestore: avoid nullptr in bluestore_extent_ref_map_t::bound_encode

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Haomai Wang [Thu, 23 Mar 2017 11:23:34 +0000 (19:23 +0800)]

Merge pull request #14094 from optimistyzy/322

bluestore, NVMeDevice: use task' own lock for (random) read

Reviewed-by: Haomai Wang <haomai@xsky.com>

commit | commitdiff | tree

Kefu Chai [Thu, 23 Mar 2017 11:13:41 +0000 (19:13 +0800)]

script: ceph-release-notes: use https instead of http

Signed-off-by: Kefu Chai <kchai@redhat.com>

commit | commitdiff | tree

Kefu Chai [Thu, 23 Mar 2017 08:09:34 +0000 (16:09 +0800)]

Merge pull request #14004 from liewegas/wip-osd-full-failsafe

osd: fall back to failsafe threshold if osdmap doesn't set [near]full

Reviewed-by: David Zafman <dzafman@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>

commit | commitdiff | tree

Kefu Chai [Thu, 23 Mar 2017 08:08:22 +0000 (16:08 +0800)]

Merge pull request #13903 from wjwithagen/wip-wjw-run-classes-sed

test: sed on FreeBSD requires "-i extension", so use gsed

Reviewed-by: Kefu Chai <kchai@redhat.com>

commit | commitdiff | tree

Kefu Chai [Thu, 23 Mar 2017 08:04:52 +0000 (16:04 +0800)]

Merge pull request #9940 from aclamk/common-recursive-mutex-fix

common: fix lockdep vs recursive mutexes

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>

commit | commitdiff | tree

baiyanchun [Thu, 23 Mar 2017 02:38:15 +0000 (10:38 +0800)]

common: remove useless parameter

Signed-off-by: baiyanchun <yanchun.bai@istuary.com>

commit | commitdiff | tree

Ziye Yang [Wed, 22 Mar 2017 03:41:00 +0000 (11:41 +0800)]

bluestore, NVMeDevice: use task' own lock for (random) read

The reason is that ioc may be reaped in _aio_thread function
with the following statements:
for (auto &&it : registered_devices)
it->reap_ioc();

So if we still use ioc's lock for (random) read, it will cause
core dump.

Signed-off-by: optimistyzy <optimistyzy@gmail.com>

commit | commitdiff | tree

Guo Zhandong [Wed, 22 Mar 2017 10:00:37 +0000 (18:00 +0800)]

rgw: enable to update acl of bucket created in slave zonegroup

Fixes: http://tracker.ceph.com/issues/16888
Signed-off-by: Guo Zhandong <guozhandong@cmss.chinamobile.com>

commit | commitdiff | tree

Loic Dachary [Wed, 22 Mar 2017 18:43:37 +0000 (19:43 +0100)]

Merge pull request #14080 from ceph/evelu-ceph-disk

ceph-disk: Reporting /sys directory in get_partition_dev()

Reviewed-by: Loic Dachary <ldachary@redhat.com>

commit | commitdiff | tree

Kefu Chai [Wed, 22 Mar 2017 15:57:13 +0000 (23:57 +0800)]

Merge pull request #13942 from xiexingguo/wip-cleanup-proc-repinfo

osd/PG: conditionally retry on receiving pg-notify when Primary is Incomplete

Reviewed-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Kefu Chai [Wed, 22 Mar 2017 15:56:27 +0000 (23:56 +0800)]

Merge pull request #14061 from tchaikov/wip-19312

tests: ceph_test_rados_api_watch_notify: test timeout using rados_wat…

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Casey Bodley [Wed, 22 Mar 2017 15:46:33 +0000 (11:46 -0400)]

Merge pull request #12449 from cbodley/wip-rgw-test-multi-vers-acl

test/rgw: add bucket acl and versioning tests to test_multi.py

Reviewed-by: Orit Wasserman <owasserm@redhat.com>

commit | commitdiff | tree

Kefu Chai [Wed, 22 Mar 2017 14:43:41 +0000 (22:43 +0800)]

Merge pull request #14059 from vumrao/wip-vumrao-19318

common/config_opts.h: Remove deprecated osd_compact_leveldb_on_mount option

Reviewed-by: Jos Collin <jcollin@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>

commit | commitdiff | tree

Mark Nelson [Wed, 22 Mar 2017 14:12:41 +0000 (09:12 -0500)]

Merge pull request #14076 from liewegas/wip-bluestore-min-alloc-size

os/bluestore: default 16KB min_alloc_size on ssd

commit | commitdiff | tree

Willem Jan Withagen [Wed, 22 Mar 2017 14:03:32 +0000 (15:03 +0100)]

test/objectstore/store_test_fixture.cc: Exclude bluestore code if required.

Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>

commit | commitdiff | tree

Haomai Wang [Wed, 22 Mar 2017 13:16:48 +0000 (21:16 +0800)]

Merge pull request #14068 from optimistyzy/321_new

Bluestore, NVMEDevice: add the spdk core mask check

Reviewed-by: Haomai Wang <haomai@xsky.com>

commit | commitdiff | tree

Piotr Dałek [Mon, 20 Mar 2017 12:51:25 +0000 (13:51 +0100)]

TrackedOp: allow dumping historic ops sorted by duration

Currently dump_historic_ops dumps ops sorted by their initiation time,
which may not have any relation to how long it took, and sorting output
of that command by op duration is neither fast nor convenient.
New asok command ("dump_historic_ops_by_duration") outputs the same
op list, but ordered by their duration time (longest first).

Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>

commit | commitdiff | tree

optimistyzy [Tue, 21 Mar 2017 11:00:15 +0000 (19:00 +0800)]

Bluestore, NVMEDevice: add the spdk core mask check

This patch adds the spdk core mask check and also
set the master core for starting DPDK.

Signed-off-by: optimistyzy <optimistyzy@gmail.com>

commit | commitdiff | tree

liuchang0812 [Wed, 22 Mar 2017 09:27:20 +0000 (17:27 +0800)]

rgw/rgw_op: fix memory leak in RGWGetObjLayout

Signed-off-by: liuchang0812 <liuchang0812@gmail.com>

commit | commitdiff | tree

Erwan Velu [Wed, 22 Mar 2017 09:11:44 +0000 (10:11 +0100)]

ceph-disk: Reporting /sys directory in get_partition_dev()

When get_partition_dev() fails, it reports the following message :
ceph_disk.main.Error: Error: partition 2 for /dev/sdb does not appear to exist
The code search for a directory inside the /sys/block/get_dev_name(os.path.realpath(dev)).

The issue here is the error message doesn't report that path when failing while it might be involved in.

This patch is about reporting where the code was looking at when trying to estimate if the partition was available.

Signed-off-by: Erwan Velu <erwan@redhat.com>

commit | commitdiff | tree

Kefu Chai [Wed, 22 Mar 2017 03:34:21 +0000 (11:34 +0800)]

vstart.sh: do nothing if $CEPH_NUM_* is 0

Signed-off-by: Kefu Chai <kchai@redhat.com>

commit | commitdiff | tree

Kefu Chai [Wed, 15 Mar 2017 07:28:09 +0000 (15:28 +0800)]

vstart.sh: extract start_{osd,mon,mgr,mds} into functions

Signed-off-by: Kefu Chai <kchai@redhat.com>

commit | commitdiff | tree

Sage Weil [Wed, 22 Mar 2017 02:27:23 +0000 (21:27 -0500)]

os/bluestore: default 16KB min_alloc_size on ssd

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Orit Wasserman [Tue, 21 Mar 2017 21:44:22 +0000 (23:44 +0200)]

Merge pull request #13963 from cbodley/wip-18725

rgw-admin: remove deprecated regionmap commands
Reviewed-by: Orit Wasserman <owasserm@redhat.com>

commit | commitdiff | tree

Sage Weil [Tue, 21 Mar 2017 20:05:56 +0000 (15:05 -0500)]

Merge pull request #13888 from liewegas/wip-bluestore-dw

os/bluestore: fix deferred writes; improve flush

Reviewed-by: Igor Fedotov <ifedotov@mirantis.com>

commit | commitdiff | tree

Casey Bodley [Tue, 21 Mar 2017 19:43:48 +0000 (15:43 -0400)]

Merge pull request #13902 from Wilhelmshaven/rm_redundant_code

rgw: remove redundant codes in rgw_cache.h

Reviewed-by: Casey Bodley <cbodley@redhat.com>

commit | commitdiff | tree

Sage Weil [Sat, 18 Mar 2017 17:51:08 +0000 (13:51 -0400)]

os/bluestore: handle zombie OpSequencers

It's possible for the Sequencer to go away while the OpSequencer still has
txcs in flight. We were handling the case where the osr was on the
deferred_queue, but it may be off the deferred_queue but waiting for the
commit to happen, and we still need to wait for that.

Fix this by introducing a 'zombie' state for the osr, in which we keep the
osr in the osr_set.

Clean up the OpSequencer methods and a few other method names.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Fri, 17 Mar 2017 21:52:56 +0000 (17:52 -0400)]

os/bluestore: clean up flush_all()

Add assertions if we fail to flush everything.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Fri, 17 Mar 2017 14:13:22 +0000 (10:13 -0400)]

os/bluestore: move cached items around on collection split

We've been avoiding doing this for a while and it has finally caught up
with us: the SharedBlob may outlive the split due to deferred IO, and
a read on the child collection may load a competing Blob and SharedBlob
and read from the on-disk blocks that haven't been written yet.

Fix by preserving the one-SharedBlob-instance invariant by moving cache
items to the new Collection and cache shard like we should have from the
beginning.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Fri, 17 Mar 2017 17:54:20 +0000 (13:54 -0400)]

os/bluestore: simplify flush() wake-up condition

Clearer, and fewer wakeups.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Fri, 17 Mar 2017 14:12:02 +0000 (10:12 -0400)]

ceph_test_objectstore: set bluestore cache shards to 5

Better test coverage!

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Thu, 16 Mar 2017 20:33:53 +0000 (16:33 -0400)]

unittest_bluestore_types: fix Collection using tests

We can't use a bare Collection since we get/put refs, the last put will
delete it, and the dtor asserts nref == 0 (no faking a ref and deliberately
leaking!).

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Thu, 16 Mar 2017 16:24:51 +0000 (12:24 -0400)]

os/bluestore/KernelDevice: drop unused flush_lock

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Thu, 16 Mar 2017 16:19:30 +0000 (12:19 -0400)]

os/bluestore: better debugging around collections

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Thu, 16 Mar 2017 15:30:59 +0000 (11:30 -0400)]

os/bluestore: nicer Onode dout prefix

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Thu, 16 Mar 2017 15:30:37 +0000 (11:30 -0400)]

os/bluestore: flush_cache on umount, fsck finish, etc.

Otherwise cache items survive beyond umount into the next mount cycle!

Also, ensure that we flush_cache *before* clearing coll_map, as some cache
items have references back to the Collection.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Wed, 15 Mar 2017 19:01:52 +0000 (15:01 -0400)]

os/bluestore: take Collection ref from SharedBlob

These can survive as long as the txc, which can be longer than the
Collection. Make sure we have a valid ref as both finish_write and
~SharedBlob use coll for the SharedBlobSet (and coll->store->cct for
debug).

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Tue, 14 Mar 2017 20:47:48 +0000 (16:47 -0400)]

os/bluestore: fix perfcounters for deferred io

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Tue, 14 Mar 2017 20:47:40 +0000 (16:47 -0400)]

os/bluestore: remove dead _do_deferred_op code

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Tue, 14 Mar 2017 18:17:20 +0000 (14:17 -0400)]

os/bluestore: make throttles tunable online

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Mon, 13 Mar 2017 11:43:57 +0000 (07:43 -0400)]

os/bluestore: prevent throttle deadlock due to deferred writes

Kick off deferred IOs if we pass the throttle midpoint or if we would
block during submission.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Fri, 10 Mar 2017 15:27:52 +0000 (10:27 -0500)]

ceph_test_objectstore: fix Synthetic to never modify bufferlists

We were modifying bufferlists in place, and kludging around it by making
full copies elsewhere. Instead, never modify a buffer.

This fixes issues where the buffer we submit to ObjectStore ends up in
the cache and we modify in place later, corrupting the implementation's
copy. (This was affecting BlueStore.)

Rearrange the data methods to be next to each other and clean them up a
bit too.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Fri, 10 Mar 2017 15:20:22 +0000 (10:20 -0500)]

os/bluestore: drop obsolete comment

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Thu, 9 Mar 2017 22:28:58 +0000 (17:28 -0500)]

os/bluestore: avoid extra dev flush on single device when all io is deferred

If we have no non-deferred IO to flush, and we are running bluefs on a
single shared device, then we can rely on the bluefs flush to make our
current batch of deferred ios stable.

Separate deferred into a "done" and "stable" list. If we do sync, put
everything from "done" onto "stable". Otherwise, after we do our kv
commit via bluefs, move "done" to "stable" then.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Tue, 14 Mar 2017 14:33:23 +0000 (10:33 -0400)]

os/bluestore: debug alloc release

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Tue, 14 Mar 2017 14:33:17 +0000 (10:33 -0400)]

os/bluestore: flush old/discarded OpSequencers too

When the Sequencer goes away it get deregistered. If there are still
deferred IOs in flight, we need to wait for those too.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Thu, 9 Mar 2017 19:17:47 +0000 (14:17 -0500)]

os/bluestore: batch up to bluestore_deferred_batch_ops before submitting

Allow several deferred writes to accumulate before we submit them.  In
general we have no time pressure, and on HDD (and perhaps sometimes SSD)
it is beneficial to accumulate and batch these so that they result in
fewer seeks.  On HDD, this is particularly true of seeks away from the
journal.  And on sequential workloads this can avoid seeks.  In may even
allow the block layer or SSD firmware to merge IOs and perform fewer
writes.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Mon, 13 Mar 2017 11:32:12 +0000 (07:32 -0400)]

os/bluestore: only discard deallocated regions of a blob if !shared

If a blob is shared, we can't discard deallocated regions: there may
be deferred buffers in flight and we might get a read via the clone.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Thu, 9 Mar 2017 16:53:06 +0000 (11:53 -0500)]

os/bluestore: avoid waking up kv thread on deferred write completion

In a simple HDD workload with queue depth of 1, we halve our throughput
because the kv thread does a full commit twice per IO: once for the
initial commit, and then again to clean up the deferred write record. The
second wakeup is unnecessary; we can clean it up on the next commit.

We do need to do this wakeup in a few cases, though, when draining the
OpSequencers: (1) on replay during startup, and (2) on shutdown in
_osr_drain_all().

Send everything through _osr_drain_all() for simplicity.

This doubles HDD qd=1 IOPS from ~50 to ~100 on my 7200 rpm test device
(rados bench 30 write -b 4096 -t 1).

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Thu, 9 Mar 2017 15:34:50 +0000 (10:34 -0500)]

os/bluestore: move many initializations into header

This is less fragile, especially with 2 constructors.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Thu, 9 Mar 2017 02:53:22 +0000 (21:53 -0500)]

os/bluestore: restructure deferred write queue

First, eliminate the work queue--it's useless.  We are dispatching aio and
should not block.  And if a single thread isn't sufficient to do it, it
probably means we should be parallelizing kv_sync_thread too (which is our
only caller that matters).

Repurpose the old osr-list -> txc-list-per-osr queue structure to manage
the queuing.  For any given osr, dispatch one batch of aios at a time,
taking care to collapse any overwrites so that the latest write wins.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Fri, 10 Mar 2017 03:56:28 +0000 (22:56 -0500)]

os/bluestore: fix OpSequencer/Sequencer lifecycle

Make osr_set refcounts so that it can tolerate a Sequencer destruction
racing with flush or a Sequencer that outlives the BlueStore instance
itself.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Wed, 8 Mar 2017 20:01:35 +0000 (15:01 -0500)]

os/bluestore: move _osr_reap_done

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Wed, 8 Mar 2017 20:01:28 +0000 (15:01 -0500)]

os/bluestore: reimplement/rename _sync -> _flush_all

The old implementation is racy and doesn't actually work. Instead, rely
on a list of all OpSequencers and drain them all.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Tue, 14 Mar 2017 02:49:41 +0000 (22:49 -0400)]

os/bluestore: keep all OpSequencers registered

Maintain the set of all live OpSequencers.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Sat, 11 Mar 2017 19:30:53 +0000 (14:30 -0500)]

os/bluestore: keep onode refs for lifetime of obc

This ensures that we don't trim an onode from the cache while it has a
txc that is still in flight. Which in turn ensures that if we try to read
the object, we will have any writing buffers available.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Sat, 11 Mar 2017 19:21:47 +0000 (14:21 -0500)]

os/bluestore: make OnodeSpace onode_map private

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Thu, 9 Mar 2017 23:05:48 +0000 (18:05 -0500)]

os/bluestore: make Sequencer::flush() more efficient

BlueStore collection methods only need preceding transactions to be
applied to the kv db; they do not need to be committed.

Note that this is *only* needed for collection listings; all other read
operations are immediately safe after queue_transactions().

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Tue, 14 Mar 2017 02:49:37 +0000 (22:49 -0400)]

os/bluestore: add OpSequencer::drain()

Currently this is the same as flush, but more precisely it is an internal
method that means all txc's must complete. Update _wal_apply() to use it
instead of flush(), which is part of the public Sequencer interface.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Wed, 8 Mar 2017 20:45:31 +0000 (15:45 -0500)]

os/bluestore: revert throttle perfcounters

This reverts 3e40595f3cd8626cdceffa4a3a4efb088127f726

The individual throttles have their own set of perfcounters; no need to
duplicate them here.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Wed, 8 Mar 2017 19:57:52 +0000 (14:57 -0500)]

os/bluestore: release deferred throttle on io finish, before cleanup

The throttle is really about limiting deferred IO; we do not need to
actually remove the deferred record from the kv db before queueing more.
(In fact, the txc that queues more will do the cleanup.)

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Wed, 8 Mar 2017 19:51:39 +0000 (14:51 -0500)]

os/bluestore: separate _txc_finish_kv into _txc_{applied,committed}_kv

We can unblock flush()ing threads as soon as we have applied to the kv db,
while the callbacks must wait until we have committed.

Move methods around a bit to better match the execution order.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Wed, 8 Mar 2017 19:48:12 +0000 (14:48 -0500)]

os/bluestore: make flush() only wait for kv commit

The only remaining flush() users only need to see previous txc's applied
to the kv db (e.g., _omap_clear needs to see the records to delete them).

Signed-off-by: Sage Weil <sage@redhat.com>
# Conflicts:
# src/os/bluestore/BlueStore.h

commit | commitdiff | tree

Sage Weil [Wed, 8 Mar 2017 19:45:27 +0000 (14:45 -0500)]

os/bluestore: no need to Onode::flush() on truncate

We do not release extents until after any deferred IO, so this flush() is
unnecessary.

Signed-off-by: Sage Weil <sage@redhat.com>
# Conflicts:
# src/os/bluestore/BlueStore.cc

commit | commitdiff | tree

Sage Weil [Mon, 6 Mar 2017 18:51:30 +0000 (13:51 -0500)]

os/bluestore: no need to Onode::flush() in _do_read

We now ensure that deferred writes are in cache until the txc retires,
so there is no need to wait here.

Signed-off-by: Sage Weil <sage@redhat.com>

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom