Kefu Chai [Fri, 19 Aug 2016 06:50:38 +0000 (14:50 +0800)]
os/filestore/FileJournal: bail out if transaction is too large
if a transaction is too large to fit in the FileJournal's ring buffer,
we will wait. but if its size is larger than the max_size, it's likely
due to a bug or an invalid setting. in that case, we'd better fail
earlier.
Kefu Chai [Fri, 19 Aug 2016 02:33:11 +0000 (10:33 +0800)]
debian: exclude python3* packages in dh_shlibdeps
since we are not using subvar of ${shlibs:Depends} in python3-* packages,
just exclude them in dh_shlibdeps.
this silences warnings like
```
warning: dpkg-gencontrol: package python3-cephfs: unused substitution
variable ${shlibs:Depends}
```
Kefu Chai [Fri, 19 Aug 2016 02:31:40 +0000 (10:31 +0800)]
debian: enable dh_python3 for python3 packages
so we can use subvars like ${python3:Depends} in debian/control.
this silences the warnings like:
```
warning: dpkg-gencontrol: Depends field of package python3-cephfs:
unknown substitution variable ${python3:Depends}
```
Ilsoo Byun [Thu, 18 Aug 2016 18:29:38 +0000 (14:29 -0400)]
os/bluestore: add multiple finishers to bluestore
- The single finisher of a bluestore can be a bottleneck
when using an SSD as a backend device. If too much load
is given to the single finisher, client-side IO latency
increases. So we add multiple finishers to the
bluestore, which shows better performance.
- 'bluestore_shard_finishers' option is added to
be able to configure wheather finsihers is multiple or
not.
- a finisher is selected according to the shard id of a
sequencer.
- the number of finishers is decided by
osd_op_num_shards.
xie xingguo [Wed, 17 Aug 2016 03:22:38 +0000 (11:22 +0800)]
mon/PGMonitor: skip scrub checking if we can
The PG-scrub checking may become expensive once the cluster
is big. Since the scrub-warn related options are defaulted to be off,
we should skip this checking when it is possible, which
is good for performance.
Kefu Chai [Wed, 17 Aug 2016 16:32:03 +0000 (00:32 +0800)]
cmake: recompile erasure src for different variants
* instead of reusing the object libraries, we should recompile jerasure
code for different plugin flavors like neon, sse3, sse4.
* do not version plugin so, as they are not supposed to be used by
user directly.
haodong [Tue, 2 Aug 2016 18:19:03 +0000 (02:19 +0800)]
kv: delete store after pg destructor is called in OSD shutdown.
Using memdb for bluestore kvbackend, we will hit segfault when we use
'kill' command to shut down osd process. After destructing pg, some
reference to bluestore will be release, but bluestore has been deleted
at this time.
Tim Serong [Wed, 17 Aug 2016 11:46:40 +0000 (21:46 +1000)]
cmake: Add -pie to CMAKE_EXE_LINKER_FLAGS
Without this, rpmlint (on openSUSE Tumbleweed) fails with:
ceph-radosgw.x86_64: E: non-position-independent-executable
(Badness: 10000) /usr/bin/radosgw
This executable must be position independent. Check that it
is built with -fPIE/-fpie in compiler flags and -pie in linker
flags.
Tim Serong [Wed, 17 Aug 2016 11:14:46 +0000 (21:14 +1000)]
cmake: Fix mismatched librgw VERSION / SOVERSION
Without this, rpmlint (on openSUSE Tumbleweed) fails with:
librgw2.x86_64: E: shlib-policy-name-error (Badness: 10000) librgw1
Your package contains a single shared library but is not named
after its SONAME.
It seems that the VERSION/SOVERSION mismatch results in the
creation of librgw.so.1 and librgw.so.2.0.0, whereas it should
be librgw.so.2 and librgw.so.2.0.0.
Kefu Chai [Wed, 17 Aug 2016 07:15:57 +0000 (15:15 +0800)]
os/filestore: silence compiling warnings
silence warnings like
```
src/os/filestore/FileStore.h:55:27: comparison between signed and
unsigned integer expressions [-Wsign-compare]
^
/srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-11.0.0-1566-ga98ddf7/src/os/filestore/BtrfsFileStoreBackend.cc:269:29:
note: in expansion of macro ‘BTRFS_SUPER_MAGIC’
if (currentfs.f_type == BTRFS_SUPER_MAGIC && basest.st_dev != st.st_dev)
{
^
```
Kefu Chai [Wed, 17 Aug 2016 06:31:16 +0000 (14:31 +0800)]
cls_rgw: fix the compiler warning
fixes the warning of
```
warning:
/srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-11.0.0-1566-ga98ddf7/src/objclass/objclass.h:35:72:
format ‘%lu’ expects argument of type ‘long unsigned int’, but argument
6 has type ‘std::basic_string::size_type {aka unsigned int}’ [-Wformat=]
cls_log(level, " %s:%d: " fmt, __FILE__, __LINE__, ##__VA_ARGS__)
^
/srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-11.0.0-1566-ga98ddf7/src/cls/rgw/cls_rgw.cc:477:7:
note: in expansion of macro ‘CLS_LOG’
CLS_LOG(20, "start_key=%s len=%lu", start_key.c_str(),
start_key.size());
^
```
Haomai Wang [Tue, 12 Jul 2016 17:26:04 +0000 (01:26 +0800)]
msg/async/AsyncConnection: fix _conn_prefix racing when stopped
When the connection is lossy and enter fault, it will dispatch reset event.
If cleanup handler is executed as well as ms_handle_reset call mark_down,
it may exists racing for "cs". cleanup handler will reset "cs" but
_conn_prefix in mark_down will access "cs".
Haomai Wang [Tue, 12 Jul 2016 02:16:33 +0000 (10:16 +0800)]
msg/async/Stack: disable smart thread spawn now
New async msgr runtime need to spawn threads when binding, but ceph-osd will
call daemon() after binding port. So we need to respawn threads if forked.
Then thread spawn delay will increase complexity for this change and it's
really a simple strategy which help less, we disable auto spawn now.
Haomai Wang [Sun, 10 Jul 2016 08:19:29 +0000 (16:19 +0800)]
msg/async/Event: remove event wakeup flag
Now only dispatch external event will wakeup event thread(previously
delete_time_event will call wakeup), we only need to use
"external_num_events" to indicate whether we have extra events.
Haomai Wang [Mon, 4 Jul 2016 06:41:13 +0000 (14:41 +0800)]
msg/async/Stack: add abstract Stack
Stack is a network IO framework which encapsulates all necessary basic network
interface, then it manages threads to work.
Different network backend like posix, dpdk even RDMA need to inherit Stack
class to implement necessary interfaces. So it will make ease for other
network backend to integrated into ceph. Otherwise, each backend need to
implement the whole Messenger logics like reconnect, policy handle, session
maintain...
A non-primary image's commit possition won't accurately reflect
the current demotion/promotion chain. Therefore, directly specify
the predecessor for promotion events.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>