Haomai Wang [Wed, 29 Jun 2016 09:14:16 +0000 (17:14 +0800)]
msg/async: make sure worker started before let msgr ready
When we create event thread, it need a little time to enter event loop(like
calling set_owner), if caller is going to call create_file_event before event
thread enter event loop, it will trigger assert.
Haomai Wang [Wed, 29 Jun 2016 08:54:16 +0000 (16:54 +0800)]
msg/async: make EventCenter notify file event creating when set_owner
EventCenter::init is called by other thread instead of event thread, so we
need to move create_file_event to set_owner which is called by event thread.
Haomai Wang [Wed, 29 Jun 2016 08:26:29 +0000 (16:26 +0800)]
msg/async/AsyncConnection: swap eventcenter when replacing
Previously we only exchange fd when replacing, now we will introduce dpdk
plugin in the near future. It needs all fd used locally which not like
kernel socket shared by all cores.
So we need to add EventCenter swapping to let each socket is associated to
EventCenter.
Haomai Wang [Tue, 8 Mar 2016 05:59:50 +0000 (13:59 +0800)]
AsyncMessenger: make create/delete_file_event within event thread
We are make each AsyncConnection/AsyncMessenger only modify its file event
in event thread. So make sure create/delete_file_event aren't directly called.
Haomai Wang [Tue, 8 Mar 2016 07:51:02 +0000 (15:51 +0800)]
Event: no need to delete_file_Event when deconstruct
Since we are going to close all epoll and cleanup resources, no need to delete
notify fd resource. And another reason is "delete_file_event" doesn't expect
other threads to call
Haomai Wang [Thu, 31 Dec 2015 15:46:17 +0000 (23:46 +0800)]
net_handler: adjust set_socket_options to avoid read from conf
We don't want net_handler rely on config value, caller may expect to pass
different value to set_socket_options according to different socket type
like heartbeat, client socket or server socket.
Loic Dachary [Thu, 26 May 2016 07:38:47 +0000 (09:38 +0200)]
ceph-disk: partprobe should block udev induced BLKRRPART
Wrap partprobe with flock to stop udev from issuing BLKRRPART because
this is racy and frequently fails with a message like:
Error: Error informing the kernel about modifications to partition
/dev/vdc1 -- Device or resource busy. This means Linux won't know about
any changes you made to /dev/vdc1 until you reboot -- so you shouldn't
mount it or use it in any way before rebooting.
Opening a device (/dev/vdc for instance) in write mode indirectly
triggers a BLKRRPART ioctl from udev (starting version 214 and up)
when the device is closed (see below for the udev release note).
However, if udev fails to acquire an exclusive lock (with
flock(fd, LOCK_EX|LOCK_NB); ) the BLKRRPART ioctl is not issued.
Acquiring an exclusive lock before running the process that opens the
device in write mode is therefore an effective way to control this
behavior.
git clone git://anonscm.debian.org/pkg-systemd/systemd.git
systemd/NEWS:
CHANGES WITH 214:
* As an experimental feature, udev now tries to lock the
disk device node (flock(LOCK_SH|LOCK_NB)) while it
executes events for the disk or any of its partitions.
Applications like partitioning programs can lock the
disk device node (flock(LOCK_EX)) and claim temporary
device ownership that way; udev will entirely skip all event
handling for this disk and its partitions. If the disk
was opened for writing, the close will trigger a partition
table rescan in udev's "watch" facility, and if needed
synthesize "change" events for the disk and all its partitions.
This is now unconditionally enabled, and if it turns out to
cause major problems, we might turn it on only for specific
devices, or might need to disable it entirely. Device Mapper
devices are excluded from this logic.
Kefu Chai [Fri, 17 Jun 2016 05:58:55 +0000 (13:58 +0800)]
msg/simple: set close on exec on server sockets
mds execv() when handling the "respawn" command, to avoid fd leakage,
and enormous CLOSE_WAIT connections after respawning, we need to set
FD_CLOEXEC flag for the socket fds.
Kefu Chai [Thu, 16 Jun 2016 17:17:05 +0000 (01:17 +0800)]
msg/async: set close on exec on server sockets
mds execv() when handling the "respawn" command, to avoid fd leakage,
and enormous CLOSE_WAIT connections after respawning, we need to set
FD_CLOEXEC flag for the socket fds.
os/bluestore: add compression required ratio to enable/disable compression
Require the net gain of compression at least to be at a specified ratio,
otherwise we don't compress.
Ask for compressing at least 12.5% off, by default.
This is for the sake of performance because if the compression turns out
to be meaningless(saving little space), we can simply shut it down, as we
know the compression/decompression can be rather CPU-consuming.
crush: reset bucket->h.items[i] when removing tree item
* crush: so we don't see the reference after the removing, this keeps
check_item_loc() happy, and move_bucket() use check_item_loc() to see if
the removed bucket disappears after the removal.
* test: also add unittest_crush_wrapper::CrushWrapper.insert_item
cmake: remove unnecessary linked libs from libcephfs
* some of the libs shares the same .cc which has static C++ variables. if
we link against the different libs sharing the same static C++
variables, and the dtor of the C++ variables has side-effects, among
other things, deallocates a memory chunk. then, we are in the trouble of
double free. so "osd" lib is removed.
* some of the libs are referenced by the linked lib, so no need to link
against them again. for example, BLKID_LIBRARIES are linked by
libcommon, so we can remove it from the linked libs list.
* and lib "os" and "cls_references_objs" are not used by libcephfs at all,
so remove them.
rgw: forward input data when forwarding set_bucket_version to master
Fixes: http://tracker.ceph.com/issues/16494
Needed to keep input data around to be forwarded correctly. Also, master
does not send any data back, so don't try to parse anything.
Sage Weil [Fri, 24 Jun 2016 13:43:52 +0000 (09:43 -0400)]
os/bluestore/BlueFS: make _sync_and_flush_log smarter
If we know what event we need to wait for, only wait long enough for it
to flush. This helps the situation where another thread flushed what we
needed, and more dirty stuff was added to log_t, but we don't need to
wait for that too for our caller to be happy.
Sage Weil [Fri, 24 Jun 2016 13:23:21 +0000 (09:23 -0400)]
os/bluestore: drop lock while we flush the log
Handle cases where we have multiple racing threads trying to flush the
log by only allowing one concurrent log flush to be in progress at a time,
and behave if, after flushing, there are no more dirty records to flush.
Sage Weil [Thu, 23 Jun 2016 13:39:31 +0000 (09:39 -0400)]
os/bluestore/BlueFS: drop lock while waiting for user io to complete
_flush_wait is safe to call without a lock, as long as our reference is
stable. Rename it wait_for_aio() to be more clear about what it does and
the fact that it doesn't require a lock.
Sage Weil [Thu, 23 Jun 2016 13:33:09 +0000 (09:33 -0400)]
os/bluestore/BlueFS: track dirty by log_seq, log_seq_stable
Note when we dirty a file, and clean it only if that seq has been
committed. Currently this is always the case because we don't drop the
lock, but that will change shortly.