Sage Weil [Mon, 3 Aug 2015 18:41:28 +0000 (14:41 -0400)]
logrotate: fix log rotation with systemd
systemctl does not have a nice way to enumerate (active) units so we can
reload them. On centos7, the is-active wildcard syntax does not appear to
be supported. On fedora 22, it prints the state only but not which unit
the state belongs to.
David Disseldorp [Mon, 11 May 2015 23:45:34 +0000 (01:45 +0200)]
systemd: activate disks via systemd service instead of udev
The udev(7) man page states:
RUN
...
This can only be used for very short-running foreground tasks. Running
an event process for a long period of time may block all further
events for this or a dependent device.
Starting daemons or other long-running processes is not appropriate
for udev; the forked processes, detached or not, will be
unconditionally killed after the event handling has finished.
ceph-disk activate is far from a short-running task:
- check whether path is a block dev, for dirs call through to
activate_dir()
- call blkid to obtain the filesystem type for the block dev
- pull mount options from hard-coded ceph.conf file
- mount the OSD dev at a temporary path
- check the ceph magic for mounted filesystem
- read cluster uuid and locate corresponding /etc/ceph/{cluster}.conf
path
- read or generate (if missing) the OSD uuid
- create a file indicating init system usage (systemd)
- mount the device at a second (final) location
- umount (lazy) the temporary mount path
- enable the systemd ceph-osd@{osd_id} service
- start the systemd ceph-osd@{osd_id} service
This logic is therefore best left in a systemd service for execution. As
it is less limited in terms of execution time, and also allows for
improved event handling in future (fsck, dmcrypt mapping etc.).
This change sees 95-ceph-osd.rules.systemd trigger ceph-disk activate or
ceph-disk activate-journal via new ceph-disk-activate-journal@.service,
ceph-disk-activate@.service and ceph-disk-dmcrypt-activate@.service
systemd service files.
ceph-disk-dmcrypt-activate@.service makes use of the newly added
--dmcrypt parameter for ceph-disk activate.
Owen Synge [Wed, 18 Mar 2015 09:17:05 +0000 (10:17 +0100)]
radosgw systemd support
Added a radosgw systemd support and associated prestart script.
- With improved checking over first revison.
- ceph-radosgw-prestart.sh now installed in /usr/lib/ceph-radosgw
Owen Synge [Wed, 3 Jun 2015 10:55:01 +0000 (12:55 +0200)]
Added tmpfiles.d for rgw: templated user and group.
tmpfiles.d are part of system.d and define how temporary directories are setup.
rgw needs a socket directory. To do this we template tmpfiles.d user and group
for rgw and fill in the values using autotools.
Note1: Added to spec file.
Note2: Name changed to rgw from radosgw as is preferred name by Sage.
Note3: Adds configure options
--with-rgw-user=UserName
--with-rgw-group=GroupName
Note4: Defaults set for debian
Note5: spec file overrides defaults for redhat and suse
rgw: skip prefetch first chunk if range get falls to shadow objects
Currently the head object will be prefetched in each GET:
a) This is unnecessary if the Range GET falls to shadow objects.
b) The GET request would be quite slow if we have a big head object
This patch adds some check on the Range. If it's not in the head
object(>=rgw_max_chunk_size) then skip the prefetch.
test_librbd_fsx: invalidate before discard in krbd mode
Commit bd050240ceef ("test_librbd_fsx: flush before discard in krbd
mode") added an fsync() before BLKDISCARD. Don't know what I was
thinking at the time, but I missed the invalidate part, for which we
need to use the BLKFLSBUF ioctl.
Samuel Just [Fri, 24 Jul 2015 22:38:18 +0000 (15:38 -0700)]
Log::reopen_log_file: take m_flush_mutex
Otherwise, _flush() might continue to write to m_fd after it's closed.
This might cause log data to go to a data object if the filestore then
reuses the fd during that time.
the first op id was 16 by default, which is okay, but a non-zero
magic number could lead to questions. max_op was mixed up with
max_ops, and changed to 16 in 51e402e3 by mistake.
Jason Dillaman [Tue, 28 Jul 2015 19:27:24 +0000 (15:27 -0400)]
rbd: remove dependency on non-ABI controlled CephContext
The rbd CLI tool no longer attempts to initialize a CephContext
and pass said context to librados since it's possible that the
structure will not be ABI compatible between rbd and librados.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Tue, 28 Jul 2015 17:14:29 +0000 (13:14 -0400)]
crypto: use NSS_InitContext/NSS_ShutdownContex to avoid memory leak
Switched to context-aware NSS init/shutdown functions to avoid conflicts
with parent application. Use a reference counter to properly shutdown the
NSS crypto library when the last CephContext is destroyed. This avoids
memory leaks with the NSS library from users of librados.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Boris Ranto [Mon, 27 Jul 2015 11:14:32 +0000 (13:14 +0200)]
Anotate all the .s files
Recent update to erasure code library in 59aa6700 caused a regression
where .s files are no longer properly anotated and yasm sets the exec
stack for them. This patch brings back the anotations as was done before
by Dan Mick, see Bug #10114.
rbd: rename --object-extents option to --whole-object
--object-extents is a bit confusing - extent is generally something of
a varying length and here the meaning is "diff whole objects". Rename
it to --whole-object (the name of diff_iterate() parameter).
Change du to take <image-spec> | <snap-spec> as an argument instead of
going through --image option. The new synopsis is
(du | disk-usage) [<image-spec> | <snap-spec>]
This is to make it look more like the rest of the commands: the only
other command that takes pool as an argument is ls and it can't really
serve as a prototype for du, because the latter has to work on images
and snapshots as well.
Examples:
# stats for pool rbd
$ rbd du
$ rbd -p rbd du
# stats for pool foo
$ rbd -p foo du
# stats for snapshot mysnap of image baz in pool rbd
$ rbd du baz@mysnap
# stats for image bar in pool foo
$ rbd du foo/bar
No command uses it as of now, but only clone command fails; cp, mv and
import simply ignore it. Check if it's set and exit with a generic
error message.
rbd: import doesn't require image-spec arg, ditto for export and path
Mark those as such in help and clarify what image-spec defaults to.
Related, all command args in our man page are enclosed into brackets.
I suppose the reason is that they are optional in the sense that you
can have commands like
$ rbd clone --pool a --image b --snap -c --dest-pool d --dest e
with no args. Given that we are trying to push people towards
$ rbd clone a/b@c d/e
undo that so that real optional arguments can be marked optional.
While at it, add synopsis for each command and use backticks for
denoting commands more consistently.
This patch changes image-name instances to image-spec and snap-name
instances to snap-spec to try to clarify usage for some commands and
disambiguate the term {image,snap}-name, which has been used to denote
both simple names and compound names (specs).
<image-spec> is [<pool-name>]/<image-name>
<snap-spec> is [<pool-name>]/<image-name>@<snap-name>
This patch also removes duplicate checks for image-name and snap-name.
Signed-off-by: Vikhyat Umrao <vumrao@redhat.com>
[idryomov@gmail.com: some commands take either image-spec or snap-spec,
other fixes, formatting, changelog] Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
test/perf_local: disable tests on unsupported archs
* maybe we can have div32 tests on aarch64, but i don't
think "udiv|sdiv" supports 64bits numerator. probably
we can use float divde for the benchmark...
* disable cpuid test on non-intel archs.
Fixes: #12453 Reported-by: Tom Deneau <tom.deneau@amd.com> Signed-off-by: Kefu Chai <kchai@redhat.com>
Samuel Just [Fri, 24 Jul 2015 22:38:18 +0000 (15:38 -0700)]
Log::reopen_log_file: take m_flush_mutex
Otherwise, _flush() might continue to write to m_fd after it's closed.
This might cause log data to go to a data object if the filestore then
reuses the fd during that time.
Fixes: #12465
Backport: firefly, hammer Signed-off-by: Samuel Just <sjust@redhat.com>
John Spray [Fri, 24 Jul 2015 08:37:15 +0000 (09:37 +0100)]
mds: fix val used in inode->last_journaled
This was getting assigned with LogEvent::get_start_offset
on an uncommitted LogEvent, which is junk. During replay
last_journaled is compared with the metablob's event_seq,
so that's what should be used here.
This change just from code inspection -- haven't seen this
manifest as an actual misbehaviour.