Jason Dillaman [Tue, 11 Aug 2015 13:26:33 +0000 (09:26 -0400)]
librbd: prevent race condition between resize requests
It was possible that the same resize request could be sent twice
if a completed resize op started a newly created resize op while
it was also being concurrently started by another thread.
Fixes: #12664
Backport: hammer Signed-off-by: Jason Dillaman <dillaman@redhat.com>
John Spray [Thu, 6 Aug 2015 12:10:08 +0000 (13:10 +0100)]
mds: fix setting whole layout in one vxattr
Previously we were validating the layout
after setting each field, so even when
setting multiple fields in one go (to
something valid) the intermediate states
could be invalid and cause a bogus EINVAL.
For example try setting object_size and stripe_unit
both to 2M in one go.
Samuel Just [Fri, 24 Jul 2015 22:38:18 +0000 (15:38 -0700)]
Log::reopen_log_file: take m_flush_mutex
Otherwise, _flush() might continue to write to m_fd after it's closed.
This might cause log data to go to a data object if the filestore then
reuses the fd during that time.
Jason Dillaman [Tue, 28 Jul 2015 19:27:24 +0000 (15:27 -0400)]
rbd: remove dependency on non-ABI controlled CephContext
The rbd CLI tool no longer attempts to initialize a CephContext
and pass said context to librados since it's possible that the
structure will not be ABI compatible between rbd and librados.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Tue, 28 Jul 2015 17:14:29 +0000 (13:14 -0400)]
crypto: use NSS_InitContext/NSS_ShutdownContex to avoid memory leak
Switched to context-aware NSS init/shutdown functions to avoid conflicts
with parent application. Use a reference counter to properly shutdown the
NSS crypto library when the last CephContext is destroyed. This avoids
memory leaks with the NSS library from users of librados.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
rbd: rename --object-extents option to --whole-object
--object-extents is a bit confusing - extent is generally something of
a varying length and here the meaning is "diff whole objects". Rename
it to --whole-object (the name of diff_iterate() parameter).
Change du to take <image-spec> | <snap-spec> as an argument instead of
going through --image option. The new synopsis is
(du | disk-usage) [<image-spec> | <snap-spec>]
This is to make it look more like the rest of the commands: the only
other command that takes pool as an argument is ls and it can't really
serve as a prototype for du, because the latter has to work on images
and snapshots as well.
Examples:
# stats for pool rbd
$ rbd du
$ rbd -p rbd du
# stats for pool foo
$ rbd -p foo du
# stats for snapshot mysnap of image baz in pool rbd
$ rbd du baz@mysnap
# stats for image bar in pool foo
$ rbd du foo/bar
No command uses it as of now, but only clone command fails; cp, mv and
import simply ignore it. Check if it's set and exit with a generic
error message.
rbd: import doesn't require image-spec arg, ditto for export and path
Mark those as such in help and clarify what image-spec defaults to.
Related, all command args in our man page are enclosed into brackets.
I suppose the reason is that they are optional in the sense that you
can have commands like
$ rbd clone --pool a --image b --snap -c --dest-pool d --dest e
with no args. Given that we are trying to push people towards
$ rbd clone a/b@c d/e
undo that so that real optional arguments can be marked optional.
While at it, add synopsis for each command and use backticks for
denoting commands more consistently.
This patch changes image-name instances to image-spec and snap-name
instances to snap-spec to try to clarify usage for some commands and
disambiguate the term {image,snap}-name, which has been used to denote
both simple names and compound names (specs).
<image-spec> is [<pool-name>]/<image-name>
<snap-spec> is [<pool-name>]/<image-name>@<snap-name>
This patch also removes duplicate checks for image-name and snap-name.
Signed-off-by: Vikhyat Umrao <vumrao@redhat.com>
[idryomov@gmail.com: some commands take either image-spec or snap-spec,
other fixes, formatting, changelog] Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
John Spray [Wed, 22 Jul 2015 20:57:33 +0000 (21:57 +0100)]
mds: fix crash while stopping rank
trim() was broken in: 8a91daae mds: fix mds crash when mds_max_log_events smaller
When doing a final trim in shutdown we need to trim everything,
not just all but the last segment. The previous change's intent
was to prevent badness when mds_log_max_events was too small: accomplish
the same thing here by clamping the value to something sane.
Fixes: #12355 Signed-off-by: John Spray <john.spray@redhat.com>
mon: PaxosService: call post_refresh() instead of post_paxos_update()
Whenever the monitor finishes committing a proposal, we call
Monitor::refresh_from_paxos() to nudge the services to refresh. Once
all services have refreshed, we would then call each services
post_paxos_update().
However, due to an unfortunate, non-critical bug, some services (mainly
the LogMonitor) could have messages pending in their
'waiting_for_finished_proposal' callback queue [1], and we need to nudge
those callbacks.
This patch adds a new step during the refresh phase: instead of calling
directly the service's post_paxos_update(), we introduce a
PaxosService::post_refresh() which will call the services
post_paxos_update() function first and then nudge those callbacks when
appropriate.
[1] - Given the monitor will send MLog messages to itself, and given the
service is not readable before its initial state is proposed and
committed, some of the initial MLog's would be stuck waiting for the
proposal to finish. However, by design, we only nudge those message's
callbacks when an election finishes or, if the leader, when the proposal
finishes. On peons, however, we would only nudge those callbacks if an
election happened to be triggered, hence the need for an alternate path
to retry any message waiting for the initial proposal to finish.
Fixes: #11470 Signed-off-by: Joao Eduardo Luis <joao@suse.de>
mon: Monitor: use 'ceph mon sync force' instead of 'ceph sync force'
Makes it easier to identify the command as being related with the
monitor instead of cluster-wide.
This entails adding an exception to module 'mon' in order to have this
command handled by the Monitor class instead of MonmapMonitor (which is
the one traditionally handling 'mon' module commands).
Fixes: #11545 Signed-off-by: Joao Eduardo Luis <joao@suse.de>
mon: Monitor: use 'ceph mon scrub' instead of 'ceph scrub'
Makes it easier to identify the command as being related with the
monitor instead of cluster-wide.
This entails adding an exception to module 'mon' in order to have this
command handled by the Monitor class instead of MonmapMonitor (which is
the one traditionally handling 'mon' module commands).
Fixes: #11545 Signed-off-by: Joao Eduardo Luis <joao@suse.de>
mon: Monitor: use 'ceph mon compact' instead of 'ceph compact'
Makes it easier to identify the command as being related with the
monitor instead of cluster-wide.
This entails adding an exception to module 'mon' in order to have this
command handled by the Monitor class instead of MonmapMonitor (which is
the one traditionally handling 'mon' module commands).
Fixes: #11545 Signed-off-by: Joao Eduardo Luis <joao@suse.de>
mon: MonCommands: accept FLAG(f) instead of 'f' in command sig
This allows us to do nifty stuff like 'FLAG(foo) | FLAG(bar)' and expand
it to (MonCommand::FLAG_foo | MonCommand::FLAG_bar), instead of being
bound by a single flag on macro expansion.
Corrects minor Automake errors which prevented Ceph from building
when configure was invoked from a different directory than
the toplevel source directory.
Signed-off-by: Krzysztof Kosiński <krzysztof.kosinski@intel.com>
time_t is more a arithmetic type. and it's not portable. it's always
defined as "long int" by libc. and we have no encode(int, bl), which
is expected. so a safe way is to use int64_t for presenting the mtime
returned from the stat() call.
Ken Dreyer [Tue, 23 Jun 2015 19:41:53 +0000 (13:41 -0600)]
packaging: RGW depends on /etc/mime.types
If the mimecap RPM or mime-support DEB is not installed, then the
/etc/mime.types file is not present on the system. RGW attempts to read
this file during startup, and if the file is not present, RGW logs an
error:
ext_mime_map_init(): failed to open file=/etc/mime.types ret=-2
Make the radosgw package depend on the mailcap/mime-support packages so
that /etc/mime.types is always available on RGW systems.
ceph.in: print more detailed error message for 'tell' command
* we do not allow user specify a certain daemon when starting an
interactive session. the existing error message could lead to
some confusion. so put more details in it.
John Spray [Fri, 8 May 2015 15:22:20 +0000 (16:22 +0100)]
doc: add some docs about cephfs-data-scan
These are deliberately fairly sparse, because:
* These tools are for experts
* These tools may well be wrapped in a higher
level recovery tool that orchestrates parallel
workers at some stage.