actually we are verifying if the variable is an instance of
specified class. for example, the `prepare.data` should be
a `PrepareFilestoreData` if `--bluestore` is not specified.
Haomai Wang [Sat, 6 Feb 2016 06:52:43 +0000 (14:52 +0800)]
AsyncConnection: avoid debug log in cleanup_handler
local connection will be stop and call cleanup_handler after messenger
is down
introduced in
commit(https://github.com/ceph/ceph/commit/9da2fffd31562ed5d0b795d7862b3ebec66aba40)
Matt Benjamin [Fri, 5 Feb 2016 21:43:43 +0000 (16:43 -0500)]
cmake: add libboost_system to EXTRALIBS
This concisely fixes several unittest builds, and reflects the
fact that this library dependency has moved into several areas
of the codebase (libcephfs, librbd, librgw).
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Jason Dillaman [Wed, 3 Feb 2016 22:33:24 +0000 (17:33 -0500)]
cls_journal: new tag management methods and handling
In the case of librbd, a new tag will be allocated when the
exclusive lock is acquired. All tags for the same dataset
(e.g. librbd image) will belong to the same class. Tags are
automatically pruned on tag create / client unregister
if no other clients' commit position would require the tags.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Adam C. Emerson [Mon, 1 Feb 2016 15:40:54 +0000 (10:40 -0500)]
time: Have skewing-now call non-skewing now
For the real-time clocks, Ceph's testing infrastructure likes to be able to
inject a skew. To avoid pulling CephContext into ceph_time.h these are moved to
ceph_time.cc. The original way this was done called clock_gettime in both
places.
This is an unnecessary duplication and apparently error-prone. So only call
clock_gettime from one place.
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Sage Weil [Fri, 5 Feb 2016 14:20:40 +0000 (09:20 -0500)]
os/bluestore: fix block device file creation
Just make a separate flag to indicate whether we create a block
file. This lets us drop the weird touch in vstart.sh, and default
to creating a token 'block' file on --mkfs.
Loic Dachary [Mon, 1 Feb 2016 12:32:13 +0000 (19:32 +0700)]
global: do not start two daemons with a single pid-file (part 2)
Fixes the following bugs:
* the fd is open(O_WRONLY) and cannot be read from, safe_read
always fails and never removes the pid file.
* pidfile_open(g_conf) is close(STDOUT_FILENO) and there is a risk that
pidfile_open gets STDOUT_FILENO only to have it closed and redirected
to /dev/null.
* Before writing the file, ftruncate it so that overriding a file
containing the pid 1234 with the pid 89 does not end up being
a file with 8934.
* Before reading the file, lseek back to offset 0 otherwise it
will read nothing.
* tests_pidfile was missing an argument when failing
TEST_without_pidfile and killed all process with ceph in their name,
leading to chaos and no useful error message.
* lstat(fd) cannot possibly return a result different from the one
obtained right after the file was open, stat(path) must be used
instead.
In addition to fixing the bugs above, refactor the pidfile.cc
implementation to:
* be systematic about error reporting (using cerr for when removing
the pidfile because derr is not available at this point and derr
when creating the pidfile).
* replace pidfile_open / pidfile_write with just pidfile_write since
there never is a case when they are not used together.
More test cases are added to test_pidfile to verify the bugs above are
fixed.
Sage Weil [Thu, 4 Feb 2016 18:04:53 +0000 (13:04 -0500)]
os/bluestore/BlueStore: fix enode uniqueness
We were failing to set o->enode, which meant that there were
multiple instances of the same enode alive at once. Avoid this
category of bug by changing _txc_release to take the onode ref
and assign it there, and removing almost all of the local EnodeRef
instances.
Sage Weil [Thu, 4 Feb 2016 14:24:22 +0000 (09:24 -0500)]
os/memstore/MemStore: set Collection::cid on create
This was broken by the collection handles merge in 2e52a8b17c348bb3356eb76a8a0f6ef6efbe5bd3 because the c->cid
value was never initialized and now we started to rely on it.
Loic Dachary [Sun, 24 Jan 2016 10:07:58 +0000 (17:07 +0700)]
ceph-disk: use the type file for bluestore
The type file in the OSD bluestore data exists and contains the
bluestore string. ceph-disk activate should use it instead of
the "osd objectstore" configuration value. It is better in case the
configuration file changes between prepare and activate.
The fsid file cannot be used by bluestore to signify that ceph-osd
--mkfs has completed successfully because it is pre-populated by
ceph-disk. Introduce the mkfs_done file, dedicated to this, instead of
overloading an existing file.
Signed-off-by: Sage Weil <sage@redhat.com> Signed-off-by: Loic Dachary <loic@dachary.org>
Loic Dachary [Mon, 1 Feb 2016 11:26:05 +0000 (18:26 +0700)]
tests: ceph-disk tests pid files must exist
http://tracker.ceph.com/issues/13422 made it so ceph-osd won't start
unless the pidfile can be created successfully. The default location
being the current directory, ceph-osd must explicitly be told to write
in a directory where it has write permissions.
Loic Dachary [Thu, 28 Jan 2016 04:59:10 +0000 (11:59 +0700)]
ceph-disk: bluestore deactivate / destroy
It is straightforward because it entirely relies on information
collected by ceph-disk list which has full support for bluestore.
It loops on all possible auxiliary devices (as found in Spaces.NAMES)
and does the associated deactivate / destruction which is merely about
handling dmcrypt map / unmap.
Loic Dachary [Thu, 28 Jan 2016 04:12:47 +0000 (11:12 +0700)]
ceph-disk: bluestore list
The objectstore journal and the bluestore block auxiliary device are
handled in the same way. Each occurrence of journal in the code is
replaced with a variable.
A few helpers are added to the Ptype class to factorize the most common
lookups but the code logic is unmodified with one exception: the
more_osd_info previously added a journal_uuid entry regarless. If there
was no journal_uuid file, it would be None. It is changed to only add
the {block,journal}_uuid entry if the corresponding file exist.
Loic Dachary [Thu, 28 Jan 2016 04:53:49 +0000 (11:53 +0700)]
ceph-disk: bluestore trigger
Copy paste the journal code and s/journal/block/
More work will be needed to support multiple auxiliary
devices (block.wal etc). But the goal is to minimize the change because
this commit is part of a series of commits focusing on refactoring
prepare, not the entire ceph-disk codebase.
Loic Dachary [Thu, 28 Jan 2016 04:48:55 +0000 (11:48 +0700)]
ceph-disk: bluestore activate
Only support the block file for now. The refactoring consist of
replacing main_activate_journal with main_activate_space and a name
argument (block, journal). More work will be needed to support multiple
auxiliary devices (block.wal etc). But the goal is to minimize the
change because this commit is part of a series of commits focusing on
refactoring prepare, not the entire ceph-disk codebase.
Loic Dachary [Thu, 28 Jan 2016 04:43:22 +0000 (11:43 +0700)]
ceph-disk: bluestore prepare
Only support the block file for now. It is handled the same as the
journal, only with a different name (block) and it's own set of ptypes
depending on multipath or dmcrypt.
Loic Dachary [Tue, 19 Jan 2016 09:49:40 +0000 (16:49 +0700)]
ceph-disk: refactor prepare
The logic / code path is only modified to the extent necessary for the
refactor.
The Prepare class roughly replaces the prepare_main function but also
handles the prepare subcommand argument parsing. It creates the data and
journal objects and delegate the actual work to them via the prepare()
method.
The Prepare class assumes that preparing an OSD consists on the
following phases:
* optionally prepare auxiliary devices, such as the journal
* prepare a data directory or device
* populate the data directory with fsid etc. and optionally
symbolic links to the auxiliary devices
The PrepareDefault class is derived from Prepare and implements the
current model where there only is one auxiliary device, the journal.
The PrepareJournal class implements the *journal* functions
and is based on a generic class, PrepareSpace which handles the
allocation of an auxiliary device. The only journal specific feature is
left to the PrepareJournal class: querying the OSD to figure out if
a journal is wanted or not.
The OSD data directory is prepared via the PrepareData class. It creates
a file system if necessary (i.e. if a device) and populate the data
directory. Further preparation is then delegated to the auxiliary
devices (i.e. adding a symlink to the device for a journal).
There was some code paths related dmcrypt / multipath devices in
the prepare functions, although it is orthogonal. A class tree for
Devices was created to isolate that.
Although that was the primary reason for adding a new class tree, two
other aspects have also been moved there: ptypes and partition creation.
The ptypes are organized into a data structure with a few helpers in
the hope it will be easier to maintain. All references to the *_UUID
variables have been updated.
The creation of a partition is delegated to sgdisk and a wrapper helps
reduce the code redundancy.
The ptype of a given partition depends on the type of the device (is it
dmcrypt'ed or a multipath device ?). It is best implemented by
derivation so the prepare function does not need to be concerned about
how the ptype of a partition is determined.
Many functions could be refactored into a Device class and its
derivatives, but that was not done to minimize the size of the refactor.
Device knows how to create a partition and figure out the ptype tobe
DevicePartition a regular device partition
DevicePartitionMultipath a partition of a multipath device
DevicePartitionCrypt base class for luks/plain dmcrypt, can map/unmap
DevicePartitionCryptPlain knows how to setup dmcrypt plain
DevicePartitionCryptLuks knows how to setup dmcrypt plain
The CryptHelpers class is introduced to factorize the code snippets that
were duplicated in various places but that do not really belong
because they are convenience wrappers to figure out:
* if dmrypt should be used
* the keysize
* the dmcrypt type (plain or luks)
Josh Durgin [Fri, 11 Dec 2015 23:20:19 +0000 (15:20 -0800)]
rbd-mirror: skeleton of a mirroring daemon
This isn't functional yet, since ImageReplayer doesn't do any replay
yet. Just the parts for monitoring other clusters for changes
(ClusterWatcher and PoolWatcher) are tested, with simple
unit and functional tests.
Loic Dachary [Tue, 19 Jan 2016 11:33:05 +0000 (18:33 +0700)]
tests: workaround ceph-disk global side effects
Because some variables are global in ceph-disk, tests that modify them
interact with each other in non-predictable ways. This will go away
eventually but requires a significant refactor. Workaround by running
one py.test per test file.