Loic Dachary [Mon, 29 Feb 2016 11:18:41 +0000 (18:18 +0700)]
ceph-disk: improve trigger verbosity
The ceph-disk activate errors were ignored and not displayed. Capture
stdout/stderr and display it if the exit code is non zero. Also fail
when an activate fails.
Pass the --verbose flag to activate, if given to trigger.
Loic Dachary [Fri, 26 Feb 2016 11:57:37 +0000 (18:57 +0700)]
ceph-disk: protect list with activate lock
list may try to mount partitions to figure out the OSD id and other
details. If it does so while the OSD is activated, it will race and lead
to errors, either for activation or for list.
Loic Dachary [Thu, 25 Feb 2016 10:20:54 +0000 (17:20 +0700)]
doc: update ceph-disk to refer to ceph-disk --help
The ceph-disk page is often obsolete, mostly because maintaining
it requires a significant amount of copy/paste and re-formatting.
Now that the --help of ceph-disk has been updated to include a more
verbose explanation of each subcommand, simplify the man page to
give an overview of the subcommands and suggest the --help is used
to get more information.
Loic Dachary [Thu, 25 Feb 2016 05:56:02 +0000 (12:56 +0700)]
ceph-disk: implement lockbox key management
Instead of storing the dmcrypt keys in the /etc/ceph/dmcrypt-keys
directory, they are stored in the monitor. If a machine with
OSDs created with ceph-disk prepare --dmcrypt is lost, it does
not contain the key that would allow to decrypt their content.
The dmcrypt key is retrieved from the monitor using a different keyring
for each OSD. It is stored in a small partition called the lockbox. At
boot time the lockbox is mounted
/var/lib/ceph/osd-lockbox/$uuid
and used when the $uuid partition is detected by udev to map it with
cryptsetup.
The OSDs that were prepared prior to the lockbox implementation are
supported by looking up the key found in /etc/ceph/dmcrypt-keys before
looking in /var/lib/ceph/osd-lockbox/$uuid.
Loic Dachary [Thu, 25 Feb 2016 05:53:10 +0000 (12:53 +0700)]
ceph-disk: simplify trigger
The ceph-disk trigger deals with dmcrypt mapping which is redundant with
what ceph-disk activate-* does when the --dmcrypt flag is set. Remove
the dmcrypt mapping code and add the --dmcrypt flag to ceph-disk
activate-* where relevant.
Yehuda Sadeh [Thu, 3 Mar 2016 22:18:25 +0000 (14:18 -0800)]
Merge pull request #7786 from ceph/wip-rgw-indexless
rgw: indexless buckets (Yehuda Sadeh)
- can define a policy, for which buckets are indexless
- users can then create buckets under the specified placement target
- indexless buckets will not be synced across zones
- does not work with (s3) versioned buckets
Reviewed-by: Matt Benjamin <mbenjamin@redhat.com> Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
Piotr Dałek [Thu, 3 Mar 2016 10:30:53 +0000 (11:30 +0100)]
common/obj_bencher.cc: make verify error fatal
When run without "--no-verify", all verification errors are noted,
but they are not forwarded/reported anywhere else but to cerr, which
will cause automated testing to ignore them. Make seq_read_bench and
rand_read_bench return -EIO on any verification error which will,
in turn, return it back to caller.
Fixes: #14971 Signed-off-by: Piotr Dałek <piotr.dalek@ts.fujitsu.com>
Piotr Dałek [Wed, 2 Mar 2016 12:22:38 +0000 (13:22 +0100)]
PGMonitor: unconfuse object count skew message
"Pool <pool> has too few pgs" is okay assuming it does not take other
pools into account. And since it does, it is confusing in the following
scenario:
1. Create two pools, one with small pg count and one with large
pg count
2. Put a whole lot of objects in smaller pool, resulting in "too few
pgs" warning on that pool, which is expected behavior.
3. Put a whole lot of objects in larger pool, warning goes away.
Suddenly smaller pool has plenty of PGs?
Current message suggests adding more nodes (or PGs) to pool, when
actually it's warning about significantly more objects in that
particular pool than in the other pools.
Signed-off-by: Piotr Dałek <piotr.dalek@ts.fujitsu.com>
Jianpeng Ma [Thu, 3 Mar 2016 13:46:55 +0000 (21:46 +0800)]
os/bluestore/BlueStore: Don't leak trim overlay data before write.
Suppose: bluestore_overlay_max_length=bluestore_min_alloc_size;
bluestore_overlay_max = 2;
For the following ops:
write(off=0, len=4096) --->write into overlay
write(off=4096, len=4096)-->write into overlay
write(off=0, len=bluestore_min_alloc_size)-->because overlay_map.size()
>=2, it allocate a extent.
It should trim overlay data(0,4096) &(4096, 4096),and then write(0,
bluestore_min_alloc_size).
But the original code don't trim overlay data.
This make the later read data is orignal data rather that new data.
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Jianpeng Ma [Thu, 3 Mar 2016 10:49:28 +0000 (18:49 +0800)]
os/bluestore/BlueStore: Fix bug when calc offset & end whether locate in the a extent.
Suppose: bluestore_overlay_max_length == bluestore_min_alloc_size
The orignal code which calc content of written whether locate in a
extent:
(offset / min_alloc_size) == (offset + length) /min_alloc_size
This will make the case which offset=0 & length =min_alloc_size locate
in the different extent.
In fact, this content is in the same extent.
Change end = offset + length - 1 make work.
Fixes: #14954 Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Piotr Dałek [Thu, 3 Mar 2016 10:22:57 +0000 (11:22 +0100)]
common/obj_bencher.cc: use more readable constant instead of magic number
When clean_up_slow() fails, it returns "-5" which is equal to -EIO.
Change it in source, so it's not confusing for someone who does not
remember all error codes (functionality remains the same).
Signed-off-by: Piotr Dałek <piotr.dalek@ts.fujitsu.com>
Adam Kupczyk [Wed, 2 Mar 2016 11:31:01 +0000 (12:31 +0100)]
[MON] Fixed calculation of %USED. Now it is shows (space used by all replicas)/(raw space available on OSDs). Before it was (size of pool)/(raw space available on OSDs).
Nathan Cutler [Tue, 1 Mar 2016 20:25:11 +0000 (21:25 +0100)]
RPM: move scriptlets from ceph to ceph-base
This addresses the following RPMLINT error:
ceph-base.x86_64: E: library-without-ldconfig-postun (Badness:
300) /usr/lib64/libosd_tp.so.1.0.0
ceph-base.x86_64: E: library-without-ldconfig-postun (Badness:
300) /usr/lib64/libos_tp.so.1.0.0
This package contains a library and provides no %postun scriptlet
containing a call to ldconfig.
ceph-base.x86_64: E: library-without-ldconfig-postin (Badness:
300) /usr/lib64/libosd_tp.so.1.0.0
ceph-base.x86_64: E: library-without-ldconfig-postin (Badness:
300) /usr/lib64/libos_tp.so.1.0.0
This package contains a library and provides no %post scriptlet
containing a call to ldconfig.
Sage Weil [Mon, 1 Feb 2016 18:01:32 +0000 (13:01 -0500)]
mon/MDSMonitor: prevent pool 0 from being used as a data pool
Pool 0 means no change or default in the legacy ceph_file_layout in the
layout ioctl and file create arguments. Prevent it from being used to avoid
putting users in an awkward situation later.
Sage Weil [Tue, 12 Jan 2016 14:57:06 +0000 (09:57 -0500)]
fs_types: file_layout_t: convert pool -1 (undefined) to 0 in legacy encoding
Old code assumes that fl_pg_pool == 0 means the pool is not defined, while
file_layout_t uses -1. Translate between the two.
Note that this means a valid file_layout_t with pool_id == 0 cannot be
accurately translated to a legacy file_layout_t. That is somewhat
unavoidable, and should not be a problem since real clusters create 'rbd'
as pool 0 and it does not use any file layouts.
Sage Weil [Mon, 4 Jan 2016 15:44:53 +0000 (10:44 -0500)]
struct ceph_file_layout -> file_layout_t
- drop the global
- do not memset!
- encode with features
- field names are different
- use get_period() method where appropriate
- fix is layout empty checks