]>
git.apps.os.sepia.ceph.com Git - ceph.git/log
Sage Weil [Tue, 15 Dec 2015 20:13:08 +0000 (15:13 -0500)]
os/bluestore/bluestore_types: add extent FLAG_COW_{HEAD,TAIL}
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 17 Dec 2015 19:14:44 +0000 (14:14 -0500)]
unittest_bluefs, unittest_bluestore_types
These should run during make check.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 14 Dec 2015 21:58:15 +0000 (16:58 -0500)]
os/bluestore/bluestore_types: add contains(), clear(), empty() to extent_ref_map
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 14 Dec 2015 20:53:41 +0000 (15:53 -0500)]
os/bluestore/BlueStore: wal_op_t::OP_COPY
Assume block-aligned.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Fri, 18 Dec 2015 22:33:41 +0000 (17:33 -0500)]
os/bluestore/BlockDevice: fix waiter wakeup use-after-free race
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Tue, 22 Dec 2015 19:03:33 +0000 (14:03 -0500)]
os/bluestore: add bluestore_debug_no_reuse_blocks
This makes debugging a bit easier because we never use the same
extent of the disk twice, leaving useful evidence behind.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Fri, 18 Dec 2015 22:45:27 +0000 (17:45 -0500)]
ceph_test_objectstore: do Synthetic tests over larger objects
400k for objects.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 17 Dec 2015 19:15:48 +0000 (14:15 -0500)]
ceph_test_objectstore: use a few hash values for objects; clone between them
We only guarantee support for clone between objects with the same hash.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 17 Dec 2015 19:17:35 +0000 (14:17 -0500)]
ceph_test_objectstore: dump actual vs expected on read data mismatch
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 17 Dec 2015 19:15:33 +0000 (14:15 -0500)]
ceph_test_objectstore: add many clone tests
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 17 Dec 2015 18:59:36 +0000 (13:59 -0500)]
ceph_test_objectstore: validate full object contents after writes
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 17 Dec 2015 16:28:33 +0000 (11:28 -0500)]
ceph_test_objectstore: debug enter/exit points
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 17 Dec 2015 16:28:14 +0000 (11:28 -0500)]
ceph_test_objectstore: save map lookups for a few ops
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 17 Dec 2015 16:27:54 +0000 (11:27 -0500)]
ceph_test_objectstore: fix locking for a few ops
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 17 Dec 2015 16:27:03 +0000 (11:27 -0500)]
ceph_test_objectstore: fix clone
Copy the buffer, in case other threads modify it in place.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Wed, 16 Dec 2015 18:18:53 +0000 (13:18 -0500)]
ceph_test_objectstore: simplify object name generation
The long names don't exercise useful code paths, and having
consistent naming makes it easier to grep through logs.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Wed, 16 Dec 2015 14:16:37 +0000 (09:16 -0500)]
ceph_test_objectstore: clone non-empty objects, not empty ones
This condition was backwards.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 14 Dec 2015 20:00:37 +0000 (15:00 -0500)]
ceph_test_objectstore: clone objects with same hash
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Tue, 22 Dec 2015 18:31:07 +0000 (13:31 -0500)]
os/bluestore: add some slow debug path
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Tue, 22 Dec 2015 18:10:43 +0000 (13:10 -0500)]
os/bluestore: clean up comments a bit
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Tue, 22 Dec 2015 17:45:58 +0000 (12:45 -0500)]
os/bluestore/BlueStore: note wal releases in fsck
Include these in used_blocks (they are about to be released but
not reflected in the onode).
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Fri, 18 Dec 2015 22:40:17 +0000 (17:40 -0500)]
os/bluestore/BlueStore: fix read bug when there is a hole
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Tue, 22 Dec 2015 18:40:56 +0000 (13:40 -0500)]
os/kstore: fix rename
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Tue, 22 Dec 2015 16:40:23 +0000 (11:40 -0500)]
os/bluestore/BlueStore: fix rename
Install a negative onode entry at the old name position.
Otherwise, a simple transaction like
rename a -> b
touch b
will re-read the old b onode key on the second op, and chaos will
ensue (e.g., because it'll reference the same extents from a
different object).
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Tue, 22 Dec 2015 16:34:07 +0000 (11:34 -0500)]
os/bluestore/BlueStore: remove unused OnodeMap::remove
We install negative entries instead.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Tue, 22 Dec 2015 15:49:18 +0000 (10:49 -0500)]
os/bluestore/BlockDevice: adjust debug output
5 helpful (read/write offsets)
10 more, with aio completions
20 everything
30 fire hose
40 data hexdumps
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Tue, 22 Dec 2015 15:48:14 +0000 (10:48 -0500)]
os/bluestore/BlockDevice: fix path
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Tue, 22 Dec 2015 15:31:08 +0000 (10:31 -0500)]
os/bluestore/BlueStore: do WAL ops buffered to avoid RMW issues
We may have multiple WAL ops that do read/modify/write covering
the same blocks. To avoid the complexity of identifying those
situations and ensuring that we, say, wait for writes to complete
before reading them back again, just make the IO buffered and let
the page cache handle that for us.
This fixes the failure of LibRadosAio.RoundTripWriteFull.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 21 Dec 2015 21:57:12 +0000 (16:57 -0500)]
rocksdb: debug log writes/reads
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 21 Dec 2015 21:56:42 +0000 (16:56 -0500)]
os/bluestore: handle both buffered and direct+async IO
Prefer aio unless explicitly directed otherwise.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 21 Dec 2015 21:45:02 +0000 (16:45 -0500)]
os/bluestore/BlockDevice: rename bdev options
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 21 Dec 2015 21:18:05 +0000 (16:18 -0500)]
os/bluestore/BlueStore: use BlueFS::get_usage()
...just so we log bdev utilization in the log.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 21 Dec 2015 21:17:47 +0000 (16:17 -0500)]
os/bluestore/BlueFS: get_usage()
Return (and log) usage for all bdevs.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 21 Dec 2015 20:33:38 +0000 (15:33 -0500)]
os/bluestore/BlueFS: do not dirty file when overwriting bytes
The rocksdb log recycle option allows us to overwrite previously
allocated space in an old log file to avoid updating the file
metadata on normal file systems. Take advantage of that here to
by implementing what is effectively O_NOCMTIME semantics: we do
not dirty the file metadata just because mtime is updated.
Instead, we dirty the file only if we allocate new space or if
the size has to be increased.
Note that on my NVME drive a single-thread rados bench test, we
jump from 30MB/sec to 50MB/sec 128KB writes as soon as we start
recycling previous logs (about 40 second into the run).
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 21 Dec 2015 20:07:44 +0000 (15:07 -0500)]
os/bluestore/BlueFS: ignore flush when buffer is small
Rocksdb does a flush after every append, each of which is often
less than a full block. This is very inefficient when our
_flush() will send that to disk (and block).
Avoid this most of the time by ignoring small flush requests
entirely, unless the force flag is set (e.g., by fsync).
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 21 Dec 2015 19:45:00 +0000 (14:45 -0500)]
os/bluestore: update freelist in individual transactions
We submit each operation's transaction individually to rocksdb,
and then since a final transction to flush them all. However,
they may not commit atomically (all together), which means we
need to leave the individual freelist updates within each
transaction.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 21 Dec 2015 19:22:58 +0000 (14:22 -0500)]
os/bluestore: better debugging on fsck alloc errors
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 21 Dec 2015 18:54:35 +0000 (13:54 -0500)]
script/crash_bdev: simple script to inject bdev failures
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 21 Dec 2015 18:53:34 +0000 (13:53 -0500)]
os/bluestore: fail mount of fsck finds errors
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 21 Dec 2015 14:49:05 +0000 (09:49 -0500)]
os/fs/FS.h: fix aio_t::pread
Allocate aligned buffer.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 21 Dec 2015 14:39:56 +0000 (09:39 -0500)]
os/bluestore/BlueStore: better error msg for bdev label check
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 21 Dec 2015 14:00:17 +0000 (09:00 -0500)]
os/bluestore: don't create block.{db,wal} by default
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 21 Dec 2015 13:58:43 +0000 (08:58 -0500)]
vstart.sh: less noisy debug
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 21 Dec 2015 13:57:18 +0000 (08:57 -0500)]
os/bluestore: fix fsck contains vs intersects
Any overlap is an error.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Sat, 19 Dec 2015 19:06:00 +0000 (14:06 -0500)]
os/bluestore: bluestore bluefs = true
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Fri, 18 Dec 2015 20:40:45 +0000 (15:40 -0500)]
rpm, debian: package ceph-bluefs-tool
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 17 Dec 2015 19:11:07 +0000 (14:11 -0500)]
os/bluestore/BlueStore: fix error path if label set fails
Reported-by: David Zafman <dzafman@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 17 Dec 2015 19:12:36 +0000 (14:12 -0500)]
rocksdb: fix recycle replay
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 17 Dec 2015 14:06:48 +0000 (09:06 -0500)]
Makefile-rocksdb.am: update
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 14 Dec 2015 21:33:38 +0000 (16:33 -0500)]
os/bluestore: default to 64k min_alloc_size
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 14 Dec 2015 21:28:22 +0000 (16:28 -0500)]
os/bluestore/BlueStore: fix _open_bdev() failure path
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 14 Dec 2015 21:27:17 +0000 (16:27 -0500)]
kv/RocksDBStore: behave if options string is empty
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 14 Dec 2015 20:56:33 +0000 (15:56 -0500)]
os/bluestore: clear coll_map on umount, fsck finish
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 14 Dec 2015 20:55:09 +0000 (15:55 -0500)]
os/kstore/KStore: fix object key decode with key
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 14 Dec 2015 19:59:17 +0000 (14:59 -0500)]
os/bluestore/BlueStore: fix object key decode with key
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Wed, 9 Dec 2015 21:19:58 +0000 (16:19 -0500)]
ceph_objectstore_test: fix warning
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Wed, 9 Dec 2015 21:19:07 +0000 (16:19 -0500)]
os/KeyValueStore: drop kinetic #include
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 21:04:32 +0000 (16:04 -0500)]
os/kstore: add new KStore backend
This is based on BlueStore, but with all of the block-related code
and complexity ripped out, and a simple striping strategy added
in its place.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 21:03:59 +0000 (16:03 -0500)]
os/bluestore/bluestore_types: localize types
Prefix with bluestore_
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 22:27:04 +0000 (17:27 -0500)]
os/bluestore: add extent_ref_map_t
This will be used to refcount extents for some subset
of the store (objects with same name or hash value?).
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 21:03:41 +0000 (16:03 -0500)]
os/bluestore/FreelistManager: drop unused db ref
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 21:03:23 +0000 (16:03 -0500)]
os/bluestore: record kv backend
Record kv backend at mkfs time instead of relying on current value
of config option.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 21:02:45 +0000 (16:02 -0500)]
os/bluestore: statfs
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Fri, 4 Dec 2015 01:03:10 +0000 (20:03 -0500)]
os/bluestore/BlockDevice: inject block failures
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 3 Dec 2015 21:33:37 +0000 (16:33 -0500)]
ceph_test_objectstore: clean up synthetic collections
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 22:17:45 +0000 (17:17 -0500)]
os/bluestore: block.db support
Support a mid- to fast device that will preferentially
store the rocksdb data (and wal, if block.wal is not
present).
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 22:17:10 +0000 (17:17 -0500)]
os/bluestore: less debug noise
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 22:21:03 +0000 (17:21 -0500)]
os/bluestore/BlueFS: all overwrites on open_for_write
rocksdb will occasionally overwrite an existing file
if it is not present/valid in the manifest.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Wed, 25 Nov 2015 19:27:28 +0000 (14:27 -0500)]
os/bluestore/BlueStore: drop internal EnvMirror
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Fri, 11 Dec 2015 14:32:30 +0000 (09:32 -0500)]
rocksdb: pull up to master, include EnvMirror
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 22:20:25 +0000 (17:20 -0500)]
os/bluestore: label all block devices
Label all of our block devices with a simple label
that includes the osd_uuid. Wire this into the
ObjectStore and OSD probe mechanism.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 22:19:29 +0000 (17:19 -0500)]
os/bluestore/BlueFS: flush log if needed
If a file has dirty metadata (but no dirty data), we
still need to flush the log when it is flushed.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 22:18:57 +0000 (17:18 -0500)]
os/bluestore/BlueFS: fix replay of unlink
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 22:15:57 +0000 (17:15 -0500)]
os/bluestore: support second block.wal device
Use this device for the bluefs log.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 22:15:33 +0000 (17:15 -0500)]
os/bluestore/BlueStore: fix zero gap bug
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 22:15:14 +0000 (17:15 -0500)]
os/bluestore: disable overlay for now
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Fri, 27 Nov 2015 16:07:46 +0000 (11:07 -0500)]
os/bluestore/BlockDevice: restructure interface
use atomics, do not track in-flight extents or magically cope
with racing ios (that is the users responsibility).
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 21:49:56 +0000 (16:49 -0500)]
os/bluestore/BlueFS: fix overwrite
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 22:10:02 +0000 (17:10 -0500)]
os/bluestore/BlueFS: fix writes spanning extents
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 22:09:51 +0000 (17:09 -0500)]
os/bluestore: reenable rocksdb recycling
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 21:45:04 +0000 (16:45 -0500)]
os/bluestore/BlockDevice: lock device while open
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 21:44:42 +0000 (16:44 -0500)]
os/bluestore/BlockDevice: debug read result
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 21:44:29 +0000 (16:44 -0500)]
os/bluestore/BlockDevice: fix alignment check
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 21:49:14 +0000 (16:49 -0500)]
os/bluestore/BlockDevice: check aio return values
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 21:43:32 +0000 (16:43 -0500)]
os/bluestore/BlueFS: avoid lock during reads
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 21:43:14 +0000 (16:43 -0500)]
os/bluestore/BlueFS: prevent read+write sharing
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 21:38:45 +0000 (16:38 -0500)]
vstart.sh: debug bluefs and rocksdb
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 21:38:35 +0000 (16:38 -0500)]
os/bluestore/BlueFS: periodically compact log
Rewrite only the current metadata in a fresh log
periodically to free log space.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 21:37:55 +0000 (16:37 -0500)]
os/bluestore/BlueFS: simplify extent list
Merge contiguous extents.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 21:37:35 +0000 (16:37 -0500)]
os/bluestore/BlueFS: fix read
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 21:35:37 +0000 (16:35 -0500)]
ceph_test_objectstore: trivial init fix
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 21:31:18 +0000 (16:31 -0500)]
kv/RocksDBStore: rocksdb_separate_wal_dir option
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 21:34:27 +0000 (16:34 -0500)]
os/bluestore/BlueFS: ref count BlueFS::File *
There are FileWriters that exist when the file is
deleted.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 21:32:24 +0000 (16:32 -0500)]
os/bluestore/BlueFS: readdir list dirs, too
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 21:32:06 +0000 (16:32 -0500)]
ceph-bluefs-tool: simple tool to export bluefs content
Currently we just do a dump. We'll add more
functionality later.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 21:30:47 +0000 (16:30 -0500)]
os/bluestore/BlueFS: many fixes
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 21:16:57 +0000 (16:16 -0500)]
os/bluestore/BlueStore: share space with BlueFS
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 21:17:53 +0000 (16:17 -0500)]
os/bluestore/BlockDevice: move to simple mutex model
Just for now, while we get the rest of this working.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 10 Dec 2015 21:07:15 +0000 (16:07 -0500)]
os/bluestore/BlueFS: simple file system to back rocksdb
BlueFS is a simple file system that will back rocksdb.
BlueRocksEnv is the rocksdb::Env implementation that
glues them together.
Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Fri, 27 Nov 2015 16:07:36 +0000 (11:07 -0500)]
ceph_test_objectstore: less verbose
Signed-off-by: Sage Weil <sage@redhat.com>