Summary:
Use the rsync tempfile naming convention in our `BackupEngine`. The temp file follows the format, `.<filename>.<suffix>`, which is later renamed to `<filename>`. We fix `tmp` as the `<suffix>` as we don't need to use random bytes for now. The benefit is gluster treats this tempfile naming convention specially and applies hashing only to `<filename>`, so the file won't need to be linked or moved when it's renamed. Our gluster team suggested this will make things operationally easier.
Closes https://github.com/facebook/rocksdb/pull/3463
Sagar Vemuri [Wed, 21 Feb 2018 03:05:21 +0000 (19:05 -0800)]
Add rocksdb.iterator.internal-key property
Summary:
Added a new iterator property: `rocksdb.iterator.internal-key` to get the internal-key (converted to user key) at which the iterator stopped.
Closes https://github.com/facebook/rocksdb/pull/3525
Mike Kolupaev [Fri, 16 Feb 2018 15:58:18 +0000 (07:58 -0800)]
Fix deadlock in ColumnFamilyData::InstallSuperVersion()
Summary:
Deadlock: a memtable flush holds DB::mutex_ and calls ThreadLocalPtr::Scrape(), which locks ThreadLocalPtr mutex; meanwhile, a thread exit handler locks ThreadLocalPtr mutex and calls SuperVersionUnrefHandle, which tries to lock DB::mutex_.
This deadlock is hit all the time on our workload. It blocks our release.
In general, the problem is that ThreadLocalPtr takes an arbitrary callback and calls it while holding a lock on a global mutex. The same global mutex is (at least in some cases) locked by almost all ThreadLocalPtr methods, on any instance of ThreadLocalPtr. So, there'll be a deadlock if the callback tries to do anything to any instance of ThreadLocalPtr, or waits for another thread to do so.
So, probably the only safe way to use ThreadLocalPtr callbacks is to do only do simple and lock-free things in them.
This PR fixes the deadlock by making sure that local_sv_ never holds the last reference to a SuperVersion, and therefore SuperVersionUnrefHandle never has to do any nontrivial cleanup.
I also searched for other uses of ThreadLocalPtr to see if they may have similar bugs. There's only one other use, in transaction_lock_mgr.cc, and it looks fine.
Closes https://github.com/facebook/rocksdb/pull/3510
Siying Dong [Wed, 14 Feb 2018 00:20:13 +0000 (16:20 -0800)]
Direct I/O writable file should do fsync in Close()
Summary:
We don't do fsync() after truncate in direct I/O writeable file (in fact we don't do any fsync ever). This can cause metadata not persistent to disk after the file is generated. We call it instead.
Closes https://github.com/facebook/rocksdb/pull/3500
Wouter Beek [Wed, 20 Dec 2017 16:02:53 +0000 (08:02 -0800)]
FIXED: string buffers potentially too small to fit formatted write
Summary:
This fixes the following warnings when compiled with GCC7:
util/transaction_test_util.cc: In static member function ‘static rocksdb::Status rocksdb::RandomTransactionInserter::DBGet(rocksdb::DB*, rocksdb::Transaction*, rocksdb::ReadOptions&, uint16_t, uint64_t, bool, uint64_t*, std::__cxx11::string*, bool*)’:
util/transaction_test_util.cc:75:8: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=]
Status RandomTransactionInserter::DBGet(
^~~~~~~~~~~~~~~~~~~~~~~~~
util/transaction_test_util.cc:84:11: note: ‘snprintf’ output between 5 and 6 bytes into a destination of size 5
snprintf(prefix_buf, sizeof(prefix_buf), "%.4u", set_i + 1);
~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
util/transaction_test_util.cc: In static member function ‘static rocksdb::Status rocksdb::RandomTransactionInserter::Verify(rocksdb::DB*, uint16_t, uint64_t, bool, rocksdb::Random64*)’:
util/transaction_test_util.cc:245:8: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=]
Status RandomTransactionInserter::Verify(DB* db, uint16_t num_sets,
^~~~~~~~~~~~~~~~~~~~~~~~~
util/transaction_test_util.cc:268:13: note: ‘snprintf’ output between 5 and 6 bytes into a destination of size 5
snprintf(prefix_buf, sizeof(prefix_buf), "%.4u", set_i + 1);
~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Closes https://github.com/facebook/rocksdb/pull/3295
Jun Wu [Thu, 1 Feb 2018 22:15:28 +0000 (14:15 -0800)]
crc32: suppress -Wimplicit-fallthrough warnings
Summary:
Workaround a bunch of "implicit-fallthrough" compiler errors, like:
```
util/crc32c.cc:533:7: error: this statement may fall through [-Werror=implicit-fallthrough=]
crc = _mm_crc32_u64(crc, *(uint64_t*)(buf + offset));
^
util/crc32c.cc:1016:9: note: in expansion of macro ‘CRCsinglet’
CRCsinglet(crc0, next, -2 * 8);
^~~~~~~~~~
util/crc32c.cc:1017:7: note: here
case 1:
```
Closes https://github.com/facebook/rocksdb/pull/3339
Yi Wu [Fri, 26 Jan 2018 23:12:16 +0000 (15:12 -0800)]
StackableDB optionally take shared ownership of the underlying DB
Summary:
Allow StackableDB optionally takes a shared_ptr on construction and thus hold shared ownership of the underlying DB.
Closes https://github.com/facebook/rocksdb/pull/3423
Sagar Vemuri [Fri, 12 Jan 2018 18:53:43 +0000 (10:53 -0800)]
Fix PowerPC dynamic java build
Summary:
Java build on PPC64le has been broken since a few months, due to #2716. Fixing it with the least amount of changes.
(We should cleanup a little around this code when time permits).
This should fix the build failures seen in http://140.211.168.68:8080/job/Rocksdb/ .
Closes https://github.com/facebook/rocksdb/pull/3359
Yi Wu [Fri, 19 Jan 2018 01:32:50 +0000 (17:32 -0800)]
Fix Flush() keep waiting after flush finish
Summary:
Flush() call could be waiting indefinitely if min_write_buffer_number_to_merge is used. Consider the sequence:
1. User call Flush() with flush_options.wait = true
2. The manual flush started in the background
3. New memtable become immutable because of writes. The new memtable will not trigger flush if min_write_buffer_number_to_merge is not reached.
4. The manual flush finish.
Because of the new memtable created at step 3 not being flush, previous logic of WaitForFlushMemTable() keep waiting, despite the memtables it intent to flush has been flushed.
Here instead of checking if there are any more memtables to flush, WaitForFlushMemTable() also check the id of the earliest memtable. If the id is larger than that of latest memtable at the time flush was initiated, it means all the memtable at the time of flush start has all been flush.
Closes https://github.com/facebook/rocksdb/pull/3378
Sagar Vemuri [Fri, 5 Jan 2018 23:30:12 +0000 (15:30 -0800)]
Fix zstd/zdict include path for java static build
Summary:
With the ZSTD dictionary generator support added in #3057
`PORTABLE=1 ROCKSDB_NO_FBCODE=1 make rocksdbjavastatic` fails as it can't find zdict.h. Specifically due to:
https://github.com/facebook/rocksdb/blob/e3a06f12d27fd50af7b6c5941973f529601f9a3e/util/compression.h#L39
In java static builds zstd code gets directly downloaded from https://github.com/facebook/zstd , and in there zdict.h is under dictBuilder directory. So, I modified libzstd.a target to use `make install` to collect all the header files into a single location and used that as the zstd's include path.
Closes https://github.com/facebook/rocksdb/pull/3260