]> git-server-git.apps.pok.os.sepia.ceph.com Git - rocksdb.git/log
rocksdb.git
4 years agoUpdate HISTORY.md for #8428 (#9001)
Hui Xiao [Fri, 8 Oct 2021 23:59:21 +0000 (16:59 -0700)]
Update HISTORY.md for #8428 (#9001)

Summary:
Context:
HISTORY.md was not properly updated along with the change in https://github.com/facebook/rocksdb/pull/8428, where we introduced a change of accounting compression dictionary buffering memory and an extra condition of triggering data unbuffering.
Updated HISTORY.md for https://github.com/facebook/rocksdb/pull/8428 in 6.25.0 HISTORY.md section.
Updated blog post https://rocksdb.org/blog/2021/05/31/dictionary-compression.html.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9001

Reviewed By: ajkr

Differential Revision: D31517836

Pulled By: hx235

fbshipit-source-id: 01f6b30de4e1ff6b315aa8221139d9b700c7c629

4 years agoCancel manual compactions waiting on automatic compactions to drain (#8991)
Andrew Kryczka [Thu, 7 Oct 2021 22:22:34 +0000 (15:22 -0700)]
Cancel manual compactions waiting on automatic compactions to drain (#8991)

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8991

Test Plan: the new test hangs forever without this fix and passes with this fix.

Reviewed By: hx235

Differential Revision: D31456419

Pulled By: ajkr

fbshipit-source-id: a82c0e5560b6e6153089dccd8e46163c61b07bff

4 years agoBump version v6.25.1
Yanqin Jin [Tue, 28 Sep 2021 20:24:51 +0000 (13:24 -0700)]
Bump version

4 years agoSort per-file blob read requests by offset (#8953)
Yanqin Jin [Sat, 25 Sep 2021 05:13:19 +0000 (22:13 -0700)]
Sort per-file blob read requests by offset (#8953)

Summary:
`RandomAccessFileReader::MultiRead()` tries to merge requests in direct IO, assuming input IO requests are
sorted by offsets.

Add a test in direct IO mode.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8953

Test Plan: make check

Reviewed By: ltamasi

Differential Revision: D31183546

Pulled By: riversand963

fbshipit-source-id: 5d043ec68e2daa47a3149066150afd41ee3d73e6

4 years agoReturn Status::NotSupported() in RateLimiter::GetTotalPendingRequests default impl...
Hui Xiao [Thu, 23 Sep 2021 02:35:05 +0000 (19:35 -0700)]
Return Status::NotSupported() in RateLimiter::GetTotalPendingRequests default impl (#8950)

Summary:
Context:
After more discussion, a fix in https://github.com/facebook/rocksdb/issues/8938 might turn out to be too restrictive for the case where `GetTotalPendingRequests` might be invoked on RateLimiter classes that does not support the recently added API `RateLimiter::GetTotalPendingRequests` (https://github.com/facebook/rocksdb/issues/8890) due to the `assert(false)` in https://github.com/facebook/rocksdb/issues/8938. Furthermore, sentinel value like `-1` proposed in https://github.com/facebook/rocksdb/issues/8938 is easy to be ignored and unchecked. Therefore we decided to adopt `Status::NotSupported()`, which is also a convention of adding new API to public header in RocksDB.
- Changed return value type of  `RateLimiter::GetTotalPendingRequests` in related declaration/definition
- Passed in pointer argument to hold the output instead of returning it as before
- Adapted to the changes above in calling `RateLimiter::GetTotalPendingRequests` in test
- Minor improvement to `TEST_F(RateLimiterTest, GetTotalPendingRequests)`:  added failure message for assertion and replaced repetitive statements with a loop

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8950

Reviewed By: ajkr, pdillinger

Differential Revision: D31128450

Pulled By: hx235

fbshipit-source-id: 282ac9c4f3dacaa0aec6d0a993161f77ad47a040

4 years agoAdd HISTORY.md entry to a recent bug fix. (#8948)
sdong [Wed, 22 Sep 2021 23:19:57 +0000 (16:19 -0700)]
Add HISTORY.md entry to a recent bug fix. (#8948)

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8948

Reviewed By: anand1976

Differential Revision: D31127368

fbshipit-source-id: a374cb0baf88c3e15cd587a8f31e8a2d84432928

4 years agoRandomAccessFileReader::MultiRead() should not return read bytes not read (#8941)
sdong [Tue, 21 Sep 2021 19:21:03 +0000 (12:21 -0700)]
RandomAccessFileReader::MultiRead() should not return read bytes not read (#8941)

Summary:
Right now, if underlying read returns fewer bytes than asked for, RandomAccessFileReader::MultiRead() still returns those in the buffer to upper layer. This can be a surprise to upper layer.
This is unlikely to cause incorrect data. To cause incorrect data, checksum checking in upper layer should pass with short reads, whose chance is low.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8941

Test Plan: Run stress tests for a while

Reviewed By: anand1976

Differential Revision: D31085780

fbshipit-source-id: 999adf2d6c2712f1323d14bb68b678df59969973

4 years agoClean up HISTORY
Peter Dillinger [Wed, 22 Sep 2021 04:59:24 +0000 (21:59 -0700)]
Clean up HISTORY

4 years agoMake RateLimiter::GetTotalPendingRequest() non pure virtual for backward compability...
Hui Xiao [Wed, 22 Sep 2021 04:27:26 +0000 (21:27 -0700)]
Make RateLimiter::GetTotalPendingRequest() non pure virtual for backward compability (#8938)

Summary:
Context/Summary:
https://github.com/facebook/rocksdb/pull/8890 added a public API `RateLimiter::GetTotalPendingRequest()` but mistakenly marked it as pure virtual, forcing RateLimiter's derived classes to implement this function and breaking backward compatibility.

This PR makes `RateLimiter::GetTotalPendingRequest()` as non-pure virtual method by providing a trivial implementation in rate_limiter.h

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8938

Test Plan: Passing existing tests

Reviewed By: pdillinger

Differential Revision: D31100661

Pulled By: hx235

fbshipit-source-id: 06eff1005156a6e5a881e393b2c5b2ad706897d8

4 years agoFinish BackupEngine migration to IOStatus (#8940)
Peter Dillinger [Tue, 21 Sep 2021 18:12:19 +0000 (11:12 -0700)]
Finish BackupEngine migration to IOStatus (#8940)

Summary:
Updates a few remaining functions that should have been updated
from Status -> IOStatus, and adds to HISTORY for the overall change
including https://github.com/facebook/rocksdb/issues/8820.

This change is for inclusion in version 6.25.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8940

Test Plan: CI

Reviewed By: zhichao-cao

Differential Revision: D31085029

Pulled By: pdillinger

fbshipit-source-id: 91557c6a39ef1d90357d4f4dcd79af0645d87c7b

4 years agoUpdate version to 6.25.0 (#8935)
Peter Dillinger [Mon, 20 Sep 2021 17:38:03 +0000 (10:38 -0700)]
Update version to 6.25.0 (#8935)

Summary:
for release

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8935

Test Plan: CI

Reviewed By: ajkr

Differential Revision: D31056726

Pulled By: pdillinger

fbshipit-source-id: 6fd022c39c19c35f10a2367df45dd2deb43df510

4 years agoAdd a gflag for IO uring enable/disable (#8931)
anand76 [Sat, 18 Sep 2021 16:31:57 +0000 (09:31 -0700)]
Add a gflag for IO uring enable/disable (#8931)

Summary:
In case of IO uring bugs, we need to provide a way for users to turn it off.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8931

Test Plan: Manually run db_bench with/without the option and verify the behavior

Reviewed By: pdillinger

Differential Revision: D31040252

Pulled By: anand1976

fbshipit-source-id: 56f2537d6ac8488c9e126296d8190ad9e0158f70

4 years agoRemoteCompaction support Fallback to local compaction (#8709)
Jay Zhuang [Sat, 18 Sep 2021 06:24:03 +0000 (23:24 -0700)]
RemoteCompaction support Fallback to local compaction (#8709)

Summary:
Add support for fallback to local compaction, the user can
return `CompactionServiceJobStatus::kUseLocal` to instruct RocksDB to
run the compaction locally instead of waiting for the remote compaction
result.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8709

Test Plan: unittest

Reviewed By: ajkr

Differential Revision: D30560163

Pulled By: jay-zhuang

fbshipit-source-id: 65d8905a4a1bc185a68daa120997f21d3198dbe1

4 years agoBatch blob read IO for MultiGet (#8699)
Yanqin Jin [Sat, 18 Sep 2021 01:43:32 +0000 (18:43 -0700)]
Batch blob read IO for MultiGet (#8699)

Summary:
In batched `MultiGet()`, RocksDB batches blob read IO and uses `RandomAccessFileReader::MultiRead()`
to read the blobs instead of issuing multiple `Read()`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8699

Test Plan:
```
make check
```

Reviewed By: ltamasi

Differential Revision: D31030861

Pulled By: riversand963

fbshipit-source-id: a0df6060cbfd54cff9515a4eee08807b1dbcb0c8

4 years agoFix ldb --try_load_options doesn't use customized Env (#8929)
sdong [Fri, 17 Sep 2021 21:43:53 +0000 (14:43 -0700)]
Fix ldb --try_load_options doesn't use customized Env (#8929)

Summary:
As title. The reason is that after loading customized options, the env is not set back to the correct one. Fix it.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8929

Test Plan: Manually validate in an environment where the command failed.

Reviewed By: riversand963

Differential Revision: D31026931

fbshipit-source-id: c25dc788bf80ed5bf4b24922c442781943bcd65b

4 years agoChange `SstFileMetaData::size` from `size_t` to `uint64_t` (#8926)
Peter Dillinger [Fri, 17 Sep 2021 19:38:21 +0000 (12:38 -0700)]
Change `SstFileMetaData::size` from `size_t` to `uint64_t` (#8926)

Summary:
Because even 32-bit systems can have large files

This is a "change" that I don't want intermingled with an upcoming refactoring.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8926

Test Plan: CI

Reviewed By: zhichao-cao

Differential Revision: D31020974

Pulled By: pdillinger

fbshipit-source-id: ca9eb4510697df6f1f55e37b37730b88b1809a92

4 years agoImprove rate_limiter_test.cc (#8904)
Hui Xiao [Fri, 17 Sep 2021 15:52:20 +0000 (08:52 -0700)]
Improve rate_limiter_test.cc (#8904)

Summary:
- Fixed a bug in `RateLimiterTest.GeneratePriorityIterationOrder` that the callbacks in this test were not called starting from `i = 1`. Fix by increasing `rate_bytes_per_sec` and requested bytes.
   - The bug is due to the previous `rate_bytes_per_sec` was set too small, resulting in `refill_bytes_per_period`  less than  `kMinRefillBytesPerPeriod`. Hence the actual `refill_bytes_per_period` was equal to `kMinRefillBytesPerPeriod` due to the logic [here](https://github.com/facebook/rocksdb/blob/main/util/rate_limiter.cc#L302-L303)  and it ended up being greater than the previously set requested bytes. Therefore starting from `i = 1`, `RefillBytesAndGrantRequests()` and `GeneratePriorityIterationOrder` won't be called and the test callbacks was not triggered to execute the assertion.
- Added internal flag to assert callbacks are called in `RateLimiterTest.GeneratePriorityIterationOrder` to prevent any future changes defeat the purpose of the test [as suggested](https://github.com/facebook/rocksdb/pull/8890#discussion_r704915134)
- Increased `rate_bytes_per_sec` and bytes of each request in `RateLimiterTest.GetTotalBytesThrough`, `RateLimiterTest.GetTotalRequests`, `RateLimiterTest.GetTotalPendingRequests` to trigger the "long path" of execution (i.e, the one trigger RefillBytesAndGrantRequests()) to increase test coverage
   - This increased the running time of the three tests, see test plan for time difference running locally
- Cleared up sync point effects after each test by calling `SyncPoint::GetInstance()->DisableProcessing();` and `SyncPoint::GetInstance()->ClearAllCallBacks();` in `~RateLimiterTest()` [as suggested](https://github.com/facebook/rocksdb/pull/8595/files#r697534279)
  - It's fine to call these two methods even when `EnableProcessing()` or `SetCallBack()` is not called in the test or is already cleaned up. In those cases, calling these two functions in destructor is effectively no-op.
  - This will allow cleaning up sync point effects of previous test even when the previous test failed in assertion.
- Added missing `SyncPoint::GetInstance()->DisableProcessing();` and `SyncPoint::GetInstance()->ClearCallBacks(..);` in existing tests for completeness
- Called `SyncPoint::GetInstance()->DisableProcessing();` and `SyncPoint::GetInstance()->ClearCallBacks(..);` in loop in `RateLimiterTest.GeneratePriorityIterationOrder` for completeness

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8904

Test Plan:
- Passing existing tests
- To verify the 1st change, run `RateLimiterTest.GeneratePriorityIterationOrder` with assertions of callbacks are indeed called under original `rate_bytes_per_sec` and request byte and under updated `rate_bytes_per_sec` and request byte. The former will fail the assertion while the latter succeeds.
- Here is the increased test time due to the 3rd change mentioned above in the summary. The relevant 3 tests mentioned in total increase the test time by 6s (~6000/33848 = 17.7% of the original total test time), which IMO is acceptable for better test coverage through running the "long path".
   - current (run on branch rate_limiter_ut_improve locally)

   [ RUN      ] RateLimiterTest.GetTotalBytesThrough
   [       OK ] RateLimiterTest.GetTotalBytesThrough (3000 ms)
   [ RUN      ] RateLimiterTest.GetTotalRequests
   [       OK ] RateLimiterTest.GetTotalRequests (3001 ms)
   [ RUN      ] RateLimiterTest.GetTotalPendingRequests
   [       OK ] RateLimiterTest.GetTotalPendingRequests (0 ms)
   ...
   [----------] 10 tests from RateLimiterTest (43349 ms total)

   [----------] Global test environment tear-down
   [==========] 10 tests from 1 test case ran. (43349 ms total)
   [  PASSED  ] 10 tests.

   - previous (run on branch main locally)

   [ RUN      ] RateLimiterTest.GetTotalBytesThrough
   [       OK ] RateLimiterTest.GetTotalBytesThrough (0 ms)
   [ RUN      ] RateLimiterTest.GetTotalRequests
   [       OK ] RateLimiterTest.GetTotalRequests (0 ms)
   [ RUN      ] RateLimiterTest.GetTotalPendingRequests
   [       OK ] RateLimiterTest.GetTotalPendingRequests (0 ms)
   ...
   [----------] 10 tests from RateLimiterTest (33848 ms total)

  [----------] Global test environment tear-down
  [==========] 10 tests from 1 test case ran. (33848 ms total)
  [  PASSED  ] 10 tests.

Reviewed By: ajkr

Differential Revision: D30872544

Pulled By: hx235

fbshipit-source-id: ff894f5c1a4bef70e8e407d53b00be45f776b3e4

4 years agoAdded a default Name method to Statistics (#8918)
mrambacher [Fri, 17 Sep 2021 13:54:40 +0000 (06:54 -0700)]
Added a default Name method to Statistics (#8918)

Summary:
This keeps the implementations/API backward compatible.  Implementations of Statistics will need to override this method (and be registered with the ObjectRegistry) in order to be created via CreateFromString.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8918

Reviewed By: pdillinger

Differential Revision: D30958916

Pulled By: mrambacher

fbshipit-source-id: 75b99a84e9e11fda2a9e8eff9ee1ef69a17517b2

4 years agoExpose blob file information through the EventListener interface (#8675)
Akanksha Mahajan [Fri, 17 Sep 2021 00:17:40 +0000 (17:17 -0700)]
Expose blob file information through the EventListener interface (#8675)

Summary:
1. Extend FlushJobInfo and CompactionJobInfo with information about the blob files generated by flush/compaction jobs. This PR add two structures BlobFileInfo and BlobFileGarbageInfo that contains the required information of blob files.
 2. Notify the creation and deletion of blob files through OnBlobFileCreationStarted, OnBlobFileCreated, and OnBlobFileDeleted.
 3. Test OnFile*Finish operations notifications with Blob Files.
 4. Log the blob file creation/deletion events through EventLogger in Log file.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8675

Test Plan: Add new unit tests in listener_test

Reviewed By: ltamasi

Differential Revision: D30412613

Pulled By: akankshamahajan15

fbshipit-source-id: ca51b63c6e8c8d0485a38c503572bc5a82bd5d07

4 years agoImplement TestFSRandomAccessFile::MultiRead() (#8925)
sdong [Thu, 16 Sep 2021 22:59:57 +0000 (15:59 -0700)]
Implement TestFSRandomAccessFile::MultiRead() (#8925)

Summary:
Right now, the failure injection test for MultiGet() is not sufficient. Improve it with TestFSRandomAccessFile::MultiRead() injecting failures.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8925

Test Plan: Run crash test locally for a while.

Reviewed By: anand1976

Differential Revision: D31000529

fbshipit-source-id: 439c7e02cf7440ac5af82deb609e202abdca3e1f

4 years agoAdd compaction priority information in RemoteCompaction (#8707)
Jay Zhuang [Thu, 16 Sep 2021 22:08:23 +0000 (15:08 -0700)]
Add compaction priority information in RemoteCompaction (#8707)

Summary:
Add compaction priority information in RemoteCompaction, which
can be used to schedule high priority job first.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8707

Test Plan: unittest

Reviewed By: ajkr

Differential Revision: D30548401

Pulled By: jay-zhuang

fbshipit-source-id: b30446511fb31b4583c49edd8565d496cf013a34

4 years agoAdjust contrun name (#8924)
sdong [Thu, 16 Sep 2021 22:05:23 +0000 (15:05 -0700)]
Adjust contrun name (#8924)

Summary:
One contrun name is incorrect, which mixed error reporting with another one. Fix it.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8924

Reviewed By: ltamasi

Differential Revision: D30999477

fbshipit-source-id: 46a04b2e4b48f755181aa9a47c353d91f1128469

4 years agoFix flaky WALTrashCleanupOnOpen (#8917)
Peter Dillinger [Thu, 16 Sep 2021 04:30:23 +0000 (21:30 -0700)]
Fix flaky WALTrashCleanupOnOpen (#8917)

Summary:
Test did not consider that slower deletion rate only kicks in
after a file is deleted

Fixes https://github.com/facebook/rocksdb/issues/7546

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8917

Test Plan:
no longer reproduces using

    buck test mode/dev //internal_repo_rocksdb/repo:db_sst_test -- --exact 'internal_repo_rocksdb/repo:db_sst_test - DBWALTestWithParam/DBWALTestWithParam.WALTrashCleanupOnOpen/0' --jobs 40 --stress-runs 600 --record-results

Reviewed By: siying

Differential Revision: D30949127

Pulled By: pdillinger

fbshipit-source-id: 5d0607f8f548071b07410fe8f532b4618cd225e5

4 years agoFix PrepopulateBlockCache::kFlushOnly (#8750)
Peter Dillinger [Wed, 15 Sep 2021 22:32:07 +0000 (15:32 -0700)]
Fix PrepopulateBlockCache::kFlushOnly (#8750)

Summary:
kFlushOnly currently means "always" except in the case of
remote compaction. This makes it flushes only.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8750

Test Plan: test updated

Reviewed By: akankshamahajan15

Differential Revision: D30968034

Pulled By: pdillinger

fbshipit-source-id: 5dbd24dde18852a0e937a540995fba9bfbe89037

4 years agoReplace Status with IOStatus in the backupable_db (#8820)
Zhichao Cao [Wed, 15 Sep 2021 22:08:39 +0000 (15:08 -0700)]
Replace Status with IOStatus in the backupable_db (#8820)

Summary:
In order to populate the IOStatus up to the higher level, replace some of the Status to IOStatus.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8820

Test Plan: make check

Reviewed By: pdillinger

Differential Revision: D30967215

Pulled By: zhichao-cao

fbshipit-source-id: ccf9d5cfbd9d3de047c464aaa85f9fa43b474903

4 years agoAvoid overwriting first non-OK Status in db_stress setup (#8907)
Andrew Kryczka [Wed, 15 Sep 2021 21:23:17 +0000 (14:23 -0700)]
Avoid overwriting first non-OK Status in db_stress setup (#8907)

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8907

Reviewed By: zhichao-cao

Differential Revision: D30922081

Pulled By: ajkr

fbshipit-source-id: ad7a32c21d0049342fd20c9b7f555e93674c3671

4 years agoMore robust checking of IO uring completion data (#8894)
anand76 [Wed, 15 Sep 2021 19:43:35 +0000 (12:43 -0700)]
More robust checking of IO uring completion data (#8894)

Summary:
Potential bugs in the IO uring implementation can cause bad data to be returned in the completion queue. Add some checks in the PosixRandomAccessFile::MultiRead completion handling code to catch such errors and fail the entire MultiRead. Also log some diagnostic messages and stack trace.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8894

Reviewed By: siying, pdillinger

Differential Revision: D30826982

Pulled By: anand1976

fbshipit-source-id: af91815ac760e095d6cc0466cf8bd5c10167fd15

4 years agoUse the write amplification value calculated by RocksDB in benchmark.sh (#8915)
Levi Tamasi [Wed, 15 Sep 2021 19:15:47 +0000 (12:15 -0700)]
Use the write amplification value calculated by RocksDB in benchmark.sh (#8915)

Summary:
Currently, `benchmark.sh` computes write amplification itself; the patch
changes the script to use the value calculated by RocksDB (which is
printed as part of the periodic statistics). This also has the benefit
of being correct for BlobDB as well, since it also considers the amount
of data written to blob files.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8915

Test Plan:
```
DB_DIR=/tmp/rocksdbtest/dbbench/ WAL_DIR=/tmp/rocksdbtest/dbbench/ NUM_KEYS=20000000 NUM_THREADS=32 tools/benchmark.sh overwrite --enable_blob_files=1 --enable_blob_garbage_collection=1

...

** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      7/5   43.93 MB   0.5      0.3     0.0      0.3       0.5      0.3       0.0   1.0      1.3     59.9    201.35            101.88       109    1.847     22M   499K       0.0      11.2
  L4      4/4   244.03 MB   0.0     11.4     0.3      1.6       1.6      0.0       0.0   1.1     50.6     49.3    231.10            288.84         7   33.014    156M    26M       9.5       9.5
  L5     36/0    3.28 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0       0.0       0.0
 Sum     47/9    3.56 GB   0.0     11.7     0.3      1.8       2.2      0.3       0.0   2.0     27.6     54.3    432.45            390.72       116    3.728    179M    26M       9.5      20.8
 Int      0/0    0.00 KB   0.0      3.5     0.1      0.5       0.6      0.1       0.0   2.2     31.2     55.6    115.01            109.53        29    3.966     51M  7353K       2.9       5.6

...

Completed overwrite (ID: ) in 289 seconds
ops/sec mb/sec Size-GB L0_GB Sum_GB W-Amp W-MB/s usec/op p50 p75 p99 p99.9 p99.99 Uptime Stall-time Stall% Test Date Version Job-ID
111784 44.8 0.0 0.5 2.2 2.0 9.2 285.9 215.3 264.4 1232 13299 23310 243 00:00:0.000 0.0 overwrite.t32.s0 2021-09-14T11:58:26.000-07:00 6.24
```

Reviewed By: zhichao-cao

Differential Revision: D30940352

Pulled By: ltamasi

fbshipit-source-id: ae7f5cd5440c8529788dda043266121fc2be0853

4 years agoAlways iniitalize ArenaWrappedDBIter::db_iter_ to nullptr (#8889)
sdong [Tue, 14 Sep 2021 21:32:24 +0000 (14:32 -0700)]
Always iniitalize ArenaWrappedDBIter::db_iter_ to nullptr (#8889)

Summary:
ArenaWrappedDBIter::db_iter_ should never be nullptr. However, when debugging a segfault, it's hard to distinguish it is not initialized (not possible) and other corruption. Add this nullptr to help distinguish the case.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8889

Test Plan: Run existing unit tests.

Reviewed By: pdillinger

Differential Revision: D30814756

fbshipit-source-id: 4b1f36896a33dc203d4f1f424ded9554927d61ba

4 years agoAdapt key-value checksum for timestamp-suffixed keys (#8914)
Andrew Kryczka [Tue, 14 Sep 2021 20:13:36 +0000 (13:13 -0700)]
Adapt key-value checksum for timestamp-suffixed keys (#8914)

Summary:
After https://github.com/facebook/rocksdb/issues/8725, keys added to `WriteBatch` may be timestamp-suffixed, while `WriteBatch` has no awareness of the timestamp size. Therefore, `WriteBatch` can no longer calculate timestamp checksum separately from the rest of the key's checksum in all cases.

This PR changes the definition of key in KV checksum to include the timestamp suffix. That way we do not need to worry about where the timestamp begins within the key. I believe the only practical effect of this change is now `AssignTimestamp()` requires recomputing the whole key checksum (`UpdateK()`) rather than just the timestamp portion (`UpdateT()`).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8914

Test Plan:
run stress command that used to fail

```
$ ./db_stress --batch_protection_bytes_per_key=8 -clear_column_family_one_in=0 -test_batches_snapshots=1
```

Reviewed By: riversand963

Differential Revision: D30925715

Pulled By: ajkr

fbshipit-source-id: c143f7ccb46c0efb390ad57ef415c250d754deff

4 years agoImprove benchmark.sh (#8730)
Adam Retter [Tue, 14 Sep 2021 18:08:46 +0000 (11:08 -0700)]
Improve benchmark.sh (#8730)

Summary:
* Started on some proper usage text to document the options
* Added a `JOB_ID` parameter, so that we can trace jobs and relate them to other assets
* Now generates a correct TSV file of the summary
* Summary has new additional fields:
    * RocksDB Version
    * Date
    * Job ID
* db_bench log files now also include the Job ID

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8730

Reviewed By: mrambacher

Differential Revision: D30747344

Pulled By: jay-zhuang

fbshipit-source-id: 87eb78d20959b6d95804aebf129606fa9c71f407

4 years agoAdd Kafka to USERS (#8911)
Cheng Chang [Tue, 14 Sep 2021 17:25:10 +0000 (10:25 -0700)]
Add Kafka to USERS (#8911)

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8911

Reviewed By: zhichao-cao

Differential Revision: D30908552

Pulled By: cheng-chang

fbshipit-source-id: df2ab50d94ed46bfb54f0dd520f8a5cdbfa49fd1

4 years agoFix WAL log data corruption #8723 (#8746)
eharry [Tue, 14 Sep 2021 03:15:00 +0000 (20:15 -0700)]
Fix WAL log data corruption #8723 (#8746)

Summary:
Fix WAL log data corruption when using DBOptions.manual_wal_flush(true) and WriteOptions.sync(true) together (https://github.com/facebook/rocksdb/issues/8723)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8746

Reviewed By: ajkr

Differential Revision: D30758468

Pulled By: riversand963

fbshipit-source-id: 07c20899d5f2447dc77861b4845efc68a59aa4e8

4 years agoFix flaky, dubious LdbCmdTest::*DumpFileChecksum* (#8898)
Peter Dillinger [Tue, 14 Sep 2021 00:06:05 +0000 (17:06 -0700)]
Fix flaky, dubious LdbCmdTest::*DumpFileChecksum* (#8898)

Summary:
These tests would frequently fail to find SST files due to race
condition in running ldb (read-only) on an open DB which might do automatic
compaction. But only sometimes would that failure translate into test
failure because the implementation of ldb file_checksum_dump would
swallow many errors. Now,

* DB closed while running ldb to avoid unnecessary race condition
* Detect and report/propagate more failures in `ldb file_checksum_dump`
* Use --hex so that random binary data is not printed to console

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8898

Test Plan: ./ldb_cmd_test --gtest_filter=*Checksum* --gtest_repeat=100

Reviewed By: zhichao-cao

Differential Revision: D30848738

Pulled By: pdillinger

fbshipit-source-id: 20290b517eeceba99bb538bb5a17088f7e878405

4 years agoBypass unused parameterization in ExternalSSTFileBasicTest.IngestExte… (#8910)
Peter Dillinger [Mon, 13 Sep 2021 19:16:51 +0000 (12:16 -0700)]
Bypass unused parameterization in ExternalSSTFileBasicTest.IngestExte… (#8910)

Summary:
Facebook infrastructure doesn't like continuously skipping
tests, so fixing this permanently disabled parameterization to BYPASS
instead of SKIP. (Internal ref: T100525285)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8910

Test Plan: manual

Reviewed By: anand1976

Differential Revision: D30905169

Pulled By: pdillinger

fbshipit-source-id: e23d63d2aa800e54676269fad3a093cd3f9f222d

4 years agoUse GetBlobFileSize instead of GetTotalBlobBytes in DB properties (#8902)
Levi Tamasi [Mon, 13 Sep 2021 17:46:07 +0000 (10:46 -0700)]
Use GetBlobFileSize instead of GetTotalBlobBytes in DB properties (#8902)

Summary:
The patch adjusts the definition of BlobDB's DB properties a bit by
switching to `GetBlobFileSize` from `GetTotalBlobBytes`. The
difference is that the value returned by `GetBlobFileSize` includes
the blob file header and footer as well, and thus matches the on-disk
size of blob files. In addition, the patch removes the `Version` number
from the `blob_stats` property, and updates/extends the unit tests a little.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8902

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D30859542

Pulled By: ltamasi

fbshipit-source-id: e3426d2d567bd1bd8c8636abdafaafa0743c854c

4 years agoFix minor typo in blog post (#8906)
Romain Péchayre [Mon, 13 Sep 2021 17:30:26 +0000 (10:30 -0700)]
Fix minor typo in blog post (#8906)

Summary:
Hi. Hope this helps :)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8906

Reviewed By: jay-zhuang

Differential Revision: D30890111

Pulled By: zhichao-cao

fbshipit-source-id: 45a4119158dc38cb4220b1d6d571bb1ca9902ffc

4 years agoChange the File System File Wrappers to std::unique_ptr (#8618)
mrambacher [Mon, 13 Sep 2021 15:45:13 +0000 (08:45 -0700)]
Change the File System File Wrappers to std::unique_ptr (#8618)

Summary:
This allows the wrapper classes to own the wrapped object and eliminates confusion as to ownership.  Previously, many classes implemented their own ownership solutions.  Fixes https://github.com/facebook/rocksdb/issues/8606

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8618

Reviewed By: pdillinger

Differential Revision: D30136064

Pulled By: mrambacher

fbshipit-source-id: d0bf471df8818dbb1770a86335fe98f761cca193

4 years agoAllow WriteBatch to have keys with different timestamp sizes (#8725)
Yanqin Jin [Sun, 12 Sep 2021 22:33:15 +0000 (15:33 -0700)]
Allow WriteBatch to have keys with different timestamp sizes (#8725)

Summary:
In the past, we unnecessarily requires all keys in the same write batch
to be from column families whose timestamps' formats are the same for
simplicity. Specifically, we cannot use the same write batch to write to
two column families, one of which enables timestamp while the other
disables it.

The limitation is due to the member `timestamp_size_` that used to exist
in each `WriteBatch` object. We pass a timestamp_size to the constructor
of `WriteBatch`. Therefore, users can simply use the old
`WriteBatch::Put()`, `WriteBatch::Delete()`, etc APIs for write, while
the internal implementation of `WriteBatch` will take care of memory
allocation for timestamps.

The above is not necessary.
One the one hand, users can set up a memory buffer to store user key and
then contiguously append the timestamp to the user key. Then the user
can pass this buffer to the `WriteBatch::Put(Slice&)` API.
On the other hand, users can set up a SliceParts object which is an
array of Slices and let the last Slice to point to the memory buffer
storing timestamp. Then the user can pass the SliceParts object to the
`WriteBatch::Put(SliceParts&)` API.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8725

Test Plan: make check

Reviewed By: ltamasi

Differential Revision: D30654499

Pulled By: riversand963

fbshipit-source-id: 9d848c77ad3c9dd629aa5fc4e2bc16fb0687b4a2

4 years agoUpdate HISTORY.md for PR 8899 (#8905)
Levi Tamasi [Sun, 12 Sep 2021 15:17:48 +0000 (08:17 -0700)]
Update HISTORY.md for PR 8899 (#8905)

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8905

Reviewed By: zhichao-cao

Differential Revision: D30873416

Pulled By: ltamasi

fbshipit-source-id: 6e55ec14a7fd2e562aa24cd0274e2436369923f5

4 years agoFix and detect headers with missing dependencies (#8893)
Peter Dillinger [Fri, 10 Sep 2021 16:59:05 +0000 (09:59 -0700)]
Fix and detect headers with missing dependencies (#8893)

Summary:
It's always annoying to find a header does not include its own
dependencies and only works when included after other includes. This
change adds `make check-headers` which validates that each header can
be included at the top of a file. Some headers are excluded e.g. because
of platform or external dependencies.

rocksdb_namespace.h had to be re-worked slightly to enable checking for
failure to include it. (ROCKSDB_NAMESPACE is a valid namespace name.)

Fixes mostly involve adding and cleaning up #includes, but for
FileTraceWriter, a constructor was out-of-lined to make a forward
declaration sufficient.

This check is not currently run with `make check` but is added to
CircleCI build-linux-unity since that one is already relatively fast.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8893

Test Plan: existing tests and resolving issues detected by new check

Reviewed By: mrambacher

Differential Revision: D30823300

Pulled By: pdillinger

fbshipit-source-id: 9fff223944994c83c105e2e6496d24845dc8e572

4 years agoMake Statistics a Customizable Class (#8637)
mrambacher [Fri, 10 Sep 2021 16:46:47 +0000 (09:46 -0700)]
Make Statistics a Customizable Class (#8637)

Summary:
Make the Statistics object into a Customizable object.  Statistics can now be stored and created to/from the Options file.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8637

Reviewed By: zhichao-cao

Differential Revision: D30530550

Pulled By: mrambacher

fbshipit-source-id: 5fc7d01d8431f37b2c205bbbd8342c9f697023bd

4 years agoAdd public API RateLimiter::GetTotalPendingRequests() (#8890)
Hui Xiao [Fri, 10 Sep 2021 15:35:59 +0000 (08:35 -0700)]
Add public API RateLimiter::GetTotalPendingRequests() (#8890)

Summary:
Context/Summary:
As users requested, a public API RateLimiter::GetTotalPendingRequests() is added to expose the total number of pending requests for bytes in the rate limiter, which is the size of the request queue of that priority (or of all priorities, if IO_TOTAL is interested) at the time when this API is called.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8890

Test Plan:
- Passing added new unit tests
- Passing existing unit tests

Reviewed By: ajkr

Differential Revision: D30815500

Pulled By: hx235

fbshipit-source-id: 2dfa990f651c1c47378b6215c751ad76a5824300

4 years agoAdd support to the ObjectRegistry for ManagedObjects (#8658)
mrambacher [Fri, 10 Sep 2021 12:19:47 +0000 (05:19 -0700)]
Add support to the ObjectRegistry for ManagedObjects (#8658)

Summary:
ManagedObjects are  shared pointer objects where RocksDB wants to share a single object between multiple configurations.  For example, the Cache may be shared between multiple column families/tables or the Statistics may be shared between multiple databases.

ManagedObjects are stored in the ObjectRegistry by Type (e.g. Cache) and ID.  For a given type/ID name, a single object is stored.

APIs were added to get/set/create these objects.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8658

Reviewed By: pdillinger

Differential Revision: D30806273

Pulled By: mrambacher

fbshipit-source-id: 832ac4423b210c4c4b4a456b35897334775d3160

4 years agoSupport timestamps in SstFileWriter (#8899)
Levi Tamasi [Fri, 10 Sep 2021 01:57:01 +0000 (18:57 -0700)]
Support timestamps in SstFileWriter (#8899)

Summary:
As a first step of supporting user-defined timestamps with ingestion, the
patch adds timestamp support to `SstFileWriter`; namely, it adds new
versions of the `Put` and `Delete` APIs that take timestamps. (`Merge`
and `DeleteRange` are currently not supported with user-defined timestamps
in general but once those features are implemented, we can handle them
in `SstFileWriter` in a similar fashion.) The new APIs validate the size of
the timestamp provided by the client. Similarly, calls to the pre-existing
timestamp-less APIs are now disallowed when user-defined timestamps are
in use according to the comparator.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8899

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D30850699

Pulled By: ltamasi

fbshipit-source-id: 779154373618f19b8f0797976bb7286783c57b67

4 years agoAdd comment for new_memory_used parameter in CacheReservationManager::UpdateCacheRese...
Hui Xiao [Thu, 9 Sep 2021 22:24:15 +0000 (15:24 -0700)]
Add comment for new_memory_used parameter in CacheReservationManager::UpdateCacheReservation (#8895)

Summary:
Context/Summary: this PR is to clarify what the parameter new_memory_used is in CacheReservationManager::UpdateCacheReservation

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8895

Test Plan:
- Passing existing test
- Make format

Reviewed By: jay-zhuang

Differential Revision: D30844814

Pulled By: hx235

fbshipit-source-id: 3177f7abf5668ea9e73818ceaa355566f03acabc

4 years agoUpdate HISTORY.md for new rate limiter io priorities (#8896)
Hui Xiao [Thu, 9 Sep 2021 20:24:20 +0000 (13:24 -0700)]
Update HISTORY.md for new rate limiter io priorities (#8896)

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8896

Reviewed By: ajkr

Differential Revision: D30846120

Pulled By: hx235

fbshipit-source-id: 9224ebce5437d63b0fb8af9171c6041a9ea5d90f

4 years agoSupport custom Env in db_sst_test and external_sst_file_basic_test (#8888)
anand76 [Thu, 9 Sep 2021 04:20:38 +0000 (21:20 -0700)]
Support custom Env in db_sst_test and external_sst_file_basic_test (#8888)

Summary:
Support custom Env in these tests. Some custom Envs do not support reopening a file for write, either normal mode or Random RW mode. Added some additional checks in external_sst_file_basic_test to accommodate those Envs.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8888

Reviewed By: riversand963

Differential Revision: D30824481

Pulled By: anand1976

fbshipit-source-id: c3ac7a628e6df29e94f42e370e679934a4f77eac

4 years agoCharge read to rate limiter in BackupEngine (#8722)
hx235 [Wed, 8 Sep 2021 23:23:41 +0000 (16:23 -0700)]
Charge read to rate limiter in BackupEngine (#8722)

Summary:
Context:
While all the non-trivial write operations in BackupEngine go through the RateLimiter, reads currently do not. In general, this is not a huge issue because (especially since some I/O efficiency fixes) reads in BackupEngine are mostly limited by corresponding writes, for both backup and restore. But in principle we should charge the RateLimiter for reads as well.
- Charged read operations in `BackupEngineImpl::CopyOrCreateFile`, `BackupEngineImpl::ReadFileAndComputeChecksum`, `BackupEngineImpl::BackupMeta::LoadFromFile` and `BackupEngineImpl::GetFileDbIdentities`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8722

Test Plan:
- Passed existing tests
- Passed added unit tests

Reviewed By: pdillinger

Differential Revision: D30610464

Pulled By: hx235

fbshipit-source-id: 9b08c9387159a5385c8d390d6666377a0d0117e5

4 years agoprevent stranded LATEST_BACKUP in BackupEngineTest.NoDeleteWithReadOnly (#8887)
Andrew Kryczka [Wed, 8 Sep 2021 20:37:59 +0000 (13:37 -0700)]
prevent stranded LATEST_BACKUP in BackupEngineTest.NoDeleteWithReadOnly (#8887)

Summary:
A "LATEST_BACKUP" file was left in the backup directory by
"BackupEngineTest.NoDeleteWithReadOnly" test, affecting future test
runs. In particular, it caused "BackupEngineTest.IOStats" to fail since
it relies on backup directory containing only data written by its
`BackupEngine`.

The fix is to promote "LATEST_BACKUP" to an explicitly managed file so
it is deleted in `BackupEngineTest` constructor if it exists.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8887

Test Plan:
below command used to fail. Now it passes:

```
$ TEST_TMPDIR=/dev/shm ./backupable_db_test --gtest_filter='BackupEngineTest.NoDeleteWithReadOnly:BackupEngineTest.IOStats'
```

Reviewed By: pdillinger

Differential Revision: D30812336

Pulled By: ajkr

fbshipit-source-id: 32dfbe1368ebdab872e610764bfea5daf9a2af09

4 years agoAccount for dictionary-building buffer in global memory limit (#8428)
Hui Xiao [Wed, 8 Sep 2021 19:34:35 +0000 (12:34 -0700)]
Account for dictionary-building buffer in global memory limit (#8428)

Summary:
Context:
Some data blocks are temporarily buffered in memory in BlockBasedTableBuilder for building compression dictionary used in data block compression. Currently this memory usage is not counted toward our global memory usage utilizing block cache capacity. To improve that, this PR charges that memory usage into the block cache to achieve better memory tracking and limiting.

- Reserve memory in block cache for buffered data blocks that are used to build a compression dictionary
- Release all the memory associated with buffering the data blocks mentioned above in EnterUnbuffered(), which is called when (a) buffer limit is exceeded after buffering OR (b) the block cache becomes full after reservation OR (c) BlockBasedTableBuilder calls Finish()

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8428

Test Plan:
- Passing existing unit tests
- Passing new unit tests

Reviewed By: ajkr

Differential Revision: D30755305

Pulled By: hx235

fbshipit-source-id: 6e66665020b775154a94c4c5e0f2adaeaff13981

4 years agoAdd DB properties for BlobDB (#8734)
Zhiyi Zhang [Wed, 8 Sep 2021 19:19:01 +0000 (12:19 -0700)]
Add DB properties for BlobDB (#8734)

Summary:
RocksDB exposes certain internal statistics via the DB property interface.
However, there are currently no properties related to BlobDB.

For starters, we would like to add the following BlobDB properties:
`rocksdb.num-blob-files`: number of blob files in the current Version (kind of like `num-files-at-level` but note this is not per level, since blob files are not part of the LSM tree).
`rocksdb.blob-stats`: this could return the total number and size of all blob files, and potentially also the total amount of garbage (in bytes) in the blob files in the current Version.
`rocksdb.total-blob-file-size`: the total size of all blob files (as a blob counterpart for `total-sst-file-size`) of all Versions.
`rocksdb.live-blob-file-size`: the total size of all blob files in the current Version.
`rocksdb.estimate-live-data-size`: this is actually an existing property that we can extend so it considers blob files as well. When it comes to blobs, we actually have an exact value for live bytes. Namely, live bytes can be computed simply as total bytes minus garbage bytes, summed over the entire set of blob files in the Version.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8734

Test Plan:
```
âžœ  rocksdb git:(new_feature_blobDB_properties) ./db_blob_basic_test
[==========] Running 16 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 10 tests from DBBlobBasicTest
[ RUN      ] DBBlobBasicTest.GetBlob
[       OK ] DBBlobBasicTest.GetBlob (12 ms)
[ RUN      ] DBBlobBasicTest.MultiGetBlobs
[       OK ] DBBlobBasicTest.MultiGetBlobs (11 ms)
[ RUN      ] DBBlobBasicTest.GetBlob_CorruptIndex
[       OK ] DBBlobBasicTest.GetBlob_CorruptIndex (10 ms)
[ RUN      ] DBBlobBasicTest.GetBlob_InlinedTTLIndex
[       OK ] DBBlobBasicTest.GetBlob_InlinedTTLIndex (12 ms)
[ RUN      ] DBBlobBasicTest.GetBlob_IndexWithInvalidFileNumber
[       OK ] DBBlobBasicTest.GetBlob_IndexWithInvalidFileNumber (9 ms)
[ RUN      ] DBBlobBasicTest.GenerateIOTracing
[       OK ] DBBlobBasicTest.GenerateIOTracing (11 ms)
[ RUN      ] DBBlobBasicTest.BestEffortsRecovery_MissingNewestBlobFile
[       OK ] DBBlobBasicTest.BestEffortsRecovery_MissingNewestBlobFile (13 ms)
[ RUN      ] DBBlobBasicTest.GetMergeBlobWithPut
[       OK ] DBBlobBasicTest.GetMergeBlobWithPut (11 ms)
[ RUN      ] DBBlobBasicTest.MultiGetMergeBlobWithPut
[       OK ] DBBlobBasicTest.MultiGetMergeBlobWithPut (14 ms)
[ RUN      ] DBBlobBasicTest.BlobDBProperties
[       OK ] DBBlobBasicTest.BlobDBProperties (21 ms)
[----------] 10 tests from DBBlobBasicTest (124 ms total)

[----------] 6 tests from DBBlobBasicTest/DBBlobBasicIOErrorTest
[ RUN      ] DBBlobBasicTest/DBBlobBasicIOErrorTest.GetBlob_IOError/0
[       OK ] DBBlobBasicTest/DBBlobBasicIOErrorTest.GetBlob_IOError/0 (12 ms)
[ RUN      ] DBBlobBasicTest/DBBlobBasicIOErrorTest.GetBlob_IOError/1
[       OK ] DBBlobBasicTest/DBBlobBasicIOErrorTest.GetBlob_IOError/1 (10 ms)
[ RUN      ] DBBlobBasicTest/DBBlobBasicIOErrorTest.MultiGetBlobs_IOError/0
[       OK ] DBBlobBasicTest/DBBlobBasicIOErrorTest.MultiGetBlobs_IOError/0 (10 ms)
[ RUN      ] DBBlobBasicTest/DBBlobBasicIOErrorTest.MultiGetBlobs_IOError/1
[       OK ] DBBlobBasicTest/DBBlobBasicIOErrorTest.MultiGetBlobs_IOError/1 (10 ms)
[ RUN      ] DBBlobBasicTest/DBBlobBasicIOErrorTest.CompactionFilterReadBlob_IOError/0
[       OK ] DBBlobBasicTest/DBBlobBasicIOErrorTest.CompactionFilterReadBlob_IOError/0 (1011 ms)
[ RUN      ] DBBlobBasicTest/DBBlobBasicIOErrorTest.CompactionFilterReadBlob_IOError/1
[       OK ] DBBlobBasicTest/DBBlobBasicIOErrorTest.CompactionFilterReadBlob_IOError/1 (1013 ms)
[----------] 6 tests from DBBlobBasicTest/DBBlobBasicIOErrorTest (2066 ms total)

[----------] Global test environment tear-down
[==========] 16 tests from 2 test cases ran. (2190 ms total)
[  PASSED  ] 16 tests.
```

Reviewed By: ltamasi

Differential Revision: D30690849

Pulled By: Zhiyi-Zhang

fbshipit-source-id: a7567319487ad76bd1a2e24bf143afdbbd9e4346

4 years agoAdd Milvus in USERS (#8822)
Cheng Chang [Wed, 8 Sep 2021 19:17:49 +0000 (12:17 -0700)]
Add Milvus in USERS (#8822)

Summary:
Milvus is a new database using RocksDB.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8822

Test Plan: N/A

Reviewed By: mrambacher

Differential Revision: D30802413

Pulled By: cheng-chang

fbshipit-source-id: 7c506f30688d4bb6b4cb8cddfc90e9414a397a53

4 years agoMake MemTableRepFactory into a Customizable class (#8419)
mrambacher [Wed, 8 Sep 2021 14:45:59 +0000 (07:45 -0700)]
Make MemTableRepFactory into a Customizable class (#8419)

Summary:
This PR does the following:
-> Makes the MemTableRepFactory into a Customizable class and creatable/configurable via CreateFromString
-> Makes the existing implementations compatible with configurations
-> Moves the "SpecialRepFactory" test class into testutil, accessible via the ObjectRegistry or a NewSpecial API

New tests were added to validate the functionality and all existing tests pass.  db_bench and memtablerep_bench were hand-tested to verify the functionality in those tools.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8419

Reviewed By: zhichao-cao

Differential Revision: D29558961

Pulled By: mrambacher

fbshipit-source-id: 81b7229636e4e649a0c914e73ac7b0f8454c931c

4 years agoFix POSIX LockFile after failure to create file (#8747)
Peter Dillinger [Wed, 8 Sep 2021 05:40:37 +0000 (22:40 -0700)]
Fix POSIX LockFile after failure to create file (#8747)

Summary:
Failure to create the lock file (e.g. out of space) could
prevent future LockFile attempts in the same process on the same file
from succeeding.

Also added DEBUG code to fail assertion if PosixFileLock is destroyed
without using UnlockFile (which is a risk because FileLock is in the
public API with virtual destructor).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8747

Test Plan: test added

Reviewed By: ajkr

Differential Revision: D30732543

Pulled By: pdillinger

fbshipit-source-id: 4c30a959566d91f778d6fad3fbbd5f3941b097c1

4 years agoAdd (& fix) some simple source code checks (#8821)
Peter Dillinger [Wed, 8 Sep 2021 04:18:21 +0000 (21:18 -0700)]
Add (& fix) some simple source code checks (#8821)

Summary:
* Don't hardcode namespace rocksdb (use ROCKSDB_NAMESPACE)
* Don't #include <rocksdb/...> (use double quotes)
* Support putting NOCOMMIT (any case) in source code that should not be
committed/pushed in current state.

These will be run with `make check` and in GitHub actions

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8821

Test Plan: existing tests, manually try out new checks

Reviewed By: zhichao-cao

Differential Revision: D30791726

Pulled By: pdillinger

fbshipit-source-id: 399c883f312be24d9e55c58951d4013e18429d92

4 years agoBytes read/written stats for `CreateNewBackup*()` (#8819)
Andrew Kryczka [Wed, 8 Sep 2021 01:23:58 +0000 (18:23 -0700)]
Bytes read/written stats for `CreateNewBackup*()` (#8819)

Summary:
Gets `Statistics` from the options associated with the `DB` undergoing backup, and populates new ticker stats with the thread-local `IOContext` read/write counters for the threads doing backup work.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8819

Reviewed By: pdillinger

Differential Revision: D30779238

Pulled By: ajkr

fbshipit-source-id: 75ccafc355f90906df5cf80367f7245b985772d8

4 years agoRemove asan_symbolize.py for internal asan build (#8737)
Jay Zhuang [Tue, 7 Sep 2021 22:38:04 +0000 (15:38 -0700)]
Remove asan_symbolize.py for internal asan build (#8737)

Summary:
asan_symbolize.py is not compatible with python3. Also make it
consistent with public CI, which doesn't use asan_symbolize.py
And update coverage_test.sh to use python3.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8737

Test Plan: CI

Reviewed By: pdillinger

Differential Revision: D30702430

Pulled By: jay-zhuang

fbshipit-source-id: ef09947b1232294d31b09a855c2f0ce149097dd9

4 years agoBytes read stat for `VerifyChecksum()` and `VerifyFileChecksums()` APIs (#8741)
Andrew Kryczka [Tue, 7 Sep 2021 20:25:24 +0000 (13:25 -0700)]
Bytes read stat for `VerifyChecksum()` and `VerifyFileChecksums()` APIs (#8741)

Summary:
- Clarified some comments on compatibility for adding new ticker stats
- Added read I/O stats for `VerifyChecksum()` and `VerifyFileChecksums()` APIs

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8741

Test Plan: new unit test

Reviewed By: zhichao-cao

Differential Revision: D30708578

Pulled By: ajkr

fbshipit-source-id: d06b961f7e199ae92c266b683e39870aa8f63449

4 years agoImprove support for using regexes (#8740)
Peter Dillinger [Tue, 7 Sep 2021 20:04:07 +0000 (13:04 -0700)]
Improve support for using regexes (#8740)

Summary:
* Consolidate use of std::regex for testing to testharness.cc, to
minimize Facebook linters constantly flagging uses in non-production
code.
* Improve syntax and error messages for asserting some string matches a
regex in tests.
* Add a public Regex wrapper class to encapsulate existing usage in
ObjectRegistry.
* Remove unnecessary include <regex>
* Put warnings that use of Regex in production code could cause bad
performance or stack overflow.

Intended follow-up work:
* Replace std::regex with another underlying implementation like RE2
* Improve ObjectRegistry interface in terms of possibly confusing literal
string matching vs. regex and in terms of reporting invalid regex.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8740

Test Plan:
tests updated, basic unit test for public Regex, and some manual
testing of temporary changes to see example error messages:

utilities/backupable/backupable_db_test.cc:917: Failure
000010_1162373755_138626.blob (child.name)
does not match regex
[0-9]+_[0-9]+_[0-9]+[.]blobHAHAHA (pattern)

db/db_basic_test.cc:74: Failure
R3SHSBA8C4U0CIMV2ZB0 (sid3)
does not match regex [0-9A-Z]{20}HAHAHA

Reviewed By: mrambacher

Differential Revision: D30706246

Pulled By: pdillinger

fbshipit-source-id: ba845e8f563ccad39bdb58f44f04e9da8f78c3fd

4 years agoReplace most typedef with using= (#8751)
Peter Dillinger [Tue, 7 Sep 2021 18:31:12 +0000 (11:31 -0700)]
Replace most typedef with using= (#8751)

Summary:
Old typedef syntax is confusing

Most but not all changes with

    perl -pi -e 's/typedef (.*) ([a-zA-Z0-9_]+);/using $2 = $1;/g' list_of_files
    make format

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8751

Test Plan: existing

Reviewed By: zhichao-cao

Differential Revision: D30745277

Pulled By: pdillinger

fbshipit-source-id: 6f65f0631c3563382d43347896020413cc2366d9

4 years agoSupport custom env in db_blob_{basic,compaction,corruption,index}_test (#8817)
Levi Tamasi [Tue, 7 Sep 2021 18:12:53 +0000 (11:12 -0700)]
Support custom env in db_blob_{basic,compaction,corruption,index}_test (#8817)

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8817

Test Plan: Ran `make check` and built/tested using internal custom environment.

Reviewed By: riversand963

Differential Revision: D30768215

Pulled By: ltamasi

fbshipit-source-id: cce96211d4c097612d20247f2e997358f40cc3d3

4 years agoFix a minor regression script issue (#8755)
Jay Zhuang [Sun, 5 Sep 2021 00:34:01 +0000 (17:34 -0700)]
Fix a minor regression script issue (#8755)

Summary:
The system default `time` doesn't support option -v

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8755

Test Plan: CI: https://www.internalfb.com/intern/sandcastle/job/13510799359724405

Reviewed By: ltamasi

Differential Revision: D30757119

Pulled By: jay-zhuang

fbshipit-source-id: 093e5084f3b7cc71f6795b1062f48d4e77ed4518

4 years agoUpdate branch name in rocksdb-lego-determinator (#8754)
Levi Tamasi [Sat, 4 Sep 2021 04:12:06 +0000 (21:12 -0700)]
Update branch name in rocksdb-lego-determinator (#8754)

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8754

Reviewed By: jay-zhuang

Differential Revision: D30754986

Pulled By: ltamasi

fbshipit-source-id: 4c879a8deaaae07a921a35ff5ed1939d4262f982

4 years agoFix regression test script (#8753)
Jay Zhuang [Sat, 4 Sep 2021 02:03:56 +0000 (19:03 -0700)]
Fix regression test script (#8753)

Summary:
Regression test is broken and not running:
1. failed test is not reporting, fix it by add `set -e`
2. internal regression test is not run inside github, removing that
3. fix a few minor issues to pass the test
4. delete unused binary size build, and regression test is reporting binary size now.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8753

Test Plan: CI: https://www.internalfb.com/intern/sandcastle/job/13510799359573861

Reviewed By: ltamasi

Differential Revision: D30754380

Pulled By: jay-zhuang

fbshipit-source-id: 0cfa008327fff31bc61118a3fe642924090d28e1

4 years agoRe-sync with internal repository (#8748)
Facebook Community Bot [Thu, 2 Sep 2021 17:44:17 +0000 (13:44 -0400)]
Re-sync with internal repository (#8748)

Co-authored-by: Facebook Community Bot <6422482+facebook-github-bot@users.noreply.github.com>
4 years agoUpdate branch as "main" in tools/advisor/README.md (#8744)
Akanksha Mahajan [Thu, 2 Sep 2021 02:38:08 +0000 (19:38 -0700)]
Update branch as "main" in tools/advisor/README.md (#8744)

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8744

Reviewed By: ltamasi

Differential Revision: D30716145

Pulled By: akankshamahajan15

fbshipit-source-id: c2fcaf9ddcae85a86c0f10496acab28cd795ff12

4 years agoUpdate branch name in WINDOWS_PORT.md (#8745)
Levi Tamasi [Thu, 2 Sep 2021 02:04:13 +0000 (19:04 -0700)]
Update branch name in WINDOWS_PORT.md (#8745)

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8745

Reviewed By: jay-zhuang

Differential Revision: D30718273

Pulled By: ltamasi

fbshipit-source-id: f0f4d592a71a327e731a5eef0f01488074d99118

4 years agoFix an invalid UTF-8 character in WINDOWS_PORT.md
Levi Tamasi [Wed, 1 Sep 2021 23:35:51 +0000 (16:35 -0700)]
Fix an invalid UTF-8 character in WINDOWS_PORT.md

Reviewed By: ajkr, riversand963

Differential Revision: D30713977

fbshipit-source-id: b46a9a860d32e8fa0cb2b980b9b33d5148f9715f

4 years agoUpdate branch name to main in docs/* (#8743)
Akanksha Mahajan [Wed, 1 Sep 2021 22:56:09 +0000 (15:56 -0700)]
Update branch name to main in docs/* (#8743)

Summary:
Update branch name from master to main in docs/*

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8743

Reviewed By: ltamasi

Differential Revision: D30712263

Pulled By: akankshamahajan15

fbshipit-source-id: a1a5e20d95210e792705030f98dd2b38ca542eb5

4 years agoUpdate branch name to "main" in README/LANGUAGE_BINDINGS (#8727)
Levi Tamasi [Wed, 1 Sep 2021 22:07:15 +0000 (15:07 -0700)]
Update branch name to "main" in README/LANGUAGE_BINDINGS (#8727)

Summary:
While we're at it, also update the links to Travis (.org to .com).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8727

Reviewed By: jay-zhuang

Differential Revision: D30675223

Pulled By: ltamasi

fbshipit-source-id: ce4b04a72e56ad35b6bddb982cca41fa8ceacf84

4 years agoFix fbcode linker error with `make shared_lib` (#8742)
Peter Dillinger [Wed, 1 Sep 2021 21:32:56 +0000 (14:32 -0700)]
Fix fbcode linker error with `make shared_lib` (#8742)

Summary:
Need proper linker path for linking shared library

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8742

Test Plan: `make shared_lib` on Facebook machine

Reviewed By: jay-zhuang

Differential Revision: D30709012

Pulled By: pdillinger

fbshipit-source-id: 6d17e281204c359aced85e18212e59758cec3313

4 years agoRemove some unneeded code (#8736)
Peter Dillinger [Wed, 1 Sep 2021 21:14:50 +0000 (14:14 -0700)]
Remove some unneeded code (#8736)

Summary:
* FullKey and ParseFullKey appear to serve no purpose in the public API
(or anything else) so removed. Only use in one test updated.
* NumberToString serves no purpose vs. ToString so removed, numerous
calls updated
* Remove unnecessary forward declarations in metadata.h by re-arranging
class definitions.
* Remove some unneeded semicolons

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8736

Test Plan: existing tests

Reviewed By: mrambacher

Differential Revision: D30700039

Pulled By: pdillinger

fbshipit-source-id: 1e436a576f511a6ed8b4d97af7cc8216bc729af2

4 years agoFix a buffer size race condition in BackupEngine (#8732)
Peter Dillinger [Wed, 1 Sep 2021 21:10:36 +0000 (14:10 -0700)]
Fix a buffer size race condition in BackupEngine (#8732)

Summary:
If RateLimiter burst bytes changes during concurrent Restore
operations

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8732

Test Plan: updated unit test fails with TSAN before change, passes after

Reviewed By: ajkr

Differential Revision: D30683879

Pulled By: pdillinger

fbshipit-source-id: d0ddb3587ade91ee2a4d926b475acf7781b03086

4 years agoUpdate branch name to main in env_librados.md (#8738)
Akanksha Mahajan [Wed, 1 Sep 2021 19:44:55 +0000 (12:44 -0700)]
Update branch name to main in env_librados.md (#8738)

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8738

Reviewed By: ltamasi

Differential Revision: D30705691

Pulled By: akankshamahajan15

fbshipit-source-id: 44ac8e1c906b1d1d31e9017a700aab5eefe94253

4 years agoCreate CNAME
Joel Marcey [Wed, 1 Sep 2021 20:28:13 +0000 (13:28 -0700)]
Create CNAME

4 years agoUpdate branch name to "main" in CircleCI config (#8726)
Levi Tamasi [Wed, 1 Sep 2021 18:56:37 +0000 (11:56 -0700)]
Update branch name to "main" in CircleCI config (#8726)

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8726

Reviewed By: jay-zhuang

Differential Revision: D30675182

Pulled By: ltamasi

fbshipit-source-id: d5746931f6d942ed3a9d21325335cfc9e111a7f3

4 years agoMake format-diff.sh branch name agnostic (#8731)
Levi Tamasi [Wed, 1 Sep 2021 18:56:33 +0000 (11:56 -0700)]
Make format-diff.sh branch name agnostic (#8731)

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8731

Reviewed By: jay-zhuang

Differential Revision: D30678124

Pulled By: ltamasi

fbshipit-source-id: 0131b6707f0c5d1d887bcd45781623143b5ccae0

4 years agobuild with platform assembler (#8733)
Jay Zhuang [Wed, 1 Sep 2021 17:05:57 +0000 (10:05 -0700)]
build with platform assembler (#8733)

Summary:
Required for platform009, which is incompat with the centos
assembler.
author: pbrady@fb.com D29099768

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8733

Test Plan: CI

Reviewed By: pixelb

Differential Revision: D30687156

Pulled By: jay-zhuang

fbshipit-source-id: 796f90842cbf0ca11bad07e7d654bce1fafc4ba0

4 years agoImplement superior user & mid IO priority level in GenericRateLimiter (#8595)
Hui Xiao [Tue, 31 Aug 2021 17:59:14 +0000 (10:59 -0700)]
Implement superior user & mid IO priority level in GenericRateLimiter (#8595)

Summary:
Context:
An extra IO_USER priority in rate limiter allows users to optionally charge WAL writes / SST reads to rate limiter at this priority level, which then has higher priority than IO_HIGH and IO_LOW. With an extra IO_USER priority, it allows users to better specify the relative urgency/importance among different requests in rate limiter. As a consequence, IO resource management can better prioritize and limit resource based on user's need.

The IO_USER is implemented as superior priority in GenericRateLimiter, in the sense that its request queue will always be iterated first without being constrained to fairness. The reason is that the notion of fairness is only meaningful in helping lower priorities in background IO (i.e, IO_HIGH/MID/LOW) to gain some fair chance to run so that it does not block foreground IO (i.e, the ones that are charged at the level of IO_USER). As we can see, the ultimate goal here is to not blocking foreground IO at IO_USER level, which justifies the superiority of IO_USER.

Similar benefits exist for IO_MID priority.
- Rewrote the logic of deciding the order of iterating request queues of high/low priorities to include the extra user/mid priority w/o affecting the existing behavior (see PR's [comment](https://github.com/facebook/rocksdb/pull/8595/files#r678749331))
- Included the request queue of user-pri/mid-pri in the code path of next-leader-candidate signaling and GenericRateLimiter's destructor
- Included the extra user/mid-pri in bookkeeping data structures: total_bytes_through_ and total_requests_
- Re-written the previous impl of explicitly iterating priorities with a loop from Env::IO_LOW to Env::IO_TOTAL

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8595

Test Plan:
- passed existing rate_limiter_test.cc
- passed added unit tests in rate_limiter_test.cc
- run performance test to verify performance with only high/low requests is not affected by this change
   - Set-up command:
   `TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=fillrandom --duration=5 --compression_type=none --num=100000000 --disable_auto_compactions=true --write_buffer_size=1048576 --writable_file_max_buffer_size=65536 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --level0_slowdown_writes_trigger=$(((1 << 31) - 1)) --level0_stop_writes_trigger=$(((1 << 31) - 1))`

    - Test command:
   `TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=overwrite --use_existing_db=true --disable_wal=true --duration=30 --compression_type=none --num=100000000 --write_buffer_size=1048576 --writable_file_max_buffer_size=65536 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --level0_slowdown_writes_trigger=$(((1 << 31) - 1)) --level0_stop_writes_trigger=$(((1 << 31) - 1)) --statistics=true --rate_limiter_bytes_per_sec=1048576 --rate_limiter_refill_period_us=1000  --threads=32 |& grep -E '(flush|compact)\.write\.bytes'`

   - Before (on branch upstream/master):
   `rocksdb.compact.write.bytes COUNT : 4014162`
   `rocksdb.flush.write.bytes COUNT : 26715832`
    rocksdb.flush.write.bytes/rocksdb.compact.write.bytes ~= 6.66

   - After (on branch rate_limiter_user_pri):
  `rocksdb.compact.write.bytes COUNT : 3807822`
  `rocksdb.flush.write.bytes COUNT : 26098659`
   rocksdb.flush.write.bytes/rocksdb.compact.write.bytes ~= 6.85

Reviewed By: ajkr

Differential Revision: D30577783

Pulled By: hx235

fbshipit-source-id: 0881f2705ffd13ecd331256bde7e8ec874a353f4

4 years agoReplace `std::shared_ptr<SystemClock>` by `SystemClock*` in `TraceExecutionHandler...
Qizhong Mao [Tue, 31 Aug 2021 17:56:08 +0000 (10:56 -0700)]
Replace `std::shared_ptr<SystemClock>` by `SystemClock*` in `TraceExecutionHandler` (#8729)

Summary:
All/most trace related APIs directly use `SystemClock*` (https://github.com/facebook/rocksdb/pull/8033). Do the same in `TraceExecutionHandler`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8729

Test Plan: None

Reviewed By: zhichao-cao

Differential Revision: D30672159

Pulled By: autopear

fbshipit-source-id: 017db4912c6ac1cfede842b8b122cf569a394f25

4 years agoFix a race in LRUCacheShard::Promote (#8717)
anand76 [Tue, 31 Aug 2021 02:09:43 +0000 (19:09 -0700)]
Fix a race in LRUCacheShard::Promote (#8717)

Summary:
In ```LRUCacheShard::Promote```, a reference is released outside the LRU mutex. Fix the race condition.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8717

Reviewed By: zhichao-cao

Differential Revision: D30649206

Pulled By: anand1976

fbshipit-source-id: 09c0af05b2294a7fe2c02876a61b0bad6e3ada61

4 years agoBuilt-in support for generating unique IDs, bug fix (#8708)
Peter Dillinger [Mon, 30 Aug 2021 22:19:39 +0000 (15:19 -0700)]
Built-in support for generating unique IDs, bug fix (#8708)

Summary:
Env::GenerateUniqueId() works fine on Windows and on POSIX
where /proc/sys/kernel/random/uuid exists. Our other implementation is
flawed and easily produces collision in a new multi-threaded test.
As we rely more heavily on DB session ID uniqueness, this becomes a
serious issue.

This change combines several individually suitable entropy sources
for reliable generation of random unique IDs, with goal of uniqueness
and portability, not cryptographic strength nor maximum speed.

Specifically:
* Moves code for getting UUIDs from the OS to port::GenerateRfcUuid
rather than in Env implementation details. Callers are now told whether
the operation fails or succeeds.
* Adds an internal API GenerateRawUniqueId for generating high-quality
128-bit unique identifiers, by combining entropy from three "tracks":
  * Lots of info from default Env like time, process id, and hostname.
  * std::random_device
  * port::GenerateRfcUuid (when working)
* Built-in implementations of Env::GenerateUniqueId() will now always
produce an RFC 4122 UUID string, either from platform-specific API or
by converting the output of GenerateRawUniqueId.

DB session IDs now use GenerateRawUniqueId while DB IDs (not as
critical) try to use port::GenerateRfcUuid but fall back on
GenerateRawUniqueId with conversion to an RFC 4122 UUID.

GenerateRawUniqueId is declared and defined under env/ rather than util/
or even port/ because of the Env dependency.

Likely follow-up: enhance GenerateRawUniqueId to be faster after the
first call and to guarantee uniqueness within the lifetime of a single
process (imparting the same property onto DB session IDs).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8708

Test Plan:
A new mini-stress test in env_test checks the various public
and internal APIs for uniqueness, including each track of
GenerateRawUniqueId individually. We can't hope to verify anywhere close
to 128 bits of entropy, but it can at least detect flaws as bad as the
old code. Serial execution of the new tests takes about 350 ms on
my machine.

Reviewed By: zhichao-cao, mrambacher

Differential Revision: D30563780

Pulled By: pdillinger

fbshipit-source-id: de4c9ff4b2f581cf784fcedb5f39f16e5185c364

4 years agoUpdate comments, fix typos. (#8721)
Merlin Mao [Fri, 27 Aug 2021 20:15:25 +0000 (13:15 -0700)]
Update comments, fix typos. (#8721)

Summary:
- Removed the default empty constructors of `TraceWriter` and `TraceReader`.
- Removed unused `ReadFooter()` from `ReplayerImpl`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8721

Test Plan: None

Reviewed By: zhichao-cao

Differential Revision: D30609743

Pulled By: autopear

fbshipit-source-id: 7e2626b015bd57ebb408a2836b4b4217cea10002

4 years agoRefactor with VersionBuilder (#8706)
Zaorang Yang [Fri, 27 Aug 2021 19:10:01 +0000 (12:10 -0700)]
Refactor with VersionBuilder (#8706)

Summary:
Introduce a new function to save sst files.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8706

Reviewed By: jay-zhuang

Differential Revision: D30544242

Pulled By: riversand963

fbshipit-source-id: 554755852daff7ae1c7864b0029f51b27099ee09

4 years agoFix typo in the comment of log_empty_ (#8711)
James Yin [Fri, 27 Aug 2021 19:09:13 +0000 (12:09 -0700)]
Fix typo in the comment of log_empty_ (#8711)

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8711

Reviewed By: riversand963

Differential Revision: D30566761

Pulled By: jay-zhuang

fbshipit-source-id: dd4690f5e2af2d263ed75ea1b9ed24692fe81362

4 years agoAdd 6.24 to the format compatibility checker (#8716)
Levi Tamasi [Thu, 26 Aug 2021 23:33:55 +0000 (16:33 -0700)]
Add 6.24 to the format compatibility checker (#8716)

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8716

Reviewed By: jay-zhuang

Differential Revision: D30587722

Pulled By: ltamasi

fbshipit-source-id: d2f5b08084778779c5a8b85635977babd8194d5c

4 years agoFix a race condition in DumpStats() during iteration of the ColumnFamilySet (#8714)
anand76 [Thu, 26 Aug 2021 22:39:32 +0000 (15:39 -0700)]
Fix a race condition in DumpStats() during iteration of the ColumnFamilySet (#8714)

Summary:
DumpStats() iterates through the ColumnFamilySet. There is a potential
race condition because it does Ref the cfd, and the cfd could get
destroyed during the iteration.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8714

Test Plan: make check

Reviewed By: ltamasi

Differential Revision: D30580199

Pulled By: anand1976

fbshipit-source-id: 60a3443ad0d4f7ac6a977dec780e6d2c1b70b850

4 years agoDeflake test `CompactionJobTest.InputSerialization` (#8712)
Jay Zhuang [Thu, 26 Aug 2021 16:26:41 +0000 (09:26 -0700)]
Deflake test `CompactionJobTest.InputSerialization` (#8712)

Summary:
It's invalid to have an empty file name.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8712

Test Plan:
```
$ gtest-parallel ./compaction_job_test --gtest_filter=CompactionJobTest.InputSerialization -r 10000
```

Reviewed By: pdillinger

Differential Revision: D30566739

Pulled By: jay-zhuang

fbshipit-source-id: 41e73175e3c95c4b73b4fdcd33470788d4e29d37

4 years agoMake Configurable/Customizable options copyable (#8704)
mrambacher [Thu, 26 Aug 2021 00:46:31 +0000 (17:46 -0700)]
Make Configurable/Customizable options copyable (#8704)

Summary:
The atomic variable "is_prepared_" was keeping Configurable objects from being copy-constructed.  Removed the atomic to allow copies.

Since the variable is only changed from false to true (and never back), there is no reason it had to be atomic.

Added tests that simple Configurable and Customizable objects can be put on the stack and copied.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8704

Reviewed By: anand1976

Differential Revision: D30530526

Pulled By: ltamasi

fbshipit-source-id: 4dd4439b3e5ad7fa396573d0b25d9fb709160576

4 years agoFix legocastle Python commands for CentOS 8 (#8701)
Andrew Kryczka [Wed, 25 Aug 2021 23:52:50 +0000 (16:52 -0700)]
Fix legocastle Python commands for CentOS 8 (#8701)

Summary:
There is no longer an unversioned `python` command that refers to Python
3; the recommended alternative is `/usr/bin/env python3`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8701

Test Plan: - [internal link] https://www.internalfb.com/intern/sandcastle/group/nonce/5100000000000001/

Reviewed By: riversand963

Differential Revision: D30520380

Pulled By: ajkr

fbshipit-source-id: 2af459a64a15fb2a011e98b156f31d322f6b2d25

4 years agoTemporarily disable block-based filter when stress testing timestamp (#8703)
Yanqin Jin [Wed, 25 Aug 2021 02:03:58 +0000 (19:03 -0700)]
Temporarily disable block-based filter when stress testing timestamp (#8703)

Summary:
Current implementation does not support user-defined timestamp when
block-based filter is used. Will implement the support in the future, or
wait to see if block-based filter can be deprecated and removed.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8703

Test Plan: make whitebox_crash_test_with_ts

Reviewed By: pdillinger

Differential Revision: D30528931

Pulled By: riversand963

fbshipit-source-id: 60dd74ee0a6194e69072069d8c4bd876f249f38d

4 years agoFix a bug of secondary instance sequence going backward (#8653)
Yanqin Jin [Wed, 25 Aug 2021 01:17:32 +0000 (18:17 -0700)]
Fix a bug of secondary instance sequence going backward (#8653)

Summary:
Recent refactor of `ReactiveVersionSet::ReadAndApply()` uses
`ManifestTailer` whose `Iterate()` method can cause the db's
`last_sequence_` to go backward. Consequently, read requests can see
out-dated data. For example, latest changes to the primary will not be
seen on the secondary even after a `TryCatchUpWithPrimary()` if no new
write batches are read from the WALs and no new MANIFEST entries are
read from the MANIFEST.

Fix the bug so that `VersionEditHandler::CheckIterationResult` will
never decrease `last_sequence_`, `last_allocated_sequence_` and
`last_published_sequence_`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8653

Test Plan: make check

Reviewed By: jay-zhuang

Differential Revision: D30272084

Pulled By: riversand963

fbshipit-source-id: c6a49c534b2509b93ef62d8936ed0acd5b860eaa

4 years agoSimplify `TraceAnalyzer` (#8697)
Merlin Mao [Wed, 25 Aug 2021 01:17:31 +0000 (18:17 -0700)]
Simplify `TraceAnalyzer` (#8697)

Summary:
Handler functions now use a common output function to output to stdout/files.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8697

Test Plan: `trace_analyzer_test` can pass.

Reviewed By: zhichao-cao

Differential Revision: D30527696

Pulled By: autopear

fbshipit-source-id: c626cf4d53a39665a9c4bcf0cb019c448434abe4

4 years agoAdd port::GetProcessID() (#8693)
Peter Dillinger [Wed, 25 Aug 2021 00:45:01 +0000 (17:45 -0700)]
Add port::GetProcessID() (#8693)

Summary:
Useful in some places for object uniqueness across processes.
Currently used for generating a host-wide identifier of Cache objects
but expected to be used soon in some unique id generation code.

`int64_t` is chosen for return type because POSIX uses signed integer type,
usually `int`, for `pid_t` and Windows uses `DWORD`, which is `uint32_t`.

Future work: avoid copy-pasted declarations in port_*.h, perhaps with
port_common.h always included from port.h

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8693

Test Plan: manual for now

Reviewed By: ajkr, anand1976

Differential Revision: D30492876

Pulled By: pdillinger

fbshipit-source-id: 39fc2788623cc9f4787866bdb67a4d183dde7eef

4 years agoAllow iterate refresh for secondary instance (#8700)
Yanqin Jin [Tue, 24 Aug 2021 22:39:31 +0000 (15:39 -0700)]
Allow iterate refresh for secondary instance (#8700)

Summary:
Test plan
make check

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8700

Reviewed By: zhichao-cao

Differential Revision: D30523907

Pulled By: riversand963

fbshipit-source-id: 68928ab4dafb64ce80ab7bc69d83727a4713ab91

4 years agoRefactor WriteBufferManager::CacheRep into CacheReservationManager (#8506)
Hui Xiao [Tue, 24 Aug 2021 19:42:31 +0000 (12:42 -0700)]
Refactor WriteBufferManager::CacheRep into CacheReservationManager (#8506)

Summary:
Context:
To help cap various memory usage by a single limit of the block cache capacity, we charge the memory usage through inserting/releasing dummy entries in the block cache. CacheReservationManager is such a class (non thread-safe) responsible for  inserting/removing dummy entries to reserve cache space for memory used by the class user.

- Refactored the inner private class CacheRep of WriteBufferManager into public CacheReservationManager class for reusability such as for https://github.com/facebook/rocksdb/pull/8428

- Encapsulated implementation details of cache key generation and dummy entries insertion/release in cache reservation as discussed in https://github.com/facebook/rocksdb/pull/8506#discussion_r666550838

- Consolidated increase/decrease cache reservation into one API - UpdateCacheReservation.

- Adjusted the previous dummy entry release algorithm in decreasing cache reservation to be loop-releasing dummy entries to stay symmetric to dummy entry insertion algorithm

- Made the previous dummy entry release algorithm in delayed decrease mode more aggressive for better decreasing cache reservation when memory used is less likely to increase back.

  Previously, the algorithms only release 1 dummy entries when new_mem_used < 3/4 * cache_allocated_size_ and cache_allocated_size_ - kSizeDummyEntry > new_mem_used.
Now, the algorithms loop-releases as many dummy entries as possible when new_mem_used < 3/4 * cache_allocated_size_.

- Updated WriteBufferManager's test cases to adapt to changes on the release algorithm mentioned above and left comment for some test cases for clarity

- Replaced the previous cache key prefix generation (utilizing object address related to the cache client) with one that utilizes Cache->NewID() to prevent cache-key collision among dummy entry clients sharing the same cache.

  The specific collision we are preventing happens when the object address is reused for a new cache-key prefix while the old cache-key using that same object address in its prefix still exists in the cache. This could happen due to that, under LRU cache policy, there is a possible delay in releasing a cache entry after the cache client object owning that cache entry get deallocated. In this case, the object address related to the cache client object can get reused for other client object to generate a new cache-key prefix.

  This prefix generation can be made obsolete after Peter's unification of all the code generating cache key, mentioned in https://github.com/facebook/rocksdb/pull/8506#discussion_r667265255

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8506

Test Plan:
- Passing the added unit tests cache_reservation_manager_test.cc
- Passing existing and adjusted write_buffer_manager_test.cc

Reviewed By: ajkr

Differential Revision: D29644135

Pulled By: hx235

fbshipit-source-id: 0fc93fbfe4a40bb41be85c314f8f2bafa8b741f7

4 years agoDeflake write-prepared and write-unprepared tests (#8696)
Andrew Kryczka [Tue, 24 Aug 2021 06:08:25 +0000 (23:08 -0700)]
Deflake write-prepared and write-unprepared tests (#8696)

Summary:
The `JobContext::job_snapshot` referenced DB state but could
have been deleted by a BG thread after the signal/unlock allowing
shutdown to proceed. Then we would see an error like this (valgrind):

```
==354104== Thread 2:
==354104== Invalid read of size 8
==354104==    at 0x694C4D: rocksdb::ManagedSnapshot::~ManagedSnapshot() (snapshot_impl.cc:20)
==354104==    by 0x58F5BA: operator() (unique_ptr.h:81)
==354104==    by 0x58F5BA: operator() (unique_ptr.h:75)
==354104==    by 0x58F5BA: ~unique_ptr (unique_ptr.h:292)
==354104==    by 0x58F5BA: rocksdb::JobContext::~JobContext() (job_context.h:221)
==354104==    by 0x5F155E: rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority) (db_impl_compaction_flush.cc:2696)
==354104==    by 0x5F1BC2: rocksdb::DBImpl::BGWorkCompaction(void*) (db_impl_compaction_flush.cc:2468)
==354104==    by 0x83707A: operator() (std_function.h:688)
==354104==    by 0x83707A: rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long) (threadpool_imp.cc:266)
==354104==    by 0x8373ED: rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*) (threadpool_imp.cc:307)
==354104==    by 0x492A800: execute_native_thread_routine (in /usr/local/fbcode/platform009/lib/libstdc++.so.6.0.28)
==354104==    by 0x4A5020B: start_thread (in /usr/local/fbcode/platform009/lib/libpthread-2.30.so)
==354104==    by 0x4CF281E: clone (in /usr/local/fbcode/platform009/lib/libc-2.30.so)
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8696

Test Plan: unable to repro

Reviewed By: pdillinger

Differential Revision: D30505277

Pulled By: ajkr

fbshipit-source-id: 5a99f34137cd14d06b0f624add6d37a70a61135d

4 years agoRefactor TraceAnalyzer to use `TraceRecord::Handler` to avoid casting. (#8678)
Merlin Mao [Tue, 24 Aug 2021 00:17:13 +0000 (17:17 -0700)]
Refactor TraceAnalyzer to use `TraceRecord::Handler` to avoid casting. (#8678)

Summary:
`TraceAnalyzer` privately inherits `TraceRecord::Handler` and `WriteBatch::Handler`.

`trace_analyzer_test` can pass with this change.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8678

Reviewed By: zhichao-cao

Differential Revision: D30459814

Pulled By: autopear

fbshipit-source-id: a27f59ac4600f7c3682830c9b1d9dc79e53425be

4 years agoAdd extra information to RemoteCompaction APIs (#8680)
Jay Zhuang [Mon, 23 Aug 2021 23:26:13 +0000 (16:26 -0700)]
Add extra information to RemoteCompaction APIs (#8680)

Summary:
Currently, we only provide job_id in RemoteCompaction APIs, the
main problem of `job_id` is it cannot uniquely identify a compaction job
between DB instances or between sessions.
Providing DB and session id to the user, which will make building cross
DB compaction service easier.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8680

Test Plan: unittest

Reviewed By: ajkr

Differential Revision: D30444859

Pulled By: jay-zhuang

fbshipit-source-id: fdf107f4286564049637f154193c6d94c3c59448