README

   1 About this Fork:
   2
   3 This is a fork of the FFSB. The original source licenced under GPL2 is available
   4 under http://sourceforge.net/projects/ffsb/ . This fork includes some bugfixes
   5 and some new features. As the original project seems dead, a fork was necessary
   6 to publish the changes.
   7
   8 Introduction:
   9
  10 The Flexible Filesystem Benchmark (FFSB) is a filesystem performance
  11 measurement tool.  It is a multi-threaded application (using
  12 pthreads), written entirely in C with cross-platform portability in
  13 mind.  It differs from other filesystem benchmarks in that the user
  14 may supply a profile to create custom workloads, while most other
  15 filesystem benchmarks use a fixed set of workloads.
  16
  17 As of version 5.1, it supports seven different basic operations, support
  18 for multiple groups of threads with different operation mixtures,
  19 support for operation across multiple filesystems, and support for
  20 filesystem aging prior to benchmarking.
  21
  22
  23 Differences from version 4.0 and older:
  24
  25 Version 5.0 and above represent almost a total re-write and many
  26 things have changed.  In version 5.0 and above FFSB moved to a
  27 time-regulated run versus doing a set number of different operations
  28 and timing the whole thing.  This is primarily to better deal with the
  29 use of multiple threadgroups which would otherwise not be synchronized
  30 at termination time.
  31
  32 Additionally, the FFSB configuration file format has changed in
  33 version 5.0, although we do support old-style configuration files
  34 along with a run-time passed on the command line.  In this mode,
  35 version 5.0 and above ignores the iterations parameter, and simply
  36 uses the time specified on the command line.
  37
  38 Behaviorally, most of the old operations are the same -- sequential
  39 reads and sequential writes work as they did before.  One change in
  40 version 5.0 is the skip-read behavior of reading then seeking forward
  41 a fixed amount then reading again is removed, we now support fully
  42 randomized reads and writes from random offsets within the file.
  43
  44 Version 4.0 didn't support overwrites (only appends) so we interpret
  45 writes in old config files to be append operations.
  46
  47 On Linux, CPU utilization information will only be accurate for
  48 systems using NPTL, older Linuxthreads systems will probably only see
  49 zeros for CPU utilization because Linuxthreads is non-compliant to
  50 POSIX. Version 4.0 and older could be recompiled to work on
  51 Linuxthreads, but in 5.0 and later we no longer support this.
  52
  53 We no longer support the "outputfile" on the command line.
  54
  55 One should simply use tee or similar to capture the output.  FFSB
  56 unbuffers standard out for this purpose, and errors are sent on
  57 standard error.
  58
  59 Global options:
  60
  61 There are eight valid global options placed at the beginning of the
  62 profile.  Three of them are required: num_filesystems (number of
  63 filesystems), num_threadgroups (number of threadgroups), and time
  64 (running time of the benchmark).  The other five options are:
  65
  66 directio   - each call to open will be made using O_DIRECT
  67 alignio    - aligns all block operations for random reads and writes
  68              on 4k boundaries.
  69 bufferedio - currently ignorred: it is intended to use libc
  70              fread,rwrite, instead of just unix read and write calls
  71 verbose    - currently ignored
  72
  73 callout    - calls and external command and waits for its termination
  74              before FFSB begins the benchmark phase.
  75              This is useful for synchronizing distributed clients,
  76              starting profilers, etc.
  77
  78 They must be specified in the above order (num_filesystems,
  79 num_threadgroups, time, directio, alignio, bufferedio, verbose,
  80 callout).
  81
  82
  83
  84 Filesystems:
  85
  86 Filesystems are specified to FFSB in the form of a directory.  FFSB
  87 assumes that the filesystem is mounted at this directory and will not
  88 do any verification of this fact beyond ensuring it can read/write to
  89 the location.  So be careful to ensure something with enough space to
  90 handle the dataset is in fact mounted at the specified location.
  91
  92 In the filesystem clause of the profile, one may set the starting
  93 number of files and directories as well as a minimum and maximum
  94 filesize for the filesystem.  One may also specify the blocksize
  95 used for creating the files separately in the filesystem clause.
  96
  97 Also, if a filesystem is to be aged, a special threadgroup clause may
  98 be embedded in a filesystem clause to specify the operation mixture
  99 and number of threads used to age the filesystem.  This threadgroup is
 100 run until filesystem utilization reaches the specified amount.
 101
 102 Inheritance --  if you are using multiple filesystems, all attributes
 103 except the location should be inherited from the previous filesystem.
 104 This is done to make it easier to add groups of similar filesystems.
 105 In this case, only the location is required in the filesystem clause.
 106
 107 As of version 5.1, filesystem re-use is supported if a given
 108 filesystem hasn't been modified beyond it's orginal specifications
 109 (number of files and directories is correct, and file sizes are within
 110 specifications).  This can be a huge time saver if one wishes to do
 111 multiple runs on the same data-set without altering it during a run,
 112 because the fileset doesn't need to be recreated before each run.
 113
 114 To do this, specify "reuse=1" in the filesystem clause, and FFSB will
 115 verify the fileset first, and if it checks out it will use it.
 116 Otherwise, it will remove everything and re-create the filesets for
 117 that filesystem.
 118
 119 Threadgroups:
 120
 121 An arbitrary number of threadgroups with differing numbers of threads
 122 and operation mixes can be specified.  The operations are specified
 123 using a weighting for each operation, if an operation isn't specified
 124 it's weighting is assumed to be zero (not used).
 125
 126 "Think-time" for a threadgroup may also be specified in millisecond
 127 amounts using the "op_delay" parameter, where every thread will wait
 128 for the specified amount between each operation.
 129
 130 Operations:
 131
 132 All operations begin by randomly selecting a filesystem from the list
 133 of filesystems specified in the profile.  The distribution aims to be
 134 uniform across all filesystems.
 135
 136
 137 The seven operations are:
 138
 139 reads  - read() calls with an overall amount and a blocksize
 140          operates on existing files.  Care must be taken to ensure
 141          that the read amount is smaller than the size of any possible
 142          file.
 143
 144          If random_read is specified, then the each individual blocks
 145          will be read starting from a random point with the file, and
 146          this will continune until the entire amount specifed has been
 147          read.  This offset of each random block will be totally
 148          random to the byte level, unless the "alignio" global parameter
 149          is on, and then the reads will be 4096 byte aligned.  This is
 150          generally recommended.
 151
 152
 153 readall - Very similar to read above, except it doesn't take an
 154           amount; it simply reads the entire file sequentially using the
 155           read_blocksize.   This is useful for situations where
 156           different filesystems have differently sized files, and sequential
 157           read patterns across all filesystems are desired.
 158
 159 writes - write() calls with an overall amount and blocksize
 160          this is an overwrite operation and will not enlarge an existing
 161          file, again one must be careful not to specify a write amount
 162          that is larger than any possible file in the data set.
 163
 164          If random_write is specified, then the each individual blocks
 165          will be written starting from a random point with the file, and
 166          this will continune until the entire amount specifed has been
 167          written out.  This offset of each random block will be totally
 168          random to the byte level, unless the "alignio" global parameter
 169          is on, and then the writes will be 4096 byte aligned.  This
 170          is generally recommended.
 171
 172          If the fsync_flag parameter for the threadgroup is non-zero,
 173          then after all of the write calls are finished, fsync() will
 174          be called on the file descriptor before the file is closed.
 175
 176
 177 creates - creates a file using open() call and determines the size
 178           randomly between on the constraints (min_filesize and
 179           max_filesize) for the selected filesystem. Write operations will
 180           be done using the same blocksize as is specified for the
 181           write operation.
 182 deletes - calls unlink() on a filename and removes it from the
 183           internal data-structures.  One must be careful to ensure
 184           there are enough files to delete at all times or else the benchmark
 185           will terminate.
 186 appends - calls write() using the append flag with an overall amount
 187           and a blocksize to be appended onto a randomly chosen file.
 188 metas   - this is actually a mix of several different directory
 189           operations.  Each "meta" operation consists of two directory
 190           creates, one directory remove, and a directory rename.
 191           These operations are all carried out separately from the
 192           other 5 operations.
 193
 194 Operation accounting:
 195
 196 Each operation which uses a blocksize counts each read/write of a
 197 blocksize as an operation (reads,writes,creates, and appends) whereas
 198 deletes and metas are considered single operations.
 199
 200 Running the benchmark:
 201
 202 There are three phases to running the benchmark, aging, fileset
 203 creates, and the benchmark phase.
 204
 205 The create phase is carried out across all filesystems simultanously
 206 with one dedicated thread per filesystem.
 207
 208 After the create phase, sync() is called to ensure all dirty data gets
 209 written out before the benchmark phase begins, and sync() is again
 210 called at the end of the benchmark phase.  The time in sync() at the
 211 end of the benchmark phase is counted as part of the benchmark phase.
 212
 213 Caveats/Holes/Bugs:
 214
 215 Aging and aging across multiple filesystems simultaneously hasn't been tested
 216 very much.
 217
 218 If *any* i/o operation or system call/libc call fails, the benchmark
 219 will terminate immediately.
 220
 221 The parser doesn't handle mal-formed or incorrect profiles very well
 222 (or at all).
 223
 224 The parser doesn't check to make sure all of the appropriate options
 225 have been specified.  For example, if writes are specified in a
 226 threadgroup but write_blocksize isn't specified, the parse won't catch
 227 it, but the benchmark run will fail later on.
 228
 229
 230 Configuration Files (new style):
 231
 232 New Style Configuration allows for arbitrary newlines between lines,
 233 and comments using '#' at the start of a line.  Also it allows tabs,
 234 whitespace before and after configuration parameters.
 235
 236 The new style configuration file is broken up into three main parts:
 237
 238 global parameters, filesystems, and threadgroups
 239
 240 The sections must be in the above order.
 241
 242 Global parameters:
 243
 244 Global Paramters are described above, the first three are always
 245 required. Example:
 246
 247 ----------
 248
 249 num_filesystems=1
 250 num_threadgroups=1
 251 time=30                 # time is in seconds
 252
 253 directio=0              # don't use direct io
 254 alignio=1               # align random IOs to 4k
 255 bufferedio=0            # this does nothing right now
 256 verbose=0               # this does nothing right now
 257
 258                         # calls and external command and waits
 259                         # everything until the newline is taken
 260                         # so you can have abritrary parmeters
 261 callout=synchronize.sh myhostname
 262
 263 ---------
 264
 265 All of these must appear in this order, though you can leave out the
 266 optional ones.
 267
 268 Filesystems:
 269
 270 Filesystems describe differnt logical sets of files residing in
 271 different directorys.  There is no strict requirement that they
 272 actually be on different filesystems, only that the directory
 273 specified already exists.
 274
 275 Filesystems are specified by a clause with a filesystem number like
 276 this:
 277
 278 [filesystem0]
 279         location=/mnt/testing/
 280         num_files=10
 281         num_dirs=1
 282         max_filesize=4096
 283         min_filesize=4096
 284 [end0]
 285
 286
 287 The clause must always begin with [filesystemX] and end with [endX]
 288 where X is the number of that filesystem.
 289
 290 You should start wiht X = 0, and increment by one for each following
 291 filesystem.  If they are out of order, things will likely break.
 292
 293 The required information for each filesystem is: location, num_files,
 294 num_dirs, max_filesize, and min_filesize.  Beyond those the following
 295 four options are supported:
 296
 297
 298
 299 reuse=1 # check the filesystem to see if it is reusable
 300
 301         # filesystem aging, three components required
 302         # takes agefs=1 to turn it on
 303         # then a valid threadgroup specification
 304         # then a desired utilization percentage
 305
 306 agefs=1 # age the filesystem according to the following threadgroup
 307         [threadgroup0]
 308                 num_threads=10
 309                 write_size=40960
 310                 write_blocksize=4096
 311                 create_weight=10
 312                 append_weight=10
 313                 delete_weight=1
 314         [end0]
 315 desired_util=0.20       # In this case, age until the fs is 20% full
 316
 317 create_blocksize=4096   # specify the blocksize to write()
 318                         # for creating the fileset, defaults to 4096
 319
 320 age_blocksize=4096      # specify the blocksize to write() for aging
 321
 322
 323 Also, to allow lazy people to use lots of filesystems, we support
 324 filesystem inheritance, which simply copies all options but the
 325 location from the previous filesystem clause if nothing is specified.
 326 Obviously, this doesn't work for filesystem0. (May not work for aging
 327 either?)
 328
 329 Full blown filesystem clause example:
 330
 331 ----
 332
 333 [filesystem0]
 334
 335         # required parts
 336
 337         location=/home/sonny/tmp
 338         num_files=100
 339         num_dirs=100
 340         max_filesize=65536
 341         min_filesize=4096
 342
 343         # aging part
 344         agefs=0
 345         [threadgroup0]
 346                 num_threads=10
 347                 write_size=40960
 348                 write_blocksize=4096
 349                 create_weight=10
 350                 append_weight=10
 351                 delete_weight=1
 352         [end0]
 353                 desired_util=0.02       # age until 2% full
 354
 355         # other optional commands
 356
 357         create_blocksize=1024           # use a small create blocksize
 358         age_blocksize=1024              # and smaller age create blocksize
 359         reuse=0                         # don't reuse it
 360 [end0]
 361
 362
 363
 364 --
 365
 366 Threadgroups:
 367
 368 Threadgropus are very similar to filesystems in that any number of
 369 them can be specified in clauses, and they must be in order starting
 370 with threadgroup0.
 371
 372 Example:
 373
 374 ---
 375
 376 [threadgroup0]
 377         num_threads=32
 378         read_weight=4
 379         append_weight=1
 380
 381         write_size=4096
 382         write_blocksize=4096
 383
 384         read_size=4096
 385         read_blocksize=4096
 386 [end0]
 387
 388 ---
 389
 390 In a threadgroup clause, num_threads is required and must be at least
 391 1.  Then, at least one operation must be given a weight greater than 0
 392 to be a valid threadgroup.  Operations can be given a weighting of 0,
 393 and in this case they are ignored.
 394
 395 Certain operations will also require other commands, for example, if
 396 read_weight is greater than zero, then one must also include a
 397 read_size and a read_blocksize.  Here's the table of requirements and
 398 options:
 399
 400
 401 Operation               Requirements                    Options
 402 --                      --                              --
 403 read_weight             read_size, read_blocksize       read_random
 404 readall_weight          read_blocksize                  none
 405 write_weight            write_size, write_blocksize     write_random,fsync_file
 406 create_weight           write_blocksize or create_blocksize     none
 407 append_weight           write_blocksize, write_size     none
 408 delete_weight           none                            none
 409 meta_weight             none                            none
 410
 411
 412
 413 Other threadgroup options:
 414
 415 op_delay=10  # specify a wait between operations in milli-seconds
 416
 417 bindfs=3     # This allows you to restrict a threadgroup's operation
 418              # to a specific filesystem number.  Currently only
 419              # binding to one specific filesystem is supported
 420