[ Upstream commit 919edbadebe17a67193533f531c2920c03e40fa4 ]
Jan Kara reported a performance regression in dbench that he
bisected down to commit bad77c375e ("xfs: CIL checkpoint
flushes caches unconditionally").
Whilst developing the journal flush/fua optimisations this cache was
part of, it appeared to made a significant difference to
performance. However, now that this patchset has settled and all the
correctness issues fixed, there does not appear to be any
significant performance benefit to asynchronous cache flushes.
In fact, the opposite is true on some storage types and workloads,
where additional cache flushes that can occur from fsync heavy
workloads have measurable and significant impact on overall
throughput.
Local dbench testing shows little difference on dbench runs with
sync vs async cache flushes on either fast or slow SSD storage, and
no difference in streaming concurrent async transaction workloads
like fs-mark.
Fast NVME storage.
>From `dbench -t 30`, CIL scale:
clients async sync
BW Latency BW Latency
1 935.18 0.855 915.64 0.903
8 2404.51 6.873 2341.77 6.511
16 3003.42 6.460 2931.57 6.529
32 3697.23 7.939 3596.28 7.894
128 7237.43 15.495 7217.74 11.588
512 5079.24 90.587 5167.08 95.822
fsmark, 32 threads, create w/ 64 byte xattr w/32k logbsize
create chown unlink
async 1m41s 1m16s 2m03s
sync 1m40s 1m19s 1m54s
Slower SATA SSD storage:
>From `dbench -t 30`, CIL scale:
clients async sync
BW Latency BW Latency
1 78.59 15.792 83.78 10.729
8 367.88 92.067 404.63 59.943
16 564.51 72.524 602.71 76.089
32 831.66 105.984 870.26 110.482
128 1659.76 102.969 1624.73 91.356
512 2135.91 223.054 2603.07 161.160
fsmark, 16 threads, create w/32k logbsize
create unlink
async 5m06s 4m15s
sync 5m00s 4m22s
And on Jan's test machine:
5.18-rc8-vanilla 5.18-rc8-patched
Amean 1 71.22 ( 0.00%) 64.94 * 8.81%*
Amean 2 93.03 ( 0.00%) 84.80 * 8.85%*
Amean 4 150.54 ( 0.00%) 137.51 * 8.66%*
Amean 8 252.53 ( 0.00%) 242.24 * 4.08%*
Amean 16 454.13 ( 0.00%) 439.08 * 3.31%*
Amean 32 835.24 ( 0.00%) 829.74 * 0.66%*
Amean 64 1740.59 ( 0.00%) 1686.73 * 3.09%*
Performance and cache flush behaviour is restored to pre-regression
levels.
As such, we can now consider the async cache flush mechanism an
unnecessary exercise in premature optimisation and hence we can
now remove it and the infrastructure it requires completely.
Fixes: bad77c375e ("xfs: CIL checkpoint flushes caches unconditionally")
Reported-and-tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The new checkpoint cache flush mechanism requires us to issue an
unconditional cache flush before we start a new checkpoint. We don't
want to block for this if we can help it, and we have a fair chunk
of CPU work to do between starting the checkpoint and issuing the
first journal IO.
Hence it makes sense to amortise the latency cost of the cache flush
by issuing it asynchronously and then waiting for it only when we
need to issue the first IO in the transaction.
To do this, we need async cache flush primitives to submit the cache
flush bio and to wait on it. The block layer has no such primitives
for filesystems, so roll our own for the moment.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
It's often inconvenient to use BIO_MAX_PAGES due to min() requiring the
sign to be the same. Introduce bio_max_segs() and change BIO_MAX_PAGES to
be unsigned to make it easier for the users.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We need to chain the earlier bios to the later ones, so that
submit_bio_wait waits on the bio that all the completions are
dispatched to.
Fixes: 6ad5b3255b ("xfs: use bios directly to read and write the log recovery buffers")
Reported-by: Dave Chinner <david@fromorbit.com>
Tested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The xfs_buf structure is basically used as a glorified container for
a memory allocation in the log recovery code. Replace it with a
call to kmem_alloc_large and a simple abstraction to read into or
write from it synchronously using chained bios.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>