Currently, commit e9e2eae89d dropped a (int) decoration from
XFS_LITINO(mp), and since sizeof() expression is also involved,
the result of XFS_LITINO(mp) is simply as the size_t type
(commonly unsigned long).
Considering the expression in xfs_attr_shortform_bytesfit():
offset = (XFS_LITINO(mp) - bytes) >> 3;
let "bytes" be (int)340, and
"XFS_LITINO(mp)" be (unsigned long)336.
on 64-bit platform, the expression is
offset = ((unsigned long)336 - (int)340) >> 3 =
(int)(0xfffffffffffffffcUL >> 3) = -1
but on 32-bit platform, the expression is
offset = ((unsigned long)336 - (int)340) >> 3 =
(int)(0xfffffffcUL >> 3) = 0x1fffffff
instead.
so offset becomes a large positive number on 32-bit platform, and
cause xfs_attr_shortform_bytesfit() returns maxforkoff rather than 0.
Therefore, one result is
"ASSERT(new_size <= XFS_IFORK_SIZE(ip, whichfork));"
assertion failure in xfs_idata_realloc(), which was also the root
cause of the original bugreport from Dennis, see:
https://bugzilla.redhat.com/show_bug.cgi?id=1894177
And it can also be manually triggered with the following commands:
$ touch a;
$ setfattr -n user.0 -v "`seq 0 80`" a;
$ setfattr -n user.1 -v "`seq 0 80`" a
on 32-bit platform.
Fix the case in xfs_attr_shortform_bytesfit() by bailing out
"XFS_LITINO(mp) < bytes" in advance suggested by Eric and a misleading
comment together with this bugfix suggested by Darrick. It seems the
other users of XFS_LITINO(mp) are not impacted.
Fixes: e9e2eae89d ("xfs: only check the superblock version for dinode size calculation")
Cc: <stable@vger.kernel.org> # 5.7+
Reported-and-tested-by: Dennis Gilmore <dgilmore@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Teach the directory scrubber to check all the bestfree entries,
including the null ones. We want to be able to detect the case where
the entry is null but there actually /is/ a directory data block.
Found by fuzzing lbests[0] = ones in xfs/391.
Fixes: df481968f3 ("xfs: scrub directory freespace")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
We always know the correct state of the rmap record flags (attr, bmbt,
unwritten) so check them by direct comparison.
Fixes: d852657ccf ("xfs: cross-reference reverse-mapping btree")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
The comment and logic in xchk_btree_check_minrecs for dealing with
inode-rooted btrees isn't quite correct. While the direct children of
the inode root are allowed to have fewer records than what would
normally be allowed for a regular ondisk btree block, this is only true
if there is only one child block and the number of records don't fit in
the inode root.
Fixes: 08a3a692ef ("xfs: btree scrub should check minrecs")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
We also need to drop the iolock when invalidate_inode_pages2 fails, not
only on all other error or successful cases.
Fixes: 527851124d ("xfs: implement pNFS export operations")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Fix some serious WTF in the reference count scrubber's rmap fragment
processing. The code comment says that this loop is supposed to move
all fragment records starting at or before bno onto the worklist, but
there's no obvious reason why nr (the number of items added) should
increment starting from 1, and breaking the loop when we've added the
target number seems dubious since we could have more rmap fragments that
should have been added to the worklist.
This seems to manifest in xfs/411 when adding one to the refcount field.
Fixes: dbde19da96 ("xfs: cross-reference the rmapbt data with the refcountbt")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Keys for extent interval records in the reverse mapping btree are
supposed to be computed as follows:
(physical block, owner, fork, is_btree, is_unwritten, offset)
This provides users the ability to look up a reverse mapping from a bmbt
record -- start with the physical block; then if there are multiple
records for the same block, move on to the owner; then the inode fork
type; and so on to the file offset.
However, the key comparison functions incorrectly remove the
fork/btree/unwritten information that's encoded in the on-disk offset.
This means that lookup comparisons are only done with:
(physical block, owner, offset)
This means that queries can return incorrect results. On consistent
filesystems this hasn't been an issue because blocks are never shared
between forks or with bmbt blocks; and are never unwritten. However,
this bug means that online repair cannot always detect corruption in the
key information in internal rmapbt nodes.
Found by fuzzing keys[1].attrfork = ones on xfs/371.
Fixes: 4b8ed67794 ("xfs: add rmap btree operations")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
When the bmbt scrubber is looking up rmap extents, we need to set the
extent flags from the bmbt record fully. This will matter once we fix
the rmap btree comparison functions to check those flags correctly.
Fixes: d852657ccf ("xfs: cross-reference reverse-mapping btree")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Pass the same oldext argument (which contains the existing rmapping's
unwritten state) to xfs_rmap_lookup_le_range at the start of
xfs_rmap_convert_shared. At this point in the code, flags is zero,
which means that we perform lookups using the wrong key.
Fixes: 3f165b334e ("xfs: convert unwritten status of reverse mappings for shared files")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
There's no reason to flush an entire file when we're unsharing part of
a file. Therefore, only initiate writeback on the selected range.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
The kernel has always allowed directories to have the rtinherit flag
set, even if there is no rt device, so this check is wrong.
Fixes: 80e4e12688 ("xfs: scrub inodes")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
In commit 7588cbeec6, we tried to fix a race stemming from the lack of
coordination between higher level code that wants to allocate and remap
CoW fork extents into the data fork. Christoph cites as examples the
always_cow mode, and a directio write completion racing with writeback.
According to the comments before the goto retry, we want to restart the
lookup to catch the extent in the data fork, but we don't actually reset
whichfork or cow_fsb, which means the second try executes using stale
information. Up until now I think we've gotten lucky that either
there's something left in the CoW fork to cause cow_fsb to be reset, or
either data/cow fork sequence numbers have advanced enough to force a
fresh lookup from the data fork. However, if we reach the retry with an
empty stable CoW fork and a stable data fork, neither of those things
happens. The retry foolishly re-calls xfs_convert_blocks on the CoW
fork which fails again. This time, we toss the write.
I've recently been working on extending reflink to the realtime device.
When the realtime extent size is larger than a single block, we have to
force the page cache to CoW the entire rt extent if a write (or
fallocate) are not aligned with the rt extent size. The strategy I've
chosen to deal with this is derived from Dave's blocksize > pagesize
series: dirtying around the write range, and ensuring that writeback
always starts mapping on an rt extent boundary. This has brought this
race front and center, since generic/522 blows up immediately.
However, I'm pretty sure this is a bug outright, independent of that.
Fixes: 7588cbeec6 ("xfs: retry COW fork delalloc conversion when no extent was found")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
The iomap writepage error handling logic is a mash of old and
slightly broken XFS writepage logic. When keepwrite writeback state
tracking was introduced in XFS in commit 0d085a529b ("xfs: ensure
WB_SYNC_ALL writeback handles partial pages correctly"), XFS had an
additional cluster writeback context that scanned ahead of
->writepage() to process dirty pages over the current ->writepage()
extent mapping. This context expected a dirty page and required
retention of the TOWRITE tag on partial page processing so the
higher level writeback context would revisit the page (in contrast
to ->writepage(), which passes a page with the dirty bit already
cleared).
The cluster writeback mechanism was eventually removed and some of
the error handling logic folded into the primary writeback path in
commit 150d5be09c ("xfs: remove xfs_cancel_ioend"). This patch
accidentally conflated the two contexts by using the keepwrite logic
in ->writepage() without accounting for the fact that the page is
not dirty. Further, the keepwrite logic has no practical effect on
the core ->writepage() caller (write_cache_pages()) because it never
revisits a page in the current function invocation.
Technically, the page should be redirtied for the keepwrite logic to
have any effect. Otherwise, write_cache_pages() may find the tagged
page but will skip it since it is clean. Even if the page was
redirtied, however, there is still no practical effect to keepwrite
since write_cache_pages() does not wrap around within a single
invocation of the function. Therefore, the dirty page would simply
end up retagged on the next writeback sequence over the associated
range.
All that being said, none of this really matters because redirtying
a partially processed page introduces a potential infinite redirty
-> writeback failure loop that deviates from the current design
principle of clearing the dirty state on writepage failure to avoid
building up too much dirty, unreclaimable memory on the system.
Therefore, drop the spurious keepwrite usage and dirty state
clearing logic from iomap_writepage_map(), treat the partially
processed page the same as a fully processed page, and let the
imminent ioend failure clean up the writeback state.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
iomap writeback mapping failure only calls into ->discard_page() if
the current page has not been added to the ioend. Accordingly, the
XFS callback assumes a full page discard and invalidation. This is
problematic for sub-page block size filesystems where some portion
of a page might have been mapped successfully before a failure to
map a delalloc block occurs. ->discard_page() is not called in that
error scenario and the bio is explicitly failed by iomap via the
error return from ->prepare_ioend(). As a result, the filesystem
leaks delalloc blocks and corrupts the filesystem block counters.
Since XFS is the only user of ->discard_page(), tweak the semantics
to invoke the callback unconditionally on mapping errors and provide
the file offset that failed to map. Update xfs_discard_page() to
discard the corresponding portion of the file and pass the range
along to iomap_invalidatepage(). The latter already properly handles
both full and sub-page scenarios by not changing any iomap or page
state on sub-page invalidations.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
It is possible to expose non-zeroed post-EOF data in XFS if the new
EOF page is dirty, backed by an unwritten block and the truncate
happens to race with writeback. iomap_truncate_page() will not zero
the post-EOF portion of the page if the underlying block is
unwritten. The subsequent call to truncate_setsize() will, but
doesn't dirty the page. Therefore, if writeback happens to complete
after iomap_truncate_page() (so it still sees the unwritten block)
but before truncate_setsize(), the cached page becomes inconsistent
with the on-disk block. A mapped read after the associated page is
reclaimed or invalidated exposes non-zero post-EOF data.
For example, consider the following sequence when run on a kernel
modified to explicitly flush the new EOF page within the race
window:
$ xfs_io -fc "falloc 0 4k" -c fsync /mnt/file
$ xfs_io -c "pwrite 0 4k" -c "truncate 1k" /mnt/file
...
$ xfs_io -c "mmap 0 4k" -c "mread -v 1k 8" /mnt/file
00000400: 00 00 00 00 00 00 00 00 ........
$ umount /mnt/; mount <dev> /mnt/
$ xfs_io -c "mmap 0 4k" -c "mread -v 1k 8" /mnt/file
00000400: cd cd cd cd cd cd cd cd ........
Update xfs_setattr_size() to explicitly flush the new EOF page prior
to the page truncate to ensure iomap has the latest state of the
underlying block.
Fixes: 68a9f5e700 ("xfs: implement iomap based buffered write path")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Make sure that we actually initialize xefi_discard when we're scheduling
a deferred free of an AGFL block. This was (eventually) found by the
UBSAN while I was banging on realtime rmap problems, but it exists in
the upstream codebase. While we're at it, rearrange the structure to
reduce the struct size from 64 to 56 bytes.
Fixes: fcb762f5de ("xfs: add bmapi nodiscard flag")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Pull more cifs updates from Steve French:
"Add support for stat of various special file types (WSL reparse points
for char, block, fifo)"
* tag '5.10-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal module version number
smb3: add some missing definitions from MS-FSCC
smb3: remove two unused variables
smb3: add support for stat of WSL reparse points for special file types
Pull io_uring fixes from Jens Axboe:
- fsize was missed in previous unification of work flags
- Few fixes cleaning up the flags unification creds cases (Pavel)
- Fix NUMA affinities for completely unplugged/replugged node for io-wq
- Two fallout fixes from the set_fs changes. One local to io_uring, one
for the splice entry point that io_uring uses.
- Linked timeout fixes (Pavel)
- Removal of ->flush() ->files work-around that we don't need anymore
with referenced files (Pavel)
- Various cleanups (Pavel)
* tag 'io_uring-5.10-2020-10-24' of git://git.kernel.dk/linux-block:
splice: change exported internal do_splice() helper to take kernel offset
io_uring: make loop_rw_iter() use original user supplied pointers
io_uring: remove req cancel in ->flush()
io-wq: re-set NUMA node affinities if CPUs come online
io_uring: don't reuse linked_timeout
io_uring: unify fsize with def->work_flags
io_uring: fix racy REQ_F_LINK_TIMEOUT clearing
io_uring: do poll's hash_node init in common code
io_uring: inline io_poll_task_handler()
io_uring: remove extra ->file check in poll prep
io_uring: make cached_cq_overflow non atomic_t
io_uring: inline io_fail_links()
io_uring: kill ref get/drop in personality init
io_uring: flags-based creds init in queue
Pull misc vfs updates from Al Viro:
"Assorted stuff all over the place (the largest group here is
Christoph's stat cleanups)"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: remove KSTAT_QUERY_FLAGS
fs: remove vfs_stat_set_lookup_flags
fs: move vfs_fstatat out of line
fs: implement vfs_stat and vfs_lstat in terms of vfs_fstatat
fs: remove vfs_statx_fd
fs: omfs: use kmemdup() rather than kmalloc+memcpy
[PATCH] reduce boilerplate in fsid handling
fs: Remove duplicated flag O_NDELAY occurring twice in VALID_OPEN_FLAGS
selftests: mount: add nosymfollow tests
Add a "nosymfollow" mount option.
Pull xfs fixes from Darrick Wong:
"Two bug fixes that trickled in during the merge window:
- Make fallocate check the alignment of its arguments against the
fundamental allocation unit of the volume the file lives on, so
that we don't trigger the fs' alignment checks.
- Cancel unprocessed log intents immediately when log recovery fails,
to avoid a log deadlock"
* tag 'xfs-5.10-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: cancel intents immediately if process_intents fails
xfs: fix fallocate functions when rtextsize is larger than 1
Add some structures and defines that were recently added to
the protocol documentation (see MS-FSCC sections 2.3.29-2.3.34).
Signed-off-by: Steve French <stfrench@microsoft.com>
Fix two unused variables in commit
"add support for stat of WSL reparse points for special file types"
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull gfs2 updates from Andreas Gruenbacher:
- Use iomap for non-journaled buffered I/O. This largely eliminates
buffer heads on filesystems where the block size matches the page
size. Many thanks to Christoph Hellwig for this patch!
- Fixes for some more journaled data filesystem bugs, found by running
xfstests with data journaling on for all files (chattr +j $MNT) (Bob
Peterson)
- gfs2_evict_inode refactoring (Bob Peterson)
- Use the statfs data in the journal during recovery instead of reading
it in from the local statfs inodes (Abhi Das)
- Several other minor fixes by various people
* tag 'gfs2-for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (30 commits)
gfs2: Recover statfs info in journal head
gfs2: lookup local statfs inodes prior to journal recovery
gfs2: Add fields for statfs info in struct gfs2_log_header_host
gfs2: Ignore subsequent errors after withdraw in rgrp_go_sync
gfs2: Eliminate gl_vm
gfs2: Only access gl_delete for iopen glocks
gfs2: Fix comments to glock_hash_walk
gfs2: eliminate GLF_QUEUED flag in favor of list_empty(gl_holders)
gfs2: Ignore journal log writes for jdata holes
gfs2: simplify gfs2_block_map
gfs2: Only set PageChecked if we have a transaction
gfs2: don't lock sd_ail_lock in gfs2_releasepage
gfs2: make gfs2_ail1_empty_one return the count of active items
gfs2: Wipe jdata and ail1 in gfs2_journal_wipe, formerly gfs2_meta_wipe
gfs2: enhance log_blocks trace point to show log blocks free
gfs2: add missing log_blocks trace points in gfs2_write_revokes
gfs2: rename gfs2_write_full_page to gfs2_write_jdata_page, remove parm
gfs2: add validation checks for size of superblock
gfs2: use-after-free in sysfs deregistration
gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump
...
Pull cifs updates from Steve French:
- add support for recognizing special file types (char/block/fifo/
symlink) for files created by Linux on WSL (a format we plan to move
to as the default for creating special files on Linux, as it has
advantages over the other current option, the SFU format) in readdir.
- fix double queries to root directory when directory leases not
supported (e.g. Samba)
- fix querying mode bits (modefromsid mount option) for special file
types
- stronger encryption (gcm256), disabled by default until tested more
broadly
- allow querying owner when server reports 'well known SID' on query
dir with SMB3.1.1 POSIX extensions
* tag '5.10-rc-smb3-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6: (30 commits)
SMB3: add support for recognizing WSL reparse tags
cifs: remove bogus debug code
smb3.1.1: fix typo in compression flag
cifs: move smb version mount options into fs_context.c
cifs: move cache mount options to fs_context.ch
cifs: move security mount options into fs_context.ch
cifs: add files to host new mount api
smb3: do not try to cache root directory if dir leases not supported
smb3: fix stat when special device file and mounted with modefromsid
cifs: Print the address and port we are connecting to in generic_ip_connect()
SMB3: Resolve data corruption of TCP server info fields
cifs: make const array static, makes object smaller
SMB3.1.1: Fix ids returned in POSIX query dir
smb3: add dynamic trace point to trace when credits obtained
smb3.1.1: do not fail if no encryption required but server doesn't support it
cifs: Return the error from crypt_message when enc/dec key not found.
smb3.1.1: set gcm256 when requested
smb3.1.1: rename nonces used for GCM and CCM encryption
smb3.1.1: print warning if server does not support requested encryption type
smb3.1.1: add new module load parm enable_gcm_256
...
Pull clone/dedupe/remap code refactoring from Darrick Wong:
"Move the generic file range remap (aka reflink and dedupe) functions
out of mm/filemap.c and fs/read_write.c and into fs/remap_range.c to
reduce clutter in the first two files"
* tag 'vfs-5.10-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
vfs: move the generic write and copy checks out of mm
vfs: move the remap range helpers to remap_range.c
vfs: move generic_remap_checks out of mm
Pull arch task_work cleanups from Jens Axboe:
"Two cleanups that don't fit other categories:
- Finally get the task_work_add() cleanup done properly, so we don't
have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates
all callers, and also fixes up the documentation for
task_work_add().
- While working on some TIF related changes for 5.11, this
TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch
duplication for how that is handled"
* tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block:
task_work: cleanup notification modes
tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()
Apply the outstanding statfs changes in the journal head to the
master statfs file. Zero out the local statfs file for good measure.
Previously, statfs updates would be read in from the local statfs inode and
synced to the master statfs inode during recovery.
We now use the statfs updates in the journal head to update the master statfs
inode instead of reading in from the local statfs inode. To preserve backward
compatibility with kernels that can't do this, we still need to keep the
local statfs inode up to date by writing changes to it. At some point in the
future, we can do away with the local statfs inodes altogether and keep the
statfs changes solely in the journal.
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
We need to lookup the master statfs inode and the local statfs
inodes earlier in the mount process (in init_journal) so journal
recovery can use them when it attempts to recover the statfs info.
We lookup all the local statfs inodes and store them in a linked
list to allow a node to recover statfs info for other nodes in the
cluster.
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
With the set_fs change, we can no longer rely on copy_{to,from}_user()
accepting a kernel pointer, and it was bad form to do so anyway. Clean
this up and change the internal helper that io_uring uses to deal with
kernel pointers instead. This puts the offset copy in/out in __do_splice()
instead, which just calls the same helper.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We jump through a hoop for fixed buffers, where we first map these to
a bvec(), then kmap() the bvec to obtain the pointer we copy to/from.
This was always a bit ugly, and with the set_fs changes, it ends up
being practically problematic as well.
There's no need to jump through these hoops, just use the original user
pointers and length for the non iter based read/write.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull ext4 updates from Ted Ts'o:
"The siginificant new ext4 feature this time around is Harshad's new
fast_commit mode.
In addition, thanks to Mauricio for fixing a race where mmap'ed pages
that are being changed in parallel with a data=journal transaction
commit could result in bad checksums in the failure that could cause
journal replays to fail.
Also notable is Ritesh's buffered write optimization which can result
in significant improvements on parallel write workloads. (The kernel
test robot reported a 330.6% improvement on fio.write_iops on a 96
core system using DAX)
Besides that, we have the usual miscellaneous cleanups and bug fixes"
Link: https://lore.kernel.org/r/20200925071217.GO28663@shao2-debian
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (46 commits)
ext4: fix invalid inode checksum
ext4: add fast commit stats in procfs
ext4: add a mount opt to forcefully turn fast commits on
ext4: fast commit recovery path
jbd2: fast commit recovery path
ext4: main fast-commit commit path
jbd2: add fast commit machinery
ext4 / jbd2: add fast commit initialization
ext4: add fast_commit feature and handling for extended mount options
doc: update ext4 and journalling docs to include fast commit feature
ext4: Detect already used quota file early
jbd2: avoid transaction reuse after reformatting
ext4: use the normal helper to get the actual inode
ext4: fix bs < ps issue reported with dioread_nolock mount opt
ext4: data=journal: write-protect pages on j_submit_inode_data_buffers()
ext4: data=journal: fixes for ext4_page_mkwrite()
jbd2, ext4, ocfs2: introduce/use journal callbacks j_submit|finish_inode_data_buffers()
jbd2: introduce/export functions jbd2_journal_submit|finish_inode_data_buffers()
ext4: introduce ext4_sb_bread_unmovable() to replace sb_bread_unmovable()
ext4: use ext4_sb_bread() instead of sb_bread()
...
The IO_REPARSE_TAG_LX_ tags originally were used by WSL but they
are preferred by the Linux client in some cases since, unlike
the NFS reparse tag (or EAs), they don't require an extra query
to determine which type of special file they represent.
Add support for readdir to recognize special file types of
FIFO, SOCKET, CHAR, BLOCK and SYMLINK. This can be tested
by creating these special files in WSL Linux and then
sharing that location on the Windows server and mounting
to the Windows server to access them.
Prior to this patch all of the special files would show up
as being of type 'file' but with this patch they can be seen
with the correct file type as can be seen below:
brwxr-xr-x 1 root root 0, 0 Oct 21 17:10 block
crwxr-xr-x 1 root root 0, 0 Oct 21 17:46 char
drwxr-xr-x 2 root root 0 Oct 21 18:27 dir
prwxr-xr-x 1 root root 0 Oct 21 16:21 fifo
-rwxr-xr-x 1 root root 0 Oct 21 15:48 file
lrwxr-xr-x 1 root root 0 Oct 21 15:52 symlink-to-file
TODO: go through all documented reparse tags to see if we can
reasonably map some of them to directories vs. files vs. symlinks
and also add support for device numbers for block and char
devices.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
The "end" pointer is either NULL or it points to the next byte to parse.
If there isn't a next byte then dereferencing "end" is an off-by-one out
of bounds error. And, of course, if it's NULL that leads to an Oops.
Printing "*end" doesn't seem very useful so let's delete this code.
Also for the last debug statement, I noticed that it should be printing
"sequence_end" instead of "end" so fix that as well.
Reported-by: Dominik Maier <dmaier@sect.tu-berlin.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
This and related patches which move mount related
code to fs_context.c has the advantage of
shriking the code in fs/cifs/connect.c (which had
the second most lines of code of any of the files
in cifs.ko and was getting harder to read due
to its size) and will also make it easier to
switch over to the new mount API in the future.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Helps to shrink connect.c and make it more readable
by moving mount related code to fs_context.c and
fs_context.h
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
This patch moves the parsing of security mount options into
fs_context.ch. There are no changes to any logic.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
This will make it easier in the future, but also will allow us to
shrink connect.c which is getting too big, and harder to read
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Pull initial set_fs() removal from Al Viro:
"Christoph's set_fs base series + fixups"
* 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Allow a NULL pos pointer to __kernel_read
fs: Allow a NULL pos pointer to __kernel_write
powerpc: remove address space overrides using set_fs()
powerpc: use non-set_fs based maccess routines
x86: remove address space overrides using set_fs()
x86: make TASK_SIZE_MAX usable from assembly code
x86: move PAGE_OFFSET, TASK_SIZE & friends to page_{32,64}_types.h
lkdtm: remove set_fs-based tests
test_bitmap: remove user bitmap tests
uaccess: add infrastructure for kernel builds with set_fs()
fs: don't allow splice read/write without explicit ops
fs: don't allow kernel reads and writes without iter ops
sysctl: Convert to iter interfaces
proc: add a read_iter method to proc proc_ops
proc: cleanup the compat vs no compat file ops
proc: remove a level of indentation in proc_get_inode
Pull nfsd updates from Bruce Fields:
"The one new feature this time, from Anna Schumaker, is READ_PLUS,
which has the same arguments as READ but allows the server to return
an array of data and hole extents.
Otherwise it's a lot of cleanup and bugfixes"
* tag 'nfsd-5.10' of git://linux-nfs.org/~bfields/linux: (43 commits)
NFSv4.2: Fix NFS4ERR_STALE error when doing inter server copy
SUNRPC: fix copying of multiple pages in gss_read_proxy_verf()
sunrpc: raise kernel RPC channel buffer size
svcrdma: fix bounce buffers for unaligned offsets and multiple pages
nfsd: remove unneeded break
net/sunrpc: Fix return value for sysctl sunrpc.transports
NFSD: Encode a full READ_PLUS reply
NFSD: Return both a hole and a data segment
NFSD: Add READ_PLUS hole segment encoding
NFSD: Add READ_PLUS data support
NFSD: Hoist status code encoding into XDR encoder functions
NFSD: Map nfserr_wrongsec outside of nfsd_dispatch
NFSD: Remove the RETURN_STATUS() macro
NFSD: Call NFSv2 encoders on error returns
NFSD: Fix .pc_release method for NFSv2
NFSD: Remove vestigial typedefs
NFSD: Refactor nfsd_dispatch() error paths
NFSD: Clean up nfsd_dispatch() variables
NFSD: Clean up stale comments in nfsd_dispatch()
NFSD: Clean up switch statement in nfsd_dispatch()
...
Pull exfat updates from Namjae Jeon:
- Replace memcpy with structure assignment
- Remove unneeded codes and use helper function i_blocksize()
- Fix typos found by codespell
* tag 'exfat-for-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: remove useless check in exfat_move_file()
exfat: remove 'rwoffset' in exfat_inode_info
exfat: replace memcpy with structure assignment
exfat: remove useless directory scan in exfat_add_entry()
exfat: eliminate dead code in exfat_find()
exfat: use i_blocksize() to get blocksize
exfat: fix misspellings using codespell tool
Pull 9p updates from Dominique Martinet:
"A couple of small fixes (loff_t overflow on 32bit, syzbot
uninitialized variable warning) and code cleanup (xen)"
* tag '9p-for-5.10-rc1' of git://github.com/martinetd/linux:
net: 9p: initialize sun_server.sun_path to have addr's value only when addr is valid
9p/xen: Fix format argument warning
9P: Cast to loff_t before multiplying
Every close(io_uring) causes cancellation of all inflight requests
carrying ->files. That's not nice but was neccessary up until recently.
Now task->files removal is handled in the core code, so that part of
flush can be removed.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We correctly set io-wq NUMA node affinities when the io-wq context is
setup, but if an entire node CPU set is offlined and then brought back
online, the per node affinities are broken. Ensure that we set them
again whenever a CPU comes online. This ensures that we always track
the right node affinity. The usual cpuhp notifiers are used to drive it.
Reported-by: Zhang Qiang <qiang.zhang@windriver.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
During the stability test, there are some errors:
ext4_lookup:1590: inode #6967: comm fsstress: iget: checksum invalid.
If the inode->i_iblocks too big and doesn't set huge file flag, checksum
will not be recalculated when update the inode information to it's buffer.
If other inode marks the buffer dirty, then the inconsistent inode will
be flushed to disk.
Fix this problem by checking i_blocks in advance.
Cc: stable@kernel.org
Signed-off-by: Luo Meng <luomeng12@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Link: https://lore.kernel.org/r/20201020013631.3796673-1-luomeng12@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>