This shifts the responsibility of setting up dentry operations from
fscrypt to the individual filesystems, allowing them to have their own
operations while still setting fscrypt's d_revalidate as appropriate.
Also added helper function to libfs to unify ext4 and f2fs
implementations.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Test: Boots, /data/media is case insensitive
Bug: 138322712
Link: https://lore.kernel.org/linux-f2fs-devel/20200208013552.241832-1-drosen@google.com/T/#t
Change-Id: Iaf77f8c5961ecf22e22478701ab0b7fe2025225d
This adds general supporting functions for filesystems that use
utf8 casefolding. It provides standard dentry_operations and adds the
necessary structures in struct super_block to allow this standardization.
Ext4 and F2fs are switch to these implementations.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Note: Fixed issue with non-strictly enforced fallback hash
Test: Boots, /data/media is case insensitive
Bug: 138322712
Link: https://lore.kernel.org/linux-f2fs-devel/20200208013552.241832-1-drosen@google.com/T/#t
Change-Id: I81b5fb5d3ce0259a60712ae2505c1e4b03dbafde
Changes in 5.4.4
usb: gadget: configfs: Fix missing spin_lock_init()
usb: gadget: pch_udc: fix use after free
nvme: Namepace identification descriptor list is optional
Revert "nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T"
scsi: lpfc: Fix bad ndlp ptr in xri aborted handling
scsi: zfcp: trace channel log even for FCP command responses
scsi: qla2xxx: Do command completion on abort timeout
scsi: qla2xxx: Fix driver unload hang
scsi: qla2xxx: Fix double scsi_done for abort path
scsi: qla2xxx: Fix memory leak when sending I/O fails
compat_ioctl: add compat_ptr_ioctl()
ceph: fix compat_ioctl for ceph_dir_operations
media: venus: remove invalid compat_ioctl32 handler
USB: uas: honor flag to avoid CAPACITY16
USB: uas: heed CAPACITY_HEURISTICS
USB: documentation: flags on usb-storage versus UAS
usb: Allow USB device to be warm reset in suspended state
usb: host: xhci-tegra: Correct phy enable sequence
binder: fix incorrect calculation for num_valid
staging: exfat: fix multiple definition error of `rename_file'
staging: rtl8188eu: fix interface sanity check
staging: rtl8712: fix interface sanity check
staging: vchiq: call unregister_chrdev_region() when driver registration fails
staging: gigaset: fix general protection fault on probe
staging: gigaset: fix illegal free on probe errors
staging: gigaset: add endpoint-type sanity check
usb: xhci: only set D3hot for pci device
xhci: Fix memory leak in xhci_add_in_port()
xhci: fix USB3 device initiated resume race with roothub autosuspend
xhci: Increase STS_HALT timeout in xhci_suspend()
xhci: handle some XHCI_TRUST_TX_LENGTH quirks cases as default behaviour.
xhci: make sure interrupts are restored to correct state
interconnect: qcom: sdm845: Walk the list safely on node removal
interconnect: qcom: qcs404: Walk the list safely on node removal
usb: common: usb-conn-gpio: Don't log an error on probe deferral
ARM: dts: pandora-common: define wl1251 as child node of mmc3
iio: adis16480: Add debugfs_reg_access entry
iio: imu: st_lsm6dsx: fix ODR check in st_lsm6dsx_write_raw
iio: adis16480: Fix scales factors
iio: humidity: hdc100x: fix IIO_HUMIDITYRELATIVE channel reporting
iio: imu: inv_mpu6050: fix temperature reporting using bad unit
iio: adc: ad7606: fix reading unnecessary data from device
iio: adc: ad7124: Enable internal reference
USB: atm: ueagle-atm: add missing endpoint check
USB: idmouse: fix interface sanity checks
USB: serial: io_edgeport: fix epic endpoint lookup
usb: roles: fix a potential use after free
USB: adutux: fix interface sanity check
usb: core: urb: fix URB structure initialization function
usb: mon: Fix a deadlock in usbmon between mmap and read
tpm: add check after commands attribs tab allocation
tpm: Switch to platform_get_irq_optional()
EDAC/altera: Use fast register IO for S10 IRQs
brcmfmac: disable PCIe interrupts before bus reset
mtd: spear_smi: Fix Write Burst mode
mtd: rawnand: Change calculating of position page containing BBM
virt_wifi: fix use-after-free in virt_wifi_newlink()
virtio-balloon: fix managed page counts when migrating pages between zones
usb: dwc3: pci: add ID for the Intel Comet Lake -H variant
usb: dwc3: gadget: Fix logical condition
usb: dwc3: gadget: Clear started flag for non-IOC
usb: dwc3: ep0: Clear started flag on completion
phy: renesas: rcar-gen3-usb2: Fix sysfs interface of "role"
usb: typec: fix use after free in typec_register_port()
iwlwifi: pcie: fix support for transmitting SKBs with fraglist
btrfs: check page->mapping when loading free space cache
btrfs: use btrfs_block_group_cache_done in update_block_group
btrfs: use refcount_inc_not_zero in kill_all_nodes
Btrfs: fix metadata space leak on fixup worker failure to set range as delalloc
Btrfs: fix negative subv_writers counter and data space leak after buffered write
btrfs: Avoid getting stuck during cyclic writebacks
btrfs: Remove btrfs_bio::flags member
Btrfs: send, skip backreference walking for extents with many references
btrfs: record all roots for rename exchange on a subvol
rtlwifi: rtl8192de: Fix missing code to retrieve RX buffer address
rtlwifi: rtl8192de: Fix missing callback that tests for hw release of buffer
rtlwifi: rtl8192de: Fix missing enable interrupt flag
lib: raid6: fix awk build warnings
ovl: fix lookup failure on multi lower squashfs
ovl: fix corner case of non-unique st_dev;st_ino
ovl: relax WARN_ON() on rename to self
hwrng: omap - Fix RNG wait loop timeout
dm writecache: handle REQ_FUA
dm zoned: reduce overhead of backing device checks
workqueue: Fix spurious sanity check failures in destroy_workqueue()
workqueue: Fix pwq ref leak in rescuer_thread()
ASoC: rt5645: Fixed buddy jack support.
ASoC: rt5645: Fixed typo for buddy jack support.
ASoC: Jack: Fix NULL pointer dereference in snd_soc_jack_report
ASoC: fsl_audmix: Add spin lock to protect tdms
md: improve handling of bio with REQ_PREFLUSH in md_flush_request()
blk-mq: avoid sysfs buffer overflow with too many CPU cores
cgroup: pids: use atomic64_t for pids->limit
wil6210: check len before memcpy() calls
ar5523: check NULL before memcpy() in ar5523_cmd()
s390/mm: properly clear _PAGE_NOEXEC bit when it is not supported
media: hantro: Fix s_fmt for dynamic resolution changes
media: hantro: Fix motion vectors usage condition
media: hantro: Fix picture order count table enable
media: vimc: sen: remove unused kthread_sen field
media: bdisp: fix memleak on release
media: radio: wl1273: fix interrupt masking on release
media: cec.h: CEC_OP_REC_FLAG_ values were swapped
cpuidle: Do not unset the driver if it is there already
cpuidle: teo: Ignore disabled idle states that are too deep
cpuidle: teo: Rename local variable in teo_select()
cpuidle: teo: Consider hits and misses metrics of disabled states
cpuidle: teo: Fix "early hits" handling for disabled idle states
cpuidle: use first valid target residency as poll time
erofs: zero out when listxattr is called with no xattr
perf tests: Fix out of bounds memory access
drm/panfrost: Open/close the perfcnt BO
powerpc/perf: Disable trace_imc pmu
intel_th: Fix a double put_device() in error path
intel_th: pci: Add Ice Lake CPU support
intel_th: pci: Add Tiger Lake CPU support
PM / devfreq: Lock devfreq in trans_stat_show
cpufreq: powernv: fix stack bloat and hard limit on number of CPUs
ALSA: fireface: fix return value in error path of isochronous resources reservation
ALSA: oxfw: fix return value in error path of isochronous resources reservation
ALSA: hda/realtek - Line-out jack doesn't work on a Dell AIO
ACPI / utils: Move acpi_dev_get_first_match_dev() under CONFIG_ACPI
ACPI: LPSS: Add LNXVIDEO -> BYT I2C7 to lpss_device_links
ACPI: LPSS: Add LNXVIDEO -> BYT I2C1 to lpss_device_links
ACPI: LPSS: Add dmi quirk for skipping _DEP check for some device-links
ACPI / hotplug / PCI: Allocate resources directly under the non-hotplug bridge
ACPI: OSL: only free map once in osl.c
ACPI: bus: Fix NULL pointer check in acpi_bus_get_private_data()
ACPI: EC: Rework flushing of pending work
ACPI: PM: Avoid attaching ACPI PM domain to certain devices
pinctrl: rza2: Fix gpio name typos
pinctrl: armada-37xx: Fix irq mask access in armada_37xx_irq_set_type()
pinctrl: samsung: Add of_node_put() before return in error path
pinctrl: samsung: Fix device node refcount leaks in Exynos wakeup controller init
pinctrl: samsung: Fix device node refcount leaks in S3C24xx wakeup controller init
pinctrl: samsung: Fix device node refcount leaks in init code
pinctrl: samsung: Fix device node refcount leaks in S3C64xx wakeup controller init
mmc: host: omap_hsmmc: add code for special init of wl1251 to get rid of pandora_wl1251_init_card
ARM: dts: omap3-tao3530: Fix incorrect MMC card detection GPIO polarity
RDMA/core: Fix ib_dma_max_seg_size()
ppdev: fix PPGETTIME/PPSETTIME ioctls
stm class: Lose the protocol driver when dropping its reference
coresight: Serialize enabling/disabling a link device.
powerpc: Allow 64bit VDSO __kernel_sync_dicache to work across ranges >4GB
powerpc/xive: Prevent page fault issues in the machine crash handler
powerpc: Allow flush_icache_range to work across ranges >4GB
powerpc/xive: Skip ioremap() of ESB pages for LSI interrupts
video/hdmi: Fix AVI bar unpack
quota: Check that quota is not dirty before release
ext2: check err when partial != NULL
seccomp: avoid overflow in implicit constant conversion
quota: fix livelock in dquot_writeback_dquots
ext4: Fix credit estimate for final inode freeing
reiserfs: fix extended attributes on the root directory
scsi: qla2xxx: Fix SRB leak on switch command timeout
scsi: qla2xxx: Fix a dma_pool_free() call
Revert "scsi: qla2xxx: Fix memory leak when sending I/O fails"
iio: ad7949: kill pointless "readback"-handling code
iio: ad7949: fix channels mixups
omap: pdata-quirks: revert pandora specific gpiod additions
omap: pdata-quirks: remove openpandora quirks for mmc3 and wl1251
powerpc: Avoid clang warnings around setjmp and longjmp
powerpc: Fix vDSO clock_getres()
mm, memfd: fix COW issue on MAP_PRIVATE and F_SEAL_FUTURE_WRITE mappings
mfd: rk808: Fix RK818 ID template
mm: memcg/slab: wait for !root kmem_cache refcnt killing on root kmem_cache destruction
ext4: work around deleting a file with i_nlink == 0 safely
firmware: qcom: scm: Ensure 'a0' status code is treated as signed
s390/smp,vdso: fix ASCE handling
s390/kaslr: store KASLR offset for early dumps
mm/shmem.c: cast the type of unmap_start to u64
powerpc: Define arch_is_kernel_initmem_freed() for lockdep
USB: dummy-hcd: increase max number of devices to 32
rtc: disable uie before setting time and enable after
splice: only read in as much information as there is pipe buffer space
ext4: fix a bug in ext4_wait_for_tail_page_commit
ext4: fix leak of quota reservations
blk-mq: make sure that line break can be printed
workqueue: Fix missing kfree(rescuer) in destroy_workqueue()
r8169: fix rtl_hw_jumbo_disable for RTL8168evl
EDAC/ghes: Do not warn when incrementing refcount on 0
Linux 5.4.4
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8949a5fb2fbd836ce34907e70906e3aeb8a58b7c
commit 2952db0fd5 upstream.
Many drivers have ioctl() handlers that are completely compatible between
32-bit and 64-bit architectures, except for the argument that is passed
down from user space and may have to be passed through compat_ptr()
in order to become a valid 64-bit pointer.
Using ".compat_ptr = compat_ptr_ioctl" in file operations should let
us simplify a lot of those drivers to avoid #ifdef checks, and convert
additional drivers that don't have proper compat handling yet.
On most architectures, the compat_ptr_ioctl() just passes all arguments
to the corresponding ->ioctl handler. The exception is arch/s390, where
compat_ptr() clears the top bit of a 32-bit pointer value, so user space
pointers to the second 2GB alias the first 2GB, as is the case for native
32-bit s390 user space.
The compat_ptr_ioctl() function must therefore be used only with
ioctl functions that either ignore the argument or pass a pointer to a
compatible data type.
If any ioctl command handled by fops->unlocked_ioctl passes a plain
integer instead of a pointer, or any of the passed data types is
incompatible between 32-bit and 64-bit architectures, a proper handler
is required instead of compat_ptr_ioctl.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
To make the 5.4-rc1 merge easier, merge at a prerelease point in time
before the final release happens.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: If613d657fd0abf9910c5bf3435a745f01b89765e
Pull nfsd updates from Bruce Fields:
"Highlights:
- Add a new knfsd file cache, so that we don't have to open and close
on each (NFSv2/v3) READ or WRITE. This can speed up read and write
in some cases. It also replaces our readahead cache.
- Prevent silent data loss on write errors, by treating write errors
like server reboots for the purposes of write caching, thus forcing
clients to resend their writes.
- Tweak the code that allocates sessions to be more forgiving, so
that NFSv4.1 mounts are less likely to hang when a server already
has a lot of clients.
- Eliminate an arbitrary limit on NFSv4 ACL sizes; they should now be
limited only by the backend filesystem and the maximum RPC size.
- Allow the server to enforce use of the correct kerberos credentials
when a client reclaims state after a reboot.
And some miscellaneous smaller bugfixes and cleanup"
* tag 'nfsd-5.4' of git://linux-nfs.org/~bfields/linux: (34 commits)
sunrpc: clean up indentation issue
nfsd: fix nfs read eof detection
nfsd: Make nfsd_reset_boot_verifier_locked static
nfsd: degraded slot-count more gracefully as allocation nears exhaustion.
nfsd: handle drc over-allocation gracefully.
nfsd: add support for upcall version 2
nfsd: add a "GetVersion" upcall for nfsdcld
nfsd: Reset the boot verifier on all write I/O errors
nfsd: Don't garbage collect files that might contain write errors
nfsd: Support the server resetting the boot verifier
nfsd: nfsd_file cache entries should be per net namespace
nfsd: eliminate an unnecessary acl size limit
Deprecate nfsd fault injection
nfsd: remove duplicated include from filecache.c
nfsd: Fix the documentation for svcxdr_tmpalloc()
nfsd: Fix up some unused variable warnings
nfsd: close cached files prior to a REMOVE or RENAME that would replace target
nfsd: rip out the raparms cache
nfsd: have nfsd_test_lock use the nfsd_file cache
nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
...
To make the 5.4-rc1 merge easier, merge at a prerelease point in time
before the final release happens.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I29b683c837ed1a3324644dbf9bf863f30740cd0b
This merges Linus's tree as of commit b41dae061b ("Merge tag
'xfs-5.4-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux")
into android-mainline.
This "early" merge makes it easier to test and handle merge conflicts
instead of having to wait until the "end" of the merge window and handle
all 10000+ commits at once.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6bebf55e5e2353f814e3c87f5033607b1ae5d812
Pull y2038 vfs updates from Arnd Bergmann:
"Add inode timestamp clamping.
This series from Deepa Dinamani adds a per-superblock minimum/maximum
timestamp limit for a file system, and clamps timestamps as they are
written, to avoid random behavior from integer overflow as well as
having different time stamps on disk vs in memory.
At mount time, a warning is now printed for any file system that can
represent current timestamps but not future timestamps more than 30
years into the future, similar to the arbitrary 30 year limit that was
added to settimeofday().
This was picked as a compromise to warn users to migrate to other file
systems (e.g. ext4 instead of ext3) when they need the file system to
survive beyond 2038 (or similar limits in other file systems), but not
get in the way of normal usage"
* tag 'y2038-vfs' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
ext4: Reduce ext4 timestamp warnings
isofs: Initialize filesystem timestamp ranges
pstore: fs superblock limits
fs: omfs: Initialize filesystem timestamp ranges
fs: hpfs: Initialize filesystem timestamp ranges
fs: ceph: Initialize filesystem timestamp ranges
fs: sysv: Initialize filesystem timestamp ranges
fs: affs: Initialize filesystem timestamp ranges
fs: fat: Initialize filesystem timestamp ranges
fs: cifs: Initialize filesystem timestamp ranges
fs: nfs: Initialize filesystem timestamp ranges
ext4: Initialize timestamps limits
9p: Fill min and max timestamps in sb
fs: Fill in max and min timestamps in superblock
utimes: Clamp the timestamps before update
mount: Add mount warning for impending timestamp expiry
timestamp_truncate: Replace users of timespec64_trunc
vfs: Add timestamp_truncate() api
vfs: Add file timestamp range support
Pull xfs updates from Darrick Wong:
"For this cycle we have the usual pile of cleanups and bug fixes, some
performance improvements for online metadata scrubbing, massive
speedups in the directory entry creation code, some performance
improvement in the file ACL lookup code, a fix for a logging stall
during mount, and fixes for concurrency problems.
It has survived a couple of weeks of xfstests runs and merges cleanly.
Summary:
- Remove KM_SLEEP/KM_NOSLEEP.
- Ensure that memory buffers for IO are properly sector-aligned to
avoid problems that the block layer doesn't check.
- Make the bmap scrubber more efficient in its record checking.
- Don't crash xfs_db when superblock inode geometry is corrupt.
- Fix btree key helper functions.
- Remove unneeded error returns for things that can't fail.
- Fix buffer logging bugs in repair.
- Clean up iterator return values.
- Speed up directory entry creation.
- Enable allocation of xattr value memory buffer during lookup.
- Fix readahead racing with truncate/punch hole.
- Other minor cleanups.
- Fix one AGI/AGF deadlock with RENAME_WHITEOUT.
- More BUG -> WARN whackamole.
- Fix various problems with the log failing to advance under certain
circumstances, which results in stalls during mount"
* tag 'xfs-5.4-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (45 commits)
xfs: push the grant head when the log head moves forward
xfs: push iclog state cleaning into xlog_state_clean_log
xfs: factor iclog state processing out of xlog_state_do_callback()
xfs: factor callbacks out of xlog_state_do_callback()
xfs: factor debug code out of xlog_state_do_callback()
xfs: prevent CIL push holdoff in log recovery
xfs: fix missed wakeup on l_flush_wait
xfs: push the AIL in xlog_grant_head_wake
xfs: Use WARN_ON_ONCE for bailout mount-operation
xfs: Fix deadlock between AGI and AGF with RENAME_WHITEOUT
xfs: define a flags field for the AG geometry ioctl structure
xfs: add a xfs_valid_startblock helper
xfs: remove the unused XFS_ALLOC_USERDATA flag
xfs: cleanup xfs_fsb_to_db
xfs: fix the dax supported check in xfs_ioctl_setattr_dax_invalidate
xfs: Fix stale data exposure when readahead races with hole punch
fs: Export generic_fadvise()
mm: Handle MADV_WILLNEED through vfs_fadvise()
xfs: allocate xattr buffer on demand
xfs: consolidate attribute value copying
...
Pull swap access updates from Darrick Wong:
"Prohibit writing to active swap files and swap partitions.
There's no non-malicious use case for allowing userspace to scribble
on storage that the kernel thinks it owns"
* tag 'vfs-5.4-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
vfs: don't allow writes to swap files
mm: set S_SWAPFILE on blockdev swap devices
Pull fs-verity support from Eric Biggers:
"fs-verity is a filesystem feature that provides Merkle tree based
hashing (similar to dm-verity) for individual readonly files, mainly
for the purpose of efficient authenticity verification.
This pull request includes:
(a) The fs/verity/ support layer and documentation.
(b) fs-verity support for ext4 and f2fs.
Compared to the original fs-verity patchset from last year, the UAPI
to enable fs-verity on a file has been greatly simplified. Lots of
other things were cleaned up too.
fs-verity is planned to be used by two different projects on Android;
most of the userspace code is in place already. Another userspace tool
("fsverity-utils"), and xfstests, are also available. e2fsprogs and
f2fs-tools already have fs-verity support. Other people have shown
interest in using fs-verity too.
I've tested this on ext4 and f2fs with xfstests, both the existing
tests and the new fs-verity tests. This has also been in linux-next
since July 30 with no reported issues except a couple minor ones I
found myself and folded in fixes for.
Ted and I will be co-maintaining fs-verity"
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
f2fs: add fs-verity support
ext4: update on-disk format documentation for fs-verity
ext4: add fs-verity read support
ext4: add basic fs-verity support
fs-verity: support builtin file signatures
fs-verity: add SHA-512 support
fs-verity: implement FS_IOC_MEASURE_VERITY ioctl
fs-verity: implement FS_IOC_ENABLE_VERITY ioctl
fs-verity: add data verification hooks for ->readpages()
fs-verity: add the hook for file ->setattr()
fs-verity: add the hook for file ->open()
fs-verity: add inode and superblock fields
fs-verity: add Kconfig and the helper functions for hashing
fs: uapi: define verity bit for FS_IOC_GETFLAGS
fs-verity: add UAPI header
fs-verity: add MAINTAINERS file entry
fs-verity: add a documentation file
timespec_trunc() function is used to truncate a
filesystem timestamp to the right granularity.
But, the function does not clamp tv_sec part of the
timestamps according to the filesystem timestamp limits.
The replacement api: timestamp_truncate() also alters the
signature of the function to accommodate filesystem
timestamp clamping according to flesystem limits.
Note that the tv_nsec part is set to 0 if tv_sec is not within
the range supported for the filesystem.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Add fields to the superblock to track the min and max
timestamps supported by filesystems.
Initially, when a superblock is allocated, initialize
it to the max and min values the fields can hold.
Individual filesystems override these to match their
actual limits.
Pseudo filesystems are assumed to always support the
min and max allowable values for the fields.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Don't let userspace write to an active swap file because the kernel
effectively has a long term lease on the storage and things could get
seriously corrupted if we let this happen.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
With the new file caching infrastructure in nfsd, we can end up holding
files open for an indefinite period of time, even when they are still
idle. This may prevent the kernel from handing out leases on the file,
which is something we don't want to block.
Fix this by running a SRCU notifier call chain whenever on any
lease attempt. nfsd can then purge the cache for that inode before
returning.
Since SRCU is only conditionally compiled in, we must only define the
new chain if it's enabled, and users of the chain must ensure that
SRCU is enabled.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Add a new fscrypt ioctl, FS_IOC_ADD_ENCRYPTION_KEY. This ioctl adds an
encryption key to the filesystem's fscrypt keyring ->s_master_keys,
making any files encrypted with that key appear "unlocked".
Why we need this
~~~~~~~~~~~~~~~~
The main problem is that the "locked/unlocked" (ciphertext/plaintext)
status of encrypted files is global, but the fscrypt keys are not.
fscrypt only looks for keys in the keyring(s) the process accessing the
filesystem is subscribed to: the thread keyring, process keyring, and
session keyring, where the session keyring may contain the user keyring.
Therefore, userspace has to put fscrypt keys in the keyrings for
individual users or sessions. But this means that when a process with a
different keyring tries to access encrypted files, whether they appear
"unlocked" or not is nondeterministic. This is because it depends on
whether the files are currently present in the inode cache.
Fixing this by consistently providing each process its own view of the
filesystem depending on whether it has the key or not isn't feasible due
to how the VFS caches work. Furthermore, while sometimes users expect
this behavior, it is misguided for two reasons. First, it would be an
OS-level access control mechanism largely redundant with existing access
control mechanisms such as UNIX file permissions, ACLs, LSMs, etc.
Encryption is actually for protecting the data at rest.
Second, almost all users of fscrypt actually do need the keys to be
global. The largest users of fscrypt, Android and Chromium OS, achieve
this by having PID 1 create a "session keyring" that is inherited by
every process. This works, but it isn't scalable because it prevents
session keyrings from being used for any other purpose.
On general-purpose Linux distros, the 'fscrypt' userspace tool [1] can't
similarly abuse the session keyring, so to make 'sudo' work on all
systems it has to link all the user keyrings into root's user keyring
[2]. This is ugly and raises security concerns. Moreover it can't make
the keys available to system services, such as sshd trying to access the
user's '~/.ssh' directory (see [3], [4]) or NetworkManager trying to
read certificates from the user's home directory (see [5]); or to Docker
containers (see [6], [7]).
By having an API to add a key to the *filesystem* we'll be able to fix
the above bugs, remove userspace workarounds, and clearly express the
intended semantics: the locked/unlocked status of an encrypted directory
is global, and encryption is orthogonal to OS-level access control.
Why not use the add_key() syscall
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We use an ioctl for this API rather than the existing add_key() system
call because the ioctl gives us the flexibility needed to implement
fscrypt-specific semantics that will be introduced in later patches:
- Supporting key removal with the semantics such that the secret is
removed immediately and any unused inodes using the key are evicted;
also, the eviction of any in-use inodes can be retried.
- Calculating a key-dependent cryptographic identifier and returning it
to userspace.
- Allowing keys to be added and removed by non-root users, but only keys
for v2 encryption policies; and to prevent denial-of-service attacks,
users can only remove keys they themselves have added, and a key is
only really removed after all users who added it have removed it.
Trying to shoehorn these semantics into the keyrings syscalls would be
very difficult, whereas the ioctls make things much easier.
However, to reuse code the implementation still uses the keyrings
service internally. Thus we get lockless RCU-mode key lookups without
having to re-implement it, and the keys automatically show up in
/proc/keys for debugging purposes.
References:
[1] https://github.com/google/fscrypt
[2] https://goo.gl/55cCrI#heading=h.vf09isp98isb
[3] https://github.com/google/fscrypt/issues/111#issuecomment-444347939
[4] https://github.com/google/fscrypt/issues/116
[5] https://bugs.launchpad.net/ubuntu/+source/fscrypt/+bug/1770715
[6] https://github.com/google/fscrypt/issues/128
[7] https://askubuntu.com/questions/1130306/cannot-run-docker-on-an-encrypted-filesystem
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Commit 33ec3e53e7 ("loop: Don't change loop device under exclusive
opener") made LOOP_SET_FD ioctl acquire exclusive block device reference
while it updates loop device binding. However this can make perfectly
valid mount(2) fail with EBUSY due to racing LOOP_SET_FD holding
temporarily the exclusive bdev reference in cases like this:
for i in {a..z}{a..z}; do
dd if=/dev/zero of=$i.image bs=1k count=0 seek=1024
mkfs.ext2 $i.image
mkdir mnt$i
done
echo "Run"
for i in {a..z}{a..z}; do
mount -o loop -t ext2 $i.image mnt$i &
done
Fix the problem by not getting full exclusive bdev reference in
LOOP_SET_FD but instead just mark the bdev as being claimed while we
update the binding information. This just blocks new exclusive openers
instead of failing them with EBUSY thus fixing the problem.
Fixes: 33ec3e53e7 ("loop: Don't change loop device under exclusive opener")
Cc: stable@vger.kernel.org
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Analogous to fs/crypto/, add fields to the VFS inode and superblock for
use by the fs/verity/ support layer:
- ->s_vop: points to the fsverity_operations if the filesystem supports
fs-verity, otherwise is NULL.
- ->i_verity_info: points to cached fs-verity information for the inode
after someone opens it, otherwise is NULL.
- S_VERITY: bit in ->i_flags that identifies verity inodes, even when
they haven't been opened yet and thus still have NULL ->i_verity_info.
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
This starts to add private data associated directly
to mount points. The intent is to give filesystems
a sense of where they have come from, as a means of
letting a filesystem take different actions based on
this information.
Bug: 62094374
Bug: 120446149
Bug: 122428178
Change-Id: Ie769d7b3bb2f5972afe05c1bf16cf88c91647ab2
Signed-off-by: Daniel Rosenberg <drosen@google.com>
[astrachan: Folded 89a54ed3bf68 ("ANDROID: mnt: Fix next_descendent")
into this patch]
[drosen: Folded 138993ea820 ("Android: mnt: Propagate remount
correctly") into this patch, integrated fs_context things
Now has update_mnt_data instead of needing remount2
Signed-off-by: Alistair Strachan <astrachan@google.com>
Exposes private fs data via show_options2
Bug: 120446149
Change-Id: I2d1c06fae274eeac03ac1924ef162f7bbb2f29d0
Signed-off-by: Daniel Rosenberg <drosen@google.com>
This allows filesystems to use their mount private data to
influence the permssions they return in permission2. It has
been separated into a new call to avoid disrupting current
permission users.
Test: HiKey/X15 + Pie + android-mainline,
and HiKey + AOSP Maser + android-mainline,
directories under /sdcard created,
output of mount is right,
CTS test collecting device infor works
Bug: 35848445
Change-Id: I9d416e3b8b6eca84ef3e336bd2af89ddd51df6ca
Signed-off-by: Daniel Rosenberg <drosen@google.com>
[AmitP: Minor refactoring of original patch to align with
changes from the following upstream commit
4bfd054ae1 ("fs: fold __inode_permission() into inode_permission()").
Also introduce vfs_mkobj2(), because do_create()
moved from using vfs_create() to vfs_mkobj()
eecec19d9e ("mqueue: switch to vfs_mkobj(), quit abusing ->d_fsdata")
do_create() is dropped/cleaned-up upstream so a
minor refactoring there as well.
066cc813e9 ("do_mq_open(): move all work prior to dentry_open() into a helper")]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
[astrachan: Folded the following changes into this patch:
f46c9d62dd81 ("ANDROID: fs: Export vfs_rmdir2")
9992eb8b9a1e ("ANDROID: xattr: Pass EOPNOTSUPP to permission2")]
Signed-off-by: Alistair Strachan <astrachan@google.com>
Signed-off-by: Yongqin Liu <yongqin.liu@linaro.org>
This allows filesystems to use their mount private data to
influence the permssions they use in setattr2. It has
been separated into a new call to avoid disrupting current
setattr users.
Test: HiKey/X15 + Pie + android-mainline,
and HiKey + AOSP Maser + android-mainline,
directories under /sdcard created,
output of mount is right,
CTS test collecting device infor works
Change-Id: I19959038309284448f1b7f232d579674ef546385
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Yongqin Liu <yongqin.liu@linaro.org>
Pull vfs mount updates from Al Viro:
"The first part of mount updates.
Convert filesystems to use the new mount API"
* 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
mnt_init(): call shmem_init() unconditionally
constify ksys_mount() string arguments
don't bother with registering rootfs
init_rootfs(): don't bother with init_ramfs_fs()
vfs: Convert smackfs to use the new mount API
vfs: Convert selinuxfs to use the new mount API
vfs: Convert securityfs to use the new mount API
vfs: Convert apparmorfs to use the new mount API
vfs: Convert openpromfs to use the new mount API
vfs: Convert xenfs to use the new mount API
vfs: Convert gadgetfs to use the new mount API
vfs: Convert oprofilefs to use the new mount API
vfs: Convert ibmasmfs to use the new mount API
vfs: Convert qib_fs/ipathfs to use the new mount API
vfs: Convert efivarfs to use the new mount API
vfs: Convert configfs to use the new mount API
vfs: Convert binfmt_misc to use the new mount API
convenience helper: get_tree_single()
convenience helper get_tree_nodev()
vfs: Kill sget_userns()
...
Pull common SETFLAGS/FSSETXATTR parameter checking from Darrick Wong:
"Here's a patch series that sets up common parameter checking functions
for the FS_IOC_SETFLAGS and FS_IOC_FSSETXATTR ioctl implementations.
The goal here is to reduce the amount of behaviorial variance between
the filesystems where those ioctls originated (ext2 and XFS,
respectively) and everybody else.
- Standardize parameter checking for the SETFLAGS and FSSETXATTR
ioctls (which were the file attribute setters for ext4 and xfs and
have now been hoisted to the vfs)
- Only allow the DAX flag to be set on files and directories"
* tag 'vfs-fix-ioctl-checking-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
vfs: only allow FSSETXATTR to set DAX flag on files and dirs
vfs: teach vfs_ioc_fssetxattr_check to check extent size hints
vfs: teach vfs_ioc_fssetxattr_check to check project id info
vfs: create a generic checking function for FS_IOC_FSSETXATTR
vfs: create a generic checking and prep function for FS_IOC_SETFLAGS
Pull nfsd updates from Bruce Fields:
"Highlights:
- Add a new /proc/fs/nfsd/clients/ directory which exposes some
long-requested information about NFSv4 clients (like open files)
and allows forced revocation of client state.
- Replace the global duplicate reply cache by a cache per network
namespace; previously, a request in one network namespace could
incorrectly match an entry from another, though we haven't seen
this in production. This is the last remaining container bug that
I'm aware of; at this point you should be able to run separate
nfsd's in each network namespace, each with their own set of
exports, and everything should work.
- Cleanup and modify lock code to show the pid of lockd as the owner
of NLM locks. This is the correct version of the bugfix originally
attempted in b8eee0e90f ("lockd: Show pid of lockd for remote
locks")"
* tag 'nfsd-5.3' of git://linux-nfs.org/~bfields/linux: (34 commits)
nfsd: Make __get_nfsdfs_client() static
nfsd: Make two functions static
nfsd: Fix misuse of strlcpy
sunrpc/cache: remove the exporting of cache_seq_next
nfsd: decode implementation id
nfsd: create xdr_netobj_dup helper
nfsd: allow forced expiration of NFSv4 clients
nfsd: create get_nfsdfs_clp helper
nfsd4: show layout stateids
nfsd: show lock and deleg stateids
nfsd4: add file to display list of client's opens
nfsd: add more information to client info file
nfsd: escape high characters in binary data
nfsd: copy client's address including port number to cl_addr
nfsd4: add a client info file
nfsd: make client/ directory names small ints
nfsd: add nfsd/clients directory
nfsd4: use reference count to free client
nfsd: rename cl_refcount
nfsd: persist nfsd filesystem across mounts
...
Pull ext4 updates from Ted Ts'o:
"Many bug fixes and cleanups, and an optimization for case-insensitive
lookups"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix coverity warning on error path of filename setup
ext4: replace ktype default_attrs with default_groups
ext4: rename htree_inline_dir_to_tree() to ext4_inlinedir_to_tree()
ext4: refactor initialize_dirent_tail()
ext4: rename "dirent_csum" functions to use "dirblock"
ext4: allow directory holes
jbd2: drop declaration of journal_sync_buffer()
ext4: use jbd2_inode dirty range scoping
jbd2: introduce jbd2_inode dirty range scoping
mm: add filemap_fdatawait_range_keep_errors()
ext4: remove redundant assignment to node
ext4: optimize case-insensitive lookups
ext4: make __ext4_get_inode_loc plug
ext4: clean up kerneldoc warnigns when building with W=1
ext4: only set project inherit bit for directory
ext4: enforce the immutable flag on open files
ext4: don't allow any modifications to an immutable file
jbd2: fix typo in comment of journal_submit_inode_data_buffers
jbd2: fix some print format mistakes
ext4: gracefully handle ext4_break_layouts() failure during truncate
Pull copy_file_range updates from Darrick Wong:
"This fixes numerous parameter checking problems and inconsistent
behaviors in the new(ish) copy_file_range system call.
Now the system call will actually check its range parameters
correctly; refuse to copy into files for which the caller does not
have sufficient privileges; update mtime and strip setuid like file
writes are supposed to do; and allows copying up to the EOF of the
source file instead of failing the call like we used to.
Summary:
- Create a generic copy_file_range handler and make individual
filesystems responsible for calling it (i.e. no more assuming that
do_splice_direct will work or is appropriate)
- Refactor copy_file_range and remap_range parameter checking where
they are the same
- Install missing copy_file_range parameter checking(!)
- Remove suid/sgid and update mtime like any other file write
- Change the behavior so that a copy range crossing the source file's
eof will result in a short copy to the source file's eof instead of
EINVAL
- Permit filesystems to decide if they want to handle
cross-superblock copy_file_range in their local handlers"
* tag 'copy-file-range-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
fuse: copy_file_range needs to strip setuid bits and update timestamps
vfs: allow copy_file_range to copy across devices
xfs: use file_modified() helper
vfs: introduce file_modified() helper
vfs: add missing checks to copy_file_range
vfs: remove redundant checks from generic_remap_checks()
vfs: introduce generic_file_rw_checks()
vfs: no fallback for ->copy_file_range
vfs: introduce generic_copy_file_range()
Pull fsnotify updates from Jan Kara:
"This contains cleanups of the fsnotify name removal hook and also a
patch to disable fanotify permission events for 'proc' filesystem"
* tag 'fsnotify_for_v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: get rid of fsnotify_nameremove()
fsnotify: move fsnotify_nameremove() hook out of d_delete()
configfs: call fsnotify_rmdir() hook
debugfs: call fsnotify_{unlink,rmdir}() hooks
debugfs: simplify __debugfs_remove_file()
devpts: call fsnotify_unlink() hook
tracefs: call fsnotify_{unlink,rmdir}() hooks
rpc_pipefs: call fsnotify_{unlink,rmdir}() hooks
btrfs: call fsnotify_rmdir() hook
fsnotify: add empty fsnotify_{unlink,rmdir}() hooks
fanotify: Disallow permission events for proc filesystem
Pull file locking updates from Jeff Layton:
"Just a couple of small lease-related patches this cycle.
One from Ira to add a new tracepoint that fires during lease conflict
checks, and another patch from Amir to reduce false positives when
checking for lease conflicts"
* tag 'locks-v5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
locks: eliminate false positive conflicts for write lease
locks: Add trace_leases_conflict
After the update to use nlm_lockowners for the NLM server, there are no
more users of lm_compare_owner and lm_owner_key.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Create a generic checking function for the incoming FS_IOC_FSSETXATTR
fsxattr values so that we can standardize some of the implementation
behaviors.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Create a generic function to check incoming FS_IOC_SETFLAGS flag values
and later prepare the inode for updates so that we can standardize the
implementations that follow ext4's flag values.
Note that the efivarfs implementation no longer fails a no-op SETFLAGS
without CAP_LINUX_IMMUTABLE since that's the behavior in ext*.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: David Sterba <dsterba@suse.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
In the spirit of filemap_fdatawait_range() and
filemap_fdatawait_keep_errors(), introduce
filemap_fdatawait_range_keep_errors() which both takes a range upon
which to wait and does not clear errors from the address space.
Signed-off-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
check_conflicting_open() is checking for existing fd's open for read or
for write before allowing to take a write lease. The check that was
implemented using i_count and d_count is an approximation that has
several false positives. For example, overlayfs since v4.19, takes an
extra reference on the dentry; An open with O_PATH takes a reference on
the dentry although the file cannot be read nor written.
Change the implementation to use i_readcount and i_writecount to
eliminate the false positive conflicts and allow a write lease to be
taken on an overlayfs file.
The change of behavior with existing fd's open with O_PATH is symmetric
w.r.t. current behavior of lease breakers - an open with O_PATH currently
does not break a write lease.
This increases the size of struct inode by 4 bytes on 32bit archs when
CONFIG_FILE_LOCKING is defined and CONFIG_IMA was not already
defined.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
The combination of file_remove_privs() and file_update_mtime() is
quite common in filesystem ->write_iter() methods.
Modelled after the helper file_accessed(), introduce file_modified()
and use it from generic_remap_file_range_prep().
Note that the order of calling file_remove_privs() before
file_update_mtime() in the helper was matched to the more common order by
filesystems and not the current order in generic_remap_file_range_prep().
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Like the clone and dedupe interfaces we've recently fixed, the
copy_file_range() implementation is missing basic sanity, limits and
boundary condition tests on the parameters that are passed to it
from userspace. Create a new "generic_copy_file_checks()" function
modelled on the generic_remap_checks() function to provide this
missing functionality.
[Amir] Shorten copy length instead of checking pos_in limits
because input file size already abides by the limits.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Right now if vfs_copy_file_range() does not use any offload
mechanism, it falls back to calling do_splice_direct(). This fails
to do basic sanity checks on the files being copied. Before we
start adding this necessarily functionality to the fallback path,
separate it out into generic_copy_file_range().
generic_copy_file_range() has the same prototype as
->copy_file_range() so that filesystems can use it in their custom
->copy_file_range() method if they so choose.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
A recent documentation conversion renamed this file but forgot
to update the links.
Fixes: af96c1e304 ("docs: filesystems: vfs: Convert vfs.txt to RST")
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Proc filesystem has special locking rules for various files. Thus
fanotify which opens files on event delivery can easily deadlock
against another process that waits for fanotify permission event to be
handled. Since permission events on /proc have doubtful value anyway,
just disallow them.
Link: https://lore.kernel.org/linux-fsdevel/20190320131642.GE9485@quack2.suse.cz/
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Once upon a time we used to set ->d_name of e.g. pipefs root
so that d_path() on pipes would work. These days it's
completely pointless - dentries of pipes are not even connected
to pipefs root. However, mount_pseudo() had set the root
dentry name (passed as the second argument) and callers
kept inventing names to pass to it. Including those that
didn't *have* any non-root dentries to start with...
All of that had been pointless for about 8 years now; it's
time to get rid of that cargo-culting...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>