Changes in 5.15.52
tick/nohz: unexport __init-annotated tick_nohz_full_setup()
x86, kvm: use proper ASM macros for kvm_vcpu_is_preempted
bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init()
xfs: use kmem_cache_free() for kmem_cache objects
xfs: punch out data fork delalloc blocks on COW writeback failure
xfs: Fix the free logic of state in xfs_attr_node_hasname
xfs: remove all COW fork extents when remounting readonly
xfs: check sb_meta_uuid for dabuf buffer recovery
xfs: prevent UAF in xfs_log_item_in_current_chkpt
xfs: only bother with sync_filesystem during readonly remount
powerpc/ftrace: Remove ftrace init tramp once kernel init is complete
fs: add is_idmapped_mnt() helper
fs: move mapping helpers
fs: tweak fsuidgid_has_mapping()
fs: account for filesystem mappings
docs: update mapping documentation
fs: use low-level mapping helpers
fs: remove unused low-level mapping helpers
fs: port higher-level mapping helpers
fs: add i_user_ns() helper
fs: support mapped mounts of mapped filesystems
fs: fix acl translation
fs: account for group membership
rtw88: 8821c: support RFE type4 wifi NIC
rtw88: rtw8821c: enable rfe 6 devices
net: mscc: ocelot: allow unregistered IP multicast flooding to CPU
io_uring: fix not locked access to fixed buf table
Linux 5.15.52
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Icfb690703efd8cab1dffa7ca6cce28bbca635c3d
commit bd303368b776eead1c29e6cdda82bde7128b82a7 upstream.
In previous patches we added new and modified existing helpers to handle
idmapped mounts of filesystems mounted with an idmapping. In this final
patch we convert all relevant places in the vfs to actually pass the
filesystem's idmapping into these helpers.
With this the vfs is in shape to handle idmapped mounts of filesystems
mounted with an idmapping. Note that this is just the generic
infrastructure. Actually adding support for idmapped mounts to a
filesystem mountable with an idmapping is follow-up work.
In this patch we extend the definition of an idmapped mount from a mount
that that has the initial idmapping attached to it to a mount that has
an idmapping attached to it which is not the same as the idmapping the
filesystem was mounted with.
As before we do not allow the initial idmapping to be attached to a
mount. In addition this patch prevents that the idmapping the filesystem
was mounted with can be attached to a mount created based on this
filesystem.
This has multiple reasons and advantages. First, attaching the initial
idmapping or the filesystem's idmapping doesn't make much sense as in
both cases the values of the i_{g,u}id and other places where k{g,u}ids
are used do not change. Second, a user that really wants to do this for
whatever reason can just create a separate dedicated identical idmapping
to attach to the mount. Third, we can continue to use the initial
idmapping as an indicator that a mount is not idmapped allowing us to
continue to keep passing the initial idmapping into the mapping helpers
to tell them that something isn't an idmapped mount even if the
filesystem is mounted with an idmapping.
Link: https://lore.kernel.org/r/20211123114227.3124056-11-brauner@kernel.org (v1)
Link: https://lore.kernel.org/r/20211130121032.3753852-11-brauner@kernel.org (v2)
Link: https://lore.kernel.org/r/20211203111707.3901969-11-brauner@kernel.org
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1ac2a4104968e0a60b4b3572216a92aab5c1b025 upstream.
Currently we only support idmapped mounts for filesystems mounted
without an idmapping. This was a conscious decision mentioned in
multiple places (cf. e.g. [1]).
As explained at length in [3] it is perfectly fine to extend support for
idmapped mounts to filesystem's mounted with an idmapping should the
need arise. The need has been there for some time now. Various container
projects in userspace need this to run unprivileged and nested
unprivileged containers (cf. [2]).
Before we can port any filesystem that is mountable with an idmapping to
support idmapped mounts we need to first extend the mapping helpers to
account for the filesystem's idmapping. This again, is explained at
length in our documentation at [3] but I'll give an overview here again.
Currently, the low-level mapping helpers implement the remapping
algorithms described in [3] in a simplified manner. Because we could
rely on the fact that all filesystems supporting idmapped mounts are
mounted without an idmapping the translation step from or into the
filesystem idmapping could be skipped.
In order to support idmapped mounts of filesystem's mountable with an
idmapping the translation step we were able to skip before cannot be
skipped anymore. A filesystem mounted with an idmapping is very likely
to not use an identity mapping and will instead use a non-identity
mapping. So the translation step from or into the filesystem's idmapping
in the remapping algorithm cannot be skipped for such filesystems. More
details with examples can be found in [3].
This patch adds a few new and prepares some already existing low-level
mapping helpers to perform the full translation algorithm explained in
[3]. The low-level helpers can be written in a way that they only
perform the additional translation step when the filesystem is indeed
mounted with an idmapping.
If the low-level helpers detect that they are not dealing with an
idmapped mount they can simply return the relevant k{g,u}id unchanged;
no remapping needs to be performed at all. The no_idmapping() helper
detects whether the shortcut can be used.
If the low-level helpers detected that they are dealing with an idmapped
mount but the underlying filesystem is mounted without an idmapping we
can rely on the previous shorcut and can continue to skip the
translation step from or into the filesystem's idmapping.
These checks guarantee that only the minimal amount of work is
performed. As before, if idmapped mounts aren't used the low-level
helpers are idempotent and no work is performed at all.
This patch adds the helpers mapped_k{g,u}id_fs() and
mapped_k{g,u}id_user(). Following patches will port all places to
replace the old k{g,u}id_into_mnt() and k{g,u}id_from_mnt() with these
two new helpers. After the conversion is done k{g,u}id_into_mnt() and
k{g,u}id_from_mnt() will be removed. This also concludes the renaming of
the mapping helpers we started in [4]. Now, all mapping helpers will
started with the "mapped_" prefix making everything nice and consistent.
The mapped_k{g,u}id_fs() helpers replace the k{g,u}id_into_mnt()
helpers. They are to be used when k{g,u}ids are to be mapped from the
vfs, e.g. from from struct inode's i_{g,u}id. Conversely, the
mapped_k{g,u}id_user() helpers replace the k{g,u}id_from_mnt() helpers.
They are to be used when k{g,u}ids are to be written to disk, e.g. when
entering from a system call to change ownership of a file.
This patch only introduces the helpers. It doesn't yet convert the
relevant places to account for filesystem mounted with an idmapping.
[1]: commit 2ca4dcc490 ("fs/mount_setattr: tighten permission checks")
[2]: https://github.com/containers/podman/issues/10374
[3]: Documentations/filesystems/idmappings.rst
[4]: commit a65e58e791 ("fs: document and rename fsid helpers")
Link: https://lore.kernel.org/r/20211123114227.3124056-5-brauner@kernel.org (v1)
Link: https://lore.kernel.org/r/20211130121032.3753852-5-brauner@kernel.org (v2)
Link: https://lore.kernel.org/r/20211203111707.3901969-5-brauner@kernel.org
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are a lot of different structures that need to have a "frozen" abi
for the next 5+ years. Add padding to a lot of them in order to be able
to handle any future changes that might be needed due to LTS and
security fixes that might come up.
It's a best guess, based on what has happened in the past from the
5.10.0..5.10.110 release (1 1/2 years). Yes, past changes do not mean
that future changes will also be needed in the same area, but that is a
hint that those areas are both well maintained and looked after, and
there have been previous problems found in them.
Also the list of structures that are being required based on OEM usage
in the android/ symbol lists were consulted as that's a larger list than
what has been changed in the past.
Hopefully we caught everything we need to worry about, only time will
tell...
Bug: 151154716
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I880bbcda0628a7459988eeb49d18655522697664
Try to mitigate potential future driver core api changes by adding a
padding to a bunch of filesystem structures.
Based on a change made to the RHEL/CENTOS 8 kernel.
Bug: 151154716
Change-Id: Ida6d98d30f292c980ab07e0250fec5268c4c87ed
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
'struct fscrypt_operations' shouldn't really be part of the KMI, as
there's no reason for loadable modules to use it. However, due to the
way MODVERSIONS calculates symbol CRCs by recursively dereferencing
structures, changes to 'struct fscrypt_operations' affect the CRCs of
KMI functions exported from certain core kernel files such as
fs/dcache.c. That brings it in-scope for the KMI freeze.
Therefore, add some reserved fields to this struct for LTS updates.
Bug: 151154716
Change-Id: Ic3bf66c93a9be167a0a5b257bd55e2719d99a1b4
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jung Jinwoo <j7093.jung@samsung.com>
Changes in 5.15.5
arm64: zynqmp: Do not duplicate flash partition label property
arm64: zynqmp: Fix serial compatible string
clk: sunxi-ng: Unregister clocks/resets when unbinding
ARM: dts: sunxi: Fix OPPs node name
arm64: dts: allwinner: h5: Fix GPU thermal zone node name
arm64: dts: allwinner: a100: Fix thermal zone node name
staging: wfx: ensure IRQ is ready before enabling it
ARM: dts: BCM5301X: Fix nodes names
ARM: dts: BCM5301X: Fix MDIO mux binding
ARM: dts: NSP: Fix mpcore, mmc node names
arm64: dts: broadcom: bcm4908: Move reboot syscon out of bus
scsi: pm80xx: Fix memory leak during rmmod
scsi: lpfc: Fix list_add() corruption in lpfc_drain_txq()
ASoC: mediatek: mt8195: Add missing of_node_put()
arm64: dts: rockchip: Disable CDN DP on Pinebook Pro
arm64: dts: hisilicon: fix arm,sp805 compatible string
RDMA/bnxt_re: Check if the vlan is valid before reporting
bus: ti-sysc: Add quirk handling for reinit on context lost
bus: ti-sysc: Use context lost quirk for otg
usb: musb: tusb6010: check return value after calling platform_get_resource()
usb: typec: tipd: Remove WARN_ON in tps6598x_block_read
ARM: dts: ux500: Skomer regulator fixes
staging: rtl8723bs: remove possible deadlock when disconnect (v2)
staging: rtl8723bs: remove a second possible deadlock
staging: rtl8723bs: remove a third possible deadlock
ARM: BCM53016: Specify switch ports for Meraki MR32
arm64: dts: qcom: msm8998: Fix CPU/L2 idle state latency and residency
arm64: dts: qcom: ipq6018: Fix qcom,controlled-remotely property
arm64: dts: qcom: ipq8074: Fix qcom,controlled-remotely property
arm64: dts: qcom: sdm845: Fix qcom,controlled-remotely property
arm64: dts: freescale: fix arm,sp805 compatible string
arm64: dts: ls1012a: Add serial alias for ls1012a-rdb
RDMA/rxe: Separate HW and SW l/rkeys
ASoC: SOF: Intel: hda-dai: fix potential locking issue
scsi: core: Fix scsi_mode_sense() buffer length handling
ALSA: usb-audio: disable implicit feedback sync for Behringer UFX1204 and UFX1604
clk: imx: imx6ul: Move csi_sel mux to correct base register
ASoC: es8316: Use IRQF_NO_AUTOEN when requesting the IRQ
ASoC: rt5651: Use IRQF_NO_AUTOEN when requesting the IRQ
ASoC: nau8824: Add DMI quirk mechanism for active-high jack-detect
scsi: advansys: Fix kernel pointer leak
scsi: smartpqi: Add controller handshake during kdump
arm64: dts: imx8mm-kontron: Fix reset delays for ethernet PHY
ALSA: intel-dsp-config: add quirk for APL/GLK/TGL devices based on ES8336 codec
ASoC: Intel: soc-acpi: add missing quirk for TGL SDCA single amp
ASoC: Intel: sof_sdw: add missing quirk for Dell SKU 0A45
firmware_loader: fix pre-allocated buf built-in firmware use
HID: multitouch: disable sticky fingers for UPERFECT Y
ALSA: usb-audio: Add support for the Pioneer DJM 750MK2 Mixer/Soundcard
ARM: dts: omap: fix gpmc,mux-add-data type
usb: host: ohci-tmio: check return value after calling platform_get_resource()
ASoC: rt5682: fix a little pop while playback
ARM: dts: ls1021a: move thermal-zones node out of soc/
ARM: dts: ls1021a-tsn: use generic "jedec,spi-nor" compatible for flash
ALSA: ISA: not for M68K
iommu/vt-d: Do not falsely log intel_iommu is unsupported kernel option
tty: tty_buffer: Fix the softlockup issue in flush_to_ldisc
MIPS: sni: Fix the build
scsi: scsi_debug: Fix out-of-bound read in resp_readcap16()
scsi: scsi_debug: Fix out-of-bound read in resp_report_tgtpgs()
scsi: target: Fix ordered tag handling
scsi: target: Fix alua_tg_pt_gps_count tracking
iio: imu: st_lsm6dsx: Avoid potential array overflow in st_lsm6dsx_set_odr()
RDMA/core: Use kvzalloc when allocating the struct ib_port
scsi: lpfc: Fix use-after-free in lpfc_unreg_rpi() routine
scsi: lpfc: Fix link down processing to address NULL pointer dereference
scsi: lpfc: Allow fabric node recovery if recovery is in progress before devloss
memory: tegra20-emc: Add runtime dependency on devfreq governor module
powerpc/5200: dts: fix memory node unit name
ARM: dts: qcom: fix memory and mdio nodes naming for RB3011
arm64: dts: qcom: Fix node name of rpm-msg-ram device nodes
ALSA: gus: fix null pointer dereference on pointer block
ALSA: usb-audio: fix null pointer dereference on pointer cs_desc
clk: at91: sama7g5: remove prescaler part of master clock
iommu/dart: Initialize DART_STREAMS_ENABLE
powerpc/dcr: Use cmplwi instead of 3-argument cmpli
powerpc/8xx: Fix Oops with STRICT_KERNEL_RWX without DEBUG_RODATA_TEST
sh: check return code of request_irq
maple: fix wrong return value of maple_bus_init().
f2fs: fix up f2fs_lookup tracepoints
f2fs: fix to use WHINT_MODE
f2fs: fix wrong condition to trigger background checkpoint correctly
sh: fix kconfig unmet dependency warning for FRAME_POINTER
sh: math-emu: drop unused functions
sh: define __BIG_ENDIAN for math-emu
f2fs: compress: disallow disabling compress on non-empty compressed file
f2fs: fix incorrect return value in f2fs_sanity_check_ckpt()
clk: ingenic: Fix bugs with divided dividers
clk/ast2600: Fix soc revision for AHB
clk: qcom: gcc-msm8996: Drop (again) gcc_aggre1_pnoc_ahb_clk
KVM: arm64: Fix host stage-2 finalization
mips: BCM63XX: ensure that CPU_SUPPORTS_32BIT_KERNEL is set
MIPS: boot/compressed/: add __bswapdi2() to target for ZSTD decompression
sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()
sched/fair: Prevent dead task groups from regaining cfs_rq's
perf/x86/vlbr: Add c->flags to vlbr event constraints
blkcg: Remove extra blkcg_bio_issue_init
tracing/histogram: Do not copy the fixed-size char array field over the field size
perf bpf: Avoid memory leak from perf_env__insert_btf()
perf bench futex: Fix memory leak of perf_cpu_map__new()
perf tests: Remove bash construct from record+zstd_comp_decomp.sh
drm/nouveau: hdmigv100.c: fix corrupted HDMI Vendor InfoFrame
bpf: Fix inner map state pruning regression.
samples/bpf: Fix summary per-sec stats in xdp_sample_user
samples/bpf: Fix incorrect use of strlen in xdp_redirect_cpu
selftests: net: switch to socat in the GSO GRE test
net/ipa: ipa_resource: Fix wrong for loop range
tcp: Fix uninitialized access in skb frags array for Rx 0cp.
tracing: Add length protection to histogram string copies
nl80211: fix radio statistics in survey dump
mac80211: fix monitor_sdata RCU/locking assertions
net: ipa: HOLB register sometimes must be written twice
net: ipa: disable HOLB drop when updating timer
selftests: gpio: fix gpio compiling error
net: bnx2x: fix variable dereferenced before check
bnxt_en: reject indirect blk offload when hw-tc-offload is off
tipc: only accept encrypted MSG_CRYPTO msgs
sock: fix /proc/net/sockstat underflow in sk_clone_lock()
net/smc: Make sure the link_id is unique
NFSD: Fix exposure in nfsd4_decode_bitmap()
iavf: Fix return of set the new channel count
iavf: check for null in iavf_fix_features
iavf: free q_vectors before queues in iavf_disable_vf
iavf: don't clear a lock we don't hold
iavf: Fix failure to exit out from last all-multicast mode
iavf: prevent accidental free of filter structure
iavf: validate pointers
iavf: Fix for the false positive ASQ/ARQ errors while issuing VF reset
iavf: Fix for setting queues to 0
iavf: Restore VLAN filters after link down
bpf: Fix toctou on read-only map's constant scalar tracking
MIPS: generic/yamon-dt: fix uninitialized variable error
mips: bcm63xx: add support for clk_get_parent()
mips: lantiq: add support for clk_get_parent()
gpio: rockchip: needs GENERIC_IRQ_CHIP to fix build errors
platform/x86: hp_accel: Fix an error handling path in 'lis3lv02d_probe()'
platform/x86: think-lmi: Abort probe on analyze failure
udp: Validate checksum in udp_read_sock()
btrfs: make 1-bit bit-fields of scrub_page unsigned int
RDMA/core: Set send and receive CQ before forwarding to the driver
net/mlx5e: kTLS, Fix crash in RX resync flow
net/mlx5e: Wait for concurrent flow deletion during neigh/fib events
net/mlx5: E-Switch, Fix resetting of encap mode when entering switchdev
net/mlx5e: nullify cq->dbg pointer in mlx5_debug_cq_remove()
net/mlx5: Update error handler for UCTX and UMEM
net/mlx5: E-Switch, rebuild lag only when needed
net/mlx5e: CT, Fix multiple allocations and memleak of mod acts
net/mlx5: Lag, update tracker when state change event received
net/mlx5: E-Switch, return error if encap isn't supported
scsi: ufs: core: Improve SCSI abort handling
scsi: core: sysfs: Fix hang when device state is set via sysfs
scsi: ufs: core: Fix task management completion timeout race
scsi: ufs: core: Fix another task management completion race
net: mvmdio: fix compilation warning
net: sched: act_mirred: drop dst for the direction from egress to ingress
net: dpaa2-eth: fix use-after-free in dpaa2_eth_remove
net: virtio_net_hdr_to_skb: count transport header in UFO
i40e: Fix correct max_pkt_size on VF RX queue
i40e: Fix NULL ptr dereference on VSI filter sync
i40e: Fix changing previously set num_queue_pairs for PFs
i40e: Fix ping is lost after configuring ADq on VF
RDMA/mlx4: Do not fail the registration on port stats
i40e: Fix warning message and call stack during rmmod i40e driver
i40e: Fix creation of first queue by omitting it if is not power of two
i40e: Fix display error code in dmesg
NFC: reorganize the functions in nci_request
NFC: reorder the logic in nfc_{un,}register_device
NFC: add NCI_UNREG flag to eliminate the race
e100: fix device suspend/resume
ptp: ocp: Fix a couple NULL vs IS_ERR() checks
tools build: Fix removal of feature-sync-compare-and-swap feature detection
riscv: fix building external modules
KVM: PPC: Book3S HV: Use GLOBAL_TOC for kvmppc_h_set_dabr/xdabr()
powerpc: clean vdso32 and vdso64 directories
powerpc/pseries: rename numa_dist_table to form2_distances
powerpc/pseries: Fix numa FORM2 parsing fallback code
pinctrl: qcom: sdm845: Enable dual edge errata
pinctrl: qcom: sm8350: Correct UFS and SDC offsets
perf/x86/intel/uncore: Fix filter_tid mask for CHA events on Skylake Server
perf/x86/intel/uncore: Fix IIO event constraints for Skylake Server
perf/x86/intel/uncore: Fix IIO event constraints for Snowridge
s390/kexec: fix return code handling
blk-cgroup: fix missing put device in error path from blkg_conf_pref()
dmaengine: remove debugfs #ifdef
tun: fix bonding active backup with arp monitoring
Revert "mark pstore-blk as broken"
pstore/blk: Use "%lu" to format unsigned long
hexagon: export raw I/O routines for modules
hexagon: clean up timer-regs.h
tipc: check for null after calling kmemdup
ipc: WARN if trying to remove ipc object which is absent
shm: extend forced shm destroy to support objects from several IPC nses
mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag
hugetlb, userfaultfd: fix reservation restore on userfaultfd error
kmap_local: don't assume kmap PTEs are linear arrays in memory
mm/damon/dbgfs: use '__GFP_NOWARN' for user-specified size buffer allocation
mm/damon/dbgfs: fix missed use of damon_dbgfs_lock
x86/boot: Pull up cmdline preparation and early param parsing
x86/sgx: Fix free page accounting
x86/hyperv: Fix NULL deref in set_hv_tscchange_cb() if Hyper-V setup fails
KVM: x86: Assume a 64-bit hypercall for guests with protected state
KVM: x86: Fix uninitialized eoi_exit_bitmap usage in vcpu_load_eoi_exitmap()
KVM: x86/mmu: include EFER.LMA in extended mmu role
KVM: x86/xen: Fix get_attr of KVM_XEN_ATTR_TYPE_SHARED_INFO
powerpc/signal32: Fix sigset_t copy
powerpc/xive: Change IRQ domain to a tree domain
powerpc/8xx: Fix pinned TLBs with CONFIG_STRICT_KERNEL_RWX
Revert "drm/i915/tgl/dsi: Gate the ddi clocks after pll mapping"
Revert "parisc: Reduce sigreturn trampoline to 3 instructions"
ata: libata: improve ata_read_log_page() error message
ata: libata: add missing ata_identify_page_supported() calls
scsi: qla2xxx: Fix mailbox direction flags in qla2xxx_get_adapter_id()
pinctrl: ralink: include 'ralink_regs.h' in 'pinctrl-mt7620.c'
s390/setup: avoid reserving memory above identity mapping
s390/boot: simplify and fix kernel memory layout setup
s390/vdso: filter out -mstack-guard and -mstack-size
s390/kexec: fix memory leak of ipl report buffer
s390/dump: fix copying to user-space of swapped kdump oldmem
block: Check ADMIN before NICE for IOPRIO_CLASS_RT
fbdev: Prevent probing generic drivers if a FB is already registered
KVM: SEV: Disallow COPY_ENC_CONTEXT_FROM if target has created vCPUs
KVM: nVMX: don't use vcpu->arch.efer when checking host state on nested state load
drm/cma-helper: Release non-coherent memory with dma_free_noncoherent()
printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces
udf: Fix crash after seekdir
spi: fix use-after-free of the add_lock mutex
net: stmmac: socfpga: add runtime suspend/resume callback for stratix10 platform
Drivers: hv: balloon: Use VMBUS_RING_SIZE() wrapper for dm_ring_size
btrfs: fix memory ordering between normal and ordered work functions
fs: handle circular mappings correctly
net: stmmac: Fix signed/unsigned wreckage
parisc/sticon: fix reverse colors
cfg80211: call cfg80211_stop_ap when switch from P2P_GO type
mac80211: fix radiotap header generation
mac80211: drop check for DONT_REORDER in __ieee80211_select_queue
drm/amd/display: Update swizzle mode enums
drm/amd/display: Limit max DSC target bpp for specific monitors
drm/i915/guc: Fix outstanding G2H accounting
drm/i915/guc: Don't enable scheduling on a banned context, guc_id invalid, not registered
drm/i915/guc: Workaround reset G2H is received after schedule done G2H
drm/i915/guc: Don't drop ce->guc_active.lock when unwinding context
drm/i915/guc: Unwind context requests in reverse order
drm/udl: fix control-message timeout
drm/prime: Fix use after free in mmap with drm_gem_ttm_mmap
drm/nouveau: Add a dedicated mutex for the clients list
drm/nouveau: use drm_dev_unplug() during device removal
drm/nouveau: clean up all clients on device removal
drm/i915/dp: Ensure sink rate values are always valid
drm/i915/dp: Ensure max link params are always valid
drm/i915: Fix type1 DVI DP dual mode adapter heuristic for modern platforms
drm/amdgpu: fix set scaling mode Full/Full aspect/Center not works on vga and dvi connectors
drm/amd/pm: avoid duplicate powergate/ungate setting
signal: Implement force_fatal_sig
exit/syscall_user_dispatch: Send ordinary signals on failure
signal/powerpc: On swapcontext failure force SIGSEGV
signal/s390: Use force_sigsegv in default_trap_handler
signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails
signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig
signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.
signal/x86: In emulate_vsyscall force a signal instead of calling do_exit
signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
signal: Don't always set SA_IMMUTABLE for forced signals
signal: Replace force_fatal_sig with force_exit_sig when in doubt
hugetlbfs: flush TLBs correctly after huge_pmd_unshare
RDMA/netlink: Add __maybe_unused to static inline in C file
bpf: Forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progs
selinux: fix NULL-pointer dereference when hashtab allocation fails
ASoC: DAPM: Cover regression by kctl change notification fix
ASoC: rsnd: fixup DMAEngine API
usb: max-3421: Use driver data instead of maintaining a list of bound devices
ice: Fix VF true promiscuous mode
ice: Delete always true check of PF pointer
fs: export an inode_update_time helper
btrfs: update device path inode time instead of bd_inode
net: add and use skb_unclone_keeptruesize() helper
x86/Kconfig: Fix an unused variable error in dell-smm-hwmon
ALSA: hda: hdac_ext_stream: fix potential locking issues
ALSA: hda: hdac_stream: fix potential locking issue in snd_hdac_stream_assign()
Linux 5.15.5
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: If86a02ba2cf9af765d9838ada3b9a2cbcea9a08d
commit e60feb445fce9e51c1558a6aa7faf9dd5ded533b upstream.
If you already have an inode and need to update the time on the inode
there is no way to do this properly. Export this helper to allow file
systems to update time on the inode so the appropriate handler is
called, either ->update_time or generic_update_time.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We currently plan to disallow use of filp_open() from drivers in GKI,
however the ZRAM driver still needs it. Add a new GKI-only variant of
filp_open() which only permits a block device to be opened, which can
be exported instead. This keeps ZRAM working but cuts down on drivers
that attempt to open and write files in kernel mode.
Bug: 179220339
Bug: 205141088
Change-Id: Id696b4aaf204b0499ce0a1b6416648670236e570
Signed-off-by: Alistair Delva <adelva@google.com>
Pull gfs2 setattr updates from Al Viro:
"Make it possible for filesystems to use a generic 'may_setattr()' and
switch gfs2 to using it"
* 'work.gfs2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
gfs2: Switch to may_setattr in gfs2_setattr
fs: Move notify_change permission checks into may_setattr
Pull root filesystem type handling updates from Al Viro:
"Teach init/do_mounts.c to handle non-block filesystems, hopefully
preventing even more special-cased kludges (such as root=/dev/nfs,
etc)"
* 'work.init' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: simplify get_filesystem_list / get_all_fs_names
init: allow mounting arbitrary non-blockdevice filesystems as root
init: split get_fs_names
Pull MAP_DENYWRITE removal from David Hildenbrand:
"Remove all in-tree usage of MAP_DENYWRITE from the kernel and remove
VM_DENYWRITE.
There are some (minor) user-visible changes:
- We no longer deny write access to shared libaries loaded via legacy
uselib(); this behavior matches modern user space e.g. dlopen().
- We no longer deny write access to the elf interpreter after exec
completed, treating it just like shared libraries (which it often
is).
- We always deny write access to the file linked via /proc/pid/exe:
sys_prctl(PR_SET_MM_MAP/EXE_FILE) will fail if write access to the
file cannot be denied, and write access to the file will remain
denied until the link is effectivel gone (exec, termination,
sys_prctl(PR_SET_MM_MAP/EXE_FILE)) -- just as if exec'ing the file.
Cross-compiled for a bunch of architectures (alpha, microblaze, i386,
s390x, ...) and verified via ltp that especially the relevant tests
(i.e., creat07 and execve04) continue working as expected"
* tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linux:
fs: update documentation of get_write_access() and friends
mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff()
mm: remove VM_DENYWRITE
binfmt: remove in-tree usage of MAP_DENYWRITE
kernel/fork: always deny write access to current MM exe_file
kernel/fork: factor out replacing the current MM exe_file
binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib()
As VM_DENYWRITE does no longer exists, let's spring-clean the
documentation of get_write_access() and friends.
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Pull overlayfs update from Miklos Szeredi:
- Copy up immutable/append/sync/noatime attributes (Amir Goldstein)
- Improve performance by enabling RCU lookup.
- Misc fixes and improvements
The reason this touches so many files is that the ->get_acl() method now
gets a "bool rcu" argument. The ->get_acl() API was updated based on
comments from Al and Linus:
Link: https://lore.kernel.org/linux-fsdevel/CAJfpeguQxpd6Wgc0Jd3ks77zcsAv_bn0q17L3VNnnmPKu11t8A@mail.gmail.com/
* tag 'ovl-update-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: enable RCU'd ->get_acl()
vfs: add rcu argument to ->get_acl() callback
ovl: fix BUG_ON() in may_delete() when called from ovl_cleanup()
ovl: use kvalloc in xattr copy-up
ovl: update ctime when changing fileattr
ovl: skip checking lower file's i_writecount on truncate
ovl: relax lookup error on mismatch origin ftype
ovl: do not set overlay.opaque for new directories
ovl: add ovl_allow_offline_changes() helper
ovl: disable decoding null uuid with redirect_dir
ovl: consistent behavior for immutable/append-only inodes
ovl: copy up sync/noatime fileattr flags
ovl: pass ovl_fs to ovl_check_setxattr()
fs: add generic helper for filling statx attribute flags
Pull nfsd updates from Chuck Lever:
"New features:
- Support for server-side disconnect injection via debugfs
- Protocol definitions for new RPC_AUTH_TLS authentication flavor
Performance improvements:
- Reduce page allocator traffic in the NFSD splice read actor
- Reduce CPU utilization in svcrdma's Send completion handler
Notable bug fixes:
- Stabilize lockd operation when re-exporting NFS mounts
- Fix the use of %.*s in NFSD tracepoints
- Fix /proc/sys/fs/nfs/nsm_use_hostnames"
* tag 'nfsd-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (31 commits)
nfsd: fix crash on LOCKT on reexported NFSv3
nfs: don't allow reexport reclaims
lockd: don't attempt blocking locks on nfs reexports
nfs: don't atempt blocking locks on nfs reexports
Keep read and write fds with each nlm_file
lockd: update nlm_lookup_file reexport comment
nlm: minor refactoring
nlm: minor nlm_lookup_file argument change
lockd: lockd server-side shouldn't set fl_ops
SUNRPC: Add documentation for the fail_sunrpc/ directory
SUNRPC: Server-side disconnect injection
SUNRPC: Move client-side disconnect injection
SUNRPC: Add a /sys/kernel/debug/fail_sunrpc/ directory
svcrdma: xpt_bc_xprt is already clear in __svc_rdma_free()
nfsd4: Fix forced-expiry locking
rpc: fix gss_svc_init cleanup on failure
SUNRPC: Add RPC_AUTH_TLS protocol numbers
lockd: change the proc_handler for nsm_use_hostnames
sysctl: introduce new proc handler proc_dobool
SUNRPC: Fix a NULL pointer deref in trace_svc_stats_latency()
...
Pull btrfs updates from David Sterba:
"The highlights of this round are integrations with fs-verity and
idmapped mounts, the rest is usual mix of minor improvements, speedups
and cleanups.
There are some patches outside of btrfs, namely updating some VFS
interfaces, all straightforward and acked.
Features:
- fs-verity support, using standard ioctls, backward compatible with
read-only limitation on inodes with previously enabled fs-verity
- idmapped mount support
- make mount with rescue=ibadroots more tolerant to partially damaged
trees
- allow raid0 on a single device and raid10 on two devices,
degenerate cases but might be useful as an intermediate step during
conversion to other profiles
- zoned mode block group auto reclaim can be disabled via sysfs knob
Performance improvements:
- continue readahead of node siblings even if target node is in
memory, could speed up full send (on sample test +11%)
- batching of delayed items can speed up creating many files
- fsync/tree-log speedups
- avoid unnecessary work (gains +2% throughput, -2% run time on
sample load)
- reduced lock contention on renames (on dbench +4% throughput,
up to -30% latency)
Fixes:
- various zoned mode fixes
- preemptive flushing threshold tuning, avoid excessive work on
almost full filesystems
Core:
- continued subpage support, preparation for implementing remaining
features like compression and defragmentation; with some
limitations, write is now enabled on 64K page systems with 4K
sectors, still considered experimental
- no readahead on compressed reads
- inline extents disabled
- disabled raid56 profile conversion and mount
- improved flushing logic, fixing early ENOSPC on some workloads
- inode flags have been internally split to read-only and read-write
incompat bit parts, used by fs-verity
- new tree items for fs-verity
- descriptor item
- Merkle tree item
- inode operations extended to be namespace-aware
- cleanups and refactoring
Generic code changes:
- fs: new export filemap_fdatawrite_wbc
- fs: removed sync_inode
- block: bio_trim argument type fixups
- vfs: add namespace-aware lookup"
* tag 'for-5.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (114 commits)
btrfs: reset replace target device to allocation state on close
btrfs: zoned: fix ordered extent boundary calculation
btrfs: do not do preemptive flushing if the majority is global rsv
btrfs: reduce the preemptive flushing threshold to 90%
btrfs: tree-log: check btrfs_lookup_data_extent return value
btrfs: avoid unnecessarily logging directories that had no changes
btrfs: allow idmapped mount
btrfs: handle ACLs on idmapped mounts
btrfs: allow idmapped INO_LOOKUP_USER ioctl
btrfs: allow idmapped SUBVOL_SETFLAGS ioctl
btrfs: allow idmapped SET_RECEIVED_SUBVOL ioctls
btrfs: relax restrictions for SNAP_DESTROY_V2 with subvolids
btrfs: allow idmapped SNAP_DESTROY ioctls
btrfs: allow idmapped SNAP_CREATE/SUBVOL_CREATE ioctls
btrfs: check whether fsgid/fsuid are mapped during subvolume creation
btrfs: allow idmapped permission inode op
btrfs: allow idmapped setattr inode op
btrfs: allow idmapped tmpfile inode op
btrfs: allow idmapped symlink inode op
btrfs: allow idmapped mkdir inode op
...
Pull io_uring mkdirat/symlinkat/linkat support from Jens Axboe:
"This adds io_uring support for mkdirat, symlinkat, and linkat"
* tag 'for-5.15/io_uring-vfs-2021-08-30' of git://git.kernel.dk/linux-block:
io_uring: add support for IORING_OP_LINKAT
io_uring: add support for IORING_OP_SYMLINKAT
io_uring: add support for IORING_OP_MKDIRAT
namei: update do_*() helpers to return ints
namei: make do_linkat() take struct filename
namei: add getname_uflags()
namei: make do_symlinkat() take struct filename
namei: make do_mknodat() take struct filename
namei: make do_mkdirat() take struct filename
namei: change filename_parentat() calling conventions
namei: ignore ERR/NULL names in putname()
Pull support for struct bio recycling from Jens Axboe:
"This adds bio recycling support for polled IO, allowing quick reuse of
a bio for high IOPS scenarios via a percpu bio_set list.
It's good for almost a 10% improvement in performance, bumping our
per-core IO limit from ~3.2M IOPS to ~3.5M IOPS"
* tag 'io_uring-bio-cache.5-2021-08-30' of git://git.kernel.dk/linux-block:
bio: improve kerneldoc documentation for bio_alloc_kiocb()
block: provide bio_clear_hipri() helper
block: use the percpu bio cache in __blkdev_direct_IO
io_uring: enable use of bio alloc cache
block: clear BIO_PERCPU_CACHE flag if polling isn't supported
bio: add allocation cache abstraction
fs: add kiocb alloc cache flag
bio: optimize initialization of a bio
Pull file locking updates from Jeff Layton:
"This starts with a couple of fixes for potential deadlocks in the
fowner/fasync handling.
The next patch removes the old mandatory locking code from the kernel
altogether.
The last patch cleans up rw_verify_area a bit more after the mandatory
locking removal"
* tag 'locks-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
fs: clean up after mandatory file locking support removal
fs: remove mandatory file locking support
fcntl: fix potential deadlock for &fasync_struct.fa_lock
fcntl: fix potential deadlocks for &fown_struct.lock
Pull fs hole punching vs cache filling race fixes from Jan Kara:
"Fix races leading to possible data corruption or stale data exposure
in multiple filesystems when hole punching races with operations such
as readahead.
This is the series I was sending for the last merge window but with
your objection fixed - now filemap_fault() has been modified to take
invalidate_lock only when we need to create new page in the page cache
and / or bring it uptodate"
* tag 'hole_punch_for_v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
filesystems/locking: fix Malformed table warning
cifs: Fix race between hole punch and page fault
ceph: Fix race between hole punch and page fault
fuse: Convert to using invalidate_lock
f2fs: Convert to using invalidate_lock
zonefs: Convert to using invalidate_lock
xfs: Convert double locking of MMAPLOCK to use VFS helpers
xfs: Convert to use invalidate_lock
xfs: Refactor xfs_isilocked()
ext2: Convert to using invalidate_lock
ext4: Convert to use mapping->invalidate_lock
mm: Add functions to lock invalidate_lock for two mappings
mm: Protect operations adding pages to page cache with invalidate_lock
documentation: Sync file_operations members with reality
mm: Fix comments mentioning i_mutex
In the reexport case, nfsd is currently passing along locks with the
reclaim bit set. The client sends a new lock request, which is granted
if there's currently no conflict--even if it's possible a conflicting
lock could have been briefly held in the interim.
We don't currently have any way to safely grant reclaim, so for now
let's just deny them all.
I'm doing this by passing the reclaim bit to nfs and letting it fail the
call, with the idea that eventually the client might be able to do
something more forgiving here.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
If this kiocb can safely use the polled bio allocation cache, then this
flag must be set. Generally this can be set for polled IO, where we will
not see IRQ completions of the request.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Btrfs sometimes needs to flush dirty pages on a bunch of dirty inodes in
order to reclaim metadata reservations. Unfortunately most helpers in
this area are too smart for us:
1) The normal filemap_fdata* helpers only take range and sync modes, and
don't give any indication of how much was written, so we can only
flush full inodes, which isn't what we want in most cases.
2) The normal writeback path requires us to have the s_umount sem held,
but we can't unconditionally take it in this path because we could
deadlock.
3) The normal writeback path also skips inodes with I_SYNC set if we
write with WB_SYNC_NONE. This isn't the behavior we want under heavy
ENOSPC pressure, we want to actually make sure the pages are under
writeback before returning, and if another thread is in the middle of
writing the file we may return before they're under writeback and
miss our ordered extents and not properly wait for completion.
4) sync_inode() uses the normal writeback path and has the same problem
as #3.
What we really want is to call do_writepages() with our wbc. This way
we can make sure that writeback is actually started on the pages, and we
can control how many pages are written as a whole as we write many
inodes using the same wbc. Accomplish this with a new helper that does
just that so we can use it for our ENOSPC flushing infrastructure.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We added CONFIG_MANDATORY_FILE_LOCKING in 2015, and soon after turned it
off in Fedora and RHEL8. Several other distros have followed suit.
I've heard of one problem in all that time: Someone migrated from an
older distro that supported "-o mand" to one that didn't, and the host
had a fstab entry with "mand" in it which broke on reboot. They didn't
actually _use_ mandatory locking so they just removed the mount option
and moved on.
This patch rips out mandatory locking support wholesale from the kernel,
along with the Kconfig option and the Documentation file. It also
changes the mount code to ignore the "mand" mount option instead of
erroring out, and to throw a big, ugly warning.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Just output the '\0' separate list of supported file systems for block
devices directly rather than going through a pointless round of string
manipulation.
Based on an earlier patch from Al Viro <viro@zeniv.linux.org.uk>.
Vivek:
Modified list_bdev_fs_names() and split_fs_names() to return number of
null terminted strings to caller. Callers now use that information to
loop through all the strings instead of relying on one extra null char
being present at the end.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Overlayfs does not cache ACL's (to avoid double caching). Instead it just
calls the underlying filesystem's i_op->get_acl(), which will return the
cached value, if possible.
In rcu path walk, however, get_cached_acl_rcu() is employed to get the
value from the cache, which will fail on overlayfs resulting in dropping
out of rcu walk mode. This can result in a big performance hit in certain
situations.
Fix by calling ->get_acl() with rcu=true in case of ACL_DONT_CACHE (which
indicates pass-through)
Reported-by: garyhuang <zjh.20052005@163.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Add a rcu argument to the ->get_acl() callback to allow
get_cached_acl_rcu() to call the ->get_acl() method in the next patch.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
The immutable and append-only properties on an inode are published on
the inode's i_flags and enforced by the VFS.
Create a helper to fill the corresponding STATX_ATTR_ flags in the kstat
structure from the inode's i_flags.
Only orange was converted to use this helper.
Other filesystems could use it in the future.
Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Move the permission checks in notify_change into a separate function to
make them available to filesystems.
When notify_change is called, the vfs performs those checks before
calling into iop->setattr. However, a filesystem like gfs2 can only
lock and revalidate the inode inside ->setattr, and it must then repeat
those checks to err on the safe side.
It would be nice to get rid of the double checking, but moving the
permission check into iop->setattr altogether isn't really an option.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Some operations such as reflinking blocks among files will need to lock
invalidate_lock for two mappings. Add helper functions to do that.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Currently, serializing operations such as page fault, read, or readahead
against hole punching is rather difficult. The basic race scheme is
like:
fallocate(FALLOC_FL_PUNCH_HOLE) read / fault / ..
truncate_inode_pages_range()
<create pages in page
cache here>
<update fs block mapping and free blocks>
Now the problem is in this way read / page fault / readahead can
instantiate pages in page cache with potentially stale data (if blocks
get quickly reused). Avoiding this race is not simple - page locks do
not work because we want to make sure there are *no* pages in given
range. inode->i_rwsem does not work because page fault happens under
mmap_sem which ranks below inode->i_rwsem. Also using it for reads makes
the performance for mixed read-write workloads suffer.
So create a new rw_semaphore in the address_space - invalidate_lock -
that protects adding of pages to page cache for page faults / reads /
readahead.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>