Commit Graph

439 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
17126d43d4 Merge 5.15.90 into android14-5.15
Changes in 5.15.90
	btrfs: fix trace event name typo for FLUSH_DELAYED_REFS
	pNFS/filelayout: Fix coalescing test for single DS
	selftests/bpf: check null propagation only neither reg is PTR_TO_BTF_ID
	tools/virtio: initialize spinlocks in vring_test.c
	virtio_pci: modify ENOENT to EINVAL
	vduse: Validate vq_num in vduse_validate_config()
	net/ethtool/ioctl: return -EOPNOTSUPP if we have no phy stats
	r8169: move rtl_wol_enable_rx() and rtl_prepare_power_down()
	RDMA/srp: Move large values to a new enum for gcc13
	btrfs: always report error in run_one_delayed_ref()
	x86/asm: Fix an assembler warning with current binutils
	f2fs: let's avoid panic if extent_tree is not created
	perf/x86/rapl: Treat Tigerlake like Icelake
	fbdev: omapfb: avoid stack overflow warning
	Bluetooth: hci_qca: Fix driver shutdown on closed serdev
	wifi: brcmfmac: fix regression for Broadcom PCIe wifi devices
	wifi: mac80211: sdata can be NULL during AMPDU start
	Add exception protection processing for vd in axi_chan_handle_err function
	zonefs: Detect append writes at invalid locations
	nilfs2: fix general protection fault in nilfs_btree_insert()
	efi: fix userspace infinite retry read efivars after EFI runtime services page fault
	ALSA: hda/realtek: fix mute/micmute LEDs for a HP ProBook
	ALSA: hda/realtek: fix mute/micmute LEDs don't work for a HP platform
	drm/amdgpu: disable runtime pm on several sienna cichlid cards(v2)
	drm/amd: Delay removal of the firmware framebuffer
	hugetlb: unshare some PMDs when splitting VMAs
	io_uring: don't gate task_work run on TIF_NOTIFY_SIGNAL
	eventpoll: add EPOLL_URING_WAKE poll wakeup flag
	eventfd: provide a eventfd_signal_mask() helper
	io_uring: pass in EPOLL_URING_WAKE for eventfd signaling and wakeups
	io_uring: improve send/recv error handling
	io_uring: ensure recv and recvmsg handle MSG_WAITALL correctly
	io_uring: add flag for disabling provided buffer recycling
	io_uring: support MSG_WAITALL for IORING_OP_SEND(MSG)
	io_uring: allow re-poll if we made progress
	io_uring: fix async accept on O_NONBLOCK sockets
	io_uring: ensure that cached task references are always put on exit
	io_uring: remove duplicated calls to io_kiocb_ppos
	io_uring: update kiocb->ki_pos at execution time
	io_uring: do not recalculate ppos unnecessarily
	io_uring/rw: defer fsnotify calls to task context
	xhci-pci: set the dma max_seg_size
	usb: xhci: Check endpoint is valid before dereferencing it
	xhci: Fix null pointer dereference when host dies
	xhci: Add update_hub_device override for PCI xHCI hosts
	xhci: Add a flag to disable USB3 lpm on a xhci root port level.
	usb: acpi: add helper to check port lpm capability using acpi _DSM
	xhci: Detect lpm incapable xHC USB3 roothub ports from ACPI tables
	prlimit: do_prlimit needs to have a speculation check
	USB: serial: option: add Quectel EM05-G (GR) modem
	USB: serial: option: add Quectel EM05-G (CS) modem
	USB: serial: option: add Quectel EM05-G (RS) modem
	USB: serial: option: add Quectel EC200U modem
	USB: serial: option: add Quectel EM05CN (SG) modem
	USB: serial: option: add Quectel EM05CN modem
	staging: vchiq_arm: fix enum vchiq_status return types
	USB: misc: iowarrior: fix up header size for USB_DEVICE_ID_CODEMERCS_IOW100
	misc: fastrpc: Don't remove map on creater_process and device_release
	misc: fastrpc: Fix use-after-free race condition for maps
	usb: core: hub: disable autosuspend for TI TUSB8041
	comedi: adv_pci1760: Fix PWM instruction handling
	ACPI: PRM: Check whether EFI runtime is available
	mmc: sunxi-mmc: Fix clock refcount imbalance during unbind
	mmc: sdhci-esdhc-imx: correct the tuning start tap and step setting
	btrfs: do not abort transaction on failure to write log tree when syncing log
	btrfs: fix race between quota rescan and disable leading to NULL pointer deref
	cifs: do not include page data when checking signature
	thunderbolt: Use correct function to calculate maximum USB3 link rate
	riscv: dts: sifive: fu740: fix size of pcie 32bit memory
	bpf: restore the ebpf program ID for BPF_AUDIT_UNLOAD and PERF_BPF_EVENT_PROG_UNLOAD
	staging: mt7621-dts: change some node hex addresses to lower case
	tty: serial: qcom-geni-serial: fix slab-out-of-bounds on RX FIFO buffer
	tty: fix possible null-ptr-defer in spk_ttyio_release
	USB: gadgetfs: Fix race between mounting and unmounting
	USB: serial: cp210x: add SCALANCE LPE-9000 device id
	usb: cdns3: remove fetched trb from cache before dequeuing
	usb: host: ehci-fsl: Fix module alias
	usb: typec: tcpm: Fix altmode re-registration causes sysfs create fail
	usb: typec: altmodes/displayport: Add pin assignment helper
	usb: typec: altmodes/displayport: Fix pin assignment calculation
	usb: gadget: g_webcam: Send color matching descriptor per frame
	usb: gadget: f_ncm: fix potential NULL ptr deref in ncm_bitrate()
	usb-storage: apply IGNORE_UAS only for HIKSEMI MD202 on RTL9210
	dt-bindings: phy: g12a-usb2-phy: fix compatible string documentation
	dt-bindings: phy: g12a-usb3-pcie-phy: fix compatible string documentation
	serial: pch_uart: Pass correct sg to dma_unmap_sg()
	dmaengine: lgm: Move DT parsing after initialization
	dmaengine: tegra210-adma: fix global intr clear
	dmaengine: idxd: Let probe fail when workqueue cannot be enabled
	serial: amba-pl011: fix high priority character transmission in rs486 mode
	serial: atmel: fix incorrect baudrate setup
	gsmi: fix null-deref in gsmi_get_variable
	mei: me: add meteor lake point M DID
	drm/i915: re-disable RC6p on Sandy Bridge
	drm/i915/display: Check source height is > 0
	drm/amd/display: Fix set scaling doesn's work
	drm/amd/display: Calculate output_color_space after pixel encoding adjustment
	drm/amd/display: Fix COLOR_SPACE_YCBCR2020_TYPE matrix
	drm/amdgpu: drop experimental flag on aldebaran
	fs/ntfs3: Fix attr_punch_hole() null pointer derenference
	arm64: efi: Execute runtime services from a dedicated stack
	efi: rt-wrapper: Add missing include
	Revert "drm/amdgpu: make display pinning more flexible (v2)"
	x86/fpu: Use _Alignof to avoid undefined behavior in TYPE_ALIGN
	tracing: Use alignof__(struct {type b;}) instead of offsetof()
	io_uring: io_kiocb_update_pos() should not touch file for non -1 offset
	io_uring/net: fix fast_iov assignment in io_setup_async_msg()
	net/ulp: use consistent error code when blocking ULP
	net/mlx5: fix missing mutex_unlock in mlx5_fw_fatal_reporter_err_work()
	block: mq-deadline: Rename deadline_is_seq_writes()
	Revert "wifi: mac80211: fix memory leak in ieee80211_if_add()"
	soc: qcom: apr: Make qcom,protection-domain optional again
	mm/khugepaged: fix collapse_pte_mapped_thp() to allow anon_vma
	io_uring: Clean up a false-positive warning from GCC 9.3.0
	io_uring: fix double poll leak on repolling
	io_uring/rw: ensure kiocb_end_write() is always called
	io_uring/rw: remove leftover debug statement
	Linux 5.15.90

Change-Id: I5abc2e695f7183a1d3be9d8f62633bb1df8e8a48
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-01-25 16:13:42 +00:00
Greg Kroah-Hartman
f01aefe374 prlimit: do_prlimit needs to have a speculation check
commit 739790605705ddcf18f21782b9c99ad7d53a8c11 upstream.

do_prlimit() adds the user-controlled resource value to a pointer that
will subsequently be dereferenced.  In order to help prevent this
codepath from being used as a spectre "gadget" a barrier needs to be
added after checking the range.

Reported-by: Jordy Zomer <jordyzomer@google.com>
Tested-by: Jordy Zomer <jordyzomer@google.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-24 07:22:44 +01:00
Mark Brown
12efc92c13 UPSTREAM: arm64/sme: Implement vector length configuration prctl()s
As for SVE provide a prctl() interface which allows processes to
configure their SME vector length.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-12-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit 9e4ab6c89109472082616f8d2f6ada7deaffe161)
Signed-off-by: Will Deacon <willdeacon@google.com>
Bug: 233587962
Bug: 233588291
Change-Id: I99b778aea77b19bac68f1ddb61d77b51f64b49cb
2022-08-10 08:58:11 +01:00
Suren Baghdasaryan
0fd37220d8 UPSTREAM: mm: refactor vm_area_struct::anon_vma_name usage code
Avoid mixing strings and their anon_vma_name referenced pointers by
using struct anon_vma_name whenever possible.  This simplifies the code
and allows easier sharing of anon_vma_name structures when they
represent the same name.

[surenb@google.com: fix comment]

Link: https://lkml.kernel.org/r/20220223153613.835563-1-surenb@google.com
Link: https://lkml.kernel.org/r/20220224231834.1481408-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Colin Cross <ccross@google.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Alexey Gladkov <legion@kernel.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Chris Hyser <chris.hyser@oracle.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Xiaofeng Cao <caoxiaofeng@yulong.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 5c26f6ac9416b63d093e29c30e79b3297e425472)

Bug: 218352794
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I4a6b5602ce7151d1a4b88fac489f86d68089bd4d
2022-03-24 18:44:39 -07:00
Greg Kroah-Hartman
2ded03fd7c Merge 5.15.25 into android13-5.15
Changes in 5.15.25
	drm/nouveau/pmu/gm200-: use alternate falcon reset sequence
	fs/proc: task_mmu.c: don't read mapcount for migration entry
	btrfs: zoned: cache reported zone during mount
	scsi: lpfc: Fix mailbox command failure during driver initialization
	HID:Add support for UGTABLET WP5540
	Revert "svm: Add warning message for AVIC IPI invalid target"
	parisc: Show error if wrong 32/64-bit compiler is being used
	serial: parisc: GSC: fix build when IOSAPIC is not set
	parisc: Drop __init from map_pages declaration
	parisc: Fix data TLB miss in sba_unmap_sg
	parisc: Fix sglist access in ccio-dma.c
	mmc: block: fix read single on recovery logic
	mm: don't try to NUMA-migrate COW pages that have other uses
	HID: amd_sfh: Add illuminance mask to limit ALS max value
	HID: i2c-hid: goodix: Fix a lockdep splat
	HID: amd_sfh: Increase sensor command timeout
	HID: amd_sfh: Correct the structure field name
	PCI: hv: Fix NUMA node assignment when kernel boots with custom NUMA topology
	parisc: Add ioread64_lo_hi() and iowrite64_lo_hi()
	btrfs: send: in case of IO error log it
	platform/x86: touchscreen_dmi: Add info for the RWC NANOTE P8 AY07J 2-in-1
	platform/x86: ISST: Fix possible circular locking dependency detected
	kunit: tool: Import missing importlib.abc
	selftests: rtc: Increase test timeout so that all tests run
	kselftest: signal all child processes
	net: ieee802154: at86rf230: Stop leaking skb's
	selftests/zram: Skip max_comp_streams interface on newer kernel
	selftests/zram01.sh: Fix compression ratio calculation
	selftests/zram: Adapt the situation that /dev/zram0 is being used
	selftests: openat2: Print also errno in failure messages
	selftests: openat2: Add missing dependency in Makefile
	selftests: openat2: Skip testcases that fail with EOPNOTSUPP
	selftests: skip mincore.check_file_mmap when fs lacks needed support
	ax25: improve the incomplete fix to avoid UAF and NPD bugs
	pinctrl: bcm63xx: fix unmet dependency on REGMAP for GPIO_REGMAP
	vfs: make freeze_super abort when sync_filesystem returns error
	quota: make dquot_quota_sync return errors from ->sync_fs
	scsi: pm80xx: Fix double completion for SATA devices
	kselftest: Fix vdso_test_abi return status
	scsi: core: Reallocate device's budget map on queue depth change
	scsi: pm8001: Fix use-after-free for aborted TMF sas_task
	scsi: pm8001: Fix use-after-free for aborted SSP/STP sas_task
	drm/amd: Warn users about potential s0ix problems
	nvme: fix a possible use-after-free in controller reset during load
	nvme-tcp: fix possible use-after-free in transport error_recovery work
	nvme-rdma: fix possible use-after-free in transport error_recovery work
	net: sparx5: do not refer to skb after passing it on
	drm/amd: add support to check whether the system is set to s3
	drm/amd: Only run s3 or s0ix if system is configured properly
	drm/amdgpu: fix logic inversion in check
	x86/Xen: streamline (and fix) PV CPU enumeration
	Revert "module, async: async_synchronize_full() on module init iff async is used"
	gcc-plugins/stackleak: Use noinstr in favor of notrace
	random: wake up /dev/random writers after zap
	KVM: x86/xen: Fix runstate updates to be atomic when preempting vCPU
	KVM: x86: nSVM/nVMX: set nested_run_pending on VM entry which is a result of RSM
	KVM: x86: SVM: don't passthrough SMAP/SMEP/PKE bits in !NPT && !gCR0.PG case
	KVM: x86: nSVM: fix potential NULL derefernce on nested migration
	KVM: x86: nSVM: mark vmcb01 as dirty when restoring SMM saved state
	iwlwifi: fix use-after-free
	drm/radeon: Fix backlight control on iMac 12,1
	drm/atomic: Don't pollute crtc_state->mode_blob with error pointers
	drm/amd/pm: correct the sequence of sending gpu reset msg
	drm/amdgpu: skipping SDMA hw_init and hw_fini for S0ix.
	drm/i915/opregion: check port number bounds for SWSCI display power state
	drm/i915: Fix dbuf slice config lookup
	drm/i915: Fix mbus join config lookup
	vsock: remove vsock from connected table when connect is interrupted by a signal
	drm/cma-helper: Set VM_DONTEXPAND for mmap
	drm/i915/gvt: Make DRM_I915_GVT depend on X86
	drm/i915/ttm: tweak priority hint selection
	iwlwifi: pcie: fix locking when "HW not ready"
	iwlwifi: pcie: gen2: fix locking when "HW not ready"
	iwlwifi: mvm: don't send SAR GEO command for 3160 devices
	selftests: netfilter: fix exit value for nft_concat_range
	netfilter: nft_synproxy: unregister hooks on init error path
	selftests: netfilter: disable rp_filter on router
	ipv4: fix data races in fib_alias_hw_flags_set
	ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt
	ipv6: mcast: use rcu-safe version of ipv6_get_lladdr()
	ipv6: per-netns exclusive flowlabel checks
	Revert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname"
	mac80211: mlme: check for null after calling kmemdup
	brcmfmac: firmware: Fix crash in brcm_alt_fw_path
	cfg80211: fix race in netlink owner interface destruction
	net: dsa: lan9303: fix reset on probe
	net: dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN
	net: dsa: lantiq_gswip: fix use after free in gswip_remove()
	net: dsa: lan9303: handle hwaccel VLAN tags
	net: dsa: lan9303: add VLAN IDs to master device
	net: ieee802154: ca8210: Fix lifs/sifs periods
	ping: fix the dif and sdif check in ping_lookup
	bonding: force carrier update when releasing slave
	drop_monitor: fix data-race in dropmon_net_event / trace_napi_poll_hit
	net_sched: add __rcu annotation to netdev->qdisc
	bonding: fix data-races around agg_select_timer
	libsubcmd: Fix use-after-free for realloc(..., 0)
	net/smc: Avoid overwriting the copies of clcsock callback functions
	net: phy: mediatek: remove PHY mode check on MT7531
	atl1c: fix tx timeout after link flap on Mikrotik 10/25G NIC
	tipc: fix wrong publisher node address in link publications
	dpaa2-switch: fix default return of dpaa2_switch_flower_parse_mirror_key
	dpaa2-eth: Initialize mutex used in one step timestamping path
	net: bridge: multicast: notify switchdev driver whenever MC processing gets disabled
	perf bpf: Defer freeing string after possible strlen() on it
	selftests/exec: Add non-regular to TEST_GEN_PROGS
	arm64: Correct wrong label in macro __init_el2_gicv3
	ALSA: usb-audio: revert to IMPLICIT_FB_FIXED_DEV for M-Audio FastTrack Ultra
	ALSA: hda/realtek: Add quirk for Legion Y9000X 2019
	ALSA: hda/realtek: Fix deadlock by COEF mutex
	ALSA: hda: Fix regression on forced probe mask option
	ALSA: hda: Fix missing codec probe on Shenker Dock 15
	ASoC: ops: Fix stereo change notifications in snd_soc_put_volsw()
	ASoC: ops: Fix stereo change notifications in snd_soc_put_volsw_range()
	ASoC: ops: Fix stereo change notifications in snd_soc_put_volsw_sx()
	ASoC: ops: Fix stereo change notifications in snd_soc_put_xr_sx()
	cifs: fix set of group SID via NTSD xattrs
	powerpc/603: Fix boot failure with DEBUG_PAGEALLOC and KFENCE
	powerpc/lib/sstep: fix 'ptesync' build error
	mtd: rawnand: gpmi: don't leak PM reference in error path
	smb3: fix snapshot mount option
	tipc: fix wrong notification node addresses
	scsi: ufs: Remove dead code
	scsi: ufs: Fix a deadlock in the error handler
	ASoC: tas2770: Insert post reset delay
	ASoC: qcom: Actually clear DMA interrupt register for HDMI
	block/wbt: fix negative inflight counter when remove scsi device
	NFS: Remove an incorrect revalidation in nfs4_update_changeattr_locked()
	NFS: LOOKUP_DIRECTORY is also ok with symlinks
	NFS: Do not report writeback errors in nfs_getattr()
	tty: n_tty: do not look ahead for EOL character past the end of the buffer
	block: fix surprise removal for drivers calling blk_set_queue_dying
	mtd: rawnand: qcom: Fix clock sequencing in qcom_nandc_probe()
	mtd: parsers: qcom: Fix kernel panic on skipped partition
	mtd: parsers: qcom: Fix missing free for pparts in cleanup
	mtd: phram: Prevent divide by zero bug in phram_setup()
	mtd: rawnand: brcmnand: Fixed incorrect sub-page ECC status
	HID: elo: fix memory leak in elo_probe
	mtd: rawnand: ingenic: Fix missing put_device in ingenic_ecc_get
	Drivers: hv: vmbus: Fix memory leak in vmbus_add_channel_kobj
	KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()
	KVM: x86/pmu: Don't truncate the PerfEvtSeln MSR when creating a perf event
	KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW
	ARM: OMAP2+: hwmod: Add of_node_put() before break
	ARM: OMAP2+: adjust the location of put_device() call in omapdss_init_of
	phy: usb: Leave some clocks running during suspend
	staging: vc04_services: Fix RCU dereference check
	phy: phy-mtk-tphy: Fix duplicated argument in phy-mtk-tphy
	irqchip/sifive-plic: Add missing thead,c900-plic match string
	x86/bug: Merge annotate_reachable() into _BUG_FLAGS() asm
	netfilter: conntrack: don't refresh sctp entries in closed state
	ksmbd: fix same UniqueId for dot and dotdot entries
	ksmbd: don't align last entry offset in smb2 query directory
	arm64: dts: meson-gx: add ATF BL32 reserved-memory region
	arm64: dts: meson-g12: add ATF BL32 reserved-memory region
	arm64: dts: meson-g12: drop BL32 region from SEI510/SEI610
	pidfd: fix test failure due to stack overflow on some arches
	selftests: fixup build warnings in pidfd / clone3 tests
	mm: io_uring: allow oom-killer from io_uring_setup
	ACPI: PM: Revert "Only mark EC GPE for wakeup on Intel systems"
	kconfig: let 'shell' return enough output for deep path names
	ata: libata-core: Disable TRIM on M88V29
	soc: aspeed: lpc-ctrl: Block error printing on probe defer cases
	xprtrdma: fix pointer derefs in error cases of rpcrdma_ep_create
	drm/rockchip: dw_hdmi: Do not leave clock enabled in error case
	tracing: Fix tp_printk option related with tp_printk_stop_on_boot
	display/amd: decrease message verbosity about watermarks table failure
	drm/amd/display: Cap pflip irqs per max otg number
	drm/amd/display: fix yellow carp wm clamping
	net: usb: qmi_wwan: Add support for Dell DW5829e
	net: macb: Align the dma and coherent dma masks
	kconfig: fix failing to generate auto.conf
	scsi: lpfc: Fix pt2pt NVMe PRLI reject LOGO loop
	EDAC: Fix calculation of returned address and next offset in edac_align_ptr()
	ucounts: Handle wrapping in is_ucounts_overlimit
	ucounts: In set_cred_ucounts assume new->ucounts is non-NULL
	ucounts: Base set_cred_ucounts changes on the real user
	ucounts: Enforce RLIMIT_NPROC not RLIMIT_NPROC+1
	lib/iov_iter: initialize "flags" in new pipe_buffer
	rlimit: Fix RLIMIT_NPROC enforcement failure caused by capability calls in set_user
	ucounts: Move RLIMIT_NPROC handling after set_user
	net: sched: limit TC_ACT_REPEAT loops
	dmaengine: sh: rcar-dmac: Check for error num after setting mask
	dmaengine: stm32-dmamux: Fix PM disable depth imbalance in stm32_dmamux_probe
	dmaengine: sh: rcar-dmac: Check for error num after dma_set_max_seg_size
	tests: fix idmapped mount_setattr test
	i2c: qcom-cci: don't delete an unregistered adapter
	i2c: qcom-cci: don't put a device tree node before i2c_add_adapter()
	dmaengine: ptdma: Fix the error handling path in pt_core_init()
	copy_process(): Move fd_install() out of sighand->siglock critical section
	scsi: qedi: Fix ABBA deadlock in qedi_process_tmf_resp() and qedi_process_cmd_cleanup_resp()
	ice: enable parsing IPSEC SPI headers for RSS
	i2c: brcmstb: fix support for DSL and CM variants
	lockdep: Correct lock_classes index mapping
	Linux 5.15.25

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib129a0e11f5e82d67563329a5de1b0aef1d87928
2022-02-23 12:30:26 +01:00
Eric W. Biederman
2b2be95b60 ucounts: Move RLIMIT_NPROC handling after set_user
commit c923a8e7edb010da67424077cbf1a6f1396ebd2e upstream.

During set*id() which cred->ucounts to charge the the current process
to is not known until after set_cred_ucounts.  So move the
RLIMIT_NPROC checking into a new helper flag_nproc_exceeded and call
flag_nproc_exceeded after set_cred_ucounts.

This is very much an arbitrary subset of the places where we currently
change the RLIMIT_NPROC accounting, designed to preserve the existing
logic.

Fixing the existing logic will be the subject of another series of
changes.

Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20220216155832.680775-4-ebiederm@xmission.com
Fixes: 21d1c5e386 ("Reimplement RLIMIT_NPROC on top of ucounts")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23 12:03:20 +01:00
Eric W. Biederman
b5f949d9a9 rlimit: Fix RLIMIT_NPROC enforcement failure caused by capability calls in set_user
commit c16bdeb5a39ffa3f32b32f812831a2092d2a3061 upstream.

Solar Designer <solar@openwall.com> wrote:
> I'm not aware of anyone actually running into this issue and reporting
> it.  The systems that I personally know use suexec along with rlimits
> still run older/distro kernels, so would not yet be affected.
>
> So my mention was based on my understanding of how suexec works, and
> code review.  Specifically, Apache httpd has the setting RLimitNPROC,
> which makes it set RLIMIT_NPROC:
>
> https://httpd.apache.org/docs/2.4/mod/core.html#rlimitnproc
>
> The above documentation for it includes:
>
> "This applies to processes forked from Apache httpd children servicing
> requests, not the Apache httpd children themselves. This includes CGI
> scripts and SSI exec commands, but not any processes forked from the
> Apache httpd parent, such as piped logs."
>
> In code, there are:
>
> ./modules/generators/mod_cgid.c:        ( (cgid_req.limits.limit_nproc_set) && ((rc = apr_procattr_limit_set(procattr, APR_LIMIT_NPROC,
> ./modules/generators/mod_cgi.c:        ((rc = apr_procattr_limit_set(procattr, APR_LIMIT_NPROC,
> ./modules/filters/mod_ext_filter.c:    rv = apr_procattr_limit_set(procattr, APR_LIMIT_NPROC, conf->limit_nproc);
>
> For example, in mod_cgi.c this is in run_cgi_child().
>
> I think this means an httpd child sets RLIMIT_NPROC shortly before it
> execs suexec, which is a SUID root program.  suexec then switches to the
> target user and execs the CGI script.
>
> Before 2863643fb8, the setuid() in suexec would set the flag, and the
> target user's process count would be checked against RLIMIT_NPROC on
> execve().  After 2863643fb8, the setuid() in suexec wouldn't set the
> flag because setuid() is (naturally) called when the process is still
> running as root (thus, has those limits bypass capabilities), and
> accordingly execve() would not check the target user's process count
> against RLIMIT_NPROC.

In commit 2863643fb8 ("set_user: add capability check when
rlimit(RLIMIT_NPROC) exceeds") capable calls were added to set_user to
make it more consistent with fork.  Unfortunately because of call site
differences those capable calls were checking the credentials of the
user before set*id() instead of after set*id().

This breaks enforcement of RLIMIT_NPROC for applications that set the
rlimit and then call set*id() while holding a full set of
capabilities.  The capabilities are only changed in the new credential
in security_task_fix_setuid().

The code in apache suexec appears to follow this pattern.

Commit 909cc4ae86f3 ("[PATCH] Fix two bugs with process limits
(RLIMIT_NPROC)") where this check was added describes the targes of this
capability check as:

  2/ When a root-owned process (e.g. cgiwrap) sets up process limits and then
      calls setuid, the setuid should fail if the user would then be running
      more than rlim_cur[RLIMIT_NPROC] processes, but it doesn't.  This patch
      adds an appropriate test.  With this patch, and per-user process limit
      imposed in cgiwrap really works.

So the original use case of this check also appears to match the broken
pattern.

Restore the enforcement of RLIMIT_NPROC by removing the bad capable
checks added in set_user.  This unfortunately restores the
inconsistent state the code has been in for the last 11 years, but
dealing with the inconsistencies looks like a larger problem.

Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20210907213042.GA22626@openwall.com/
Link: https://lkml.kernel.org/r/20220212221412.GA29214@openwall.com
Link: https://lkml.kernel.org/r/20220216155832.680775-1-ebiederm@xmission.com
Fixes: 2863643fb8 ("set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds")
History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Reviewed-by: Solar Designer <solar@openwall.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23 12:03:20 +01:00
Suren Baghdasaryan
2c37ba0309 ANDROID: Change anon vma name limit from 80 to 256
Android uses vma names of up to 256 characters. Change the max limit
for anonymous vma names to support Android legacy use cases.

Bug: 120441514
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ia0126ab3919281ce4d5c597a43a47de80eadf71a
2022-01-18 16:03:55 -08:00
Colin Cross
301c56064d UPSTREAM: mm: add a field to store names for private anonymous memory
In many userspace applications, and especially in VM based applications
like Android uses heavily, there are multiple different allocators in
use.  At a minimum there is libc malloc and the stack, and in many cases
there are libc malloc, the stack, direct syscalls to mmap anonymous
memory, and multiple VM heaps (one for small objects, one for big
objects, etc.).  Each of these layers usually has its own tools to
inspect its usage; malloc by compiling a debug version, the VM through
heap inspection tools, and for direct syscalls there is usually no way
to track them.

On Android we heavily use a set of tools that use an extended version of
the logic covered in Documentation/vm/pagemap.txt to walk all pages
mapped in userspace and slice their usage by process, shared (COW) vs.
unique mappings, backing, etc.  This can account for real physical
memory usage even in cases like fork without exec (which Android uses
heavily to share as many private COW pages as possible between
processes), Kernel SamePage Merging, and clean zero pages.  It produces
a measurement of the pages that only exist in that process (USS, for
unique), and a measurement of the physical memory usage of that process
with the cost of shared pages being evenly split between processes that
share them (PSS).

If all anonymous memory is indistinguishable then figuring out the real
physical memory usage (PSS) of each heap requires either a pagemap
walking tool that can understand the heap debugging of every layer, or
for every layer's heap debugging tools to implement the pagemap walking
logic, in which case it is hard to get a consistent view of memory
across the whole system.

Tracking the information in userspace leads to all sorts of problems.
It either needs to be stored inside the process, which means every
process has to have an API to export its current heap information upon
request, or it has to be stored externally in a filesystem that somebody
needs to clean up on crashes.  It needs to be readable while the process
is still running, so it has to have some sort of synchronization with
every layer of userspace.  Efficiently tracking the ranges requires
reimplementing something like the kernel vma trees, and linking to it
from every layer of userspace.  It requires more memory, more syscalls,
more runtime cost, and more complexity to separately track regions that
the kernel is already tracking.

This patch adds a field to /proc/pid/maps and /proc/pid/smaps to show a
userspace-provided name for anonymous vmas.  The names of named
anonymous vmas are shown in /proc/pid/maps and /proc/pid/smaps as
[anon:<name>].

Userspace can set the name for a region of memory by calling

   prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name)

Setting the name to NULL clears it.  The name length limit is 80 bytes
including NUL-terminator and is checked to contain only printable ascii
characters (including space), except '[',']','\','$' and '`'.

Ascii strings are being used to have a descriptive identifiers for vmas,
which can be understood by the users reading /proc/pid/maps or
/proc/pid/smaps.  Names can be standardized for a given system and they
can include some variable parts such as the name of the allocator or a
library, tid of the thread using it, etc.

The name is stored in a pointer in the shared union in vm_area_struct
that points to a null terminated string.  Anonymous vmas with the same
name (equivalent strings) and are otherwise mergeable will be merged.
The name pointers are not shared between vmas even if they contain the
same name.  The name pointer is stored in a union with fields that are
only used on file-backed mappings, so it does not increase memory usage.

CONFIG_ANON_VMA_NAME kernel configuration is introduced to enable this
feature.  It keeps the feature disabled by default to prevent any
additional memory overhead and to avoid confusing procfs parsers on
systems which are not ready to support named anonymous vmas.

The patch is based on the original patch developed by Colin Cross, more
specifically on its latest version [1] posted upstream by Sumit Semwal.
It used a userspace pointer to store vma names.  In that design, name
pointers could be shared between vmas.  However during the last
upstreaming attempt, Kees Cook raised concerns [2] about this approach
and suggested to copy the name into kernel memory space, perform
validity checks [3] and store as a string referenced from
vm_area_struct.

One big concern is about fork() performance which would need to strdup
anonymous vma names.  Dave Hansen suggested experimenting with
worst-case scenario of forking a process with 64k vmas having longest
possible names [4].  I ran this experiment on an ARM64 Android device
and recorded a worst-case regression of almost 40% when forking such a
process.

This regression is addressed in the followup patch which replaces the
pointer to a name with a refcounted structure that allows sharing the
name pointer between vmas of the same name.  Instead of duplicating the
string during fork() or when splitting a vma it increments the refcount.

[1] https://lore.kernel.org/linux-mm/20200901161459.11772-4-sumit.semwal@linaro.org/
[2] https://lore.kernel.org/linux-mm/202009031031.D32EF57ED@keescook/
[3] https://lore.kernel.org/linux-mm/202009031022.3834F692@keescook/
[4] https://lore.kernel.org/linux-mm/5d0358ab-8c47-2f5f-8e43-23b89d6a8e95@intel.com/

Changes for prctl(2) manual page (in the options section):

PR_SET_VMA
	Sets an attribute specified in arg2 for virtual memory areas
	starting from the address specified in arg3 and spanning the
	size specified	in arg4. arg5 specifies the value of the attribute
	to be set. Note that assigning an attribute to a virtual memory
	area might prevent it from being merged with adjacent virtual
	memory areas due to the difference in that attribute's value.

	Currently, arg2 must be one of:

	PR_SET_VMA_ANON_NAME
		Set a name for anonymous virtual memory areas. arg5 should
		be a pointer to a null-terminated string containing the
		name. The name length including null byte cannot exceed
		80 bytes. If arg5 is NULL, the name of the appropriate
		anonymous virtual memory areas will be reset. The name
		can contain only printable ascii characters (including
                space), except '[',']','\','$' and '`'.

                This feature is available only if the kernel is built with
                the CONFIG_ANON_VMA_NAME option enabled.

[surenb@google.com: docs: proc.rst: /proc/PID/maps: fix malformed table]
  Link: https://lkml.kernel.org/r/20211123185928.2513763-1-surenb@google.com
[surenb: rebased over v5.15-rc6, replaced userpointer with a kernel copy,
 added input sanitization and CONFIG_ANON_VMA_NAME config. The bulk of the
 work here was done by Colin Cross, therefore, with his permission, keeping
 him as the author]

Link: https://lkml.kernel.org/r/20211019215511.3771969-2-surenb@google.com
Signed-off-by: Colin Cross <ccross@google.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Glauber <jan.glauber@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Landley <rob@landley.net>
Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
Cc: Shaohua Li <shli@fusionio.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 9a10064f5625d5572c3626c1516e0bebc6c9fe9b)

Bug: 120441514
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I53d56d551a7d62f75341304751814294b447c04e
2022-01-18 15:30:27 -08:00
Suren Baghdasaryan
f355f9635d Revert "ANDROID: mm: add a field to store names for private anonymous memory"
This reverts commit 60500a4228.
Replacing out-of-tree implementation with the upstream one.

Bug: 120441514
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ic34c8e16d51ccf9f00cb59d2de341e911bcb2828
2022-01-18 14:54:47 -08:00
Greg Kroah-Hartman
c0d1ebaba1 Merge 2d338201d5 ("Merge branch 'akpm' (patches from Andrew)") into android-mainline
Steps on the way to 5.15-rc1

Resolves merge conflict in:
	fs/proc/base.c

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic554ca8447961e52fbc6f27d91470a816b59a771
2021-09-15 14:34:48 +02:00
Greg Kroah-Hartman
c5cd945b24 Merge fd47ff55c9 ("Merge tag 'usb-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb") into android-mainline
Steps on the way to 5.15-rc1.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I42ffa8818bbb2072f043923553c4d8f91d9647a5
2021-09-14 14:42:51 +02:00
Linus Torvalds
2d338201d5 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "147 patches, based on 7d2a07b769.

  Subsystems affected by this patch series: mm (memory-hotplug, rmap,
  ioremap, highmem, cleanups, secretmem, kfence, damon, and vmscan),
  alpha, percpu, procfs, misc, core-kernel, MAINTAINERS, lib,
  checkpatch, epoll, init, nilfs2, coredump, fork, pids, criu, kconfig,
  selftests, ipc, and scripts"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits)
  scripts: check_extable: fix typo in user error message
  mm/workingset: correct kernel-doc notations
  ipc: replace costly bailout check in sysvipc_find_ipc()
  selftests/memfd: remove unused variable
  Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH
  configs: remove the obsolete CONFIG_INPUT_POLLDEV
  prctl: allow to setup brk for et_dyn executables
  pid: cleanup the stale comment mentioning pidmap_init().
  kernel/fork.c: unexport get_{mm,task}_exe_file
  coredump: fix memleak in dump_vma_snapshot()
  fs/coredump.c: log if a core dump is aborted due to changed file permissions
  nilfs2: use refcount_dec_and_lock() to fix potential UAF
  nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group
  nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group
  nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group
  nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group
  nilfs2: fix NULL pointer in nilfs_##name##_attr_release
  nilfs2: fix memory leak in nilfs_sysfs_create_device_group
  trap: cleanup trap_init()
  init: move usermodehelper_enable() to populate_rootfs()
  ...
2021-09-08 12:55:35 -07:00
Cyrill Gorcunov
e1fbbd0731 prctl: allow to setup brk for et_dyn executables
Keno Fischer reported that when a binray loaded via ld-linux-x the
prctl(PR_SET_MM_MAP) doesn't allow to setup brk value because it lays
before mm:end_data.

For example a test program shows

 | # ~/t
 |
 | start_code      401000
 | end_code        401a15
 | start_stack     7ffce4577dd0
 | start_data	   403e10
 | end_data        40408c
 | start_brk	   b5b000
 | sbrk(0)         b5b000

and when executed via ld-linux

 | # /lib64/ld-linux-x86-64.so.2 ~/t
 |
 | start_code      7fc25b0a4000
 | end_code        7fc25b0c4524
 | start_stack     7fffcc6b2400
 | start_data	   7fc25b0ce4c0
 | end_data        7fc25b0cff98
 | start_brk	   55555710c000
 | sbrk(0)         55555710c000

This of course prevent criu from restoring such programs.  Looking into
how kernel operates with brk/start_brk inside brk() syscall I don't see
any problem if we allow to setup brk/start_brk without checking for
end_data.  Even if someone pass some weird address here on a purpose then
the worst possible result will be an unexpected unmapping of existing vma
(own vma, since prctl works with the callers memory) but test for
RLIMIT_DATA is still valid and a user won't be able to gain more memory in
case of expanding VMAs via new values shipped with prctl call.

Link: https://lkml.kernel.org/r/20210121221207.GB2174@grain
Fixes: bbdc6076d2 ("binfmt_elf: move brk out of mmap when doing direct loader exec")
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reported-by: Keno Fischer <keno@juliacomputing.com>
Acked-by: Andrey Vagin <avagin@gmail.com>
Tested-by: Andrey Vagin <avagin@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 11:50:28 -07:00
Greg Kroah-Hartman
bc2f6edebd Merge 9e9fb7655e ("Merge tag 'net-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next") into android-mainline
Steps on the way to 5.15-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I49577d606b2710975407eae3fee60bc331397810
2021-09-07 14:40:30 +02:00
Linus Torvalds
49624efa65 Merge tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linux
Pull MAP_DENYWRITE removal from David Hildenbrand:
 "Remove all in-tree usage of MAP_DENYWRITE from the kernel and remove
  VM_DENYWRITE.

  There are some (minor) user-visible changes:

   - We no longer deny write access to shared libaries loaded via legacy
     uselib(); this behavior matches modern user space e.g. dlopen().

   - We no longer deny write access to the elf interpreter after exec
     completed, treating it just like shared libraries (which it often
     is).

   - We always deny write access to the file linked via /proc/pid/exe:
     sys_prctl(PR_SET_MM_MAP/EXE_FILE) will fail if write access to the
     file cannot be denied, and write access to the file will remain
     denied until the link is effectivel gone (exec, termination,
     sys_prctl(PR_SET_MM_MAP/EXE_FILE)) -- just as if exec'ing the file.

  Cross-compiled for a bunch of architectures (alpha, microblaze, i386,
  s390x, ...) and verified via ltp that especially the relevant tests
  (i.e., creat07 and execve04) continue working as expected"

* tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linux:
  fs: update documentation of get_write_access() and friends
  mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff()
  mm: remove VM_DENYWRITE
  binfmt: remove in-tree usage of MAP_DENYWRITE
  kernel/fork: always deny write access to current MM exe_file
  kernel/fork: factor out replacing the current MM exe_file
  binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib()
2021-09-04 11:35:47 -07:00
David Hildenbrand
35d7bdc860 kernel/fork: factor out replacing the current MM exe_file
Let's factor the main logic out into replace_mm_exe_file(), such that
all mm->exe_file logic is contained in kernel/fork.c.

While at it, perform some simple cleanups that are possible now that
we're simplifying the individual functions.

Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
2021-09-03 18:42:01 +02:00
Ran Xiaokai
2863643fb8 set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds
in copy_process(): non root users but with capability CAP_SYS_RESOURCE
or CAP_SYS_ADMIN will clean PF_NPROC_EXCEEDED flag even
rlimit(RLIMIT_NPROC) exceeds. Add the same capability check logic here.

Align the permission checks in copy_process() and set_user(). In
copy_process() CAP_SYS_RESOURCE or CAP_SYS_ADMIN capable users will be
able to circumvent and clear the PF_NPROC_EXCEEDED flag whereas they
aren't able to the same in set_user(). There's no obvious logic to this
and trying to unearth the reason in the thread didn't go anywhere.

The gist seems to be that this code wants to make sure that a program
can't successfully exec if it has gone through a set*id() transition
while exceeding its RLIMIT_NPROC.
A capable but non-INIT_USER caller getting PF_NPROC_EXCEEDED set during
a set*id() transition wouldn't be able to exec right away if they still
exceed their RLIMIT_NPROC at the time of exec. So their exec would fail
in fs/exec.c:

        if ((current->flags & PF_NPROC_EXCEEDED) &&
            is_ucounts_overlimit(current_ucounts(), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
                retval = -EAGAIN;
                goto out_ret;
        }

However, if the caller were to fork() right after the set*id()
transition but before the exec while still exceeding their RLIMIT_NPROC
then they would get PF_NPROC_EXCEEDED cleared (while the child would
inherit it):

        retval = -EAGAIN;
        if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
                if (p->real_cred->user != INIT_USER &&
                    !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
                        goto bad_fork_free;
        }
        current->flags &= ~PF_NPROC_EXCEEDED;

which means a subsequent exec by the capable caller would now succeed
even though they could still exceed their RLIMIT_NPROC limit. This seems
inconsistent. Allow a CAP_SYS_ADMIN or CAP_SYS_RESOURCE capable user to
avoid PF_NPROC_EXCEEDED as they already can in copy_process().

Cc: peterz@infradead.org, tglx@linutronix.de, linux-kernel@vger.kernel.org, Ran Xiaokai <ran.xiaokai@zte.com.cn>, , ,

Link: https://lore.kernel.org/r/20210728072629.530435-1-ran.xiaokai@zte.com.cn
Cc: Neil Brown <neilb@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: James Morris <jamorris@linux.microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-08-12 14:54:25 +02:00
Lee Jones
8e658623d4 Merge commit c288d9cd71 ("Merge tag 'for-5.14/io_uring-2021-06-30' of git://git.kernel.dk/linux-block") into android-mainline
Another small step en route to v5.14-rc1

Change-Id: I24899ab78da7d367574ed69ceaa82ab0837d9556
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2021-07-12 10:02:27 +01:00
Lee Jones
7889eed917 Merge 54a728dc5e ("Merge tag 'sched-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") into android-mainline
A little step towards 5.14-rc1

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I2573a6df9f4e7b67194327ac6db6082a574d2809
2021-07-09 10:55:21 +01:00
Linus Torvalds
c54b245d01 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace rlimit handling update from Eric Biederman:
 "This is the work mainly by Alexey Gladkov to limit rlimits to the
  rlimits of the user that created a user namespace, and to allow users
  to have stricter limits on the resources created within a user
  namespace."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  cred: add missing return error code when set_cred_ucounts() failed
  ucounts: Silence warning in dec_rlimit_ucounts
  ucounts: Set ucount_max to the largest positive value the type can hold
  kselftests: Add test to check for rlimit changes in different user namespaces
  Reimplement RLIMIT_MEMLOCK on top of ucounts
  Reimplement RLIMIT_SIGPENDING on top of ucounts
  Reimplement RLIMIT_MSGQUEUE on top of ucounts
  Reimplement RLIMIT_NPROC on top of ucounts
  Use atomic_t for ucounts reference counting
  Add a reference to ucounts for each cred
  Increase size of ucounts to atomic_long_t
2021-06-28 20:39:26 -07:00
Lee Jones
85adc860fd Merge 6efb943b86 Linux 5.13-rc1 into android-mainline
One giant leap, all the way up to 5.13-rc1

Also take the opportunity to re-align (a.k.a. fix a couple of previous
merge conflict fix-up issues) which occurred during this merge-window.

Fixes: 4797acfb9c ("Merge 16b3d0cf5b Merge tag 'sched-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into android-mainline")
Fixes: 92f282f338 ("Merge 8ca5297e7e Merge tag 'kconfig-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild into android-mainline")

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: Ie9389f595776e8f66bba6eaf0fa7a3587c6a5749
2021-05-15 09:09:01 +01:00
Chris Hyser
7ac592aa35 sched: prctl() core-scheduling interface
This patch provides support for setting and copying core scheduling
'task cookies' between threads (PID), processes (TGID), and process
groups (PGID).

The value of core scheduling isn't that tasks don't share a core,
'nosmt' can do that. The value lies in exploiting all the sharing
opportunities that exist to recover possible lost performance and that
requires a degree of flexibility in the API.

From a security perspective (and there are others), the thread,
process and process group distinction is an existent hierarchal
categorization of tasks that reflects many of the security concerns
about 'data sharing'. For example, protecting against cache-snooping
by a thread that can just read the memory directly isn't all that
useful.

With this in mind, subcommands to CREATE/SHARE (TO/FROM) provide a
mechanism to create and share cookies. CREATE/SHARE_TO specify a
target pid with enum pidtype used to specify the scope of the targeted
tasks. For example, PIDTYPE_TGID will share the cookie with the
process and all of it's threads as typically desired in a security
scenario.

API:

  prctl(PR_SCHED_CORE, PR_SCHED_CORE_GET, tgtpid, pidtype, &cookie)
  prctl(PR_SCHED_CORE, PR_SCHED_CORE_CREATE, tgtpid, pidtype, NULL)
  prctl(PR_SCHED_CORE, PR_SCHED_CORE_SHARE_TO, tgtpid, pidtype, NULL)
  prctl(PR_SCHED_CORE, PR_SCHED_CORE_SHARE_FROM, srcpid, pidtype, NULL)

where 'tgtpid/srcpid == 0' implies the current process and pidtype is
kernel enum pid_type {PIDTYPE_PID, PIDTYPE_TGID, PIDTYPE_PGID, ...}.

For return values, EINVAL, ENOMEM are what they say. ESRCH means the
tgtpid/srcpid was not found. EPERM indicates lack of PTRACE permission
access to tgtpid/srcpid. ENODEV indicates your machines lacks SMT.

[peterz: complete rewrite]
Signed-off-by: Chris Hyser <chris.hyser@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Don Hiatt <dhiatt@digitalocean.com>
Tested-by: Hongyu Ning <hongyu.ning@linux.intel.com>
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210422123309.039845339@infradead.org
2021-05-12 11:43:31 +02:00
Xiaofeng Cao
5afe69c2cc kernel/sys.c: fix typo
change 'infite'     to 'infinite'
change 'concurent'  to 'concurrent'
change 'memvers'    to 'members'
change 'decendants' to 'descendants'
change 'argumets'   to 'arguments'

Link: https://lkml.kernel.org/r/20210316112904.10661-1-cxfcosmos@gmail.com
Signed-off-by: Xiaofeng Cao <caoxiaofeng@yulong.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-07 00:26:34 -07:00
Alexey Gladkov
21d1c5e386 Reimplement RLIMIT_NPROC on top of ucounts
The rlimit counter is tied to uid in the user_namespace. This allows
rlimit values to be specified in userns even if they are already
globally exceeded by the user. However, the value of the previous
user_namespaces cannot be exceeded.

To illustrate the impact of rlimits, let's say there is a program that
does not fork. Some service-A wants to run this program as user X in
multiple containers. Since the program never fork the service wants to
set RLIMIT_NPROC=1.

service-A
 \- program (uid=1000, container1, rlimit_nproc=1)
 \- program (uid=1000, container2, rlimit_nproc=1)

The service-A sets RLIMIT_NPROC=1 and runs the program in container1.
When the service-A tries to run a program with RLIMIT_NPROC=1 in
container2 it fails since user X already has one running process.

We cannot use existing inc_ucounts / dec_ucounts because they do not
allow us to exceed the maximum for the counter. Some rlimits can be
overlimited by root or if the user has the appropriate capability.

Changelog

v11:
* Change inc_rlimit_ucounts() which now returns top value of ucounts.
* Drop inc_rlimit_ucounts_and_test() because the return code of
  inc_rlimit_ucounts() can be checked.

Signed-off-by: Alexey Gladkov <legion@kernel.org>
Link: https://lkml.kernel.org/r/c5286a8aa16d2d698c222f7532f3d735c82bc6bc.1619094428.git.legion@kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-04-30 14:14:01 -05:00
Alexey Gladkov
905ae01c4a Add a reference to ucounts for each cred
For RLIMIT_NPROC and some other rlimits the user_struct that holds the
global limit is kept alive for the lifetime of a process by keeping it
in struct cred. Adding a pointer to ucounts in the struct cred will
allow to track RLIMIT_NPROC not only for user in the system, but for
user in the user_namespace.

Updating ucounts may require memory allocation which may fail. So, we
cannot change cred.ucounts in the commit_creds() because this function
cannot fail and it should always return 0. For this reason, we modify
cred.ucounts before calling the commit_creds().

Changelog

v6:
* Fix null-ptr-deref in is_ucounts_overlimit() detected by trinity. This
  error was caused by the fact that cred_alloc_blank() left the ucounts
  pointer empty.

Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Alexey Gladkov <legion@kernel.org>
Link: https://lkml.kernel.org/r/b37aaef28d8b9b0d757e07ba6dd27281bbe39259.1619094428.git.legion@kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-04-30 14:14:00 -05:00
Lee Jones
7561514944 Merge commit e7c6e405e1 ("Fix misc new gcc warnings") into android-mainline
Steps on the way to 5.13-rc1

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: Iff6fb6b3991943905d20a8b40e2b2dd87c0d792b
2021-04-29 10:20:06 +01:00
Peter Collingbourne
201698626f arm64: Introduce prctl(PR_PAC_{SET,GET}_ENABLED_KEYS)
This change introduces a prctl that allows the user program to control
which PAC keys are enabled in a particular task. The main reason
why this is useful is to enable a userspace ABI that uses PAC to
sign and authenticate function pointers and other pointers exposed
outside of the function, while still allowing binaries conforming
to the ABI to interoperate with legacy binaries that do not sign or
authenticate pointers.

The idea is that a dynamic loader or early startup code would issue
this prctl very early after establishing that a process may load legacy
binaries, but before executing any PAC instructions.

This change adds a small amount of overhead to kernel entry and exit
due to additional required instruction sequences.

On a DragonBoard 845c (Cortex-A75) with the powersave governor, the
overhead of similar instruction sequences was measured as 4.9ns when
simulating the common case where IA is left enabled, or 43.7ns when
simulating the uncommon case where IA is disabled. These numbers can
be seen as the worst case scenario, since in more realistic scenarios
a better performing governor would be used and a newer chip would be
used that would support PAC unlike Cortex-A75 and would be expected
to be faster than Cortex-A75.

On an Apple M1 under a hypervisor, the overhead of the entry/exit
instruction sequences introduced by this patch was measured as 0.3ns
in the case where IA is left enabled, and 33.0ns in the case where
IA is disabled.

Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Link: https://linux-review.googlesource.com/id/Ibc41a5e6a76b275efbaa126b31119dc197b927a5
Link: https://lore.kernel.org/r/d6609065f8f40397a4124654eb68c9f490b4d477.1616123271.git.pcc@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-04-13 17:31:44 +01:00
Greg Kroah-Hartman
6bb5c8b522 Merge 5.12-rc3 into android-mainline
Linux 5.12-rc3

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ibc683be6a28885f39d477e3f86ec2ecd1a9e7e21
2021-03-15 08:48:29 +01:00
Alexey Dobriyan
c995f12ad8 prctl: fix PR_SET_MM_AUXV kernel stack leak
Doing a

	prctl(PR_SET_MM, PR_SET_MM_AUXV, addr, 1);

will copy 1 byte from userspace to (quite big) on-stack array
and then stash everything to mm->saved_auxv.
AT_NULL terminator will be inserted at the very end.

/proc/*/auxv handler will find that AT_NULL terminator
and copy original stack contents to userspace.

This devious scheme requires CAP_SYS_RESOURCE.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-14 14:33:27 -07:00
Frankie Chang
583b91c3ea ANDROID: Add vendor hooks when syscall prctl finished
Add vendor hook when syscall prctl finished for vendor-specific tuning.

Bug: 181819699

Signed-off-by: Frankie Chang <frankie.chang@mediatek.com>
Change-Id: Ica42d80ab4b540045330e9c5b211e0e814eed0ff
(cherry picked from commit d150b26653e7a3d15383a09384aace140b537ff4)
2021-03-09 01:59:59 +00:00
Greg Kroah-Hartman
14f1854cdb Merge fecfd01539 ("Merge tag 'leds-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds") into android-mainline
Steps on the way to 5.12-rc1

Resolves merge conflicts in:
	drivers/dma-buf/dma-heap.c
	drivers/dma-buf/heaps/cma_heap.c
	drivers/dma-buf/heaps/system_heap.c
	include/linux/dma-heap.h

Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ibb32dbdba5183c9e19f5d1e94016cc1ae9616173
2021-03-07 08:45:40 +01:00
Greg Kroah-Hartman
bbf615b8e1 Merge 7d6beb71da ("Merge tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux") into android-mainline
Steps on the way to 5.12-rc1.

Resolves conflicts in:
	fs/overlayfs/inode.c

Note, incfs is broken here, will mark it as BROKEN in another patch to
make it more obvious.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I95a5938fc3cfa27d521c5d4fd18b2adfb13a6d84
2021-03-06 17:42:01 +01:00
Greg Kroah-Hartman
56817939a8 Merge bd018bbaa5 ("Merge tag 'for-5.12/libata-2021-02-17' of git://git.kernel.dk/linux-block") into android-mainline
Steps on the way to 5.12-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ibaed5eb91d252bd7fedfb3c504f1b3cb5d1825d8
2021-02-26 08:43:21 +01:00
Linus Torvalds
6fbd6cf85a Merge tag 'kbuild-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:

 - Fix false-positive build warnings for ARCH=ia64 builds

 - Optimize dictionary size for module compression with xz

 - Check the compiler and linker versions in Kconfig

 - Fix misuse of extra-y

 - Support DWARF v5 debug info

 - Clamp SUBLEVEL to 255 because stable releases 4.4.x and 4.9.x
   exceeded the limit

 - Add generic syscall{tbl,hdr}.sh for cleanups across arches

 - Minor cleanups of genksyms

 - Minor cleanups of Kconfig

* tag 'kbuild-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (38 commits)
  initramfs: Remove redundant dependency of RD_ZSTD on BLK_DEV_INITRD
  kbuild: remove deprecated 'always' and 'hostprogs-y/m'
  kbuild: parse C= and M= before changing the working directory
  kbuild: reuse this-makefile to define abs_srctree
  kconfig: unify rule of config, menuconfig, nconfig, gconfig, xconfig
  kconfig: omit --oldaskconfig option for 'make config'
  kconfig: fix 'invalid option' for help option
  kconfig: remove dead code in conf_askvalue()
  kconfig: clean up nested if-conditionals in check_conf()
  kconfig: Remove duplicate call to sym_get_string_value()
  Makefile: Remove # characters from compiler string
  Makefile: reuse CC_VERSION_TEXT
  kbuild: check the minimum linker version in Kconfig
  kbuild: remove ld-version macro
  scripts: add generic syscallhdr.sh
  scripts: add generic syscalltbl.sh
  arch: syscalls: remove $(srctree)/ prefix from syscall tables
  arch: syscalls: add missing FORCE and fix 'targets' to make if_changed work
  gen_compile_commands: prune some directories
  kbuild: simplify access to the kernel's version
  ...
2021-02-25 10:17:31 -08:00
Linus Torvalds
7d6beb71da Merge tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull idmapped mounts from Christian Brauner:
 "This introduces idmapped mounts which has been in the making for some
  time. Simply put, different mounts can expose the same file or
  directory with different ownership. This initial implementation comes
  with ports for fat, ext4 and with Christoph's port for xfs with more
  filesystems being actively worked on by independent people and
  maintainers.

  Idmapping mounts handle a wide range of long standing use-cases. Here
  are just a few:

   - Idmapped mounts make it possible to easily share files between
     multiple users or multiple machines especially in complex
     scenarios. For example, idmapped mounts will be used in the
     implementation of portable home directories in
     systemd-homed.service(8) where they allow users to move their home
     directory to an external storage device and use it on multiple
     computers where they are assigned different uids and gids. This
     effectively makes it possible to assign random uids and gids at
     login time.

   - It is possible to share files from the host with unprivileged
     containers without having to change ownership permanently through
     chown(2).

   - It is possible to idmap a container's rootfs and without having to
     mangle every file. For example, Chromebooks use it to share the
     user's Download folder with their unprivileged containers in their
     Linux subsystem.

   - It is possible to share files between containers with
     non-overlapping idmappings.

   - Filesystem that lack a proper concept of ownership such as fat can
     use idmapped mounts to implement discretionary access (DAC)
     permission checking.

   - They allow users to efficiently changing ownership on a per-mount
     basis without having to (recursively) chown(2) all files. In
     contrast to chown (2) changing ownership of large sets of files is
     instantenous with idmapped mounts. This is especially useful when
     ownership of a whole root filesystem of a virtual machine or
     container is changed. With idmapped mounts a single syscall
     mount_setattr syscall will be sufficient to change the ownership of
     all files.

   - Idmapped mounts always take the current ownership into account as
     idmappings specify what a given uid or gid is supposed to be mapped
     to. This contrasts with the chown(2) syscall which cannot by itself
     take the current ownership of the files it changes into account. It
     simply changes the ownership to the specified uid and gid. This is
     especially problematic when recursively chown(2)ing a large set of
     files which is commong with the aforementioned portable home
     directory and container and vm scenario.

   - Idmapped mounts allow to change ownership locally, restricting it
     to specific mounts, and temporarily as the ownership changes only
     apply as long as the mount exists.

  Several userspace projects have either already put up patches and
  pull-requests for this feature or will do so should you decide to pull
  this:

   - systemd: In a wide variety of scenarios but especially right away
     in their implementation of portable home directories.

         https://systemd.io/HOME_DIRECTORY/

   - container runtimes: containerd, runC, LXD:To share data between
     host and unprivileged containers, unprivileged and privileged
     containers, etc. The pull request for idmapped mounts support in
     containerd, the default Kubernetes runtime is already up for quite
     a while now: https://github.com/containerd/containerd/pull/4734

   - The virtio-fs developers and several users have expressed interest
     in using this feature with virtual machines once virtio-fs is
     ported.

   - ChromeOS: Sharing host-directories with unprivileged containers.

  I've tightly synced with all those projects and all of those listed
  here have also expressed their need/desire for this feature on the
  mailing list. For more info on how people use this there's a bunch of
  talks about this too. Here's just two recent ones:

      https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf
      https://fosdem.org/2021/schedule/event/containers_idmap/

  This comes with an extensive xfstests suite covering both ext4 and
  xfs:

      https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts

  It covers truncation, creation, opening, xattrs, vfscaps, setid
  execution, setgid inheritance and more both with idmapped and
  non-idmapped mounts. It already helped to discover an unrelated xfs
  setgid inheritance bug which has since been fixed in mainline. It will
  be sent for inclusion with the xfstests project should you decide to
  merge this.

  In order to support per-mount idmappings vfsmounts are marked with
  user namespaces. The idmapping of the user namespace will be used to
  map the ids of vfs objects when they are accessed through that mount.
  By default all vfsmounts are marked with the initial user namespace.
  The initial user namespace is used to indicate that a mount is not
  idmapped. All operations behave as before and this is verified in the
  testsuite.

  Based on prior discussions we want to attach the whole user namespace
  and not just a dedicated idmapping struct. This allows us to reuse all
  the helpers that already exist for dealing with idmappings instead of
  introducing a whole new range of helpers. In addition, if we decide in
  the future that we are confident enough to enable unprivileged users
  to setup idmapped mounts the permission checking can take into account
  whether the caller is privileged in the user namespace the mount is
  currently marked with.

  The user namespace the mount will be marked with can be specified by
  passing a file descriptor refering to the user namespace as an
  argument to the new mount_setattr() syscall together with the new
  MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern
  of extensibility.

  The following conditions must be met in order to create an idmapped
  mount:

   - The caller must currently have the CAP_SYS_ADMIN capability in the
     user namespace the underlying filesystem has been mounted in.

   - The underlying filesystem must support idmapped mounts.

   - The mount must not already be idmapped. This also implies that the
     idmapping of a mount cannot be altered once it has been idmapped.

   - The mount must be a detached/anonymous mount, i.e. it must have
     been created by calling open_tree() with the OPEN_TREE_CLONE flag
     and it must not already have been visible in the filesystem.

  The last two points guarantee easier semantics for userspace and the
  kernel and make the implementation significantly simpler.

  By default vfsmounts are marked with the initial user namespace and no
  behavioral or performance changes are observed.

  The manpage with a detailed description can be found here:

      1d7b902e28

  In order to support idmapped mounts, filesystems need to be changed
  and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The
  patches to convert individual filesystem are not very large or
  complicated overall as can be seen from the included fat, ext4, and
  xfs ports. Patches for other filesystems are actively worked on and
  will be sent out separately. The xfstestsuite can be used to verify
  that port has been done correctly.

  The mount_setattr() syscall is motivated independent of the idmapped
  mounts patches and it's been around since July 2019. One of the most
  valuable features of the new mount api is the ability to perform
  mounts based on file descriptors only.

  Together with the lookup restrictions available in the openat2()
  RESOLVE_* flag namespace which we added in v5.6 this is the first time
  we are close to hardened and race-free (e.g. symlinks) mounting and
  path resolution.

  While userspace has started porting to the new mount api to mount
  proper filesystems and create new bind-mounts it is currently not
  possible to change mount options of an already existing bind mount in
  the new mount api since the mount_setattr() syscall is missing.

  With the addition of the mount_setattr() syscall we remove this last
  restriction and userspace can now fully port to the new mount api,
  covering every use-case the old mount api could. We also add the
  crucial ability to recursively change mount options for a whole mount
  tree, both removing and adding mount options at the same time. This
  syscall has been requested multiple times by various people and
  projects.

  There is a simple tool available at

      https://github.com/brauner/mount-idmapped

  that allows to create idmapped mounts so people can play with this
  patch series. I'll add support for the regular mount binary should you
  decide to pull this in the following weeks:

  Here's an example to a simple idmapped mount of another user's home
  directory:

	u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt

	u1001@f2-vm:/$ ls -al /home/ubuntu/
	total 28
	drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 .
	drwxr-xr-x 4 root   root   4096 Oct 28 04:00 ..
	-rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history
	-rw-r--r-- 1 ubuntu ubuntu  220 Feb 25  2020 .bash_logout
	-rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25  2020 .bashrc
	-rw-r--r-- 1 ubuntu ubuntu  807 Feb 25  2020 .profile
	-rw-r--r-- 1 ubuntu ubuntu    0 Oct 16 16:11 .sudo_as_admin_successful
	-rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo

	u1001@f2-vm:/$ ls -al /mnt/
	total 28
	drwxr-xr-x  2 u1001 u1001 4096 Oct 28 22:07 .
	drwxr-xr-x 29 root  root  4096 Oct 28 22:01 ..
	-rw-------  1 u1001 u1001 3154 Oct 28 22:12 .bash_history
	-rw-r--r--  1 u1001 u1001  220 Feb 25  2020 .bash_logout
	-rw-r--r--  1 u1001 u1001 3771 Feb 25  2020 .bashrc
	-rw-r--r--  1 u1001 u1001  807 Feb 25  2020 .profile
	-rw-r--r--  1 u1001 u1001    0 Oct 16 16:11 .sudo_as_admin_successful
	-rw-------  1 u1001 u1001 1144 Oct 28 00:43 .viminfo

	u1001@f2-vm:/$ touch /mnt/my-file

	u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file

	u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file

	u1001@f2-vm:/$ ls -al /mnt/my-file
	-rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file

	u1001@f2-vm:/$ ls -al /home/ubuntu/my-file
	-rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file

	u1001@f2-vm:/$ getfacl /mnt/my-file
	getfacl: Removing leading '/' from absolute path names
	# file: mnt/my-file
	# owner: u1001
	# group: u1001
	user::rw-
	user:u1001:rwx
	group::rw-
	mask::rwx
	other::r--

	u1001@f2-vm:/$ getfacl /home/ubuntu/my-file
	getfacl: Removing leading '/' from absolute path names
	# file: home/ubuntu/my-file
	# owner: ubuntu
	# group: ubuntu
	user::rw-
	user:ubuntu:rwx
	group::rw-
	mask::rwx
	other::r--"

* tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits)
  xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl
  xfs: support idmapped mounts
  ext4: support idmapped mounts
  fat: handle idmapped mounts
  tests: add mount_setattr() selftests
  fs: introduce MOUNT_ATTR_IDMAP
  fs: add mount_setattr()
  fs: add attr_flags_to_mnt_flags helper
  fs: split out functions to hold writers
  namespace: only take read lock in do_reconfigure_mnt()
  mount: make {lock,unlock}_mount_hash() static
  namespace: take lock_mount_hash() directly when changing flags
  nfs: do not export idmapped mounts
  overlayfs: do not mount on top of idmapped mounts
  ecryptfs: do not mount on top of idmapped mounts
  ima: handle idmapped mounts
  apparmor: handle idmapped mounts
  fs: make helpers idmap mount aware
  exec: handle idmapped mounts
  would_dump: handle idmapped mounts
  ...
2021-02-23 13:39:45 -08:00
Sasha Levin
88a686728b kbuild: simplify access to the kernel's version
Instead of storing the version in a single integer and having various
kernel (and userspace) code how it's constructed, export individual
(major, patchlevel, sublevel) components and simplify kernel code that
uses it.

This should also make it easier on userspace.

Signed-off-by: Sasha Levin <sashal@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-02-16 12:01:45 +09:00
Viresh Kumar
be65de6b03 fs: Remove dcookies support
The dcookies stuff was only used by the kernel's old oprofile code. Now
that oprofile's support is removed from the kernel, there is no need for
dcookies as well. Remove it.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Robert Richter <rric@kernel.org>
Acked-by: William Cohen <wcohen@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
2021-01-29 10:06:46 +05:30
Christian Brauner
02f92b3868 fs: add file and path permissions helpers
Add two simple helpers to check permissions on a file and path
respectively and convert over some callers. It simplifies quite a few
codepaths and also reduces the churn in later patches quite a bit.
Christoph also correctly points out that this makes codepaths (e.g.
ioctls) way easier to follow that would otherwise have to do more
complex argument passing than necessary.

Link: https://lore.kernel.org/r/20210121131959.646623-4-christian.brauner@ubuntu.com
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:27:16 +01:00
Greg Kroah-Hartman
482ed74e40 Merge 533369b145 ("Merge tag 'timers-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") into android-mailine
Steps on the way to 5.11-rc1

Resolves merge conflicts in:
	include/uapi/linux/prctl.h
	kernel/sys.c

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I85ea8cffcd22f93277b357872254fe21d68bd82c
2020-12-15 16:43:58 +01:00
Gabriel Krisman Bertazi
1446e1df9e kernel: Implement selective syscall userspace redirection
Introduce a mechanism to quickly disable/enable syscall handling for a
specific process and redirect to userspace via SIGSYS.  This is useful
for processes with parts that require syscall redirection and parts that
don't, but who need to perform this boundary crossing really fast,
without paying the cost of a system call to reconfigure syscall handling
on each boundary transition.  This is particularly important for Windows
games running over Wine.

The proposed interface looks like this:

  prctl(PR_SET_SYSCALL_USER_DISPATCH, <op>, <off>, <length>, [selector])

The range [<offset>,<offset>+<length>) is a part of the process memory
map that is allowed to by-pass the redirection code and dispatch
syscalls directly, such that in fast paths a process doesn't need to
disable the trap nor the kernel has to check the selector.  This is
essential to return from SIGSYS to a blocked area without triggering
another SIGSYS from rt_sigreturn.

selector is an optional pointer to a char-sized userspace memory region
that has a key switch for the mechanism. This key switch is set to
either PR_SYS_DISPATCH_ON, PR_SYS_DISPATCH_OFF to enable and disable the
redirection without calling the kernel.

The feature is meant to be set per-thread and it is disabled on
fork/clone/execv.

Internally, this doesn't add overhead to the syscall hot path, and it
requires very little per-architecture support.  I avoided using seccomp,
even though it duplicates some functionality, due to previous feedback
that maybe it shouldn't mix with seccomp since it is not a security
mechanism.  And obviously, this should never be considered a security
mechanism, since any part of the program can by-pass it by using the
syscall dispatcher.

For the sysinfo benchmark, which measures the overhead added to
executing a native syscall that doesn't require interception, the
overhead using only the direct dispatcher region to issue syscalls is
pretty much irrelevant.  The overhead of using the selector goes around
40ns for a native (unredirected) syscall in my system, and it is (as
expected) dominated by the supervisor-mode user-address access.  In
fact, with SMAP off, the overhead is consistently less than 5ns on my
test box.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20201127193238.821364-4-krisman@collabora.com
2020-12-02 15:07:56 +01:00
Greg Kroah-Hartman
67d3ed5765 Merge 'v5.10-rc1' into android-mainline
Linux 5.10-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iace3fc84a00d3023c75caa086a266de17dc1847c
2020-10-29 06:32:38 +01:00
Greg Kroah-Hartman
338f8e80b4 Merge 0746c4a9f3 ("Merge branch 'i2c/for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux") into android-mainline
Steps on the way to 5.10-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iec426c6de4a59a517e5fa575a9424b883d958f08
2020-10-28 21:20:34 +01:00
Greg Kroah-Hartman
05d2a661fd Merge 54a4c789ca ("Merge tag 'docs/v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media") into android-mainline
Steps on the way to 5.10-rc1

Resolves conflicts in:
	fs/userfaultfd.c

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie3fe3c818f1f6565cfd4fa551de72d2b72ef60af
2020-10-26 09:23:33 +01:00
Rasmus Villemoes
986b9eacb2 kernel/sys.c: fix prototype of prctl_get_tid_address()
tid_addr is not a "pointer to (pointer to int in userspace)"; it is in
fact a "pointer to (pointer to int in userspace) in userspace".  So
sparse rightfully complains about passing a kernel pointer to
put_user().

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-25 11:44:16 -07:00
Linus Torvalds
81ecf91eab Merge tag 'safesetid-5.10' of git://github.com/micah-morton/linux
Pull SafeSetID updates from Micah Morton:
 "The changes are mostly contained to within the SafeSetID LSM, with the
  exception of a few 1-line changes to change some ns_capable() calls to
  ns_capable_setid() -- causing a flag (CAP_OPT_INSETID) to be set that
  is examined by SafeSetID code and nothing else in the kernel.

  The changes to SafeSetID internally allow for setting up GID
  transition security policies, as already existed for UIDs"

* tag 'safesetid-5.10' of git://github.com/micah-morton/linux:
  LSM: SafeSetID: Fix warnings reported by test bot
  LSM: SafeSetID: Add GID security policy handling
  LSM: Signal to SafeSetID when setting group IDs
2020-10-25 10:45:26 -07:00
Liao Pingfang
15ec0fcff6 kernel/sys.c: replace do_brk with do_brk_flags in comment of prctl_set_mm_map()
Replace do_brk with do_brk_flags in comment of prctl_set_mm_map(), since
do_brk was removed in following commit.

Fixes: bb177a732c ("mm: do not bug_on on incorrect length in __mm_populate()")
Signed-off-by: Liao Pingfang <liao.pingfang@zte.com.cn>
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/1600650751-43127-1-git-send-email-wang.yi59@zte.com.cn
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:19 -07:00
Thomas Cedeno
111767c1d8 LSM: Signal to SafeSetID when setting group IDs
For SafeSetID to properly gate set*gid() calls, it needs to know whether
ns_capable() is being called from within a sys_set*gid() function or is
being called from elsewhere in the kernel. This allows SafeSetID to deny
CAP_SETGID to restricted groups when they are attempting to use the
capability for code paths other than updating GIDs (e.g. setting up
userns GID mappings). This is the identical approach to what is
currently done for CAP_SETUID.

NOTE: We also add signaling to SafeSetID from the setgroups() syscall,
as we have future plans to restrict a process' ability to set
supplementary groups in addition to what is added in this series for
restricting setting of the primary group.

Signed-off-by: Thomas Cedeno <thomascedeno@google.com>
Signed-off-by: Micah Morton <mortonm@chromium.org>
2020-10-13 09:17:34 -07:00
Greg Kroah-Hartman
f022c0602c Merge 5.9-rc3 into android-mainline
Linux 5.9-rc3

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic7758bc57a7d91861657388ddd015db5c5db5480
2020-08-31 19:51:25 +02:00
Gustavo A. R. Silva
df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00