Changes in 5.10.20 vmlinux.lds.h: add DWARF v5 sections vdpa/mlx5: fix param validation in mlx5_vdpa_get_config() debugfs: be more robust at handling improper input in debugfs_lookup() debugfs: do not attempt to create a new file before the filesystem is initalized scsi: libsas: docs: Remove notify_ha_event() scsi: qla2xxx: Fix mailbox Ch erroneous error kdb: Make memory allocations more robust w1: w1_therm: Fix conversion result for negative temperatures PCI: qcom: Use PHY_REFCLK_USE_PAD only for ipq8064 PCI: Decline to resize resources if boot config must be preserved virt: vbox: Do not use wait_event_interruptible when called from kernel context bfq: Avoid false bfq queue merging ALSA: usb-audio: Fix PCM buffer allocation in non-vmalloc mode MIPS: vmlinux.lds.S: add missing PAGE_ALIGNED_DATA() section vmlinux.lds.h: Define SANTIZER_DISCARDS with CONFIG_GCOV_KERNEL=y random: fix the RNDRESEEDCRNG ioctl ALSA: pcm: Call sync_stop at disconnection ALSA: pcm: Assure sync with the pending stop operation at suspend ALSA: pcm: Don't call sync_stop if it hasn't been stopped drm/i915/gt: One more flush for Baytrail clear residuals ath10k: Fix error handling in case of CE pipe init failure Bluetooth: btqcomsmd: Fix a resource leak in error handling paths in the probe function Bluetooth: hci_uart: Fix a race for write_work scheduling Bluetooth: Fix initializing response id after clearing struct arm64: dts: renesas: beacon kit: Fix choppy Bluetooth Audio arm64: dts: renesas: beacon: Fix audio-1.8V pin enable ARM: dts: exynos: correct PMIC interrupt trigger level on Artik 5 ARM: dts: exynos: correct PMIC interrupt trigger level on Monk ARM: dts: exynos: correct PMIC interrupt trigger level on Rinato ARM: dts: exynos: correct PMIC interrupt trigger level on Spring ARM: dts: exynos: correct PMIC interrupt trigger level on Arndale Octa ARM: dts: exynos: correct PMIC interrupt trigger level on Odroid XU3 family arm64: dts: exynos: correct PMIC interrupt trigger level on TM2 arm64: dts: exynos: correct PMIC interrupt trigger level on Espresso memory: mtk-smi: Fix PM usage counter unbalance in mtk_smi ops Bluetooth: hci_qca: Fix memleak in qca_controller_memdump staging: vchiq: Fix bulk userdata handling staging: vchiq: Fix bulk transfers on 64-bit builds arm64: dts: qcom: msm8916-samsung-a5u: Fix iris compatible net: stmmac: dwmac-meson8b: fix enabling the timing-adjustment clock bpf: Add bpf_patch_call_args prototype to include/linux/bpf.h bpf: Avoid warning when re-casting __bpf_call_base into __bpf_call_base_args firmware: arm_scmi: Fix call site of scmi_notification_exit arm64: dts: allwinner: A64: properly connect USB PHY to port 0 arm64: dts: allwinner: H6: properly connect USB PHY to port 0 arm64: dts: allwinner: Drop non-removable from SoPine/LTS SD card arm64: dts: allwinner: H6: Allow up to 150 MHz MMC bus frequency arm64: dts: allwinner: A64: Limit MMC2 bus frequency to 150 MHz arm64: dts: qcom: msm8916-samsung-a2015: Fix sensors cpufreq: brcmstb-avs-cpufreq: Free resources in error path cpufreq: brcmstb-avs-cpufreq: Fix resource leaks in ->remove() arm64: dts: rockchip: rk3328: Add clock_in_out property to gmac2phy node ACPICA: Fix exception code class checks usb: gadget: u_audio: Free requests only after callback arm64: dts: qcom: sdm845-db845c: Fix reset-pin of ov8856 node soc: qcom: socinfo: Fix an off by one in qcom_show_pmic_model() soc: ti: pm33xx: Fix some resource leak in the error handling paths of the probe function staging: media: atomisp: Fix size_t format specifier in hmm_alloc() debug statemenet Bluetooth: drop HCI device reference before return Bluetooth: Put HCI device if inquiry procedure interrupts memory: ti-aemif: Drop child node when jumping out loop ARM: dts: Configure missing thermal interrupt for 4430 usb: dwc2: Do not update data length if it is 0 on inbound transfers usb: dwc2: Abort transaction after errors with unknown reason usb: dwc2: Make "trimming xfer length" a debug message staging: rtl8723bs: wifi_regd.c: Fix incorrect number of regulatory rules x86/MSR: Filter MSR writes through X86_IOC_WRMSR_REGS ioctl too arm64: dts: renesas: beacon: Fix EEPROM compatible value can: mcp251xfd: mcp251xfd_probe(): fix errata reference ARM: dts: armada388-helios4: assign pinctrl to LEDs ARM: dts: armada388-helios4: assign pinctrl to each fan arm64: dts: armada-3720-turris-mox: rename u-boot mtd partition to a53-firmware opp: Correct debug message in _opp_add_static_v2() Bluetooth: btusb: Fix memory leak in btusb_mtk_wmt_recv soc: qcom: ocmem: don't return NULL in of_get_ocmem arm64: dts: msm8916: Fix reserved and rfsa nodes unit address arm64: dts: meson: fix broken wifi node for Khadas VIM3L iwlwifi: mvm: set enabled in the PPAG command properly ARM: s3c: fix fiq for clang IAS optee: simplify i2c access staging: wfx: fix possible panic with re-queued frames ARM: at91: use proper asm syntax in pm_suspend ath10k: Fix suspicious RCU usage warning in ath10k_wmi_tlv_parse_peer_stats_info() ath10k: Fix lockdep assertion warning in ath10k_sta_statistics ath11k: fix a locking bug in ath11k_mac_op_start() soc: aspeed: snoop: Add clock control logic iwlwifi: mvm: fix the type we use in the PPAG table validity checks iwlwifi: mvm: store PPAG enabled/disabled flag properly iwlwifi: mvm: send stored PPAG command instead of local iwlwifi: mvm: assign SAR table revision to the command later iwlwifi: mvm: don't check if CSA event is running before removing bpf_lru_list: Read double-checked variable once without lock iwlwifi: pnvm: set the PNVM again if it was already loaded iwlwifi: pnvm: increment the pointer before checking the TLV ath9k: fix data bus crash when setting nf_override via debugfs selftests/bpf: Convert test_xdp_redirect.sh to bash ibmvnic: Set to CLOSED state even on error bnxt_en: reverse order of TX disable and carrier off bnxt_en: Fix devlink info's stored fw.psid version format. xen/netback: fix spurious event detection for common event case dpaa2-eth: fix memory leak in XDP_REDIRECT net: phy: consider that suspend2ram may cut off PHY power net/mlx5e: Don't change interrupt moderation params when DIM is enabled net/mlx5e: Change interrupt moderation channel params also when channels are closed net/mlx5: Fix health error state handling net/mlx5e: Replace synchronize_rcu with synchronize_net net/mlx5e: kTLS, Use refcounts to free kTLS RX priv context net/mlx5: Disable devlink reload for multi port slave device net/mlx5: Disallow RoCE on multi port slave device net/mlx5: Disallow RoCE on lag device net/mlx5: Disable devlink reload for lag devices net/mlx5e: CT: manage the lifetime of the ct entry object net/mlx5e: Check tunnel offload is required before setting SWP mac80211: fix potential overflow when multiplying to u32 integers libbpf: Ignore non function pointer member in struct_ops bpf: Fix an unitialized value in bpf_iter bpf, devmap: Use GFP_KERNEL for xdp bulk queue allocation bpf: Fix bpf_fib_lookup helper MTU check for SKB ctx selftests: mptcp: fix ACKRX debug message tcp: fix SO_RCVLOWAT related hangs under mem pressure net: axienet: Handle deferred probe on clock properly cxgb4/chtls/cxgbit: Keeping the max ofld immediate data size same in cxgb4 and ulds b43: N-PHY: Fix the update of coef for the PHY revision >= 3case bpf: Clear subreg_def for global function return values ibmvnic: add memory barrier to protect long term buffer ibmvnic: skip send_request_unmap for timeout reset net: dsa: felix: perform teardown in reverse order of setup net: dsa: felix: don't deinitialize unused ports net: phy: mscc: adding LCPLL reset to VSC8514 net: amd-xgbe: Reset the PHY rx data path when mailbox command timeout net: amd-xgbe: Fix NETDEV WATCHDOG transmit queue timeout warning net: amd-xgbe: Reset link when the link never comes back net: amd-xgbe: Fix network fluctuations when using 1G BELFUSE SFP net: mvneta: Remove per-cpu queue mapping for Armada 3700 net: enetc: fix destroyed phylink dereference during unbind tty: convert tty_ldisc_ops 'read()' function to take a kernel pointer tty: implement read_iter fbdev: aty: SPARC64 requires FB_ATY_CT drm/gma500: Fix error return code in psb_driver_load() gma500: clean up error handling in init drm/fb-helper: Add missed unlocks in setcmap_legacy() drm/panel: mantix: Tweak init sequence drm/vc4: hdmi: Take into account the clock doubling flag in atomic_check crypto: sun4i-ss - linearize buffers content must be kept crypto: sun4i-ss - fix kmap usage crypto: arm64/aes-ce - really hide slower algos when faster ones are enabled hwrng: ingenic - Fix a resource leak in an error handling path media: allegro: Fix use after free on error kcsan: Rewrite kcsan_prandom_u32_max() without prandom_u32_state() drm: rcar-du: Fix PM reference leak in rcar_cmm_enable() drm: rcar-du: Fix crash when using LVDS1 clock for CRTC drm: rcar-du: Fix the return check of of_parse_phandle and of_find_device_by_node drm/amdgpu: Fix macro name _AMDGPU_TRACE_H_ in preprocessor if condition MIPS: c-r4k: Fix section mismatch for loongson2_sc_init MIPS: lantiq: Explicitly compare LTQ_EBU_PCC_ISTAT against 0 drm/virtio: make sure context is created in gem open drm/fourcc: fix Amlogic format modifier masks media: ipu3-cio2: Build only for x86 media: i2c: ov5670: Fix PIXEL_RATE minimum value media: imx: Unregister csc/scaler only if registered media: imx: Fix csc/scaler unregister media: mtk-vcodec: fix error return code in vdec_vp9_decode() media: camss: missing error code in msm_video_register() media: vsp1: Fix an error handling path in the probe function media: em28xx: Fix use-after-free in em28xx_alloc_urbs media: media/pci: Fix memleak in empress_init media: tm6000: Fix memleak in tm6000_start_stream media: aspeed: fix error return code in aspeed_video_setup_video() ASoC: cs42l56: fix up error handling in probe ASoC: qcom: qdsp6: Move frontend AIFs to q6asm-dai evm: Fix memleak in init_desc crypto: bcm - Rename struct device_private to bcm_device_private sched/fair: Avoid stale CPU util_est value for schedutil in task dequeue drm/sun4i: tcon: fix inverted DCLK polarity media: imx7: csi: Fix regression for parallel cameras on i.MX6UL media: imx7: csi: Fix pad link validation media: ti-vpe: cal: fix write to unallocated memory MIPS: properly stop .eh_frame generation MIPS: Compare __SYNC_loongson3_war against 0 drm/tegra: Fix reference leak when pm_runtime_get_sync() fails drm/amdgpu: toggle on DF Cstate after finishing xgmi injection bsg: free the request before return error code macintosh/adb-iop: Use big-endian autopoll mask drm/amd/display: Fix 10/12 bpc setup in DCE output bit depth reduction. drm/amd/display: Fix HDMI deep color output for DCE 6-11. media: software_node: Fix refcounts in software_node_get_next_child() media: lmedm04: Fix misuse of comma media: vidtv: psi: fix missing crc for PMT media: atomisp: Fix a buffer overflow in debug code media: qm1d1c0042: fix error return code in qm1d1c0042_init() media: cx25821: Fix a bug when reallocating some dma memory media: mtk-vcodec: fix argument used when DEBUG is defined media: pxa_camera: declare variable when DEBUG is defined media: uvcvideo: Accept invalid bFormatIndex and bFrameIndex values sched/eas: Don't update misfit status if the task is pinned f2fs: compress: fix potential deadlock ASoC: qcom: lpass-cpu: Remove bit clock state check ASoC: SOF: Intel: hda: cancel D0i3 work during runtime suspend perf/arm-cmn: Fix PMU instance naming perf/arm-cmn: Move IRQs when migrating context mtd: parser: imagetag: fix error codes in bcm963xx_parse_imagetag_partitions() crypto: talitos - Work around SEC6 ERRATA (AES-CTR mode data size error) crypto: talitos - Fix ctr(aes) on SEC1 drm/nouveau: bail out of nouveau_channel_new if channel init fails mm: proc: Invalidate TLB after clearing soft-dirty page state ata: ahci_brcm: Add back regulators management ASoC: cpcap: fix microphone timeslot mask ASoC: codecs: add missing max_register in regmap config mtd: parsers: afs: Fix freeing the part name memory in failure f2fs: fix to avoid inconsistent quota data drm/amdgpu: Prevent shift wrapping in amdgpu_read_mask() f2fs: fix a wrong condition in __submit_bio ASoC: qcom: Fix typo error in HDMI regmap config callbacks KVM: nSVM: Don't strip host's C-bit from guest's CR3 when reading PDPTRs drm/mediatek: Check if fb is null Drivers: hv: vmbus: Avoid use-after-free in vmbus_onoffer_rescind() ASoC: Intel: sof_sdw: add missing TGL_HDMI quirk for Dell SKU 0A5E ASoC: Intel: sof_sdw: add missing TGL_HDMI quirk for Dell SKU 0A3E locking/lockdep: Avoid unmatched unlock ASoC: qcom: lpass: Fix i2s ctl register bit map ASoC: rt5682: Fix panic in rt5682_jack_detect_handler happening during system shutdown ASoC: SOF: debug: Fix a potential issue on string buffer termination btrfs: clarify error returns values in __load_free_space_cache btrfs: fix double accounting of ordered extent for subpage case in btrfs_invalidapge KVM: x86: Restore all 64 bits of DR6 and DR7 during RSM on x86-64 s390/zcrypt: return EIO when msg retry limit reached drm/vc4: hdmi: Move hdmi reset to bind drm/vc4: hdmi: Fix register offset with longer CEC messages drm/vc4: hdmi: Fix up CEC registers drm/vc4: hdmi: Restore cec physical address on reconnect drm/vc4: hdmi: Compute the CEC clock divider from the clock rate drm/vc4: hdmi: Update the CEC clock divider on HSM rate change drm/lima: fix reference leak in lima_pm_busy drm/dp_mst: Don't cache EDIDs for physical ports hwrng: timeriomem - Fix cooldown period calculation crypto: ecdh_helper - Ensure 'len >= secret.len' in decode_key() io_uring: fix possible deadlock in io_uring_poll nvmet-tcp: fix receive data digest calculation for multiple h2cdata PDUs nvmet-tcp: fix potential race of tcp socket closing accept_work nvme-multipath: set nr_zones for zoned namespaces nvmet: remove extra variable in identify ns nvmet: set status to 0 in case for invalid nsid ASoC: SOF: sof-pci-dev: add missing Up-Extreme quirk ima: Free IMA measurement buffer on error ima: Free IMA measurement buffer after kexec syscall ASoC: simple-card-utils: Fix device module clock fs/jfs: fix potential integer overflow on shift of a int jffs2: fix use after free in jffs2_sum_write_data() ubifs: Fix memleak in ubifs_init_authentication ubifs: replay: Fix high stack usage, again ubifs: Fix error return code in alloc_wbufs() irqchip/imx: IMX_INTMUX should not default to y, unconditionally smp: Process pending softirqs in flush_smp_call_function_from_idle() drm/amdgpu/display: remove hdcp_srm sysfs on device removal capabilities: Don't allow writing ambiguous v3 file capabilities HSI: Fix PM usage counter unbalance in ssi_hw_init power: supply: cpcap: Add missing IRQF_ONESHOT to fix regression clk: meson: clk-pll: fix initializing the old rate (fallback) for a PLL clk: meson: clk-pll: make "ret" a signed integer clk: meson: clk-pll: propagate the error from meson_clk_pll_set_rate() selftests/powerpc: Make the test check in eeh-basic.sh posix compliant regulator: qcom-rpmh-regulator: add pm8009-1 chip revision arm64: dts: qcom: qrb5165-rb5: fix pm8009 regulators quota: Fix memory leak when handling corrupted quota file i2c: iproc: handle only slave interrupts which are enabled i2c: iproc: update slave isr mask (ISR_MASK_SLAVE) i2c: iproc: handle master read request spi: cadence-quadspi: Abort read if dummy cycles required are too many clk: sunxi-ng: h6: Fix CEC clock clk: renesas: r8a779a0: Remove non-existent S2 clock clk: renesas: r8a779a0: Fix parent of CBFUSA clock HID: core: detect and skip invalid inputs to snto32() RDMA/siw: Fix handling of zero-sized Read and Receive Queues. dmaengine: fsldma: Fix a resource leak in the remove function dmaengine: fsldma: Fix a resource leak in an error handling path of the probe function dmaengine: owl-dma: Fix a resource leak in the remove function dmaengine: hsu: disable spurious interrupt mfd: bd9571mwv: Use devm_mfd_add_devices() power: supply: cpcap-charger: Fix missing power_supply_put() power: supply: cpcap-battery: Fix missing power_supply_put() power: supply: cpcap-charger: Fix power_supply_put on null battery pointer fdt: Properly handle "no-map" field in the memory region of/fdt: Make sure no-map does not remove already reserved regions RDMA/rtrs: Extend ibtrs_cq_qp_create RDMA/rtrs-srv: Release lock before call into close_sess RDMA/rtrs-srv: Use sysfs_remove_file_self for disconnect RDMA/rtrs-clt: Set mininum limit when create QP RDMA/rtrs: Call kobject_put in the failure path RDMA/rtrs-srv: Fix missing wr_cqe RDMA/rtrs-clt: Refactor the failure cases in alloc_clt RDMA/rtrs-srv: Init wr_cnt as 1 power: reset: at91-sama5d2_shdwc: fix wkupdbc mask rtc: s5m: select REGMAP_I2C dmaengine: idxd: set DMA channel to be private power: supply: fix sbs-charger build, needs REGMAP_I2C clocksource/drivers/ixp4xx: Select TIMER_OF when needed clocksource/drivers/mxs_timer: Add missing semicolon when DEBUG is defined spi: imx: Don't print error on -EPROBEDEFER RDMA/mlx5: Use the correct obj_id upon DEVX TIR creation IB/mlx5: Add mutex destroy call to cap_mask_mutex mutex clk: sunxi-ng: h6: Fix clock divider range on some clocks platform/chrome: cros_ec_proto: Use EC_HOST_EVENT_MASK not BIT platform/chrome: cros_ec_proto: Add LID and BATTERY to default mask regulator: axp20x: Fix reference cout leak watch_queue: Drop references to /dev/watch_queue certs: Fix blacklist flag type confusion regulator: s5m8767: Fix reference count leak spi: atmel: Put allocated master before return regulator: s5m8767: Drop regulators OF node reference power: supply: axp20x_usb_power: Init work before enabling IRQs power: supply: smb347-charger: Fix interrupt usage if interrupt is unavailable regulator: core: Avoid debugfs: Directory ... already present! error isofs: release buffer head before return watchdog: intel-mid_wdt: Postpone IRQ handler registration till SCU is ready auxdisplay: ht16k33: Fix refresh rate handling objtool: Fix error handling for STD/CLD warnings objtool: Fix retpoline detection in asm code objtool: Fix ".cold" section suffix check for newer versions of GCC scsi: lpfc: Fix ancient double free iommu: Switch gather->end to the inclusive end IB/umad: Return EIO in case of when device disassociated IB/umad: Return EPOLLERR in case of when device disassociated KVM: PPC: Make the VMX instruction emulation routines static powerpc/47x: Disable 256k page size powerpc/time: Enable sched clock for irqtime mmc: owl-mmc: Fix a resource leak in an error handling path and in the remove function mmc: sdhci-sprd: Fix some resource leaks in the remove function mmc: usdhi6rol0: Fix a resource leak in the error handling path of the probe mmc: renesas_sdhi_internal_dmac: Fix DMA buffer alignment from 8 to 128-bytes ARM: 9046/1: decompressor: Do not clear SCTLR.nTLSMD for ARMv7+ cores i2c: qcom-geni: Store DMA mapping data in geni_i2c_dev struct amba: Fix resource leak for drivers without .remove iommu: Move iotlb_sync_map out from __iommu_map iommu: Properly pass gfp_t in _iommu_map() to avoid atomic sleeping IB/mlx5: Return appropriate error code instead of ENOMEM IB/cm: Avoid a loop when device has 255 ports tracepoint: Do not fail unregistering a probe due to memory failure rtc: zynqmp: depend on HAS_IOMEM perf tools: Fix DSO filtering when not finding a map for a sampled address perf vendor events arm64: Fix Ampere eMag event typo RDMA/rxe: Fix coding error in rxe_recv.c RDMA/rxe: Fix coding error in rxe_rcv_mcast_pkt RDMA/rxe: Correct skb on loopback path spi: stm32: properly handle 0 byte transfer mfd: altera-sysmgr: Fix physical address storing more mfd: wm831x-auxadc: Prevent use after free in wm831x_auxadc_read_irq() powerpc/pseries/dlpar: handle ibm, configure-connector delay status powerpc/8xx: Fix software emulation interrupt clk: qcom: gcc-msm8998: Fix Alpha PLL type for all GPLLs kunit: tool: fix unit test cleanup handling kselftests: dmabuf-heaps: Fix Makefile's inclusion of the kernel's usr/include dir RDMA/hns: Fixed wrong judgments in the goto branch RDMA/siw: Fix calculation of tx_valid_cpus size RDMA/hns: Fix type of sq_signal_bits RDMA/hns: Disable RQ inline by default clk: divider: fix initialization with parent_hw spi: pxa2xx: Fix the controller numbering for Wildcat Point powerpc/uaccess: Avoid might_fault() when user access is enabled powerpc/kuap: Restore AMR after replaying soft interrupts regulator: qcom-rpmh: fix pm8009 ldo7 clk: aspeed: Fix APLL calculate formula from ast2600-A2 selftests/ftrace: Update synthetic event syntax errors perf symbols: Use (long) for iterator for bfd symbols regulator: bd718x7, bd71828, Fix dvs voltage levels spi: dw: Avoid stack content exposure spi: Skip zero-length transfers in spi_transfer_one_message() printk: avoid prb_first_valid_seq() where possible perf symbols: Fix return value when loading PE DSO nfsd: register pernet ops last, unregister first svcrdma: Hold private mutex while invoking rdma_accept() ceph: fix flush_snap logic after putting caps RDMA/hns: Fixes missing error code of CMDQ RDMA/ucma: Fix use-after-free bug in ucma_create_uevent RDMA/rtrs-srv: Fix stack-out-of-bounds RDMA/rtrs: Only allow addition of path to an already established session RDMA/rtrs-srv: fix memory leak by missing kobject free RDMA/rtrs-srv-sysfs: fix missing put_device RDMA/rtrs-srv: Do not pass a valid pointer to PTR_ERR() Input: sur40 - fix an error code in sur40_probe() perf record: Fix continue profiling after draining the buffer perf intel-pt: Fix missing CYC processing in PSB perf intel-pt: Fix premature IPC perf intel-pt: Fix IPC with CYC threshold perf test: Fix unaligned access in sample parsing test Input: elo - fix an error code in elo_connect() sparc64: only select COMPAT_BINFMT_ELF if BINFMT_ELF is set sparc: fix led.c driver when PROC_FS is not enabled Input: zinitix - fix return type of zinitix_init_touch() ARM: 9065/1: OABI compat: fix build when EPOLL is not enabled misc: eeprom_93xx46: Fix module alias to enable module autoprobe phy: rockchip-emmc: emmc_phy_init() always return 0 phy: cadence-torrent: Fix error code in cdns_torrent_phy_probe() misc: eeprom_93xx46: Add module alias to avoid breaking support for non device tree users PCI: rcar: Always allocate MSI addresses in 32bit space soundwire: cadence: fix ACK/NAK handling pwm: rockchip: Enable APB clock during register access while probing pwm: rockchip: rockchip_pwm_probe(): Remove superfluous clk_unprepare() pwm: rockchip: Eliminate potential race condition when probing PCI: xilinx-cpm: Fix reference count leak on error path VMCI: Use set_page_dirty_lock() when unregistering guest memory PCI: Align checking of syscall user config accessors mei: hbm: call mei_set_devstate() on hbm stop response drm/msm: Fix MSM_INFO_GET_IOVA with carveout drm/msm/dsi: Correct io_start for MSM8994 (20nm PHY) drm/msm/mdp5: Fix wait-for-commit for cmd panels drm/msm: Fix race of GPU init vs timestamp power management. drm/msm: Fix races managing the OOB state for timestamp vs timestamps. drm/msm/dp: trigger unplug event in msm_dp_display_disable vfio/iommu_type1: Populate full dirty when detach non-pinned group vfio/iommu_type1: Fix some sanity checks in detach group vfio-pci/zdev: fix possible segmentation fault issue ext4: fix potential htree index checksum corruption phy: USB_LGM_PHY should depend on X86 coresight: etm4x: Skip accessing TRCPDCR in save/restore nvmem: core: Fix a resource leak on error in nvmem_add_cells_from_of() nvmem: core: skip child nodes not matching binding soundwire: bus: use sdw_update_no_pm when initializing a device soundwire: bus: use sdw_write_no_pm when setting the bus scale registers soundwire: export sdw_write/read_no_pm functions soundwire: bus: fix confusion on device used by pm_runtime misc: fastrpc: fix incorrect usage of dma_map_sgtable remoteproc/mediatek: acknowledge watchdog IRQ after handled regmap: sdw: use _no_pm functions in regmap_read/write ext: EXT4_KUNIT_TESTS should depend on EXT4_FS instead of selecting it mailbox: sprd: correct definition of SPRD_OUTBOX_FIFO_FULL device-dax: Fix default return code of range_parse() PCI: pci-bridge-emul: Fix array overruns, improve safety PCI: cadence: Fix DMA range mapping early return error i40e: Fix flow for IPv6 next header (extension header) i40e: Add zero-initialization of AQ command structures i40e: Fix overwriting flow control settings during driver loading i40e: Fix addition of RX filters after enabling FW LLDP agent i40e: Fix VFs not created Take mmap lock in cacheflush syscall nios2: fixed broken sys_clone syscall i40e: Fix add TC filter for IPv6 octeontx2-af: Fix an off by one in rvu_dbg_qsize_write() pwm: iqs620a: Fix overflow and optimize calculations vfio/type1: Use follow_pte() ice: report correct max number of TCs ice: Account for port VLAN in VF max packet size calculation ice: Fix state bits on LLDP mode switch ice: update the number of available RSS queues net: stmmac: fix CBS idleslope and sendslope calculation net/mlx4_core: Add missed mlx4_free_cmd_mailbox() PCI: rockchip: Make 'ep-gpios' DT property optional vxlan: move debug check after netdev unregister wireguard: device: do not generate ICMP for non-IP packets wireguard: kconfig: use arm chacha even with no neon ocfs2: fix a use after free on error mm: memcontrol: fix NR_ANON_THPS accounting in charge moving mm: memcontrol: fix slub memory accounting mm/memory.c: fix potential pte_unmap_unlock pte error mm/hugetlb: fix potential double free in hugetlb_register_node() error path mm/hugetlb: suppress wrong warning info when alloc gigantic page mm/compaction: fix misbehaviors of fast_find_migrateblock() r8169: fix jumbo packet handling on RTL8168e NFSv4: Fixes for nfs4_bitmask_adjust() KVM: SVM: Intercept INVPCID when it's disabled to inject #UD KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE and HugeTLB pages arm64: Add missing ISB after invalidating TLB in __primary_switch i2c: brcmstb: Fix brcmstd_send_i2c_cmd condition i2c: exynos5: Preserve high speed master code mm,thp,shmem: make khugepaged obey tmpfs mount flags mm: fix memory_failure() handling of dax-namespace metadata mm/rmap: fix potential pte_unmap on an not mapped pte proc: use kvzalloc for our kernel buffer csky: Fix a size determination in gpr_get() scsi: bnx2fc: Fix Kconfig warning & CNIC build errors scsi: sd: sd_zbc: Don't pass GFP_NOIO to kvcalloc block: reopen the device in blkdev_reread_part ide/falconide: Fix module unload scsi: sd: Fix Opal support blk-settings: align max_sectors on "logical_block_size" boundary soundwire: intel: fix possible crash when no device is detected ACPI: property: Fix fwnode string properties matching ACPI: configfs: add missing check after configfs_register_default_group() cpufreq: ACPI: Set cpuinfo.max_freq directly if max boost is known HID: logitech-dj: add support for keyboard events in eQUAD step 4 Gaming HID: wacom: Ignore attempts to overwrite the touch_max value from HID Input: raydium_ts_i2c - do not send zero length Input: xpad - add support for PowerA Enhanced Wired Controller for Xbox Series X|S Input: joydev - prevent potential read overflow in ioctl Input: i8042 - add ASUS Zenbook Flip to noselftest list media: mceusb: Fix potential out-of-bounds shift USB: serial: option: update interface mapping for ZTE P685M usb: musb: Fix runtime PM race in musb_queue_resume_work usb: dwc3: gadget: Fix setting of DEPCFG.bInterval_m1 usb: dwc3: gadget: Fix dep->interval for fullspeed interrupt USB: serial: ftdi_sio: fix FTX sub-integer prescaler USB: serial: pl2303: fix line-speed handling on newer chips USB: serial: mos7840: fix error code in mos7840_write() USB: serial: mos7720: fix error code in mos7720_write() phy: lantiq: rcu-usb2: wait after clock enable ALSA: fireface: fix to parse sync status register of latter protocol ALSA: hda: Add another CometLake-H PCI ID ALSA: hda/hdmi: Drop bogus check at closing a stream ALSA: hda/realtek: modify EAPD in the ALC886 ALSA: hda/realtek: Quirk for HP Spectre x360 14 amp setup MIPS: Ingenic: Disable HPTLB for D0 XBurst CPUs too MIPS: Support binutils configured with --enable-mips-fix-loongson3-llsc=yes MIPS: VDSO: Use CLANG_FLAGS instead of filtering out '--target=' Revert "MIPS: Octeon: Remove special handling of CONFIG_MIPS_ELF_APPENDED_DTB=y" Revert "bcache: Kill btree_io_wq" bcache: Give btree_io_wq correct semantics again bcache: Move journal work to new flush wq Revert "drm/amd/display: Update NV1x SR latency values" drm/amd/display: Add FPU wrappers to dcn21_validate_bandwidth() drm/amd/display: Remove Assert from dcn10_get_dig_frontend drm/amd/display: Add vupdate_no_lock interrupts for DCN2.1 drm/amdkfd: Fix recursive lock warnings drm/amdgpu: Set reference clock to 100Mhz on Renoir (v2) drm/nouveau/kms: handle mDP connectors drm/modes: Switch to 64bit maths to avoid integer overflow drm/sched: Cancel and flush all outstanding jobs before finish. drm/panel: kd35t133: allow using non-continuous dsi clock drm/rockchip: Require the YTR modifier for AFBC ASoC: siu: Fix build error by a wrong const prefix selinux: fix inconsistency between inode_getxattr and inode_listsecurity erofs: initialized fields can only be observed after bit is set tpm_tis: Fix check_locality for correct locality acquisition tpm_tis: Clean up locality release KEYS: trusted: Fix incorrect handling of tpm_get_random() KEYS: trusted: Fix migratable=1 failing KEYS: trusted: Reserve TPM for seal and unseal operations btrfs: do not cleanup upper nodes in btrfs_backref_cleanup_node btrfs: do not warn if we can't find the reloc root when looking up backref btrfs: add asserts for deleting backref cache nodes btrfs: abort the transaction if we fail to inc ref in btrfs_copy_root btrfs: fix reloc root leak with 0 ref reloc roots on recovery btrfs: splice remaining dirty_bg's onto the transaction dirty bg list btrfs: handle space_info::total_bytes_pinned inside the delayed ref itself btrfs: account for new extents being deleted in total_bytes_pinned btrfs: fix extent buffer leak on failure to copy root drm/i915/gt: Flush before changing register state drm/i915/gt: Correct surface base address for renderclear crypto: arm64/sha - add missing module aliases crypto: aesni - prevent misaligned buffers on the stack crypto: michael_mic - fix broken misalignment handling crypto: sun4i-ss - checking sg length is not sufficient crypto: sun4i-ss - IV register does not work on A10 and A13 crypto: sun4i-ss - handle BigEndian for cipher crypto: sun4i-ss - initialize need_fallback soc: samsung: exynos-asv: don't defer early on not-supported SoCs soc: samsung: exynos-asv: handle reading revision register error seccomp: Add missing return in non-void function arm64: ptrace: Fix seccomp of traced syscall -1 (NO_SYSCALL) misc: rtsx: init of rts522a add OCP power off when no card is present drivers/misc/vmw_vmci: restrict too big queue size in qp_host_alloc_queue pstore: Fix typo in compression option name dts64: mt7622: fix slow sd card access arm64: dts: agilex: fix phy interface bit shift for gmac1 and gmac2 staging/mt7621-dma: mtk-hsdma.c->hsdma-mt7621.c staging: gdm724x: Fix DMA from stack staging: rtl8188eu: Add Edimax EW-7811UN V2 to device table floppy: reintroduce O_NDELAY fix media: i2c: max9286: fix access to unallocated memory media: ir_toy: add another IR Droid device media: ipu3-cio2: Fix mbus_code processing in cio2_subdev_set_fmt() media: marvell-ccic: power up the device on mclk enable media: smipcie: fix interrupt handling and IR timeout x86/virt: Eat faults on VMXOFF in reboot flows x86/reboot: Force all cpus to exit VMX root if VMX is supported x86/fault: Fix AMD erratum #91 errata fixup for user code x86/entry: Fix instrumentation annotation powerpc/prom: Fix "ibm,arch-vec-5-platform-support" scan rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers rcu/nocb: Perform deferred wake up before last idle's need_resched() check kprobes: Fix to delay the kprobes jump optimization arm64: Extend workaround for erratum 1024718 to all versions of Cortex-A55 iommu/arm-smmu-qcom: Fix mask extraction for bootloader programmed SMRs arm64: kexec_file: fix memory leakage in create_dtb() when fdt_open_into() fails arm64: uprobe: Return EOPNOTSUPP for AARCH32 instruction probing arm64 module: set plt* section addresses to 0x0 arm64: spectre: Prevent lockdep splat on v4 mitigation enable path riscv: Disable KSAN_SANITIZE for vDSO watchdog: qcom: Remove incorrect usage of QCOM_WDT_ENABLE_IRQ watchdog: mei_wdt: request stop on unregister coresight: etm4x: Handle accesses to TRCSTALLCTLR mtd: spi-nor: sfdp: Fix last erase region marking mtd: spi-nor: sfdp: Fix wrong erase type bitmask for overlaid region mtd: spi-nor: core: Fix erase type discovery for overlaid region mtd: spi-nor: core: Add erase size check for erase command initialization mtd: spi-nor: hisi-sfc: Put child node np on error path fs/affs: release old buffer head on error path seq_file: document how per-entry resources are managed. x86: fix seq_file iteration for pat/memtype.c mm: memcontrol: fix swap undercounting in cgroup2 mm: memcontrol: fix get_active_memcg return value hugetlb: fix update_and_free_page contig page struct assumption hugetlb: fix copy_huge_page_from_user contig page struct assumption mm/vmscan: restore zone_reclaim_mode ABI mm, compaction: make fast_isolate_freepages() stay within zone KVM: nSVM: fix running nested guests when npt=0 nvmem: qcom-spmi-sdam: Fix uninitialized pdev pointer module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for undefined symbols mmc: sdhci-esdhc-imx: fix kernel panic when remove module mmc: sdhci-pci-o2micro: Bug fix for SDR104 HW tuning failure powerpc/32: Preserve cr1 in exception prolog stack check to fix build error powerpc/kexec_file: fix FDT size estimation for kdump kernel powerpc/32s: Add missing call to kuep_lock on syscall entry spmi: spmi-pmic-arb: Fix hw_irq overflow mei: fix transfer over dma with extended header mei: me: emmitsburg workstation DID mei: me: add adler lake point S DID mei: me: add adler lake point LP DID gpio: pcf857x: Fix missing first interrupt mfd: gateworks-gsc: Fix interrupt type printk: fix deadlock when kernel panic exfat: fix shift-out-of-bounds in exfat_fill_super() zonefs: Fix file size of zones in full condition kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE thermal: cpufreq_cooling: freq_qos_update_request() returns < 0 on error cpufreq: qcom-hw: drop devm_xxx() calls from init/exit hooks cpufreq: intel_pstate: Change intel_pstate_get_hwp_max() argument cpufreq: intel_pstate: Get per-CPU max freq via MSR_HWP_CAPABILITIES if available proc: don't allow async path resolution of /proc/thread-self components s390/vtime: fix inline assembly clobber list virtio/s390: implement virtio-ccw revision 2 correctly um: mm: check more comprehensively for stub changes um: defer killing userspace on page table update failures irqchip/loongson-pch-msi: Use bitmap_zalloc() to allocate bitmap f2fs: fix out-of-repair __setattr_copy() f2fs: enforce the immutable flag on open files f2fs: flush data when enabling checkpoint back sparc32: fix a user-triggerable oops in clear_user() spi: fsl: invert spisel_boot signal on MPC8309 spi: spi-synquacer: fix set_cs handling gfs2: fix glock confusion in function signal_our_withdraw gfs2: Don't skip dlm unlock if glock has an lvb gfs2: Lock imbalance on error path in gfs2_recover_one gfs2: Recursive gfs2_quota_hold in gfs2_iomap_end dm: fix deadlock when swapping to encrypted device dm table: fix iterate_devices based device capability checks dm table: fix DAX iterate_devices based device capability checks dm table: fix zoned iterate_devices based device capability checks dm writecache: fix performance degradation in ssd mode dm writecache: return the exact table values that were set dm writecache: fix writing beyond end of underlying device when shrinking dm era: Recover committed writeset after crash dm era: Update in-core bitset after committing the metadata dm era: Verify the data block size hasn't changed dm era: Fix bitset memory leaks dm era: Use correct value size in equality function of writeset tree dm era: Reinitialize bitset cache before digesting a new writeset dm era: only resize metadata in preresume drm/i915: Reject 446-480MHz HDMI clock on GLK kgdb: fix to kill breakpoints on initmem after boot ipv6: silence compilation warning for non-IPV6 builds net: icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending wireguard: selftests: test multiple parallel streams wireguard: queueing: get rid of per-peer ring buffers net: sched: fix police ext initialization net: qrtr: Fix memory leak in qrtr_tun_open net_sched: fix RTNL deadlock again caused by request_module() ARM: dts: aspeed: Add LCLK to lpc-snoop Linux 5.10.20 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I3fbcecd9413ce212dac68d5cc800c9457feba56a
1410 lines
40 KiB
C
1410 lines
40 KiB
C
// SPDX-License-Identifier: GPL-2.0-or-later
|
|
/* Common capabilities, needed by capability.o.
|
|
*/
|
|
|
|
#include <linux/capability.h>
|
|
#include <linux/audit.h>
|
|
#include <linux/init.h>
|
|
#include <linux/kernel.h>
|
|
#include <linux/lsm_hooks.h>
|
|
#include <linux/file.h>
|
|
#include <linux/mm.h>
|
|
#include <linux/mman.h>
|
|
#include <linux/pagemap.h>
|
|
#include <linux/swap.h>
|
|
#include <linux/skbuff.h>
|
|
#include <linux/netlink.h>
|
|
#include <linux/ptrace.h>
|
|
#include <linux/xattr.h>
|
|
#include <linux/hugetlb.h>
|
|
#include <linux/mount.h>
|
|
#include <linux/sched.h>
|
|
#include <linux/prctl.h>
|
|
#include <linux/securebits.h>
|
|
#include <linux/user_namespace.h>
|
|
#include <linux/binfmts.h>
|
|
#include <linux/personality.h>
|
|
|
|
/*
|
|
* If a non-root user executes a setuid-root binary in
|
|
* !secure(SECURE_NOROOT) mode, then we raise capabilities.
|
|
* However if fE is also set, then the intent is for only
|
|
* the file capabilities to be applied, and the setuid-root
|
|
* bit is left on either to change the uid (plausible) or
|
|
* to get full privilege on a kernel without file capabilities
|
|
* support. So in that case we do not raise capabilities.
|
|
*
|
|
* Warn if that happens, once per boot.
|
|
*/
|
|
static void warn_setuid_and_fcaps_mixed(const char *fname)
|
|
{
|
|
static int warned;
|
|
if (!warned) {
|
|
printk(KERN_INFO "warning: `%s' has both setuid-root and"
|
|
" effective capabilities. Therefore not raising all"
|
|
" capabilities.\n", fname);
|
|
warned = 1;
|
|
}
|
|
}
|
|
|
|
/**
|
|
* cap_capable - Determine whether a task has a particular effective capability
|
|
* @cred: The credentials to use
|
|
* @ns: The user namespace in which we need the capability
|
|
* @cap: The capability to check for
|
|
* @opts: Bitmask of options defined in include/linux/security.h
|
|
*
|
|
* Determine whether the nominated task has the specified capability amongst
|
|
* its effective set, returning 0 if it does, -ve if it does not.
|
|
*
|
|
* NOTE WELL: cap_has_capability() cannot be used like the kernel's capable()
|
|
* and has_capability() functions. That is, it has the reverse semantics:
|
|
* cap_has_capability() returns 0 when a task has a capability, but the
|
|
* kernel's capable() and has_capability() returns 1 for this case.
|
|
*/
|
|
int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
|
|
int cap, unsigned int opts)
|
|
{
|
|
struct user_namespace *ns = targ_ns;
|
|
|
|
/* See if cred has the capability in the target user namespace
|
|
* by examining the target user namespace and all of the target
|
|
* user namespace's parents.
|
|
*/
|
|
for (;;) {
|
|
/* Do we have the necessary capabilities? */
|
|
if (ns == cred->user_ns)
|
|
return cap_raised(cred->cap_effective, cap) ? 0 : -EPERM;
|
|
|
|
/*
|
|
* If we're already at a lower level than we're looking for,
|
|
* we're done searching.
|
|
*/
|
|
if (ns->level <= cred->user_ns->level)
|
|
return -EPERM;
|
|
|
|
/*
|
|
* The owner of the user namespace in the parent of the
|
|
* user namespace has all caps.
|
|
*/
|
|
if ((ns->parent == cred->user_ns) && uid_eq(ns->owner, cred->euid))
|
|
return 0;
|
|
|
|
/*
|
|
* If you have a capability in a parent user ns, then you have
|
|
* it over all children user namespaces as well.
|
|
*/
|
|
ns = ns->parent;
|
|
}
|
|
|
|
/* We never get here */
|
|
}
|
|
|
|
/**
|
|
* cap_settime - Determine whether the current process may set the system clock
|
|
* @ts: The time to set
|
|
* @tz: The timezone to set
|
|
*
|
|
* Determine whether the current process may set the system clock and timezone
|
|
* information, returning 0 if permission granted, -ve if denied.
|
|
*/
|
|
int cap_settime(const struct timespec64 *ts, const struct timezone *tz)
|
|
{
|
|
if (!capable(CAP_SYS_TIME))
|
|
return -EPERM;
|
|
return 0;
|
|
}
|
|
|
|
/**
|
|
* cap_ptrace_access_check - Determine whether the current process may access
|
|
* another
|
|
* @child: The process to be accessed
|
|
* @mode: The mode of attachment.
|
|
*
|
|
* If we are in the same or an ancestor user_ns and have all the target
|
|
* task's capabilities, then ptrace access is allowed.
|
|
* If we have the ptrace capability to the target user_ns, then ptrace
|
|
* access is allowed.
|
|
* Else denied.
|
|
*
|
|
* Determine whether a process may access another, returning 0 if permission
|
|
* granted, -ve if denied.
|
|
*/
|
|
int cap_ptrace_access_check(struct task_struct *child, unsigned int mode)
|
|
{
|
|
int ret = 0;
|
|
const struct cred *cred, *child_cred;
|
|
const kernel_cap_t *caller_caps;
|
|
|
|
rcu_read_lock();
|
|
cred = current_cred();
|
|
child_cred = __task_cred(child);
|
|
if (mode & PTRACE_MODE_FSCREDS)
|
|
caller_caps = &cred->cap_effective;
|
|
else
|
|
caller_caps = &cred->cap_permitted;
|
|
if (cred->user_ns == child_cred->user_ns &&
|
|
cap_issubset(child_cred->cap_permitted, *caller_caps))
|
|
goto out;
|
|
if (ns_capable(child_cred->user_ns, CAP_SYS_PTRACE))
|
|
goto out;
|
|
ret = -EPERM;
|
|
out:
|
|
rcu_read_unlock();
|
|
return ret;
|
|
}
|
|
|
|
/**
|
|
* cap_ptrace_traceme - Determine whether another process may trace the current
|
|
* @parent: The task proposed to be the tracer
|
|
*
|
|
* If parent is in the same or an ancestor user_ns and has all current's
|
|
* capabilities, then ptrace access is allowed.
|
|
* If parent has the ptrace capability to current's user_ns, then ptrace
|
|
* access is allowed.
|
|
* Else denied.
|
|
*
|
|
* Determine whether the nominated task is permitted to trace the current
|
|
* process, returning 0 if permission is granted, -ve if denied.
|
|
*/
|
|
int cap_ptrace_traceme(struct task_struct *parent)
|
|
{
|
|
int ret = 0;
|
|
const struct cred *cred, *child_cred;
|
|
|
|
rcu_read_lock();
|
|
cred = __task_cred(parent);
|
|
child_cred = current_cred();
|
|
if (cred->user_ns == child_cred->user_ns &&
|
|
cap_issubset(child_cred->cap_permitted, cred->cap_permitted))
|
|
goto out;
|
|
if (has_ns_capability(parent, child_cred->user_ns, CAP_SYS_PTRACE))
|
|
goto out;
|
|
ret = -EPERM;
|
|
out:
|
|
rcu_read_unlock();
|
|
return ret;
|
|
}
|
|
|
|
/**
|
|
* cap_capget - Retrieve a task's capability sets
|
|
* @target: The task from which to retrieve the capability sets
|
|
* @effective: The place to record the effective set
|
|
* @inheritable: The place to record the inheritable set
|
|
* @permitted: The place to record the permitted set
|
|
*
|
|
* This function retrieves the capabilities of the nominated task and returns
|
|
* them to the caller.
|
|
*/
|
|
int cap_capget(struct task_struct *target, kernel_cap_t *effective,
|
|
kernel_cap_t *inheritable, kernel_cap_t *permitted)
|
|
{
|
|
const struct cred *cred;
|
|
|
|
/* Derived from kernel/capability.c:sys_capget. */
|
|
rcu_read_lock();
|
|
cred = __task_cred(target);
|
|
*effective = cred->cap_effective;
|
|
*inheritable = cred->cap_inheritable;
|
|
*permitted = cred->cap_permitted;
|
|
rcu_read_unlock();
|
|
return 0;
|
|
}
|
|
|
|
/*
|
|
* Determine whether the inheritable capabilities are limited to the old
|
|
* permitted set. Returns 1 if they are limited, 0 if they are not.
|
|
*/
|
|
static inline int cap_inh_is_capped(void)
|
|
{
|
|
/* they are so limited unless the current task has the CAP_SETPCAP
|
|
* capability
|
|
*/
|
|
if (cap_capable(current_cred(), current_cred()->user_ns,
|
|
CAP_SETPCAP, CAP_OPT_NONE) == 0)
|
|
return 0;
|
|
return 1;
|
|
}
|
|
|
|
/**
|
|
* cap_capset - Validate and apply proposed changes to current's capabilities
|
|
* @new: The proposed new credentials; alterations should be made here
|
|
* @old: The current task's current credentials
|
|
* @effective: A pointer to the proposed new effective capabilities set
|
|
* @inheritable: A pointer to the proposed new inheritable capabilities set
|
|
* @permitted: A pointer to the proposed new permitted capabilities set
|
|
*
|
|
* This function validates and applies a proposed mass change to the current
|
|
* process's capability sets. The changes are made to the proposed new
|
|
* credentials, and assuming no error, will be committed by the caller of LSM.
|
|
*/
|
|
int cap_capset(struct cred *new,
|
|
const struct cred *old,
|
|
const kernel_cap_t *effective,
|
|
const kernel_cap_t *inheritable,
|
|
const kernel_cap_t *permitted)
|
|
{
|
|
if (cap_inh_is_capped() &&
|
|
!cap_issubset(*inheritable,
|
|
cap_combine(old->cap_inheritable,
|
|
old->cap_permitted)))
|
|
/* incapable of using this inheritable set */
|
|
return -EPERM;
|
|
|
|
if (!cap_issubset(*inheritable,
|
|
cap_combine(old->cap_inheritable,
|
|
old->cap_bset)))
|
|
/* no new pI capabilities outside bounding set */
|
|
return -EPERM;
|
|
|
|
/* verify restrictions on target's new Permitted set */
|
|
if (!cap_issubset(*permitted, old->cap_permitted))
|
|
return -EPERM;
|
|
|
|
/* verify the _new_Effective_ is a subset of the _new_Permitted_ */
|
|
if (!cap_issubset(*effective, *permitted))
|
|
return -EPERM;
|
|
|
|
new->cap_effective = *effective;
|
|
new->cap_inheritable = *inheritable;
|
|
new->cap_permitted = *permitted;
|
|
|
|
/*
|
|
* Mask off ambient bits that are no longer both permitted and
|
|
* inheritable.
|
|
*/
|
|
new->cap_ambient = cap_intersect(new->cap_ambient,
|
|
cap_intersect(*permitted,
|
|
*inheritable));
|
|
if (WARN_ON(!cap_ambient_invariant_ok(new)))
|
|
return -EINVAL;
|
|
return 0;
|
|
}
|
|
|
|
/**
|
|
* cap_inode_need_killpriv - Determine if inode change affects privileges
|
|
* @dentry: The inode/dentry in being changed with change marked ATTR_KILL_PRIV
|
|
*
|
|
* Determine if an inode having a change applied that's marked ATTR_KILL_PRIV
|
|
* affects the security markings on that inode, and if it is, should
|
|
* inode_killpriv() be invoked or the change rejected.
|
|
*
|
|
* Returns 1 if security.capability has a value, meaning inode_killpriv()
|
|
* is required, 0 otherwise, meaning inode_killpriv() is not required.
|
|
*/
|
|
int cap_inode_need_killpriv(struct dentry *dentry)
|
|
{
|
|
struct inode *inode = d_backing_inode(dentry);
|
|
int error;
|
|
|
|
error = __vfs_getxattr(dentry, inode, XATTR_NAME_CAPS, NULL, 0,
|
|
XATTR_NOSECURITY);
|
|
return error > 0;
|
|
}
|
|
|
|
/**
|
|
* cap_inode_killpriv - Erase the security markings on an inode
|
|
* @dentry: The inode/dentry to alter
|
|
*
|
|
* Erase the privilege-enhancing security markings on an inode.
|
|
*
|
|
* Returns 0 if successful, -ve on error.
|
|
*/
|
|
int cap_inode_killpriv(struct dentry *dentry)
|
|
{
|
|
int error;
|
|
|
|
error = __vfs_removexattr(dentry, XATTR_NAME_CAPS);
|
|
if (error == -EOPNOTSUPP)
|
|
error = 0;
|
|
return error;
|
|
}
|
|
|
|
static bool rootid_owns_currentns(kuid_t kroot)
|
|
{
|
|
struct user_namespace *ns;
|
|
|
|
if (!uid_valid(kroot))
|
|
return false;
|
|
|
|
for (ns = current_user_ns(); ; ns = ns->parent) {
|
|
if (from_kuid(ns, kroot) == 0)
|
|
return true;
|
|
if (ns == &init_user_ns)
|
|
break;
|
|
}
|
|
|
|
return false;
|
|
}
|
|
|
|
static __u32 sansflags(__u32 m)
|
|
{
|
|
return m & ~VFS_CAP_FLAGS_EFFECTIVE;
|
|
}
|
|
|
|
static bool is_v2header(size_t size, const struct vfs_cap_data *cap)
|
|
{
|
|
if (size != XATTR_CAPS_SZ_2)
|
|
return false;
|
|
return sansflags(le32_to_cpu(cap->magic_etc)) == VFS_CAP_REVISION_2;
|
|
}
|
|
|
|
static bool is_v3header(size_t size, const struct vfs_cap_data *cap)
|
|
{
|
|
if (size != XATTR_CAPS_SZ_3)
|
|
return false;
|
|
return sansflags(le32_to_cpu(cap->magic_etc)) == VFS_CAP_REVISION_3;
|
|
}
|
|
|
|
/*
|
|
* getsecurity: We are called for security.* before any attempt to read the
|
|
* xattr from the inode itself.
|
|
*
|
|
* This gives us a chance to read the on-disk value and convert it. If we
|
|
* return -EOPNOTSUPP, then vfs_getxattr() will call the i_op handler.
|
|
*
|
|
* Note we are not called by vfs_getxattr_alloc(), but that is only called
|
|
* by the integrity subsystem, which really wants the unconverted values -
|
|
* so that's good.
|
|
*/
|
|
int cap_inode_getsecurity(struct inode *inode, const char *name, void **buffer,
|
|
bool alloc)
|
|
{
|
|
int size, ret;
|
|
kuid_t kroot;
|
|
u32 nsmagic, magic;
|
|
uid_t root, mappedroot;
|
|
char *tmpbuf = NULL;
|
|
struct vfs_cap_data *cap;
|
|
struct vfs_ns_cap_data *nscap = NULL;
|
|
struct dentry *dentry;
|
|
struct user_namespace *fs_ns;
|
|
|
|
if (strcmp(name, "capability") != 0)
|
|
return -EOPNOTSUPP;
|
|
|
|
dentry = d_find_any_alias(inode);
|
|
if (!dentry)
|
|
return -EINVAL;
|
|
|
|
size = sizeof(struct vfs_ns_cap_data);
|
|
ret = (int) vfs_getxattr_alloc(dentry, XATTR_NAME_CAPS,
|
|
&tmpbuf, size, GFP_NOFS);
|
|
dput(dentry);
|
|
|
|
if (ret < 0)
|
|
return ret;
|
|
|
|
fs_ns = inode->i_sb->s_user_ns;
|
|
cap = (struct vfs_cap_data *) tmpbuf;
|
|
if (is_v2header((size_t) ret, cap)) {
|
|
root = 0;
|
|
} else if (is_v3header((size_t) ret, cap)) {
|
|
nscap = (struct vfs_ns_cap_data *) tmpbuf;
|
|
root = le32_to_cpu(nscap->rootid);
|
|
} else {
|
|
size = -EINVAL;
|
|
goto out_free;
|
|
}
|
|
|
|
kroot = make_kuid(fs_ns, root);
|
|
|
|
/* If the root kuid maps to a valid uid in current ns, then return
|
|
* this as a nscap. */
|
|
mappedroot = from_kuid(current_user_ns(), kroot);
|
|
if (mappedroot != (uid_t)-1 && mappedroot != (uid_t)0) {
|
|
size = sizeof(struct vfs_ns_cap_data);
|
|
if (alloc) {
|
|
if (!nscap) {
|
|
/* v2 -> v3 conversion */
|
|
nscap = kzalloc(size, GFP_ATOMIC);
|
|
if (!nscap) {
|
|
size = -ENOMEM;
|
|
goto out_free;
|
|
}
|
|
nsmagic = VFS_CAP_REVISION_3;
|
|
magic = le32_to_cpu(cap->magic_etc);
|
|
if (magic & VFS_CAP_FLAGS_EFFECTIVE)
|
|
nsmagic |= VFS_CAP_FLAGS_EFFECTIVE;
|
|
memcpy(&nscap->data, &cap->data, sizeof(__le32) * 2 * VFS_CAP_U32);
|
|
nscap->magic_etc = cpu_to_le32(nsmagic);
|
|
} else {
|
|
/* use allocated v3 buffer */
|
|
tmpbuf = NULL;
|
|
}
|
|
nscap->rootid = cpu_to_le32(mappedroot);
|
|
*buffer = nscap;
|
|
}
|
|
goto out_free;
|
|
}
|
|
|
|
if (!rootid_owns_currentns(kroot)) {
|
|
size = -EOVERFLOW;
|
|
goto out_free;
|
|
}
|
|
|
|
/* This comes from a parent namespace. Return as a v2 capability */
|
|
size = sizeof(struct vfs_cap_data);
|
|
if (alloc) {
|
|
if (nscap) {
|
|
/* v3 -> v2 conversion */
|
|
cap = kzalloc(size, GFP_ATOMIC);
|
|
if (!cap) {
|
|
size = -ENOMEM;
|
|
goto out_free;
|
|
}
|
|
magic = VFS_CAP_REVISION_2;
|
|
nsmagic = le32_to_cpu(nscap->magic_etc);
|
|
if (nsmagic & VFS_CAP_FLAGS_EFFECTIVE)
|
|
magic |= VFS_CAP_FLAGS_EFFECTIVE;
|
|
memcpy(&cap->data, &nscap->data, sizeof(__le32) * 2 * VFS_CAP_U32);
|
|
cap->magic_etc = cpu_to_le32(magic);
|
|
} else {
|
|
/* use unconverted v2 */
|
|
tmpbuf = NULL;
|
|
}
|
|
*buffer = cap;
|
|
}
|
|
out_free:
|
|
kfree(tmpbuf);
|
|
return size;
|
|
}
|
|
|
|
static kuid_t rootid_from_xattr(const void *value, size_t size,
|
|
struct user_namespace *task_ns)
|
|
{
|
|
const struct vfs_ns_cap_data *nscap = value;
|
|
uid_t rootid = 0;
|
|
|
|
if (size == XATTR_CAPS_SZ_3)
|
|
rootid = le32_to_cpu(nscap->rootid);
|
|
|
|
return make_kuid(task_ns, rootid);
|
|
}
|
|
|
|
static bool validheader(size_t size, const struct vfs_cap_data *cap)
|
|
{
|
|
return is_v2header(size, cap) || is_v3header(size, cap);
|
|
}
|
|
|
|
/*
|
|
* User requested a write of security.capability. If needed, update the
|
|
* xattr to change from v2 to v3, or to fixup the v3 rootid.
|
|
*
|
|
* If all is ok, we return the new size, on error return < 0.
|
|
*/
|
|
int cap_convert_nscap(struct dentry *dentry, void **ivalue, size_t size)
|
|
{
|
|
struct vfs_ns_cap_data *nscap;
|
|
uid_t nsrootid;
|
|
const struct vfs_cap_data *cap = *ivalue;
|
|
__u32 magic, nsmagic;
|
|
struct inode *inode = d_backing_inode(dentry);
|
|
struct user_namespace *task_ns = current_user_ns(),
|
|
*fs_ns = inode->i_sb->s_user_ns,
|
|
*ancestor;
|
|
kuid_t rootid;
|
|
size_t newsize;
|
|
|
|
if (!*ivalue)
|
|
return -EINVAL;
|
|
if (!validheader(size, cap))
|
|
return -EINVAL;
|
|
if (!capable_wrt_inode_uidgid(inode, CAP_SETFCAP))
|
|
return -EPERM;
|
|
if (size == XATTR_CAPS_SZ_2)
|
|
if (ns_capable(inode->i_sb->s_user_ns, CAP_SETFCAP))
|
|
/* user is privileged, just write the v2 */
|
|
return size;
|
|
|
|
rootid = rootid_from_xattr(*ivalue, size, task_ns);
|
|
if (!uid_valid(rootid))
|
|
return -EINVAL;
|
|
|
|
nsrootid = from_kuid(fs_ns, rootid);
|
|
if (nsrootid == -1)
|
|
return -EINVAL;
|
|
|
|
/*
|
|
* Do not allow allow adding a v3 filesystem capability xattr
|
|
* if the rootid field is ambiguous.
|
|
*/
|
|
for (ancestor = task_ns->parent; ancestor; ancestor = ancestor->parent) {
|
|
if (from_kuid(ancestor, rootid) == 0)
|
|
return -EINVAL;
|
|
}
|
|
|
|
newsize = sizeof(struct vfs_ns_cap_data);
|
|
nscap = kmalloc(newsize, GFP_ATOMIC);
|
|
if (!nscap)
|
|
return -ENOMEM;
|
|
nscap->rootid = cpu_to_le32(nsrootid);
|
|
nsmagic = VFS_CAP_REVISION_3;
|
|
magic = le32_to_cpu(cap->magic_etc);
|
|
if (magic & VFS_CAP_FLAGS_EFFECTIVE)
|
|
nsmagic |= VFS_CAP_FLAGS_EFFECTIVE;
|
|
nscap->magic_etc = cpu_to_le32(nsmagic);
|
|
memcpy(&nscap->data, &cap->data, sizeof(__le32) * 2 * VFS_CAP_U32);
|
|
|
|
kvfree(*ivalue);
|
|
*ivalue = nscap;
|
|
return newsize;
|
|
}
|
|
|
|
/*
|
|
* Calculate the new process capability sets from the capability sets attached
|
|
* to a file.
|
|
*/
|
|
static inline int bprm_caps_from_vfs_caps(struct cpu_vfs_cap_data *caps,
|
|
struct linux_binprm *bprm,
|
|
bool *effective,
|
|
bool *has_fcap)
|
|
{
|
|
struct cred *new = bprm->cred;
|
|
unsigned i;
|
|
int ret = 0;
|
|
|
|
if (caps->magic_etc & VFS_CAP_FLAGS_EFFECTIVE)
|
|
*effective = true;
|
|
|
|
if (caps->magic_etc & VFS_CAP_REVISION_MASK)
|
|
*has_fcap = true;
|
|
|
|
CAP_FOR_EACH_U32(i) {
|
|
__u32 permitted = caps->permitted.cap[i];
|
|
__u32 inheritable = caps->inheritable.cap[i];
|
|
|
|
/*
|
|
* pP' = (X & fP) | (pI & fI)
|
|
* The addition of pA' is handled later.
|
|
*/
|
|
new->cap_permitted.cap[i] =
|
|
(new->cap_bset.cap[i] & permitted) |
|
|
(new->cap_inheritable.cap[i] & inheritable);
|
|
|
|
if (permitted & ~new->cap_permitted.cap[i])
|
|
/* insufficient to execute correctly */
|
|
ret = -EPERM;
|
|
}
|
|
|
|
/*
|
|
* For legacy apps, with no internal support for recognizing they
|
|
* do not have enough capabilities, we return an error if they are
|
|
* missing some "forced" (aka file-permitted) capabilities.
|
|
*/
|
|
return *effective ? ret : 0;
|
|
}
|
|
|
|
/*
|
|
* Extract the on-exec-apply capability sets for an executable file.
|
|
*/
|
|
int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps)
|
|
{
|
|
struct inode *inode = d_backing_inode(dentry);
|
|
__u32 magic_etc;
|
|
unsigned tocopy, i;
|
|
int size;
|
|
struct vfs_ns_cap_data data, *nscaps = &data;
|
|
struct vfs_cap_data *caps = (struct vfs_cap_data *) &data;
|
|
kuid_t rootkuid;
|
|
struct user_namespace *fs_ns;
|
|
|
|
memset(cpu_caps, 0, sizeof(struct cpu_vfs_cap_data));
|
|
|
|
if (!inode)
|
|
return -ENODATA;
|
|
|
|
fs_ns = inode->i_sb->s_user_ns;
|
|
size = __vfs_getxattr((struct dentry *)dentry, inode,
|
|
XATTR_NAME_CAPS, &data, XATTR_CAPS_SZ,
|
|
XATTR_NOSECURITY);
|
|
if (size == -ENODATA || size == -EOPNOTSUPP)
|
|
/* no data, that's ok */
|
|
return -ENODATA;
|
|
|
|
if (size < 0)
|
|
return size;
|
|
|
|
if (size < sizeof(magic_etc))
|
|
return -EINVAL;
|
|
|
|
cpu_caps->magic_etc = magic_etc = le32_to_cpu(caps->magic_etc);
|
|
|
|
rootkuid = make_kuid(fs_ns, 0);
|
|
switch (magic_etc & VFS_CAP_REVISION_MASK) {
|
|
case VFS_CAP_REVISION_1:
|
|
if (size != XATTR_CAPS_SZ_1)
|
|
return -EINVAL;
|
|
tocopy = VFS_CAP_U32_1;
|
|
break;
|
|
case VFS_CAP_REVISION_2:
|
|
if (size != XATTR_CAPS_SZ_2)
|
|
return -EINVAL;
|
|
tocopy = VFS_CAP_U32_2;
|
|
break;
|
|
case VFS_CAP_REVISION_3:
|
|
if (size != XATTR_CAPS_SZ_3)
|
|
return -EINVAL;
|
|
tocopy = VFS_CAP_U32_3;
|
|
rootkuid = make_kuid(fs_ns, le32_to_cpu(nscaps->rootid));
|
|
break;
|
|
|
|
default:
|
|
return -EINVAL;
|
|
}
|
|
/* Limit the caps to the mounter of the filesystem
|
|
* or the more limited uid specified in the xattr.
|
|
*/
|
|
if (!rootid_owns_currentns(rootkuid))
|
|
return -ENODATA;
|
|
|
|
CAP_FOR_EACH_U32(i) {
|
|
if (i >= tocopy)
|
|
break;
|
|
cpu_caps->permitted.cap[i] = le32_to_cpu(caps->data[i].permitted);
|
|
cpu_caps->inheritable.cap[i] = le32_to_cpu(caps->data[i].inheritable);
|
|
}
|
|
|
|
cpu_caps->permitted.cap[CAP_LAST_U32] &= CAP_LAST_U32_VALID_MASK;
|
|
cpu_caps->inheritable.cap[CAP_LAST_U32] &= CAP_LAST_U32_VALID_MASK;
|
|
|
|
cpu_caps->rootid = rootkuid;
|
|
|
|
return 0;
|
|
}
|
|
|
|
/*
|
|
* Attempt to get the on-exec apply capability sets for an executable file from
|
|
* its xattrs and, if present, apply them to the proposed credentials being
|
|
* constructed by execve().
|
|
*/
|
|
static int get_file_caps(struct linux_binprm *bprm, struct file *file,
|
|
bool *effective, bool *has_fcap)
|
|
{
|
|
int rc = 0;
|
|
struct cpu_vfs_cap_data vcaps;
|
|
|
|
cap_clear(bprm->cred->cap_permitted);
|
|
|
|
if (!file_caps_enabled)
|
|
return 0;
|
|
|
|
if (!mnt_may_suid(file->f_path.mnt))
|
|
return 0;
|
|
|
|
/*
|
|
* This check is redundant with mnt_may_suid() but is kept to make
|
|
* explicit that capability bits are limited to s_user_ns and its
|
|
* descendants.
|
|
*/
|
|
if (!current_in_userns(file->f_path.mnt->mnt_sb->s_user_ns))
|
|
return 0;
|
|
|
|
rc = get_vfs_caps_from_disk(file->f_path.dentry, &vcaps);
|
|
if (rc < 0) {
|
|
if (rc == -EINVAL)
|
|
printk(KERN_NOTICE "Invalid argument reading file caps for %s\n",
|
|
bprm->filename);
|
|
else if (rc == -ENODATA)
|
|
rc = 0;
|
|
goto out;
|
|
}
|
|
|
|
rc = bprm_caps_from_vfs_caps(&vcaps, bprm, effective, has_fcap);
|
|
|
|
out:
|
|
if (rc)
|
|
cap_clear(bprm->cred->cap_permitted);
|
|
|
|
return rc;
|
|
}
|
|
|
|
static inline bool root_privileged(void) { return !issecure(SECURE_NOROOT); }
|
|
|
|
static inline bool __is_real(kuid_t uid, struct cred *cred)
|
|
{ return uid_eq(cred->uid, uid); }
|
|
|
|
static inline bool __is_eff(kuid_t uid, struct cred *cred)
|
|
{ return uid_eq(cred->euid, uid); }
|
|
|
|
static inline bool __is_suid(kuid_t uid, struct cred *cred)
|
|
{ return !__is_real(uid, cred) && __is_eff(uid, cred); }
|
|
|
|
/*
|
|
* handle_privileged_root - Handle case of privileged root
|
|
* @bprm: The execution parameters, including the proposed creds
|
|
* @has_fcap: Are any file capabilities set?
|
|
* @effective: Do we have effective root privilege?
|
|
* @root_uid: This namespace' root UID WRT initial USER namespace
|
|
*
|
|
* Handle the case where root is privileged and hasn't been neutered by
|
|
* SECURE_NOROOT. If file capabilities are set, they won't be combined with
|
|
* set UID root and nothing is changed. If we are root, cap_permitted is
|
|
* updated. If we have become set UID root, the effective bit is set.
|
|
*/
|
|
static void handle_privileged_root(struct linux_binprm *bprm, bool has_fcap,
|
|
bool *effective, kuid_t root_uid)
|
|
{
|
|
const struct cred *old = current_cred();
|
|
struct cred *new = bprm->cred;
|
|
|
|
if (!root_privileged())
|
|
return;
|
|
/*
|
|
* If the legacy file capability is set, then don't set privs
|
|
* for a setuid root binary run by a non-root user. Do set it
|
|
* for a root user just to cause least surprise to an admin.
|
|
*/
|
|
if (has_fcap && __is_suid(root_uid, new)) {
|
|
warn_setuid_and_fcaps_mixed(bprm->filename);
|
|
return;
|
|
}
|
|
/*
|
|
* To support inheritance of root-permissions and suid-root
|
|
* executables under compatibility mode, we override the
|
|
* capability sets for the file.
|
|
*/
|
|
if (__is_eff(root_uid, new) || __is_real(root_uid, new)) {
|
|
/* pP' = (cap_bset & ~0) | (pI & ~0) */
|
|
new->cap_permitted = cap_combine(old->cap_bset,
|
|
old->cap_inheritable);
|
|
}
|
|
/*
|
|
* If only the real uid is 0, we do not set the effective bit.
|
|
*/
|
|
if (__is_eff(root_uid, new))
|
|
*effective = true;
|
|
}
|
|
|
|
#define __cap_gained(field, target, source) \
|
|
!cap_issubset(target->cap_##field, source->cap_##field)
|
|
#define __cap_grew(target, source, cred) \
|
|
!cap_issubset(cred->cap_##target, cred->cap_##source)
|
|
#define __cap_full(field, cred) \
|
|
cap_issubset(CAP_FULL_SET, cred->cap_##field)
|
|
|
|
static inline bool __is_setuid(struct cred *new, const struct cred *old)
|
|
{ return !uid_eq(new->euid, old->uid); }
|
|
|
|
static inline bool __is_setgid(struct cred *new, const struct cred *old)
|
|
{ return !gid_eq(new->egid, old->gid); }
|
|
|
|
/*
|
|
* 1) Audit candidate if current->cap_effective is set
|
|
*
|
|
* We do not bother to audit if 3 things are true:
|
|
* 1) cap_effective has all caps
|
|
* 2) we became root *OR* are were already root
|
|
* 3) root is supposed to have all caps (SECURE_NOROOT)
|
|
* Since this is just a normal root execing a process.
|
|
*
|
|
* Number 1 above might fail if you don't have a full bset, but I think
|
|
* that is interesting information to audit.
|
|
*
|
|
* A number of other conditions require logging:
|
|
* 2) something prevented setuid root getting all caps
|
|
* 3) non-setuid root gets fcaps
|
|
* 4) non-setuid root gets ambient
|
|
*/
|
|
static inline bool nonroot_raised_pE(struct cred *new, const struct cred *old,
|
|
kuid_t root, bool has_fcap)
|
|
{
|
|
bool ret = false;
|
|
|
|
if ((__cap_grew(effective, ambient, new) &&
|
|
!(__cap_full(effective, new) &&
|
|
(__is_eff(root, new) || __is_real(root, new)) &&
|
|
root_privileged())) ||
|
|
(root_privileged() &&
|
|
__is_suid(root, new) &&
|
|
!__cap_full(effective, new)) ||
|
|
(!__is_setuid(new, old) &&
|
|
((has_fcap &&
|
|
__cap_gained(permitted, new, old)) ||
|
|
__cap_gained(ambient, new, old))))
|
|
|
|
ret = true;
|
|
|
|
return ret;
|
|
}
|
|
|
|
/**
|
|
* cap_bprm_creds_from_file - Set up the proposed credentials for execve().
|
|
* @bprm: The execution parameters, including the proposed creds
|
|
* @file: The file to pull the credentials from
|
|
*
|
|
* Set up the proposed credentials for a new execution context being
|
|
* constructed by execve(). The proposed creds in @bprm->cred is altered,
|
|
* which won't take effect immediately. Returns 0 if successful, -ve on error.
|
|
*/
|
|
int cap_bprm_creds_from_file(struct linux_binprm *bprm, struct file *file)
|
|
{
|
|
/* Process setpcap binaries and capabilities for uid 0 */
|
|
const struct cred *old = current_cred();
|
|
struct cred *new = bprm->cred;
|
|
bool effective = false, has_fcap = false, is_setid;
|
|
int ret;
|
|
kuid_t root_uid;
|
|
|
|
if (WARN_ON(!cap_ambient_invariant_ok(old)))
|
|
return -EPERM;
|
|
|
|
ret = get_file_caps(bprm, file, &effective, &has_fcap);
|
|
if (ret < 0)
|
|
return ret;
|
|
|
|
root_uid = make_kuid(new->user_ns, 0);
|
|
|
|
handle_privileged_root(bprm, has_fcap, &effective, root_uid);
|
|
|
|
/* if we have fs caps, clear dangerous personality flags */
|
|
if (__cap_gained(permitted, new, old))
|
|
bprm->per_clear |= PER_CLEAR_ON_SETID;
|
|
|
|
/* Don't let someone trace a set[ug]id/setpcap binary with the revised
|
|
* credentials unless they have the appropriate permit.
|
|
*
|
|
* In addition, if NO_NEW_PRIVS, then ensure we get no new privs.
|
|
*/
|
|
is_setid = __is_setuid(new, old) || __is_setgid(new, old);
|
|
|
|
if ((is_setid || __cap_gained(permitted, new, old)) &&
|
|
((bprm->unsafe & ~LSM_UNSAFE_PTRACE) ||
|
|
!ptracer_capable(current, new->user_ns))) {
|
|
/* downgrade; they get no more than they had, and maybe less */
|
|
if (!ns_capable(new->user_ns, CAP_SETUID) ||
|
|
(bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS)) {
|
|
new->euid = new->uid;
|
|
new->egid = new->gid;
|
|
}
|
|
new->cap_permitted = cap_intersect(new->cap_permitted,
|
|
old->cap_permitted);
|
|
}
|
|
|
|
new->suid = new->fsuid = new->euid;
|
|
new->sgid = new->fsgid = new->egid;
|
|
|
|
/* File caps or setid cancels ambient. */
|
|
if (has_fcap || is_setid)
|
|
cap_clear(new->cap_ambient);
|
|
|
|
/*
|
|
* Now that we've computed pA', update pP' to give:
|
|
* pP' = (X & fP) | (pI & fI) | pA'
|
|
*/
|
|
new->cap_permitted = cap_combine(new->cap_permitted, new->cap_ambient);
|
|
|
|
/*
|
|
* Set pE' = (fE ? pP' : pA'). Because pA' is zero if fE is set,
|
|
* this is the same as pE' = (fE ? pP' : 0) | pA'.
|
|
*/
|
|
if (effective)
|
|
new->cap_effective = new->cap_permitted;
|
|
else
|
|
new->cap_effective = new->cap_ambient;
|
|
|
|
if (WARN_ON(!cap_ambient_invariant_ok(new)))
|
|
return -EPERM;
|
|
|
|
if (nonroot_raised_pE(new, old, root_uid, has_fcap)) {
|
|
ret = audit_log_bprm_fcaps(bprm, new, old);
|
|
if (ret < 0)
|
|
return ret;
|
|
}
|
|
|
|
new->securebits &= ~issecure_mask(SECURE_KEEP_CAPS);
|
|
|
|
if (WARN_ON(!cap_ambient_invariant_ok(new)))
|
|
return -EPERM;
|
|
|
|
/* Check for privilege-elevated exec. */
|
|
if (is_setid ||
|
|
(!__is_real(root_uid, new) &&
|
|
(effective ||
|
|
__cap_grew(permitted, ambient, new))))
|
|
bprm->secureexec = 1;
|
|
|
|
return 0;
|
|
}
|
|
|
|
/**
|
|
* cap_inode_setxattr - Determine whether an xattr may be altered
|
|
* @dentry: The inode/dentry being altered
|
|
* @name: The name of the xattr to be changed
|
|
* @value: The value that the xattr will be changed to
|
|
* @size: The size of value
|
|
* @flags: The replacement flag
|
|
*
|
|
* Determine whether an xattr may be altered or set on an inode, returning 0 if
|
|
* permission is granted, -ve if denied.
|
|
*
|
|
* This is used to make sure security xattrs don't get updated or set by those
|
|
* who aren't privileged to do so.
|
|
*/
|
|
int cap_inode_setxattr(struct dentry *dentry, const char *name,
|
|
const void *value, size_t size, int flags)
|
|
{
|
|
struct user_namespace *user_ns = dentry->d_sb->s_user_ns;
|
|
|
|
/* Ignore non-security xattrs */
|
|
if (strncmp(name, XATTR_SECURITY_PREFIX,
|
|
XATTR_SECURITY_PREFIX_LEN) != 0)
|
|
return 0;
|
|
|
|
/*
|
|
* For XATTR_NAME_CAPS the check will be done in
|
|
* cap_convert_nscap(), called by setxattr()
|
|
*/
|
|
if (strcmp(name, XATTR_NAME_CAPS) == 0)
|
|
return 0;
|
|
|
|
if (!ns_capable(user_ns, CAP_SYS_ADMIN))
|
|
return -EPERM;
|
|
return 0;
|
|
}
|
|
|
|
/**
|
|
* cap_inode_removexattr - Determine whether an xattr may be removed
|
|
* @dentry: The inode/dentry being altered
|
|
* @name: The name of the xattr to be changed
|
|
*
|
|
* Determine whether an xattr may be removed from an inode, returning 0 if
|
|
* permission is granted, -ve if denied.
|
|
*
|
|
* This is used to make sure security xattrs don't get removed by those who
|
|
* aren't privileged to remove them.
|
|
*/
|
|
int cap_inode_removexattr(struct dentry *dentry, const char *name)
|
|
{
|
|
struct user_namespace *user_ns = dentry->d_sb->s_user_ns;
|
|
|
|
/* Ignore non-security xattrs */
|
|
if (strncmp(name, XATTR_SECURITY_PREFIX,
|
|
XATTR_SECURITY_PREFIX_LEN) != 0)
|
|
return 0;
|
|
|
|
if (strcmp(name, XATTR_NAME_CAPS) == 0) {
|
|
/* security.capability gets namespaced */
|
|
struct inode *inode = d_backing_inode(dentry);
|
|
if (!inode)
|
|
return -EINVAL;
|
|
if (!capable_wrt_inode_uidgid(inode, CAP_SETFCAP))
|
|
return -EPERM;
|
|
return 0;
|
|
}
|
|
|
|
if (!ns_capable(user_ns, CAP_SYS_ADMIN))
|
|
return -EPERM;
|
|
return 0;
|
|
}
|
|
|
|
/*
|
|
* cap_emulate_setxuid() fixes the effective / permitted capabilities of
|
|
* a process after a call to setuid, setreuid, or setresuid.
|
|
*
|
|
* 1) When set*uiding _from_ one of {r,e,s}uid == 0 _to_ all of
|
|
* {r,e,s}uid != 0, the permitted and effective capabilities are
|
|
* cleared.
|
|
*
|
|
* 2) When set*uiding _from_ euid == 0 _to_ euid != 0, the effective
|
|
* capabilities of the process are cleared.
|
|
*
|
|
* 3) When set*uiding _from_ euid != 0 _to_ euid == 0, the effective
|
|
* capabilities are set to the permitted capabilities.
|
|
*
|
|
* fsuid is handled elsewhere. fsuid == 0 and {r,e,s}uid!= 0 should
|
|
* never happen.
|
|
*
|
|
* -astor
|
|
*
|
|
* cevans - New behaviour, Oct '99
|
|
* A process may, via prctl(), elect to keep its capabilities when it
|
|
* calls setuid() and switches away from uid==0. Both permitted and
|
|
* effective sets will be retained.
|
|
* Without this change, it was impossible for a daemon to drop only some
|
|
* of its privilege. The call to setuid(!=0) would drop all privileges!
|
|
* Keeping uid 0 is not an option because uid 0 owns too many vital
|
|
* files..
|
|
* Thanks to Olaf Kirch and Peter Benie for spotting this.
|
|
*/
|
|
static inline void cap_emulate_setxuid(struct cred *new, const struct cred *old)
|
|
{
|
|
kuid_t root_uid = make_kuid(old->user_ns, 0);
|
|
|
|
if ((uid_eq(old->uid, root_uid) ||
|
|
uid_eq(old->euid, root_uid) ||
|
|
uid_eq(old->suid, root_uid)) &&
|
|
(!uid_eq(new->uid, root_uid) &&
|
|
!uid_eq(new->euid, root_uid) &&
|
|
!uid_eq(new->suid, root_uid))) {
|
|
if (!issecure(SECURE_KEEP_CAPS)) {
|
|
cap_clear(new->cap_permitted);
|
|
cap_clear(new->cap_effective);
|
|
}
|
|
|
|
/*
|
|
* Pre-ambient programs expect setresuid to nonroot followed
|
|
* by exec to drop capabilities. We should make sure that
|
|
* this remains the case.
|
|
*/
|
|
cap_clear(new->cap_ambient);
|
|
}
|
|
if (uid_eq(old->euid, root_uid) && !uid_eq(new->euid, root_uid))
|
|
cap_clear(new->cap_effective);
|
|
if (!uid_eq(old->euid, root_uid) && uid_eq(new->euid, root_uid))
|
|
new->cap_effective = new->cap_permitted;
|
|
}
|
|
|
|
/**
|
|
* cap_task_fix_setuid - Fix up the results of setuid() call
|
|
* @new: The proposed credentials
|
|
* @old: The current task's current credentials
|
|
* @flags: Indications of what has changed
|
|
*
|
|
* Fix up the results of setuid() call before the credential changes are
|
|
* actually applied, returning 0 to grant the changes, -ve to deny them.
|
|
*/
|
|
int cap_task_fix_setuid(struct cred *new, const struct cred *old, int flags)
|
|
{
|
|
switch (flags) {
|
|
case LSM_SETID_RE:
|
|
case LSM_SETID_ID:
|
|
case LSM_SETID_RES:
|
|
/* juggle the capabilities to follow [RES]UID changes unless
|
|
* otherwise suppressed */
|
|
if (!issecure(SECURE_NO_SETUID_FIXUP))
|
|
cap_emulate_setxuid(new, old);
|
|
break;
|
|
|
|
case LSM_SETID_FS:
|
|
/* juggle the capabilties to follow FSUID changes, unless
|
|
* otherwise suppressed
|
|
*
|
|
* FIXME - is fsuser used for all CAP_FS_MASK capabilities?
|
|
* if not, we might be a bit too harsh here.
|
|
*/
|
|
if (!issecure(SECURE_NO_SETUID_FIXUP)) {
|
|
kuid_t root_uid = make_kuid(old->user_ns, 0);
|
|
if (uid_eq(old->fsuid, root_uid) && !uid_eq(new->fsuid, root_uid))
|
|
new->cap_effective =
|
|
cap_drop_fs_set(new->cap_effective);
|
|
|
|
if (!uid_eq(old->fsuid, root_uid) && uid_eq(new->fsuid, root_uid))
|
|
new->cap_effective =
|
|
cap_raise_fs_set(new->cap_effective,
|
|
new->cap_permitted);
|
|
}
|
|
break;
|
|
|
|
default:
|
|
return -EINVAL;
|
|
}
|
|
|
|
return 0;
|
|
}
|
|
|
|
/*
|
|
* Rationale: code calling task_setscheduler, task_setioprio, and
|
|
* task_setnice, assumes that
|
|
* . if capable(cap_sys_nice), then those actions should be allowed
|
|
* . if not capable(cap_sys_nice), but acting on your own processes,
|
|
* then those actions should be allowed
|
|
* This is insufficient now since you can call code without suid, but
|
|
* yet with increased caps.
|
|
* So we check for increased caps on the target process.
|
|
*/
|
|
static int cap_safe_nice(struct task_struct *p)
|
|
{
|
|
int is_subset, ret = 0;
|
|
|
|
rcu_read_lock();
|
|
is_subset = cap_issubset(__task_cred(p)->cap_permitted,
|
|
current_cred()->cap_permitted);
|
|
if (!is_subset && !ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE))
|
|
ret = -EPERM;
|
|
rcu_read_unlock();
|
|
|
|
return ret;
|
|
}
|
|
|
|
/**
|
|
* cap_task_setscheduler - Detemine if scheduler policy change is permitted
|
|
* @p: The task to affect
|
|
*
|
|
* Detemine if the requested scheduler policy change is permitted for the
|
|
* specified task, returning 0 if permission is granted, -ve if denied.
|
|
*/
|
|
int cap_task_setscheduler(struct task_struct *p)
|
|
{
|
|
return cap_safe_nice(p);
|
|
}
|
|
|
|
/**
|
|
* cap_task_ioprio - Detemine if I/O priority change is permitted
|
|
* @p: The task to affect
|
|
* @ioprio: The I/O priority to set
|
|
*
|
|
* Detemine if the requested I/O priority change is permitted for the specified
|
|
* task, returning 0 if permission is granted, -ve if denied.
|
|
*/
|
|
int cap_task_setioprio(struct task_struct *p, int ioprio)
|
|
{
|
|
return cap_safe_nice(p);
|
|
}
|
|
|
|
/**
|
|
* cap_task_ioprio - Detemine if task priority change is permitted
|
|
* @p: The task to affect
|
|
* @nice: The nice value to set
|
|
*
|
|
* Detemine if the requested task priority change is permitted for the
|
|
* specified task, returning 0 if permission is granted, -ve if denied.
|
|
*/
|
|
int cap_task_setnice(struct task_struct *p, int nice)
|
|
{
|
|
return cap_safe_nice(p);
|
|
}
|
|
|
|
/*
|
|
* Implement PR_CAPBSET_DROP. Attempt to remove the specified capability from
|
|
* the current task's bounding set. Returns 0 on success, -ve on error.
|
|
*/
|
|
static int cap_prctl_drop(unsigned long cap)
|
|
{
|
|
struct cred *new;
|
|
|
|
if (!ns_capable(current_user_ns(), CAP_SETPCAP))
|
|
return -EPERM;
|
|
if (!cap_valid(cap))
|
|
return -EINVAL;
|
|
|
|
new = prepare_creds();
|
|
if (!new)
|
|
return -ENOMEM;
|
|
cap_lower(new->cap_bset, cap);
|
|
return commit_creds(new);
|
|
}
|
|
|
|
/**
|
|
* cap_task_prctl - Implement process control functions for this security module
|
|
* @option: The process control function requested
|
|
* @arg2, @arg3, @arg4, @arg5: The argument data for this function
|
|
*
|
|
* Allow process control functions (sys_prctl()) to alter capabilities; may
|
|
* also deny access to other functions not otherwise implemented here.
|
|
*
|
|
* Returns 0 or +ve on success, -ENOSYS if this function is not implemented
|
|
* here, other -ve on error. If -ENOSYS is returned, sys_prctl() and other LSM
|
|
* modules will consider performing the function.
|
|
*/
|
|
int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
|
|
unsigned long arg4, unsigned long arg5)
|
|
{
|
|
const struct cred *old = current_cred();
|
|
struct cred *new;
|
|
|
|
switch (option) {
|
|
case PR_CAPBSET_READ:
|
|
if (!cap_valid(arg2))
|
|
return -EINVAL;
|
|
return !!cap_raised(old->cap_bset, arg2);
|
|
|
|
case PR_CAPBSET_DROP:
|
|
return cap_prctl_drop(arg2);
|
|
|
|
/*
|
|
* The next four prctl's remain to assist with transitioning a
|
|
* system from legacy UID=0 based privilege (when filesystem
|
|
* capabilities are not in use) to a system using filesystem
|
|
* capabilities only - as the POSIX.1e draft intended.
|
|
*
|
|
* Note:
|
|
*
|
|
* PR_SET_SECUREBITS =
|
|
* issecure_mask(SECURE_KEEP_CAPS_LOCKED)
|
|
* | issecure_mask(SECURE_NOROOT)
|
|
* | issecure_mask(SECURE_NOROOT_LOCKED)
|
|
* | issecure_mask(SECURE_NO_SETUID_FIXUP)
|
|
* | issecure_mask(SECURE_NO_SETUID_FIXUP_LOCKED)
|
|
*
|
|
* will ensure that the current process and all of its
|
|
* children will be locked into a pure
|
|
* capability-based-privilege environment.
|
|
*/
|
|
case PR_SET_SECUREBITS:
|
|
if ((((old->securebits & SECURE_ALL_LOCKS) >> 1)
|
|
& (old->securebits ^ arg2)) /*[1]*/
|
|
|| ((old->securebits & SECURE_ALL_LOCKS & ~arg2)) /*[2]*/
|
|
|| (arg2 & ~(SECURE_ALL_LOCKS | SECURE_ALL_BITS)) /*[3]*/
|
|
|| (cap_capable(current_cred(),
|
|
current_cred()->user_ns,
|
|
CAP_SETPCAP,
|
|
CAP_OPT_NONE) != 0) /*[4]*/
|
|
/*
|
|
* [1] no changing of bits that are locked
|
|
* [2] no unlocking of locks
|
|
* [3] no setting of unsupported bits
|
|
* [4] doing anything requires privilege (go read about
|
|
* the "sendmail capabilities bug")
|
|
*/
|
|
)
|
|
/* cannot change a locked bit */
|
|
return -EPERM;
|
|
|
|
new = prepare_creds();
|
|
if (!new)
|
|
return -ENOMEM;
|
|
new->securebits = arg2;
|
|
return commit_creds(new);
|
|
|
|
case PR_GET_SECUREBITS:
|
|
return old->securebits;
|
|
|
|
case PR_GET_KEEPCAPS:
|
|
return !!issecure(SECURE_KEEP_CAPS);
|
|
|
|
case PR_SET_KEEPCAPS:
|
|
if (arg2 > 1) /* Note, we rely on arg2 being unsigned here */
|
|
return -EINVAL;
|
|
if (issecure(SECURE_KEEP_CAPS_LOCKED))
|
|
return -EPERM;
|
|
|
|
new = prepare_creds();
|
|
if (!new)
|
|
return -ENOMEM;
|
|
if (arg2)
|
|
new->securebits |= issecure_mask(SECURE_KEEP_CAPS);
|
|
else
|
|
new->securebits &= ~issecure_mask(SECURE_KEEP_CAPS);
|
|
return commit_creds(new);
|
|
|
|
case PR_CAP_AMBIENT:
|
|
if (arg2 == PR_CAP_AMBIENT_CLEAR_ALL) {
|
|
if (arg3 | arg4 | arg5)
|
|
return -EINVAL;
|
|
|
|
new = prepare_creds();
|
|
if (!new)
|
|
return -ENOMEM;
|
|
cap_clear(new->cap_ambient);
|
|
return commit_creds(new);
|
|
}
|
|
|
|
if (((!cap_valid(arg3)) | arg4 | arg5))
|
|
return -EINVAL;
|
|
|
|
if (arg2 == PR_CAP_AMBIENT_IS_SET) {
|
|
return !!cap_raised(current_cred()->cap_ambient, arg3);
|
|
} else if (arg2 != PR_CAP_AMBIENT_RAISE &&
|
|
arg2 != PR_CAP_AMBIENT_LOWER) {
|
|
return -EINVAL;
|
|
} else {
|
|
if (arg2 == PR_CAP_AMBIENT_RAISE &&
|
|
(!cap_raised(current_cred()->cap_permitted, arg3) ||
|
|
!cap_raised(current_cred()->cap_inheritable,
|
|
arg3) ||
|
|
issecure(SECURE_NO_CAP_AMBIENT_RAISE)))
|
|
return -EPERM;
|
|
|
|
new = prepare_creds();
|
|
if (!new)
|
|
return -ENOMEM;
|
|
if (arg2 == PR_CAP_AMBIENT_RAISE)
|
|
cap_raise(new->cap_ambient, arg3);
|
|
else
|
|
cap_lower(new->cap_ambient, arg3);
|
|
return commit_creds(new);
|
|
}
|
|
|
|
default:
|
|
/* No functionality available - continue with default */
|
|
return -ENOSYS;
|
|
}
|
|
}
|
|
|
|
/**
|
|
* cap_vm_enough_memory - Determine whether a new virtual mapping is permitted
|
|
* @mm: The VM space in which the new mapping is to be made
|
|
* @pages: The size of the mapping
|
|
*
|
|
* Determine whether the allocation of a new virtual mapping by the current
|
|
* task is permitted, returning 1 if permission is granted, 0 if not.
|
|
*/
|
|
int cap_vm_enough_memory(struct mm_struct *mm, long pages)
|
|
{
|
|
int cap_sys_admin = 0;
|
|
|
|
if (cap_capable(current_cred(), &init_user_ns,
|
|
CAP_SYS_ADMIN, CAP_OPT_NOAUDIT) == 0)
|
|
cap_sys_admin = 1;
|
|
|
|
return cap_sys_admin;
|
|
}
|
|
|
|
/*
|
|
* cap_mmap_addr - check if able to map given addr
|
|
* @addr: address attempting to be mapped
|
|
*
|
|
* If the process is attempting to map memory below dac_mmap_min_addr they need
|
|
* CAP_SYS_RAWIO. The other parameters to this function are unused by the
|
|
* capability security module. Returns 0 if this mapping should be allowed
|
|
* -EPERM if not.
|
|
*/
|
|
int cap_mmap_addr(unsigned long addr)
|
|
{
|
|
int ret = 0;
|
|
|
|
if (addr < dac_mmap_min_addr) {
|
|
ret = cap_capable(current_cred(), &init_user_ns, CAP_SYS_RAWIO,
|
|
CAP_OPT_NONE);
|
|
/* set PF_SUPERPRIV if it turns out we allow the low mmap */
|
|
if (ret == 0)
|
|
current->flags |= PF_SUPERPRIV;
|
|
}
|
|
return ret;
|
|
}
|
|
|
|
int cap_mmap_file(struct file *file, unsigned long reqprot,
|
|
unsigned long prot, unsigned long flags)
|
|
{
|
|
return 0;
|
|
}
|
|
|
|
#ifdef CONFIG_SECURITY
|
|
|
|
static struct security_hook_list capability_hooks[] __lsm_ro_after_init = {
|
|
LSM_HOOK_INIT(capable, cap_capable),
|
|
LSM_HOOK_INIT(settime, cap_settime),
|
|
LSM_HOOK_INIT(ptrace_access_check, cap_ptrace_access_check),
|
|
LSM_HOOK_INIT(ptrace_traceme, cap_ptrace_traceme),
|
|
LSM_HOOK_INIT(capget, cap_capget),
|
|
LSM_HOOK_INIT(capset, cap_capset),
|
|
LSM_HOOK_INIT(bprm_creds_from_file, cap_bprm_creds_from_file),
|
|
LSM_HOOK_INIT(inode_need_killpriv, cap_inode_need_killpriv),
|
|
LSM_HOOK_INIT(inode_killpriv, cap_inode_killpriv),
|
|
LSM_HOOK_INIT(inode_getsecurity, cap_inode_getsecurity),
|
|
LSM_HOOK_INIT(mmap_addr, cap_mmap_addr),
|
|
LSM_HOOK_INIT(mmap_file, cap_mmap_file),
|
|
LSM_HOOK_INIT(task_fix_setuid, cap_task_fix_setuid),
|
|
LSM_HOOK_INIT(task_prctl, cap_task_prctl),
|
|
LSM_HOOK_INIT(task_setscheduler, cap_task_setscheduler),
|
|
LSM_HOOK_INIT(task_setioprio, cap_task_setioprio),
|
|
LSM_HOOK_INIT(task_setnice, cap_task_setnice),
|
|
LSM_HOOK_INIT(vm_enough_memory, cap_vm_enough_memory),
|
|
};
|
|
|
|
static int __init capability_init(void)
|
|
{
|
|
security_add_hooks(capability_hooks, ARRAY_SIZE(capability_hooks),
|
|
"capability");
|
|
return 0;
|
|
}
|
|
|
|
DEFINE_LSM(capability) = {
|
|
.name = "capability",
|
|
.order = LSM_ORDER_FIRST,
|
|
.init = capability_init,
|
|
};
|
|
|
|
#endif /* CONFIG_SECURITY */
|