Commit Graph

33 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
49716e996a Merge 5.4.49 into android-5.4
Changes in 5.4.49
	power: supply: bq24257_charger: Replace depends on REGMAP_I2C with select
	clk: sunxi: Fix incorrect usage of round_down()
	ASoC: tegra: tegra_wm8903: Support nvidia, headset property
	i2c: piix4: Detect secondary SMBus controller on AMD AM4 chipsets
	ASoC: SOF: imx8: Fix randbuild error
	iio: pressure: bmp280: Tolerate IRQ before registering
	remoteproc: Fix IDR initialisation in rproc_alloc()
	clk: qcom: msm8916: Fix the address location of pll->config_reg
	ASoC: fsl_esai: Disable exception interrupt before scheduling tasklet
	backlight: lp855x: Ensure regulators are disabled on probe failure
	ARM: dts: renesas: Fix IOMMU device node names
	ASoC: davinci-mcasp: Fix dma_chan refcnt leak when getting dma type
	ARM: integrator: Add some Kconfig selections
	ARM: dts: stm32: Add missing ethernet PHY reset on AV96
	scsi: core: free sgtables in case command setup fails
	scsi: qedi: Check for buffer overflow in qedi_set_path()
	arm64: dts: meson: fixup SCP sram nodes
	ALSA: hda/realtek - Introduce polarity for micmute LED GPIO
	ALSA: isa/wavefront: prevent out of bounds write in ioctl
	PCI: Allow pci_resize_resource() for devices on root bus
	scsi: qla2xxx: Fix issue with adapter's stopping state
	Input: edt-ft5x06 - fix get_default register write access
	powerpc/kasan: Fix stack overflow by increasing THREAD_SHIFT
	rtc: mc13xxx: fix a double-unlock issue
	iio: bmp280: fix compensation of humidity
	f2fs: report delalloc reserve as non-free in statfs for project quota
	i2c: pxa: clear all master action bits in i2c_pxa_stop_message()
	remoteproc: qcom_q6v5_mss: map/unmap mpss segments before/after use
	clk: samsung: Mark top ISP and CAM clocks on Exynos542x as critical
	usblp: poison URBs upon disconnect
	serial: 8250: Fix max baud limit in generic 8250 port
	misc: fastrpc: Fix an incomplete memory release in fastrpc_rpmsg_probe()
	misc: fastrpc: fix potential fastrpc_invoke_ctx leak
	dm mpath: switch paths in dm_blk_ioctl() code path
	arm64: dts: armada-3720-turris-mox: forbid SDR104 on SDIO for FCC purposes
	arm64: dts: armada-3720-turris-mox: fix SFP binding
	arm64: dts: juno: Fix GIC child nodes
	pinctrl: ocelot: Fix GPIO interrupt decoding on Jaguar2
	clk: renesas: cpg-mssr: Fix STBCR suspend/resume handling
	ASoC: SOF: Do nothing when DSP PM callbacks are not set
	arm64: dts: fvp: Fix GIC child nodes
	PCI: aardvark: Don't blindly enable ASPM L0s and don't write to read-only register
	ps3disk: use the default segment boundary
	arm64: dts: fvp/juno: Fix node address fields
	vfio/pci: fix memory leaks in alloc_perm_bits()
	coresight: tmc: Fix TMC mode read in tmc_read_prepare_etb()
	RDMA/mlx5: Add init2init as a modify command
	scsi: hisi_sas: Do not reset phy timer to wait for stray phy up
	PCI: pci-bridge-emul: Fix PCIe bit conflicts
	m68k/PCI: Fix a memory leak in an error handling path
	gpio: dwapb: Call acpi_gpiochip_free_interrupts() on GPIO chip de-registration
	usb: gadget: core: sync interrupt before unbind the udc
	powerpc/ptdump: Add _PAGE_COHERENT flag
	mfd: wm8994: Fix driver operation if loaded as modules
	scsi: cxgb3i: Fix some leaks in init_act_open()
	clk: zynqmp: fix memory leak in zynqmp_register_clocks
	scsi: lpfc: Fix lpfc_nodelist leak when processing unsolicited event
	scsi: vhost: Notify TCM about the maximum sg entries supported per command
	clk: clk-flexgen: fix clock-critical handling
	IB/mlx5: Fix DEVX support for MLX5_CMD_OP_INIT2INIT_QP command
	powerpc/perf/hv-24x7: Fix inconsistent output values incase multiple hv-24x7 events run
	nfsd: Fix svc_xprt refcnt leak when setup callback client failed
	PCI: vmd: Filter resource type bits from shadow register
	RDMA/core: Fix several reference count leaks.
	cifs: set up next DFS target before generic_ip_connect()
	ASoC: qcom: q6asm-dai: kCFI fix
	powerpc/crashkernel: Take "mem=" option into account
	pwm: img: Call pm_runtime_put() in pm_runtime_get_sync() failed case
	sparc32: mm: Don't try to free page-table pages if ctor() fails
	yam: fix possible memory leak in yam_init_driver
	NTB: ntb_pingpong: Choose doorbells based on port number
	NTB: Fix the default port and peer numbers for legacy drivers
	mksysmap: Fix the mismatch of '.L' symbols in System.map
	apparmor: fix introspection of of task mode for unconfined tasks
	net: dsa: lantiq_gswip: fix and improve the unsupported interface error
	apparmor: check/put label on apparmor_sk_clone_security()
	f2fs: handle readonly filesystem in f2fs_ioc_shutdown()
	ASoC: meson: add missing free_irq() in error path
	bpf, sockhash: Fix memory leak when unlinking sockets in sock_hash_free
	scsi: sr: Fix sr_probe() missing deallocate of device minor
	scsi: ibmvscsi: Don't send host info in adapter info MAD after LPM
	apparmor: fix nnp subset test for unconfined
	x86/purgatory: Disable various profiling and sanitizing options
	staging: greybus: fix a missing-check bug in gb_lights_light_config()
	arm64: dts: mt8173: fix unit name warnings
	scsi: qedi: Do not flush offload work if ARP not resolved
	arm64: dts: qcom: msm8916: remove unit name for thermal trip points
	ARM: dts: sun8i-h2-plus-bananapi-m2-zero: Fix led polarity
	RDMA/mlx5: Fix udata response upon SRQ creation
	gpio: dwapb: Append MODULE_ALIAS for platform driver
	scsi: qedf: Fix crash when MFW calls for protocol stats while function is still probing
	pinctrl: rza1: Fix wrong array assignment of rza1l_swio_entries
	virtiofs: schedule blocking async replies in separate worker
	arm64: dts: qcom: fix pm8150 gpio interrupts
	firmware: qcom_scm: fix bogous abuse of dma-direct internals
	staging: gasket: Fix mapping refcnt leak when put attribute fails
	staging: gasket: Fix mapping refcnt leak when register/store fails
	ALSA: usb-audio: Improve frames size computation
	ALSA: usb-audio: Fix racy list management in output queue
	s390/qdio: put thinint indicator after early error
	tty: hvc: Fix data abort due to race in hvc_open
	slimbus: ngd: get drvdata from correct device
	clk: meson: meson8b: Fix the first parent of vid_pll_in_sel
	clk: meson: meson8b: Fix the polarity of the RESET_N lines
	clk: meson: meson8b: Fix the vclk_div{1, 2, 4, 6, 12}_en gate bits
	gpio: pca953x: fix handling of automatic address incrementing
	thermal/drivers/ti-soc-thermal: Avoid dereferencing ERR_PTR
	clk: meson: meson8b: Don't rely on u-boot to init all GP_PLL registers
	ASoC: max98373: reorder max98373_reset() in resume
	soundwire: slave: don't init debugfs on device registration error
	HID: intel-ish-hid: avoid bogus uninitialized-variable warning
	usb: dwc3: gadget: Properly handle ClearFeature(halt)
	usb: dwc3: gadget: Properly handle failed kick_transfer
	staging: wilc1000: Increase the size of wid_list array
	staging: sm750fb: add missing case while setting FB_VISUAL
	PCI: v3-semi: Fix a memory leak in v3_pci_probe() error handling paths
	i2c: pxa: fix i2c_pxa_scream_blue_murder() debug output
	serial: amba-pl011: Make sure we initialize the port.lock spinlock
	drivers: base: Fix NULL pointer exception in __platform_driver_probe() if a driver developer is foolish
	PCI: rcar: Fix incorrect programming of OB windows
	PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges
	scsi: qla2xxx: Fix warning after FC target reset
	ALSA: firewire-lib: fix invalid assignment to union data for directional parameter
	power: supply: lp8788: Fix an error handling path in 'lp8788_charger_probe()'
	power: supply: smb347-charger: IRQSTAT_D is volatile
	ASoC: SOF: core: fix error return code in sof_probe_continue()
	arm64: dts: msm8996: Fix CSI IRQ types
	scsi: target: loopback: Fix READ with data and sensebytes
	scsi: mpt3sas: Fix double free warnings
	SoC: rsnd: add interrupt support for SSI BUSIF buffer
	ASoC: ux500: mop500: Fix some refcounted resources issues
	ASoC: ti: omap-mcbsp: Fix an error handling path in 'asoc_mcbsp_probe()'
	pinctrl: rockchip: fix memleak in rockchip_dt_node_to_map
	dlm: remove BUG() before panic()
	USB: ohci-sm501: fix error return code in ohci_hcd_sm501_drv_probe()
	clk: ti: composite: fix memory leak
	PCI: Fix pci_register_host_bridge() device_register() error handling
	powerpc/64: Don't initialise init_task->thread.regs
	tty: n_gsm: Fix SOF skipping
	tty: n_gsm: Fix waking up upper tty layer when room available
	ALSA: usb-audio: Add duplex sound support for USB devices using implicit feedback
	HID: Add quirks for Trust Panora Graphic Tablet
	PCI/PM: Assume ports without DLL Link Active train links in 100 ms
	habanalabs: increase timeout during reset
	ipmi: use vzalloc instead of kmalloc for user creation
	powerpc/64s/exception: Fix machine check no-loss idle wakeup
	powerpc/pseries/ras: Fix FWNMI_VALID off by one
	drivers: phy: sr-usb: do not use internal fsm for USB2 phy init
	powerpc/ps3: Fix kexec shutdown hang
	vfio-pci: Mask cap zero
	usb/ohci-platform: Fix a warning when hibernating
	drm/msm/mdp5: Fix mdp5_init error path for failed mdp5_kms allocation
	ASoC: Intel: bytcr_rt5640: Add quirk for Toshiba Encore WT8-A tablet
	USB: host: ehci-mxc: Add error handling in ehci_mxc_drv_probe()
	tty: n_gsm: Fix bogus i++ in gsm_data_kick
	fpga: dfl: afu: Corrected error handling levels
	clk: samsung: exynos5433: Add IGNORE_UNUSED flag to sclk_i2s1
	RDMA/hns: Bugfix for querying qkey
	RDMA/hns: Fix cmdq parameter of querying pf timer resource
	scsi: target: tcmu: Userspace must not complete queued commands
	firmware: imx: scu: Fix possible memory leak in imx_scu_probe()
	fuse: fix copy_file_range cache issues
	fuse: copy_file_range should truncate cache
	arm64: tegra: Fix ethernet phy-mode for Jetson Xavier
	arm64: tegra: Fix flag for 64-bit resources in 'ranges' property
	powerpc/64s/pgtable: fix an undefined behaviour
	dm zoned: return NULL if dmz_get_zone_for_reclaim() fails to find a zone
	PCI/PTM: Inherit Switch Downstream Port PTM settings from Upstream Port
	PCI: dwc: Fix inner MSI IRQ domain registration
	PCI: amlogic: meson: Don't use FAST_LINK_MODE to set up link
	IB/cma: Fix ports memory leak in cma_configfs
	watchdog: da9062: No need to ping manually before setting timeout
	usb: dwc2: gadget: move gadget resume after the core is in L0 state
	USB: gadget: udc: s3c2410_udc: Remove pointless NULL check in s3c2410_udc_nuke
	usb: gadget: lpc32xx_udc: don't dereference ep pointer before null check
	usb: gadget: fix potential double-free in m66592_probe.
	usb: gadget: Fix issue with config_ep_by_speed function
	scripts: headers_install: Exit with error on config leak
	RDMA/iw_cxgb4: cleanup device debugfs entries on ULD remove
	x86/apic: Make TSC deadline timer detection message visible
	mfd: stmfx: Reset chip on resume as supply was disabled
	mfd: stmfx: Fix stmfx_irq_init error path
	mfd: stmfx: Disable IRQ in suspend to avoid spurious interrupt
	powerpc/32s: Don't warn when mapping RO data ROX.
	ASoC: fix incomplete error-handling in img_i2s_in_probe.
	scsi: target: tcmu: Fix a use after free in tcmu_check_expired_queue_cmd()
	clk: bcm2835: Fix return type of bcm2835_register_gate
	scsi: ufs-qcom: Fix scheduling while atomic issue
	KVM: PPC: Book3S HV: Ignore kmemleak false positives
	KVM: PPC: Book3S: Fix some RCU-list locks
	clk: sprd: return correct type of value for _sprd_pll_recalc_rate
	clk: ast2600: Fix AHB clock divider for A1
	misc: xilinx-sdfec: improve get_user_pages_fast() error handling
	/dev/mem: Revoke mappings when a driver claims the region
	net: sunrpc: Fix off-by-one issues in 'rpc_ntop6'
	NFSv4.1 fix rpc_call_done assignment for BIND_CONN_TO_SESSION
	of: Fix a refcounting bug in __of_attach_node_sysfs()
	input: i8042 - Remove special PowerPC handling
	powerpc/4xx: Don't unmap NULL mbase
	extcon: adc-jack: Fix an error handling path in 'adc_jack_probe()'
	ASoC: fsl_asrc_dma: Fix dma_chan leak when config DMA channel failed
	vfio/mdev: Fix reference count leak in add_mdev_supported_type
	rtc: rv3028: Add missed check for devm_regmap_init_i2c()
	mailbox: zynqmp-ipi: Fix NULL vs IS_ERR() check in zynqmp_ipi_mbox_probe()
	rxrpc: Adjust /proc/net/rxrpc/calls to display call->debug_id not user_ID
	openrisc: Fix issue with argument clobbering for clone/fork
	drm/nouveau/disp/gm200-: fix NV_PDISP_SOR_HDMI2_CTRL(n) selection
	ceph: don't return -ESTALE if there's still an open file
	nfsd4: make drc_slab global, not per-net
	gfs2: Allow lock_nolock mount to specify jid=X
	scsi: iscsi: Fix reference count leak in iscsi_boot_create_kobj
	scsi: ufs: Don't update urgent bkops level when toggling auto bkops
	pinctrl: imxl: Fix an error handling path in 'imx1_pinctrl_core_probe()'
	pinctrl: freescale: imx: Fix an error handling path in 'imx_pinctrl_probe()'
	nfsd: safer handling of corrupted c_type
	drm/amd/display: Revalidate bandwidth before commiting DC updates
	crypto: omap-sham - add proper load balancing support for multicore
	geneve: change from tx_error to tx_dropped on missing metadata
	lib/zlib: remove outdated and incorrect pre-increment optimization
	include/linux/bitops.h: avoid clang shift-count-overflow warnings
	selftests/vm/pkeys: fix alloc_random_pkey() to make it really random
	blktrace: use errno instead of bi_status
	blktrace: fix endianness in get_pdu_int()
	blktrace: fix endianness for blk_log_remap()
	gfs2: fix use-after-free on transaction ail lists
	net: marvell: Fix OF_MDIO config check
	ntb_perf: pass correct struct device to dma_alloc_coherent
	ntb_tool: pass correct struct device to dma_alloc_coherent
	NTB: ntb_tool: reading the link file should not end in a NULL byte
	NTB: Revert the change to use the NTB device dev for DMA allocations
	NTB: perf: Don't require one more memory window than number of peers
	NTB: perf: Fix support for hardware that doesn't have port numbers
	NTB: perf: Fix race condition when run with ntb_test
	NTB: ntb_test: Fix bug when counting remote files
	i2c: icy: Fix build with CONFIG_AMIGA_PCMCIA=n
	drivers/perf: hisi: Fix wrong value for all counters enable
	selftests/net: in timestamping, strncpy needs to preserve null byte
	f2fs: don't return vmalloc() memory from f2fs_kmalloc()
	afs: Fix memory leak in afs_put_sysnames()
	ASoC: core: only convert non DPCM link to DPCM link
	ASoC: SOF: nocodec: conditionally set dpcm_capture/dpcm_playback flags
	ASoC: Intel: bytcr_rt5640: Add quirk for Toshiba Encore WT10-A tablet
	ASoC: rt5645: Add platform-data for Asus T101HA
	bpf/sockmap: Fix kernel panic at __tcp_bpf_recvmsg
	bpf, sockhash: Synchronize delete from bucket list on map free
	tracing/probe: Fix bpf_task_fd_query() for kprobes and uprobes
	drm/sun4i: hdmi ddc clk: Fix size of m divider
	libbpf: Handle GCC noreturn-turned-volatile quirk
	scsi: acornscsi: Fix an error handling path in acornscsi_probe()
	x86/idt: Keep spurious entries unset in system_vectors
	net/filter: Permit reading NET in load_bytes_relative when MAC not set
	nvme-pci: use simple suspend when a HMB is enabled
	nfs: set invalid blocks after NFSv4 writes
	xdp: Fix xsk_generic_xmit errno
	iavf: fix speed reporting over virtchnl
	bpf: Fix memlock accounting for sock_hash
	usb/xhci-plat: Set PM runtime as active on resume
	usb: host: ehci-platform: add a quirk to avoid stuck
	usb/ehci-platform: Set PM runtime as active on resume
	perf report: Fix NULL pointer dereference in hists__fprintf_nr_sample_events()
	perf stat: Fix NULL pointer dereference
	ext4: stop overwrite the errcode in ext4_setup_super
	bcache: fix potential deadlock problem in btree_gc_coalesce
	powerpc: Fix kernel crash in show_instructions() w/DEBUG_VIRTUAL
	afs: Fix non-setting of mtime when writing into mmap
	afs: afs_write_end() should change i_size under the right lock
	afs: Fix EOF corruption
	afs: Always include dir in bulk status fetch from afs_do_lookup()
	afs: Set error flag rather than return error from file status decode
	afs: Fix the mapping of the UAEOVERFLOW abort code
	bnxt_en: Return from timer if interface is not in open state.
	scsi: ufs-bsg: Fix runtime PM imbalance on error
	block: Fix use-after-free in blkdev_get()
	mvpp2: remove module bugfix
	arm64: hw_breakpoint: Don't invoke overflow handler on uaccess watchpoints
	libata: Use per port sync for detach
	drm: encoder_slave: fix refcouting error for modules
	ext4: fix partial cluster initialization when splitting extent
	ext4: avoid utf8_strncasecmp() with unstable name
	drm/dp_mst: Reformat drm_dp_check_act_status() a bit
	drm/qxl: Use correct notify port address when creating cursor ring
	drm/amdgpu: Replace invalid device ID with a valid device ID
	selinux: fix double free
	jbd2: clean __jbd2_journal_abort_hard() and __journal_abort_soft()
	ext4: avoid race conditions when remounting with options that change dax
	drm/dp_mst: Increase ACT retry timeout to 3s
	drm/amd/display: Use swap() where appropriate
	x86/boot/compressed: Relax sed symbol type regex for LLVM ld.lld
	block: nr_sects_write(): Disable preemption on seqcount write
	net/mlx5: DR, Fix freeing in dr_create_rc_qp()
	f2fs: split f2fs_d_compare() from f2fs_match_name()
	f2fs: avoid utf8_strncasecmp() with unstable name
	s390: fix syscall_get_error for compat processes
	drm/i915: Fix AUX power domain toggling across TypeC mode resets
	drm/msm: Check for powered down HW in the devfreq callbacks
	drm/i915/gem: Avoid iterating an empty list
	drm/i915: Whitelist context-local timestamp in the gen9 cmdparser
	drm/connector: notify userspace on hotplug after register complete
	drm/amd/display: Use kvfree() to free coeff in build_regamma()
	drm/i915/icl+: Fix hotplug interrupt disabling after storm detection
	Revert "drm/amd/display: disable dcn20 abm feature for bring up"
	crypto: algif_skcipher - Cap recv SG list at ctx->used
	crypto: algboss - don't wait during notifier callback
	tracing/probe: Fix memleak in fetch_op_data operations
	kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex
	kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
	e1000e: Do not wake up the system via WOL if device wakeup is disabled
	net: octeon: mgmt: Repair filling of RX ring
	pwm: jz4740: Enhance precision in calculation of duty cycle
	sched/rt, net: Use CONFIG_PREEMPTION.patch
	net: core: device_rename: Use rwsem instead of a seqcount
	Linux 5.4.49

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I1c2b90800677a958e02061bc77b2fe413882e42e
2020-06-25 13:08:15 +02:00
Dan Williams
ece3a3337c /dev/mem: Revoke mappings when a driver claims the region
[ Upstream commit 3234ac664a ]

Close the hole of holding a mapping over kernel driver takeover event of
a given address range.

Commit 90a545e981 ("restrict /dev/mem to idle io memory ranges")
introduced CONFIG_IO_STRICT_DEVMEM with the goal of protecting the
kernel against scenarios where a /dev/mem user tramples memory that a
kernel driver owns. However, this protection only prevents *new* read(),
write() and mmap() requests. Established mappings prior to the driver
calling request_mem_region() are left alone.

Especially with persistent memory, and the core kernel metadata that is
stored there, there are plentiful scenarios for a /dev/mem user to
violate the expectations of the driver and cause amplified damage.

Teach request_mem_region() to find and shoot down active /dev/mem
mappings that it believes it has successfully claimed for the exclusive
use of the driver. Effectively a driver call to request_mem_region()
becomes a hole-punch on the /dev/mem device.

The typical usage of unmap_mapping_range() is part of
truncate_pagecache() to punch a hole in a file, but in this case the
implementation is only doing the "first half" of a hole punch. Namely it
is just evacuating current established mappings of the "hole", and it
relies on the fact that /dev/mem establishes mappings in terms of
absolute physical address offsets. Once existing mmap users are
invalidated they can attempt to re-establish the mapping, or attempt to
continue issuing read(2) / write(2) to the invalidated extent, but they
will then be subject to the CONFIG_IO_STRICT_DEVMEM checking that can
block those subsequent accesses.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: 90a545e981 ("restrict /dev/mem to idle io memory ranges")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/159009507306.847224.8502634072429766747.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24 17:50:35 +02:00
Greg Kroah-Hartman
bfa0399bc8 Merge Linus's 5.4-rc1-prerelease branch into android-mainline
This merges Linus's tree as of commit b41dae061b ("Merge tag
'xfs-5.4-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux")
into android-mainline.

This "early" merge makes it easier to test and handle merge conflicts
instead of having to wait until the "end" of the merge window and handle
all 10000+ commits at once.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6bebf55e5e2353f814e3c87f5033607b1ae5d812
2019-09-20 16:07:54 -07:00
Gao Xiang
47e4937a4a erofs: move erofs out of staging
EROFS filesystem has been merged into linux-staging for a year.

EROFS is designed to be a better solution of saving extra storage
space with guaranteed end-to-end performance for read-only files
with the help of reduced metadata, fixed-sized output compression
and decompression inplace technologies.

In the past year, EROFS was greatly improved by many people as
a staging driver, self-tested, betaed by a large number of our
internal users, successfully applied to almost all in-service
HUAWEI smartphones as the part of EMUI 9.1 and proven to be stable
enough to be moved out of staging.

EROFS is a self-contained filesystem driver. Although there are
still some TODOs to be more generic, we have a dedicated team
actively keeping on working on EROFS in order to make it better
with the evolution of Linux kernel as the other in-kernel filesystems.

As Pavel suggested, it's better to do as one commit since git
can do moves and all histories will be saved in this way.

Let's promote it from staging and enhance it more actively as
a "real" part of kernel for more wider scenarios!

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Pavel Machek <pavel@denx.de>
Cc: David Sterba <dsterba@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Darrick J . Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Richard Weinberger <richard@nod.at>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: Miao Xie <miaoxie@huawei.com>
Cc: Li Guifu <bluce.liguifu@huawei.com>
Cc: Fang Wei <fangwei1@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190822213659.5501-1-hsiangkao@aol.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-24 14:20:10 +02:00
Greg Kroah-Hartman
37766c2946 Merge 5.3.0-rc1 into android-mainline
Linus 5.3-rc1 release

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic171e37d4c21ffa495240c5538852bbb5a9dcce8
2019-07-23 16:21:59 -07:00
Daniel Rosenberg
d104edcb4f ANDROID: sdcardfs: Define magic value
Define a filesystem magic value to be used for sdcardfs.

Test: HiKey/X15 + Pie + android-mainline,
      and HiKey + AOSP Maser + android-mainline,
      directories under /sdcard created,
      output of mount is right,
      CTS test collecting device infor works

Change-Id: I54daa1452aa6a3ce0401d7d923e0a897f0c26d96
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Yongqin Liu <yongqin.liu@linaro.org>
2019-07-19 12:39:18 -07:00
Linus Torvalds
933a90bf4f Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs mount updates from Al Viro:
 "The first part of mount updates.

  Convert filesystems to use the new mount API"

* 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  mnt_init(): call shmem_init() unconditionally
  constify ksys_mount() string arguments
  don't bother with registering rootfs
  init_rootfs(): don't bother with init_ramfs_fs()
  vfs: Convert smackfs to use the new mount API
  vfs: Convert selinuxfs to use the new mount API
  vfs: Convert securityfs to use the new mount API
  vfs: Convert apparmorfs to use the new mount API
  vfs: Convert openpromfs to use the new mount API
  vfs: Convert xenfs to use the new mount API
  vfs: Convert gadgetfs to use the new mount API
  vfs: Convert oprofilefs to use the new mount API
  vfs: Convert ibmasmfs to use the new mount API
  vfs: Convert qib_fs/ipathfs to use the new mount API
  vfs: Convert efivarfs to use the new mount API
  vfs: Convert configfs to use the new mount API
  vfs: Convert binfmt_misc to use the new mount API
  convenience helper: get_tree_single()
  convenience helper get_tree_nodev()
  vfs: Kill sget_userns()
  ...
2019-07-19 10:42:02 -07:00
Greg Hackmann
ed63bb1d1f dma-buf: give each buffer a full-fledged inode
By traversing /proc/*/fd and /proc/*/map_files, processes with CAP_ADMIN
can get a lot of fine-grained data about how shmem buffers are shared
among processes.  stat(2) on each entry gives the caller a unique
ID (st_ino), the buffer's size (st_size), and even the number of pages
currently charged to the buffer (st_blocks / 512).

In contrast, all dma-bufs share the same anonymous inode.  So while we
can count how many dma-buf fds or mappings a process has, we can't get
the size of the backing buffers or tell if two entries point to the same
dma-buf.  On systems with debugfs, we can get a per-buffer breakdown of
size and reference count, but can't tell which processes are actually
holding the references to each buffer.

Replace the singleton inode with full-fledged inodes allocated by
alloc_anon_inode().  This involves creating and mounting a
mini-pseudo-filesystem for dma-buf, following the example in fs/aio.c.

Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190613223408.139221-2-fengc@google.com
2019-06-14 15:00:50 +05:30
David Howells
ea8157ab2a zsfold: Convert zsfold to use the new mount API
Convert the zsfold filesystem to the new internal mount API as the old one
will be obsoleted and removed.  This allows greater flexibility in
communication of mount parameters between userspace, the VFS and the
filesystem.

See Documentation/filesystems/mount_api.txt for more information.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-25 18:06:01 -04:00
Christian Brauner
3ad20fe393 binder: implement binderfs
As discussed at Linux Plumbers Conference 2018 in Vancouver [1] this is the
implementation of binderfs.

/* Abstract */
binderfs is a backwards-compatible filesystem for Android's binder ipc
mechanism. Each ipc namespace will mount a new binderfs instance. Mounting
binderfs multiple times at different locations in the same ipc namespace
will not cause a new super block to be allocated and hence it will be the
same filesystem instance.
Each new binderfs mount will have its own set of binder devices only
visible in the ipc namespace it has been mounted in. All devices in a new
binderfs mount will follow the scheme binder%d and numbering will always
start at 0.

/* Backwards compatibility */
Devices requested in the Kconfig via CONFIG_ANDROID_BINDER_DEVICES for the
initial ipc namespace will work as before. They will be registered via
misc_register() and appear in the devtmpfs mount. Specifically, the
standard devices binder, hwbinder, and vndbinder will all appear in their
standard locations in /dev. Mounting or unmounting the binderfs mount in
the initial ipc namespace will have no effect on these devices, i.e. they
will neither show up in the binderfs mount nor will they disappear when the
binderfs mount is gone.

/* binder-control */
Each new binderfs instance comes with a binder-control device. No other
devices will be present at first. The binder-control device can be used to
dynamically allocate binder devices. All requests operate on the binderfs
mount the binder-control device resides in.
Assuming a new instance of binderfs has been mounted at /dev/binderfs
via mount -t binderfs binderfs /dev/binderfs. Then a request to create a
new binder device can be made as illustrated in [2].
Binderfs devices can simply be removed via unlink().

/* Implementation details */
- dynamic major number allocation:
  When binderfs is registered as a new filesystem it will dynamically
  allocate a new major number. The allocated major number will be returned
  in struct binderfs_device when a new binder device is allocated.
- global minor number tracking:
  Minor are tracked in a global idr struct that is capped at
  BINDERFS_MAX_MINOR. The minor number tracker is protected by a global
  mutex. This is the only point of contention between binderfs mounts.
- struct binderfs_info:
  Each binderfs super block has its own struct binderfs_info that tracks
  specific details about a binderfs instance:
  - ipc namespace
  - dentry of the binder-control device
  - root uid and root gid of the user namespace the binderfs instance
    was mounted in
- mountable by user namespace root:
  binderfs can be mounted by user namespace root in a non-initial user
  namespace. The devices will be owned by user namespace root.
- binderfs binder devices without misc infrastructure:
  New binder devices associated with a binderfs mount do not use the
  full misc_register() infrastructure.
  The misc_register() infrastructure can only create new devices in the
  host's devtmpfs mount. binderfs does however only make devices appear
  under its own mountpoint and thus allocates new character device nodes
  from the inode of the root dentry of the super block. This will have
  the side-effect that binderfs specific device nodes do not appear in
  sysfs. This behavior is similar to devpts allocated pts devices and
  has no effect on the functionality of the ipc mechanism itself.

[1]: https://goo.gl/JL2tfX
[2]: program to allocate a new binderfs binder device:

     #define _GNU_SOURCE
     #include <errno.h>
     #include <fcntl.h>
     #include <stdio.h>
     #include <stdlib.h>
     #include <string.h>
     #include <sys/ioctl.h>
     #include <sys/stat.h>
     #include <sys/types.h>
     #include <unistd.h>
     #include <linux/android/binder_ctl.h>

     int main(int argc, char *argv[])
     {
             int fd, ret, saved_errno;
             size_t len;
             struct binderfs_device device = { 0 };

             if (argc < 2)
                     exit(EXIT_FAILURE);

             len = strlen(argv[1]);
             if (len > BINDERFS_MAX_NAME)
                     exit(EXIT_FAILURE);

             memcpy(device.name, argv[1], len);

             fd = open("/dev/binderfs/binder-control", O_RDONLY | O_CLOEXEC);
             if (fd < 0) {
                     printf("%s - Failed to open binder-control device\n",
                            strerror(errno));
                     exit(EXIT_FAILURE);
             }

             ret = ioctl(fd, BINDER_CTL_ADD, &device);
             saved_errno = errno;
             close(fd);
             errno = saved_errno;
             if (ret < 0) {
                     printf("%s - Failed to allocate new binder device\n",
                            strerror(errno));
                     exit(EXIT_FAILURE);
             }

             printf("Allocated new binder device with major %d, minor %d, and "
                    "name %s\n", device.major, device.minor,
                    device.name);

             exit(EXIT_SUCCESS);
     }

Cc: Martijn Coenen <maco@android.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-19 09:40:13 +01:00
Adam Borowski
dddde68b8f xfs: add a define for statfs magic to uapi
Needed by userspace programs that call fstatfs().

It'd be natural to publish XFS_SB_MAGIC in uapi, but while these two
have identical values, they have different semantic meaning: one is
an enum cookie meant for statfs, the other a signature of the
on-disk format.

Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:20:19 +11:00
David Howells
f044c8847b afs: Lay the groundwork for supporting network namespaces
Lay the groundwork for supporting network namespaces (netns) to the AFS
filesystem by moving various global features to a network-namespace struct
(afs_net) and providing an instance of this as a temporary global variable
that everything uses via accessor functions for the moment.

The following changes have been made:

 (1) Store the netns in the superblock info.  This will be obtained from
     the mounter's nsproxy on a manual mount and inherited from the parent
     superblock on an automount.

 (2) The cell list is made per-netns.  It can be viewed through
     /proc/net/afs/cells and also be modified by writing commands to that
     file.

 (3) The local workstation cell is set per-ns in /proc/net/afs/rootcell.
     This is unset by default.

 (4) The 'rootcell' module parameter, which sets a cell and VL server list
     modifies the init net namespace, thereby allowing an AFS root fs to be
     theoretically used.

 (5) The volume location lists and the file lock manager are made
     per-netns.

 (6) The AF_RXRPC socket and associated I/O bits are made per-ns.

The various workqueues remain global for the moment.

Changes still to be made:

 (1) /proc/fs/afs/ should be moved to /proc/net/afs/ and a symlink emplaced
     from the old name.

 (2) A per-netns subsys needs to be registered for AFS into which it can
     store its per-netns data.

 (3) Rather than the AF_RXRPC socket being opened on module init, it needs
     to be opened on the creation of a superblock in that netns.

 (4) The socket needs to be closed when the last superblock using it is
     destroyed and all outstanding client calls on it have been completed.
     This prevents a reference loop on the namespace.

 (5) It is possible that several namespaces will want to use AFS, in which
     case each one will need its own UDP port.  These can either be set
     through /proc/net/afs/cm_port or the kernel can pick one at random.
     The init_ns gets 7001 by default.

Other issues that need resolving:

 (1) The DNS keyring needs net-namespacing.

 (2) Where do upcalls go (eg. DNS request-key upcall)?

 (3) Need something like open_socket_in_file_ns() syscall so that AFS
     command line tools attempting to operate on an AFS file/volume have
     their RPC calls go to the right place.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:16 +00:00
Greg Kroah-Hartman
6f52b16c5b License cleanup: add SPDX license identifier to uapi header files with no license
Many user space API headers are missing licensing information, which
makes it hard for compliance tools to determine the correct license.

By default are files without license information under the default
license of the kernel, which is GPLV2.  Marking them GPLV2 would exclude
them from being included in non GPLV2 code, which is obviously not
intended. The user space API headers fall under the syscall exception
which is in the kernels COPYING file:

   NOTE! This copyright does *not* cover user programs that use kernel
   services by normal system calls - this is merely considered normal use
   of the kernel, and does *not* fall under the heading of "derived work".

otherwise syscall usage would not be possible.

Update the files which contain no license information with an SPDX
license identifier.  The chosen identifier is 'GPL-2.0 WITH
Linux-syscall-note' which is the officially assigned identifier for the
Linux syscall exception.  SPDX license identifiers are a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.  See the previous patch in this series for the
methodology of how this patch was researched.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:19:54 +01:00
Fabian Frederick
62aa81d7c4 ocfs2: use magic.h
Filesystems generally use SUPER_MAGIC values from magic.h instead of a
local definition.

Link: http://lkml.kernel.org/r/20170521154217.27917-1-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Reviewed-by: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:30 -07:00
John Johansen
a481f4d917 apparmor: add custom apparmorfs that will be used by policy namespace files
AppArmor policy needs to be able to be resolved based on the policy
namespace a task is confined by. Add a base apparmorfs filesystem that
(like nsfs) will exist as a kern mount and be accessed via jump_link
through a securityfs file.

Setup the base apparmorfs fns and data, but don't use it yet.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: Seth Arnold <seth.arnold@canonical.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
2017-06-08 12:51:51 -07:00
Fenghua Yu
5ff193fbde x86/intel_rdt: Add basic resctrl filesystem support
Use kernfs as basis for our user interface filesystem. This patch
supports mount/umount, and one mount parameter "cdp" to enable code/data
prioritization (though all we do at this point is ensure that the system
can support CDP).  The file system is not populated yet in this patch.

[ tglx: Fixed up a few nits and added cdp handling in case of error ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477692289-37412-4-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-30 19:10:14 -06:00
Dan Williams
3bc52c45ba dax: define a unified inode/address_space for device-dax mappings
In support of enabling resize / truncate of device-dax instances, define
a pseudo-fs to provide a unified inode/address space for vm operations.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-08-23 22:58:51 -07:00
Minchan Kim
48b4800a1c zsmalloc: page migration support
This patch introduces run-time migration feature for zspage.

For migration, VM uses page.lru field so it would be better to not use
page.next field which is unified with page.lru for own purpose.  For
that, firstly, we can get first object offset of the page via runtime
calculation instead of using page.index so we can use page.index as link
for page chaining instead of page.next.

In case of huge object, it stores handle to page.index instead of next
link of page chaining because huge object doesn't need to next link for
page chaining.  So get_next_page need to identify huge object to return
NULL.  For it, this patch uses PG_owner_priv_1 flag of the page flag.

For migration, it supports three functions

* zs_page_isolate

It isolates a zspage which includes a subpage VM want to migrate from
class so anyone cannot allocate new object from the zspage.

We could try to isolate a zspage by the number of subpage so subsequent
isolation trial of other subpage of the zpsage shouldn't fail.  For
that, we introduce zspage.isolated count.  With that, zs_page_isolate
can know whether zspage is already isolated or not for migration so if
it is isolated for migration, subsequent isolation trial can be
successful without trying further isolation.

* zs_page_migrate

First of all, it holds write-side zspage->lock to prevent migrate other
subpage in zspage.  Then, lock all objects in the page VM want to
migrate.  The reason we should lock all objects in the page is due to
race between zs_map_object and zs_page_migrate.

  zs_map_object				zs_page_migrate

  pin_tag(handle)
  obj = handle_to_obj(handle)
  obj_to_location(obj, &page, &obj_idx);

					write_lock(&zspage->lock)
					if (!trypin_tag(handle))
						goto unpin_object

  zspage = get_zspage(page);
  read_lock(&zspage->lock);

If zs_page_migrate doesn't do trypin_tag, zs_map_object's page can be
stale by migration so it goes crash.

If it locks all of objects successfully, it copies content from old page
to new one, finally, create new zspage chain with new page.  And if it's
last isolated subpage in the zspage, put the zspage back to class.

* zs_page_putback

It returns isolated zspage to right fullness_group list if it fails to
migrate a page.  If it find a zspage is ZS_EMPTY, it queues zspage
freeing to workqueue.  See below about async zspage freeing.

This patch introduces asynchronous zspage free.  The reason to need it
is we need page_lock to clear PG_movable but unfortunately, zs_free path
should be atomic so the apporach is try to grab page_lock.  If it got
page_lock of all of pages successfully, it can free zspage immediately.
Otherwise, it queues free request and free zspage via workqueue in
process context.

If zs_free finds the zspage is isolated when it try to free zspage, it
delays the freeing until zs_page_putback finds it so it will free free
the zspage finally.

In this patch, we expand fullness_list from ZS_EMPTY to ZS_FULL.  First
of all, it will use ZS_EMPTY list for delay freeing.  And with adding
ZS_FULL list, it makes to identify whether zspage is isolated or not via
list_empty(&zspage->list) test.

[minchan@kernel.org: zsmalloc: keep first object offset in struct page]
  Link: http://lkml.kernel.org/r/1465788015-23195-1-git-send-email-minchan@kernel.org
[minchan@kernel.org: zsmalloc: zspage sanity check]
  Link: http://lkml.kernel.org/r/20160603010129.GC3304@bbox
Link: http://lkml.kernel.org/r/1464736881-24886-12-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
Minchan Kim
b1123ea6d3 mm: balloon: use general non-lru movable page feature
Now, VM has a feature to migrate non-lru movable pages so balloon
doesn't need custom migration hooks in migrate.c and compaction.c.

Instead, this patch implements the page->mapping->a_ops->
{isolate|migrate|putback} functions.

With that, we could remove hooks for ballooning in general migration
functions and make balloon compaction simple.

[akpm@linux-foundation.org: compaction.h requires that the includer first include node.h]
Link: http://lkml.kernel.org/r/1464736881-24886-4-git-send-email-minchan@kernel.org
Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
Jan Kara
2a28900be2 udf: Export superblock magic to userspace
Currently UDF superblock magic doesn't appear in any userspace header
files and thus userspace apps have hard time checking for this fs. Let's
export the magic to userspace as with any other filesystem.

Signed-off-by: Jan Kara <jack@suse.cz>
2016-04-28 10:43:31 +02:00
Linus Torvalds
e9f57ebcba Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi:
 "This contains several bug fixes and a new mount option
  'default_permissions' that allows read-only exported NFS
  filesystems to be used as lower layer"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: check dentry positiveness in ovl_cleanup_whiteouts()
  ovl: setattr: check permissions before copy-up
  ovl: root: copy attr
  ovl: move super block magic number to magic.h
  ovl: use a minimal buffer in ovl_copy_xattr
  ovl: allow zero size xattr
  ovl: default permissions
2016-01-21 12:20:46 -08:00
Tejun Heo
67e9c74b8a cgroup: replace __DEVEL__sane_behavior with cgroup2 fs type
With major controllers - cpu, memory and io - shaping up for the
unified hierarchy, cgroup2 is about ready to be, gradually, released
into the wild.  Replace __DEVEL__sane_behavior flag which was used to
select the unified hierarchy with a separate filesystem type "cgroup2"
so that unified hierarchy can be mounted as follows.

  mount -t cgroup2 none $MOUNT_POINT

The cgroup2 fs has its own magic number - 0x63677270 ("cgrp").

v2: Assign a different magic number to cgroup2 fs.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
2015-11-16 11:13:34 -05:00
Stephen Hemminger
257f871993 ovl: move super block magic number to magic.h
The overlayfs file system is not recognized by programs
like tail because the magic number is not in standard header location.

Move it so that the value will propagate on for the GNU library
and utilities. Needs to go in the fstatfs manual page as well.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
2015-11-10 17:13:35 +01:00
Daniel Borkmann
b2197755b2 bpf: add support for persistent maps/progs
This work adds support for "persistent" eBPF maps/programs. The term
"persistent" is to be understood that maps/programs have a facility
that lets them survive process termination. This is desired by various
eBPF subsystem users.

Just to name one example: tc classifier/action. Whenever tc parses
the ELF object, extracts and loads maps/progs into the kernel, these
file descriptors will be out of reach after the tc instance exits.
So a subsequent tc invocation won't be able to access/relocate on this
resource, and therefore maps cannot easily be shared, f.e. between the
ingress and egress networking data path.

The current workaround is that Unix domain sockets (UDS) need to be
instrumented in order to pass the created eBPF map/program file
descriptors to a third party management daemon through UDS' socket
passing facility. This makes it a bit complicated to deploy shared
eBPF maps or programs (programs f.e. for tail calls) among various
processes.

We've been brainstorming on how we could tackle this issue and various
approches have been tried out so far, which can be read up further in
the below reference.

The architecture we eventually ended up with is a minimal file system
that can hold map/prog objects. The file system is a per mount namespace
singleton, and the default mount point is /sys/fs/bpf/. Any subsequent
mounts within a given namespace will point to the same instance. The
file system allows for creating a user-defined directory structure.
The objects for maps/progs are created/fetched through bpf(2) with
two new commands (BPF_OBJ_PIN/BPF_OBJ_GET). I.e. a bpf file descriptor
along with a pathname is being passed to bpf(2) that in turn creates
(we call it eBPF object pinning) the file system nodes. Only the pathname
is being passed to bpf(2) for getting a new BPF file descriptor to an
existing node. The user can use that to access maps and progs later on,
through bpf(2). Removal of file system nodes is being managed through
normal VFS functions such as unlink(2), etc. The file system code is
kept to a very minimum and can be further extended later on.

The next step I'm working on is to add dump eBPF map/prog commands
to bpf(2), so that a specification from a given file descriptor can
be retrieved. This can be used by things like CRIU but also applications
can inspect the meta data after calling BPF_OBJ_GET.

Big thanks also to Alexei and Hannes who significantly contributed
in the design discussion that eventually let us end up with this
architecture here.

Reference: https://lkml.org/lkml/2015/10/15/925
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02 22:48:39 -05:00
Steven Rostedt (Red Hat)
4282d60689 tracefs: Add new tracefs file system
Add a separate file system to handle the tracing directory. Currently it
is part of debugfs, but that is starting to show its limits.

One thing is that in order to access the tracing infrastructure, you need
to mount debugfs. As that includes debugging from all sorts of sub systems
in the kernel, it is not considered advisable to mount such an all
encompassing debugging system.

Having the tracing system in its own file systems gives access to the
tracing sub system without needing to include all other systems.

Another problem with tracing using the debugfs system is that the
instances use mkdir to create sub buffers. debugfs does not support mkdir
from userspace so to implement it, special hacks were used. By controlling
the file system that the tracing infrastructure uses, this can be properly
done without hacks.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-02-03 12:48:40 -05:00
Al Viro
e149ed2b80 take the targets of /proc/*/ns/* symlinks to separate fs
New pseudo-filesystem: nsfs.  Targets of /proc/*/ns/* live there now.
It's not mountable (not even registered, so it's not in /proc/filesystems,
etc.).  Files on it *are* bindable - we explicitly permit that in do_loopback().

This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well.
get_proc_ns() is a macro now (it's simply returning ->i_private; would
have been an inline, if not for header ordering headache).
proc_ns_inode() is an ex-parrot.  The interface used in procfs is
ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops).

Dentries and inodes are never hashed; a non-counting reference to dentry
is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path()
if present.  See ns_get_path()/ns_prune_dentry/nsfs_evict() for details
of that mechanism.

As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt;
it does nd_jump_link() on a consistent <vfsmount,dentry> pair it gets
from ns_get_path().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-12-10 21:30:20 -05:00
Josef Bacik
294e30fee3 Btrfs: add tests for find_lock_delalloc_range
So both Liu and I made huge messes of find_lock_delalloc_range trying to fix
stuff, me first by fixing extent size, then him by fixing something I broke and
then me again telling him to fix it a different way.  So this is obviously a
candidate for some testing.  This patch adds a pseudo fs so we can allocate fake
inodes for tests that need an inode or pages.  Then it addes a bunch of tests to
make sure find_lock_delalloc_range is acting the way it is supposed to.  With
this patch and all of our previous patches to find_lock_delalloc_range I am sure
it is working as expected now.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-11 21:56:51 -05:00
James Hogan
2b3b9bb03a hostfs: move HOSTFS_SUPER_MAGIC to <linux/magic.h>
Move HOSTFS_SUPER_MAGIC to <linux/magic.h> to be with it's magical
friends from other file systems.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-04 15:48:44 -04:00
Jarkko Sakkinen
cee7e44334 smack: SMACK_MAGIC to include/uapi/linux/magic.h
SMACK_MAGIC moved to a proper place for easy user space access
(i.e. libsmack).

Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@iki.fi>
2013-03-19 14:16:15 -07:00
Linus Torvalds
a13eea6bd9 Merge tag 'for-3.8-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull new F2FS filesystem from Jaegeuk Kim:
 "Introduce a new file system, Flash-Friendly File System (F2FS), to
  Linux 3.8.

  Highlights:
   - Add initial f2fs source codes
   - Fix an endian conversion bug
   - Fix build failures on random configs
   - Fix the power-off-recovery routine
   - Minor cleanup, coding style, and typos patches"

From the Kconfig help text:

  F2FS is based on Log-structured File System (LFS), which supports
  versatile "flash-friendly" features. The design has been focused on
  addressing the fundamental issues in LFS, which are snowball effect
  of wandering tree and high cleaning overhead.

  Since flash-based storages show different characteristics according to
  the internal geometry or flash memory management schemes aka FTL, F2FS
  and tools support various parameters not only for configuring on-disk
  layout, but also for selecting allocation and cleaning algorithms.

and there's an article by Neil Brown about it on lwn.net:

  http://lwn.net/Articles/518988/

* tag 'for-3.8-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (36 commits)
  f2fs: fix tracking parent inode number
  f2fs: cleanup the f2fs_bio_alloc routine
  f2fs: introduce accessor to retrieve number of dentry slots
  f2fs: remove redundant call to f2fs_put_page in delete entry
  f2fs: make use of GFP_F2FS_ZERO for setting gfp_mask
  f2fs: rewrite f2fs_bio_alloc to make it simpler
  f2fs: fix a typo in f2fs documentation
  f2fs: remove unused variable
  f2fs: move error condition for mkdir at proper place
  f2fs: remove unneeded initialization
  f2fs: check read only condition before beginning write out
  f2fs: remove unneeded memset from init_once
  f2fs: show error in case of invalid mount arguments
  f2fs: fix the compiler warning for uninitialized use of variable
  f2fs: resolve build failures
  f2fs: adjust kernel coding style
  f2fs: fix endian conversion bugs reported by sparse
  f2fs: remove unneeded version.h header file from f2fs.h
  f2fs: update the f2fs document
  f2fs: update Kconfig and Makefile
  ...
2012-12-20 13:54:52 -08:00
Jaegeuk Kim
39a53e0ce0 f2fs: add superblock and major in-memory structure
This adds the following major in-memory structures in f2fs.

- f2fs_sb_info:
  contains f2fs-specific information, two special inode pointers for node and
  meta address spaces, and orphan inode management.

- f2fs_inode_info:
  contains vfs_inode and other fs-specific information.

- f2fs_nm_info:
  contains node manager information such as NAT entry cache, free nid list,
  and NAT page management.

- f2fs_node_info:
  represents a node as node id, inode number, block address, and its version.

- f2fs_sm_info:
  contains segment manager information such as SIT entry cache, free segment
  map, current active logs, dirty segment management, and segment utilization.
  The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
  curseg_info.

In addition, add F2FS_SUPER_MAGIC in magic.h.

Signed-off-by: Chul Lee <chur.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-12-11 13:43:40 +09:00
Matt Fleming
91716322d8 efivarfs: Add unique magic number
Using pstore's superblock magic number is no doubt going to cause
problems in the future. Give efivarfs its own magic number.

Acked-by: Jeremy Kerr <jeremy.kerr@canonical.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2012-10-30 10:39:27 +00:00
David Howells
607ca46e97 UAPI: (Scripted) Disintegrate include/linux
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>
2012-10-13 10:46:48 +01:00