Changes in 5.15.79
thunderbolt: Tear down existing tunnels when resuming from hibernate
thunderbolt: Add DP OUT resource when DP tunnel is discovered
fuse: fix readdir cache race
drm/amdkfd: avoid recursive lock in migrations back to RAM
drm/amdkfd: handle CPU fault on COW mapping
drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()
hwspinlock: qcom: correct MMIO max register for newer SoCs
phy: stm32: fix an error code in probe
wifi: cfg80211: silence a sparse RCU warning
wifi: cfg80211: fix memory leak in query_regdb_file()
soundwire: qcom: reinit broadcast completion
soundwire: qcom: check for outanding writes before doing a read
bpf, verifier: Fix memory leak in array reallocation for stack state
bpf, sockmap: Fix the sk->sk_forward_alloc warning of sk_stream_kill_queues
wifi: mac80211: Set TWT Information Frame Disabled bit as 1
bpftool: Fix NULL pointer dereference when pin {PROG, MAP, LINK} without FILE
HID: hyperv: fix possible memory leak in mousevsc_probe()
bpf, sockmap: Fix sk->sk_forward_alloc warn_on in sk_stream_kill_queues
bpf: Fix sockmap calling sleepable function in teardown path
bpf, sock_map: Move cancel_work_sync() out of sock lock
bpf: Add helper macro bpf_for_each_reg_in_vstate
bpf: Fix wrong reg type conversion in release_reference()
net: gso: fix panic on frag_list with mixed head alloc types
macsec: delete new rxsc when offload fails
macsec: fix secy->n_rx_sc accounting
macsec: fix detection of RXSCs when toggling offloading
macsec: clear encryption keys from the stack after setting up offload
octeontx2-pf: Use hardware register for CQE count
octeontx2-pf: NIX TX overwrites SQ_CTX_HW_S[SQ_INT]
net: tun: Fix memory leaks of napi_get_frags
bnxt_en: Fix possible crash in bnxt_hwrm_set_coal()
bnxt_en: fix potentially incorrect return value for ndo_rx_flow_steer
net: fman: Unregister ethernet device on removal
capabilities: fix undefined behavior in bit shift for CAP_TO_MASK
phy: ralink: mt7621-pci: add sentinel to quirks table
KVM: s390: pv: don't allow userspace to set the clock under PV
net: lapbether: fix issue of dev reference count leakage in lapbeth_device_event()
hamradio: fix issue of dev reference count leakage in bpq_device_event()
net: wwan: iosm: fix memory leak in ipc_wwan_dellink
net: wwan: mhi: fix memory leak in mhi_mbim_dellink
drm/vc4: Fix missing platform_unregister_drivers() call in vc4_drm_register()
tcp: prohibit TCP_REPAIR_OPTIONS if data was already sent
ipv6: addrlabel: fix infoleak when sending struct ifaddrlblmsg to network
can: af_can: fix NULL pointer dereference in can_rx_register()
net: stmmac: dwmac-meson8b: fix meson8b_devm_clk_prepare_enable()
net: broadcom: Fix BCMGENET Kconfig
tipc: fix the msg->req tlv len check in tipc_nl_compat_name_table_dump_header
dmaengine: pxa_dma: use platform_get_irq_optional
dmaengine: mv_xor_v2: Fix a resource leak in mv_xor_v2_remove()
dmaengine: ti: k3-udma-glue: fix memory leak when register device fail
net: lapbether: fix issue of invalid opcode in lapbeth_open()
drivers: net: xgene: disable napi when register irq failed in xgene_enet_open()
perf stat: Fix printing os->prefix in CSV metrics output
perf tools: Add the include/perf/ directory to .gitignore
netfilter: nfnetlink: fix potential dead lock in nfnetlink_rcv_msg()
netfilter: Cleanup nft_net->module_list from nf_tables_exit_net()
net: marvell: prestera: fix memory leak in prestera_rxtx_switch_init()
net: nixge: disable napi when enable interrupts failed in nixge_open()
net: wwan: iosm: fix memory leak in ipc_pcie_read_bios_cfg
net/mlx5: Bridge, verify LAG state when adding bond to bridge
net/mlx5: Allow async trigger completion execution on single CPU systems
net/mlx5e: E-Switch, Fix comparing termination table instance
net: cpsw: disable napi in cpsw_ndo_open()
net: cxgb3_main: disable napi when bind qsets failed in cxgb_up()
stmmac: intel: Enable 2.5Gbps for Intel AlderLake-S
stmmac: intel: Update PCH PTP clock rate from 200MHz to 204.8MHz
mctp: Fix an error handling path in mctp_init()
cxgb4vf: shut down the adapter when t4vf_update_port_info() failed in cxgb4vf_open()
stmmac: dwmac-loongson: fix missing pci_disable_msi() while module exiting
stmmac: dwmac-loongson: fix missing pci_disable_device() in loongson_dwmac_probe()
stmmac: dwmac-loongson: fix missing of_node_put() while module exiting
net: phy: mscc: macsec: clear encryption keys when freeing a flow
net: atlantic: macsec: clear encryption keys from the stack
ethernet: s2io: disable napi when start nic failed in s2io_card_up()
net: mv643xx_eth: disable napi when init rxq or txq failed in mv643xx_eth_open()
ethernet: tundra: free irq when alloc ring failed in tsi108_open()
net: macvlan: fix memory leaks of macvlan_common_newlink
riscv: process: fix kernel info leakage
riscv: vdso: fix build with llvm
riscv: fix reserved memory setup
arm64: efi: Fix handling of misaligned runtime regions and drop warning
MIPS: jump_label: Fix compat branch range check
mmc: cqhci: Provide helper for resetting both SDHCI and CQHCI
mmc: sdhci-of-arasan: Fix SDHCI_RESET_ALL for CQHCI
mmc: sdhci_am654: Fix SDHCI_RESET_ALL for CQHCI
mmc: sdhci-tegra: Fix SDHCI_RESET_ALL for CQHCI
mmc: sdhci-esdhc-imx: use the correct host caps for MMC_CAP_8_BIT_DATA
ALSA: hda/hdmi - enable runtime pm for more AMD display audio
ALSA: hda/ca0132: add quirk for EVGA Z390 DARK
ALSA: hda: fix potential memleak in 'add_widget_node'
ALSA: hda/realtek: Add Positivo C6300 model quirk
ALSA: usb-audio: Yet more regression for for the delayed card registration
ALSA: usb-audio: Add quirk entry for M-Audio Micro
ALSA: usb-audio: Add DSD support for Accuphase DAC-60
vmlinux.lds.h: Fix placement of '.data..decrypted' section
ata: libata-scsi: fix SYNCHRONIZE CACHE (16) command failure
nilfs2: fix deadlock in nilfs_count_free_blocks()
nilfs2: fix use-after-free bug of ns_writer on remount
drm/i915/dmabuf: fix sg_table handling in map_dma_buf
drm/amdgpu: disable BACO on special BEIGE_GOBY card
platform/x86: hp_wmi: Fix rfkill causing soft blocked wifi
wifi: ath11k: avoid deadlock during regulatory update in ath11k_regd_update()
btrfs: fix match incorrectly in dev_args_match_device
btrfs: selftests: fix wrong error check in btrfs_free_dummy_root()
btrfs: zoned: initialize device's zone info for seeding
mms: sdhci-esdhc-imx: Fix SDHCI_RESET_ALL for CQHCI
udf: Fix a slab-out-of-bounds write bug in udf_find_entry()
mm/damon/dbgfs: check if rm_contexts input is for a real context
mm/memremap.c: map FS_DAX device memory as decrypted
mm/shmem: use page_mapping() to detect page cache for uffd continue
can: j1939: j1939_send_one(): fix missing CAN header initialization
cert host tools: Stop complaining about deprecated OpenSSL functions
dmaengine: at_hdmac: Fix at_lli struct definition
dmaengine: at_hdmac: Don't start transactions at tx_submit level
dmaengine: at_hdmac: Start transfer for cyclic channels in issue_pending
dmaengine: at_hdmac: Fix premature completion of desc in issue_pending
dmaengine: at_hdmac: Do not call the complete callback on device_terminate_all
dmaengine: at_hdmac: Protect atchan->status with the channel lock
dmaengine: at_hdmac: Fix concurrency problems by removing atc_complete_all()
dmaengine: at_hdmac: Fix concurrency over descriptor
dmaengine: at_hdmac: Free the memset buf without holding the chan lock
dmaengine: at_hdmac: Fix concurrency over the active list
dmaengine: at_hdmac: Fix descriptor handling when issuing it to hardware
dmaengine: at_hdmac: Fix completion of unissued descriptor in case of errors
dmaengine: at_hdmac: Don't allow CPU to reorder channel enable
dmaengine: at_hdmac: Fix impossible condition
dmaengine: at_hdmac: Check return code of dma_async_device_register
marvell: octeontx2: build error: unknown type name 'u64'
drm/amdkfd: Migrate in CPU page fault use current mm
net: tun: call napi_schedule_prep() to ensure we own a napi
x86/cpu: Restore AMD's DE_CFG MSR after resume
Linux 5.15.79
Change-Id: I395d5b480d2abd70e94c3505a4bd2ad728424fb3
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 8cccf05fe857a18ee26e20d11a8455a73ffd4efd upstream.
If a nilfs2 filesystem is downgraded to read-only due to metadata
corruption on disk and is remounted read/write, or if emergency read-only
remount is performed, detaching a log writer and synchronizing the
filesystem can be done at the same time.
In these cases, use-after-free of the log writer (hereinafter
nilfs->ns_writer) can happen as shown in the scenario below:
Task1 Task2
-------------------------------- ------------------------------
nilfs_construct_segment
nilfs_segctor_sync
init_wait
init_waitqueue_entry
add_wait_queue
schedule
nilfs_remount (R/W remount case)
nilfs_attach_log_writer
nilfs_detach_log_writer
nilfs_segctor_destroy
kfree
finish_wait
_raw_spin_lock_irqsave
__raw_spin_lock_irqsave
do_raw_spin_lock
debug_spin_lock_before <-- use-after-free
While Task1 is sleeping, nilfs->ns_writer is freed by Task2. After Task1
waked up, Task1 accesses nilfs->ns_writer which is already freed. This
scenario diagram is based on the Shigeru Yoshida's post [1].
This patch fixes the issue by not detaching nilfs->ns_writer on remount so
that this UAF race doesn't happen. Along with this change, this patch
also inserts a few necessary read-only checks with superblock instance
where only the ns_writer pointer was used to check if the filesystem is
read-only.
Link: https://syzkaller.appspot.com/bug?id=79a4c002e960419ca173d55e863bd09e8112df8b
Link: https://lkml.kernel.org/r/20221103141759.1836312-1-syoshida@redhat.com [1]
Link: https://lkml.kernel.org/r/20221104142959.28296-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+f816fa82f8783f7a02bb@syzkaller.appspotmail.com
Reported-by: Shigeru Yoshida <syoshida@redhat.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 5.15.42
usb: gadget: fix race when gadget driver register via ioctl
io_uring: arm poll for non-nowait files
floppy: use a statically allocated error counter
kernel/resource: Introduce request_mem_region_muxed()
i2c: piix4: Replace hardcoded memory map size with a #define
i2c: piix4: Move port I/O region request/release code into functions
i2c: piix4: Move SMBus controller base address detect into function
i2c: piix4: Move SMBus port selection into function
i2c: piix4: Add EFCH MMIO support to region request and release
i2c: piix4: Add EFCH MMIO support to SMBus base address detect
i2c: piix4: Add EFCH MMIO support for SMBus port select
i2c: piix4: Enable EFCH MMIO for Family 17h+
Watchdog: sp5100_tco: Move timer initialization into function
Watchdog: sp5100_tco: Refactor MMIO base address initialization
Watchdog: sp5100_tco: Add initialization using EFCH MMIO
Watchdog: sp5100_tco: Enable Family 17h+ CPUs
mm/kfence: reset PG_slab and memcg_data before freeing __kfence_pool
Revert "drm/i915/opregion: check port number bounds for SWSCI display power state"
rtc: fix use-after-free on device removal
rtc: pcf2127: fix bug when reading alarm registers
um: Cleanup syscall_handler_t definition/cast, fix warning
Input: add bounds checking to input_set_capability()
Input: stmfts - fix reference leak in stmfts_input_open
nvme-pci: add quirks for Samsung X5 SSDs
gfs2: Disable page faults during lockless buffered reads
rtc: sun6i: Fix time overflow handling
crypto: stm32 - fix reference leak in stm32_crc_remove
crypto: x86/chacha20 - Avoid spurious jumps to other functions
ALSA: hda/realtek: Enable headset mic on Lenovo P360
s390/traps: improve panic message for translation-specification exception
s390/pci: improve zpci_dev reference counting
vhost_vdpa: don't setup irq offloading when irq_num < 0
tools/virtio: compile with -pthread
nvmet: use a private workqueue instead of the system workqueue
nvme-multipath: fix hang when disk goes live over reconnect
rtc: mc146818-lib: Fix the AltCentury for AMD platforms
fs: fix an infinite loop in iomap_fiemap
MIPS: lantiq: check the return value of kzalloc()
drbd: remove usage of list iterator variable after loop
platform/chrome: cros_ec_debugfs: detach log reader wq from devm
ARM: 9191/1: arm/stacktrace, kasan: Silence KASAN warnings in unwind_frame()
nilfs2: fix lockdep warnings in page operations for btree nodes
nilfs2: fix lockdep warnings during disk space reclamation
ALSA: usb-audio: Restore Rane SL-1 quirk
ALSA: wavefront: Proper check of get_user() error
ALSA: hda/realtek: Add quirk for TongFang devices with pop noise
perf: Fix sys_perf_event_open() race against self
selinux: fix bad cleanup on error in hashtab_duplicate()
Fix double fget() in vhost_net_set_backend()
PCI/PM: Avoid putting Elo i2 PCIe Ports in D3cold
Revert "can: m_can: pci: use custom bit timings for Elkhart Lake"
KVM: x86/mmu: Update number of zapped pages even if page list is stable
arm64: paravirt: Use RCU read locks to guard stolen_time
arm64: mte: Ensure the cleared tags are visible before setting the PTE
crypto: qcom-rng - fix infinite loop on requests not multiple of WORD_SZ
libceph: fix potential use-after-free on linger ping and resends
drm/amd: Don't reset dGPUs if the system is going to s2idle
drm/i915/dmc: Add MMIO range restrictions
drm/dp/mst: fix a possible memory leak in fetch_monitor_name()
dma-buf: fix use of DMA_BUF_SET_NAME_{A,B} in userspace
dma-buf: ensure unique directory name for dmabuf stats
ARM: dts: aspeed-g6: remove FWQSPID group in pinctrl dtsi
pinctrl: pinctrl-aspeed-g6: remove FWQSPID group in pinctrl
ARM: dts: aspeed-g6: fix SPI1/SPI2 quad pin group
ARM: dts: aspeed: Add ADC for AST2600 and enable for Rainier and Everest
ARM: dts: aspeed: Add secure boot controller node
ARM: dts: aspeed: Add video engine to g6
pinctrl: mediatek: mt8365: fix IES control pins
ALSA: hda - fix unused Realtek function when PM is not enabled
net: ipa: record proper RX transaction count
net: macb: Increment rx bd head after allocating skb and buffer
xfrm: rework default policy structure
xfrm: fix "disable_policy" flag use when arriving from different devices
net/sched: act_pedit: sanitize shift argument before usage
netfilter: flowtable: fix excessive hw offload attempts after failure
netfilter: nft_flow_offload: skip dst neigh lookup for ppp devices
net: fix dev_fill_forward_path with pppoe + bridge
netfilter: nft_flow_offload: fix offload with pppoe + vlan
Revert "PCI: aardvark: Rewrite IRQ code to chained IRQ handler"
net: systemport: Fix an error handling path in bcm_sysport_probe()
net: vmxnet3: fix possible use-after-free bugs in vmxnet3_rq_alloc_rx_buf()
net: vmxnet3: fix possible NULL pointer dereference in vmxnet3_rq_cleanup()
ice: fix crash when writing timestamp on RX rings
ice: fix possible under reporting of ethtool Tx and Rx statistics
ice: move ice_container_type onto ice_ring_container
ice: Fix interrupt moderation settings getting cleared
clk: at91: generated: consider range when calculating best rate
net/qla3xxx: Fix a test in ql_reset_work()
NFC: nci: fix sleep in atomic context bugs caused by nci_skb_alloc
net/mlx5: DR, Fix missing flow_source when creating multi-destination FW table
net/mlx5e: Properly block LRO when XDP is enabled
net: af_key: add check for pfkey_broadcast in function pfkey_process
ARM: 9196/1: spectre-bhb: enable for Cortex-A15
ARM: 9197/1: spectre-bhb: fix loop8 sequence for Thumb2
mptcp: change the parameter of __mptcp_make_csum
mptcp: reuse __mptcp_make_csum in validate_data_csum
mptcp: fix checksum byte order
igb: skip phy status check where unavailable
netfilter: flowtable: fix TCP flow teardown
netfilter: flowtable: pass flowtable to nf_flow_table_iterate()
netfilter: flowtable: move dst_check to packet path
net: bridge: Clear offload_fwd_mark when passing frame up bridge interface.
riscv: dts: sifive: fu540-c000: align dma node name with dtschema
scsi: ufs: core: Fix referencing invalid rsp field
perf build: Fix check for btf__load_from_kernel_by_id() in libbpf
gpio: gpio-vf610: do not touch other bits when set the target bit
gpio: mvebu/pwm: Refuse requests with inverted polarity
perf regs x86: Fix arch__intr_reg_mask() for the hybrid platform
perf bench numa: Address compiler error on s390
scsi: scsi_dh_alua: Properly handle the ALUA transitioning state
scsi: qla2xxx: Fix missed DMA unmap for aborted commands
mac80211: fix rx reordering with non explicit / psmp ack policy
nl80211: validate S1G channel width
selftests: add ping test with ping_group_range tuned
Revert "fbdev: Make fb_release() return -ENODEV if fbdev was unregistered"
fbdev: Prevent possible use-after-free in fb_release()
net: fix wrong network header length
nl80211: fix locking in nl80211_set_tx_bitrate_mask()
ethernet: tulip: fix missing pci_disable_device() on error in tulip_init_one()
net: stmmac: fix missing pci_disable_device() on error in stmmac_pci_probe()
net: atlantic: fix "frag[0] not initialized"
net: atlantic: reduce scope of is_rsc_complete
net: atlantic: add check for MAX_SKB_FRAGS
net: atlantic: verify hw_head_ lies within TX buffer ring
arm64: Enable repeat tlbi workaround on KRYO4XX gold CPUs
Input: ili210x - fix reset timing
dt-bindings: pinctrl: aspeed-g6: remove FWQSPID group
mt76: mt7921e: fix possible probe failure after reboot
lockdown: also lock down previous kgdb use
i2c: mt7621: fix missing clk_disable_unprepare() on error in mtk_i2c_probe()
afs: Fix afs_getattr() to refetch file status if callback break occurred
Linux 5.15.42
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Id86177ba790bc19748595e22e6b9d3f95d7f00f6
[ Upstream commit e897be17a441fa637cd166fc3de1445131e57692 ]
Patch series "nilfs2 lockdep warning fixes".
The first two are to resolve the lockdep warning issue, and the last one
is the accompanying cleanup and low priority.
Based on your comment, this series solves the issue by separating inode
object as needed. Since I was worried about the impact of the object
composition changes, I tested the series carefully not to cause
regressions especially for delicate functions such like disk space
reclamation and snapshots.
This patch (of 3):
If CONFIG_LOCKDEP is enabled, nilfs2 hits lockdep warnings at
inode_to_wb() during page/folio operations for btree nodes:
WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 inode_to_wb include/linux/backing-dev.h:269 [inline]
WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 folio_account_dirtied mm/page-writeback.c:2460 [inline]
WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 __folio_mark_dirty+0xa7c/0xe30 mm/page-writeback.c:2509
Modules linked in:
...
RIP: 0010:inode_to_wb include/linux/backing-dev.h:269 [inline]
RIP: 0010:folio_account_dirtied mm/page-writeback.c:2460 [inline]
RIP: 0010:__folio_mark_dirty+0xa7c/0xe30 mm/page-writeback.c:2509
...
Call Trace:
__set_page_dirty include/linux/pagemap.h:834 [inline]
mark_buffer_dirty+0x4e6/0x650 fs/buffer.c:1145
nilfs_btree_propagate_p fs/nilfs2/btree.c:1889 [inline]
nilfs_btree_propagate+0x4ae/0xea0 fs/nilfs2/btree.c:2085
nilfs_bmap_propagate+0x73/0x170 fs/nilfs2/bmap.c:337
nilfs_collect_dat_data+0x45/0xd0 fs/nilfs2/segment.c:625
nilfs_segctor_apply_buffers+0x14a/0x470 fs/nilfs2/segment.c:1009
nilfs_segctor_scan_file+0x47a/0x700 fs/nilfs2/segment.c:1048
nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1224 [inline]
nilfs_segctor_collect fs/nilfs2/segment.c:1494 [inline]
nilfs_segctor_do_construct+0x14f3/0x6c60 fs/nilfs2/segment.c:2036
nilfs_segctor_construct+0x7a7/0xb30 fs/nilfs2/segment.c:2372
nilfs_segctor_thread_construct fs/nilfs2/segment.c:2480 [inline]
nilfs_segctor_thread+0x3c3/0xf90 fs/nilfs2/segment.c:2563
kthread+0x405/0x4f0 kernel/kthread.c:327
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
This is because nilfs2 uses two page caches for each inode and
inode->i_mapping never points to one of them, the btree node cache.
This causes inode_to_wb(inode) to refer to a different page cache than
the caller page/folio operations such like __folio_start_writeback(),
__folio_end_writeback(), or __folio_mark_dirty() acquired the lock.
This patch resolves the issue by allocating and using an additional
inode to hold the page cache of btree nodes. The inode is attached
one-to-one to the traditional nilfs2 inode if it requires a block
mapping with b-tree. This setup change is in memory only and does not
affect the disk format.
Link: https://lkml.kernel.org/r/1647867427-30498-1-git-send-email-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/1647867427-30498-2-git-send-email-konishi.ryusuke@gmail.com
Link: https://lore.kernel.org/r/YXrYvIo8YRnAOJCj@casper.infradead.org
Link: https://lore.kernel.org/r/9a20b33d-b38f-b4a2-4742-c1eb5b8e4d6c@redhat.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+0d5b462a6f07447991b3@syzkaller.appspotmail.com
Reported-by: syzbot+34ef28bb2aeb28724aa0@syzkaller.appspotmail.com
Reported-by: Hao Sun <sunhao.th@gmail.com>
Reported-by: David Hildenbrand <david@redhat.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
We have namespaces, so use them for all vfs-exported namespaces so that
filesystems can use them, but not anything else.
Some in-kernel drivers that do direct filesystem accesses (because they
serve up files) are also allowed access to these symbols to keep 'make
allmodconfig' builds working properly, but it is not needed for Android
kernel images.
Bug: 157965270
Bug: 210074446
Cc: Matthias Maennich <maennich@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iaf6140baf3a18a516ab2d5c3966235c42f3f70de
The superblock and segment timestamps are used only internally in nilfs2
and can be read out using sysfs.
Since we are using the old 'get_seconds()' interface and store the data
as timestamps, the behavior differs slightly between 64-bit and 32-bit
kernels, the latter will show incorrect timestamps after 2038 in sysfs,
and presumably fail completely in 2106 as comparisons go wrong.
This changes nilfs2 to use time64_t with ktime_get_real_seconds() to
handle timestamps, making the behavior consistent and correct on both
32-bit and 64-bit machines.
The on-disk format already uses 64-bit timestamps, so nothing changes
there.
Link: http://lkml.kernel.org/r/20180122211050.1286441-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a pure automated search-and-replace of the internal kernel
superblock flags.
The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.
Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.
The script to do this was:
# places to look in; re security/*: it generally should *not* be
# touched (that stuff parses mount(2) arguments directly), but
# there are two places where we really deal with superblock flags.
FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
include/linux/fs.h include/uapi/linux/bfs_fs.h \
security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
# the list of MS_... constants
SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
ACTIVE NOUSER"
SED_PROG=
for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done
# we want files that contain at least one of MS_...,
# with fs/namespace.c and fs/pnode.c excluded.
L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')
for f in $L; do sed -i $f $SED_PROG; done
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Firstly by applying the following with coccinelle's spatch:
@@ expression SB; @@
-SB->s_flags & MS_RDONLY
+sb_rdonly(SB)
to effect the conversion to sb_rdonly(sb), then by applying:
@@ expression A, SB; @@
(
-(!sb_rdonly(SB)) && A
+!sb_rdonly(SB) && A
|
-A != (sb_rdonly(SB))
+A != sb_rdonly(SB)
|
-A == (sb_rdonly(SB))
+A == sb_rdonly(SB)
|
-!(sb_rdonly(SB))
+!sb_rdonly(SB)
|
-A && (sb_rdonly(SB))
+A && sb_rdonly(SB)
|
-A || (sb_rdonly(SB))
+A || sb_rdonly(SB)
|
-(sb_rdonly(SB)) != A
+sb_rdonly(SB) != A
|
-(sb_rdonly(SB)) == A
+sb_rdonly(SB) == A
|
-(sb_rdonly(SB)) && A
+sb_rdonly(SB) && A
|
-(sb_rdonly(SB)) || A
+sb_rdonly(SB) || A
)
@@ expression A, B, SB; @@
(
-(sb_rdonly(SB)) ? 1 : 0
+sb_rdonly(SB)
|
-(sb_rdonly(SB)) ? A : B
+sb_rdonly(SB) ? A : B
)
to remove left over excess bracketage and finally by applying:
@@ expression A, SB; @@
(
-(A & MS_RDONLY) != sb_rdonly(SB)
+(bool)(A & MS_RDONLY) != sb_rdonly(SB)
|
-(A & MS_RDONLY) == sb_rdonly(SB)
+(bool)(A & MS_RDONLY) == sb_rdonly(SB)
)
to make comparisons against the result of sb_rdonly() (which is a bool)
work correctly.
Signed-off-by: David Howells <dhowells@redhat.com>
Now that all bdi structures filesystems use are properly refcounted, we
can remove the SB_I_DYNBDI flag.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Similarly to set_bdev_super() NILFS2 just used block device reference to
bdi. Convert it to properly getting bdi reference. The reference will
get automatically dropped on superblock destruction.
CC: linux-nilfs@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <axboe@fb.com>
We will want to have struct backing_dev_info allocated separately from
struct request_queue. As the first step add pointer to backing_dev_info
to request_queue and convert all users touching it. No functional
changes in this patch.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Remove the WRITE_* and READ_SYNC wrappers, and just use the flags
directly. Where applicable this also drops usage of the
bio_set_op_attrs wrapper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Replace most use of printk() in nilfs2 implementation with nilfs_msg(),
and reduce the following checkpatch.pl warning:
"WARNING: Prefer [subsystem eg: netdev]_crit([subsystem]dev, ...
then dev_crit(dev, ... then pr_crit(... to printk(KERN_CRIT ..."
This patch also fixes a minor checkpatch warning "WARNING: quoted string
split across lines" that often accompanies the prior warning, and amends
message format as needed.
Link: http://lkml.kernel.org/r/1464875891-5443-5-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Define an own output routine to replace bare use of printk() function.
The output routine is implemented with a macro and a helper function,
which are named nilfs_msg() and __nilfs_msg(), respectively.
__nilfs_msg() formats a message like "NILFS (<device-name>): <message>",
prefixing it with a given log level, and terminates the statement with a
newline. The "device-name" is optional to make it available in early
stages; it will be omitted if a NULL pointer is passed to super block
instance argument. nilfs_msg() wraps __nilfs_msg() and is removed if
CONFIG_PRINTK is not set.
Link: http://lkml.kernel.org/r/1464875891-5443-3-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Simplify nilfs_error(), an output function used to report critical
issues in file system. This renames the original nilfs_error() function
to __nilfs_error() and redefines it as a macro to hide its function name
argument within the macro.
Every call site of nilfs_error() is changed to strip __func__ argument
except nilfs_bmap_convert_error(); nilfs_bmap_convert_error() directly
calls __nilfs_error() because it inherits caller's function name.
Link: http://lkml.kernel.org/r/1464875891-5443-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark those kmem allocations that are known to be easily triggered from
userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
memcg. For the list, see below:
- threadinfo
- task_struct
- task_delay_info
- pid
- cred
- mm_struct
- vm_area_struct and vm_region (nommu)
- anon_vma and anon_vma_chain
- signal_struct
- sighand_struct
- fs_struct
- files_struct
- fdtable and fdtable->full_fds_bits
- dentry and external_name
- inode for all filesystems. This is the most tedious part, because
most filesystems overwrite the alloc_inode method.
The list is far from complete, so feel free to add more objects.
Nevertheless, it should be close to "account everything" approach and
keep most workloads within bounds. Malevolent users will be able to
breach the limit, but this was possible even with the former "account
everything" approach (simply because it did not account everything in
fact).
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some false positive warnings are reported for powerpc build.
The following warnings are reported in
http://kisskb.ellerman.id.au/kisskb/buildresult/12519703/
CC fs/nilfs2/super.o
fs/nilfs2/super.c: In function 'nilfs_resize_fs':
fs/nilfs2/super.c:376:2: warning: 'blocknr' may be used uninitialized in this function [-Wuninitialized]
fs/nilfs2/super.c:362:11: note: 'blocknr' was declared here
CC fs/nilfs2/recovery.o
fs/nilfs2/recovery.c: In function 'nilfs_salvage_orphan_logs':
fs/nilfs2/recovery.c:631:21: warning: 'sum' may be used uninitialized in this function [-Wuninitialized]
fs/nilfs2/recovery.c:585:32: note: 'sum' was declared here
fs/nilfs2/recovery.c: In function 'nilfs_search_super_root':
fs/nilfs2/recovery.c:873:11: warning: 'sum' may be used uninitialized in this function [-Wuninitialized]
Another similar warning is reported in
http://kisskb.ellerman.id.au/kisskb/buildresult/12520079/
CC fs/nilfs2/btree.o
fs/nilfs2/btree.c: In function 'nilfs_btree_convert_and_insert':
include/asm-generic/bitops/non-atomic.h:105:20: warning: 'bh' may be used uninitialized in this function [-Wuninitialized]
fs/nilfs2/btree.c:1859:22: note: 'bh' was declared here
This cleans out these warnings by forcing the variables to be initialized.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull fourth vfs update from Al Viro:
"d_inode() annotations from David Howells (sat in for-next since before
the beginning of merge window) + four assorted fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
RCU pathwalk breakage when running into a symlink overmounting something
fix I_DIO_WAKEUP definition
direct-io: only inc/dec inode->i_dio_count for file systems
fs/9p: fix readdir()
VFS: assorted d_backing_inode() annotations
VFS: fs/inode.c helpers: d_inode() annotations
VFS: fs/cachefiles: d_backing_inode() annotations
VFS: fs library helpers: d_inode() annotations
VFS: assorted weird filesystems: d_inode() annotations
VFS: normal filesystems (and lustre): d_inode() annotations
VFS: security/: d_inode() annotations
VFS: security/: d_backing_inode() annotations
VFS: net/: d_inode() annotations
VFS: net/unix: d_backing_inode() annotations
VFS: kernel/: d_inode() annotations
VFS: audit: d_backing_inode() annotations
VFS: Fix up some ->d_inode accesses in the chelsio driver
VFS: Cachefiles should perform fs modifications on the top layer only
VFS: AF_UNIX sockets should call mknod on the top layer only
Fix the following build warning:
fs/nilfs2/super.c: In function 'nilfs_checkpoint_is_mounted':
fs/nilfs2/super.c:1023:10: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
if (cno < 0 || cno > nilfs->ns_cno)
^
This warning indicates that the comparision "cno < 0" is useless because
variable "cno" has an unsigned integer type "__u64".
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now that we never use the backing_dev_info pointer in struct address_space
we can simply remove it and save 4 to 8 bytes in every inode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reviewed-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Under normal circumstances nilfs_sync_fs() writes out the super block,
which causes a flush of the underlying block device. But this depends
on the THE_NILFS_SB_DIRTY flag, which is only set if the pointer to the
last segment crosses a segment boundary. So if only a small amount of
data is written before the call to nilfs_sync_fs(), no flush of the
block device occurs.
In the above case an additional call to blkdev_issue_flush() is needed.
To prevent unnecessary overhead, the new flag nilfs->ns_flushed_device
is introduced, which is cleared whenever new logs are written and set
whenever the block device is flushed. For convenience the function
nilfs_flush_device() is added, which contains the above logic.
Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs updates from Al Viro:
"Stuff in here:
- acct.c fixes and general rework of mnt_pin mechanism. That allows
to go for delayed-mntput stuff, which will permit mntput() on deep
stack without worrying about stack overflows - fs shutdown will
happen on shallow stack. IOW, we can do Eric's umount-on-rmdir
series without introducing tons of stack overflows on new mntput()
call chains it introduces.
- Bruce's d_splice_alias() patches
- more Miklos' rename() stuff.
- a couple of regression fixes (stable fodder, in the end of branch)
and a fix for API idiocy in iov_iter.c.
There definitely will be another pile, maybe even two. I'd like to
get Eric's series in this time, but even if we miss it, it'll go right
in the beginning of for-next in the next cycle - the tricky part of
prereqs is in this pile"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits)
fix copy_tree() regression
__generic_file_write_iter(): fix handling of sync error after DIO
switch iov_iter_get_pages() to passing maximal number of pages
fs: mark __d_obtain_alias static
dcache: d_splice_alias should detect loops
exportfs: update Exporting documentation
dcache: d_find_alias needn't recheck IS_ROOT && DCACHE_DISCONNECTED
dcache: remove unused d_find_alias parameter
dcache: d_obtain_alias callers don't all want DISCONNECTED
dcache: d_splice_alias should ignore DCACHE_DISCONNECTED
dcache: d_splice_alias mustn't create directory aliases
dcache: close d_move race in d_splice_alias
dcache: move d_splice_alias
namei: trivial fix to vfs_rename_dir comment
VFS: allow ->d_manage() to declare -EISDIR in rcu_walk mode.
cifs: support RENAME_NOREPLACE
hostfs: support rename flags
shmem: support RENAME_EXCHANGE
shmem: support RENAME_NOREPLACE
btrfs: add RENAME_NOREPLACE
...
This patch integrates creation of sysfs groups and
attributes into NILFS file system driver.
It was found the issue with nilfs_sysfs_{create/delete}_snapshot_group
functions by Michael L Semon <mlsemon35@gmail.com> in the first
version of the patch:
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:579
in_atomic(): 1, irqs_disabled(): 0, pid: 32676, name: umount.nilfs2
2 locks held by umount.nilfs2/32676:
#0: (&type->s_umount_key#21){++++..}, at: [<790c18e2>] deactivate_super+0x37/0x58
#1: (&(&nilfs->ns_cptree_lock)->rlock){+.+...}, at: [<791bf659>] nilfs_put_root+0x23/0x5a
Preemption disabled at:[<791bf659>] nilfs_put_root+0x23/0x5a
CPU: 0 PID: 32676 Comm: umount.nilfs2 Not tainted 3.14.0+ #2
Hardware name: Dell Computer Corporation Dimension 2350/07W080, BIOS A01 12/17/2002
Call Trace:
dump_stack+0x4b/0x75
__might_sleep+0x111/0x16f
mutex_lock_nested+0x1e/0x3ad
kernfs_remove+0x12/0x26
sysfs_remove_dir+0x3d/0x62
kobject_del+0x13/0x38
nilfs_sysfs_delete_snapshot_group+0xb/0xd
nilfs_put_root+0x2a/0x5a
nilfs_detach_log_writer+0x1ab/0x2c1
nilfs_put_super+0x13/0x68
generic_shutdown_super+0x60/0xd1
kill_block_super+0x1d/0x60
deactivate_locked_super+0x22/0x3f
deactivate_super+0x3e/0x58
mntput_no_expire+0xe2/0x141
SyS_oldumount+0x70/0xa5
syscall_call+0x7/0xb
The reason of the issue was placement of
nilfs_sysfs_{create/delete}_snapshot_group() call under
nilfs->ns_cptree_lock protection. But this protection is unnecessary and
wrong solution. The second version of the patch fixes this issue.
[fengguang.wu@intel.com: nilfs_sysfs_create_mounted_snapshots_group can be static]
Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Vyacheslav Dubeyko <Vyacheslav.Dubeyko@hgst.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are a few d_obtain_alias callers that are using it to get the
root of a filesystem which may already have an alias somewhere else.
This is not the same as the filehandle-lookup case, and none of them
actually need DCACHE_DISCONNECTED set.
It isn't really a serious problem, but it would really be clearer if we
reserved DCACHE_DISCONNECTED for those cases where it's actually needed.
In the btrfs case this was causing a spurious printk from
nfsd/nfsfh.c:fh_verify when it found an unexpected DCACHE_DISCONNECTED
dentry. Josef worked around this by unsetting DCACHE_DISCONNECTED
manually in 3a0dfa6a12 "Btrfs: unset DCACHE_DISCONNECTED when mounting
default subvol", and this replaces that workaround.
Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The cp_inodes_count and cp_blocks_count are represented as __le64 type in
on-disk structure (struct nilfs_checkpoint). But analogous fields in
in-core structure (struct nilfs_root) are represented by atomic_t type.
This patch replaces atomic_t on atomic64_t type in representation of
inodes_count and blocks_count fields in struct nilfs_root.
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: Joern Engel <joern@logfs.org>
Cc: Clemens Eisserer <linuxhippy@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>