Updates the branch to the 5.10.26 upstream kernel version.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I84aa29bf4e4e809051eb346830c4c4b5acb78c8c
Try to mitigate potential future driver core api changes by adding a
pointer to struct signal_struct, struct sched_entity, struct
sched_rt_entity, and struct task_struct.
Based on a patch from Michal Marek <mmarek@suse.cz> from the SLES kernel
Leaf changes summary: 3 artifacts changed
Changed leaf types summary: 3 leaf types changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 0 Added function
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable
'struct sched_entity at sched.h:444:1' changed:
type size changed from 3584 to 4096 (in bits)
4 data member insertions:
'u64 sched_entity::android_kabi_reserved1', at offset 3584 (in bits) at sched.h:481:1
'u64 sched_entity::android_kabi_reserved2', at offset 3648 (in bits) at sched.h:482:1
'u64 sched_entity::android_kabi_reserved3', at offset 3712 (in bits) at sched.h:483:1
'u64 sched_entity::android_kabi_reserved4', at offset 3776 (in bits) at sched.h:484:1
1435 impacted interfaces:
'struct sched_rt_entity at sched.h:481:1' changed:
type size changed from 384 to 640 (in bits)
4 data member insertions:
'u64 sched_rt_entity::android_kabi_reserved1', at offset 384 (in bits) at sched.h:504:1
'u64 sched_rt_entity::android_kabi_reserved2', at offset 448 (in bits) at sched.h:505:1
'u64 sched_rt_entity::android_kabi_reserved3', at offset 512 (in bits) at sched.h:506:1
'u64 sched_rt_entity::android_kabi_reserved4', at offset 576 (in bits) at sched.h:507:1
1435 impacted interfaces:
'struct task_struct at sched.h:624:1' changed:
type size changed from 28672 to 30208 (in bits)
8 data member insertions:
'u64 task_struct::android_kabi_reserved1', at offset 20992 (in bits) at sched.h:1294:1
'u64 task_struct::android_kabi_reserved2', at offset 21056 (in bits) at sched.h:1295:1
'u64 task_struct::android_kabi_reserved3', at offset 21120 (in bits) at sched.h:1296:1
'u64 task_struct::android_kabi_reserved4', at offset 21184 (in bits) at sched.h:1297:1
'u64 task_struct::android_kabi_reserved5', at offset 21248 (in bits) at sched.h:1298:1
'u64 task_struct::android_kabi_reserved6', at offset 21312 (in bits) at sched.h:1299:1
'u64 task_struct::android_kabi_reserved7', at offset 21376 (in bits) at sched.h:1300:1
'u64 task_struct::android_kabi_reserved8', at offset 21440 (in bits) at sched.h:1301:1
there are data member changes:
1435 impacted interfaces:
Bug: 151154716
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I1449735b836399e9b356608727092334daf5c36b
Changes in 5.10.24
uapi: nfnetlink_cthelper.h: fix userspace compilation error
powerpc/perf: Fix handling of privilege level checks in perf interrupt context
powerpc/pseries: Don't enforce MSI affinity with kdump
ethernet: alx: fix order of calls on resume
crypto: mips/poly1305 - enable for all MIPS processors
ath9k: fix transmitting to stations in dynamic SMPS mode
net: Fix gro aggregation for udp encaps with zero csum
net: check if protocol extracted by virtio_net_hdr_set_proto is correct
net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0
net: l2tp: reduce log level of messages in receive path, add counter instead
can: skb: can_skb_set_owner(): fix ref counting if socket was closed before setting skb ownership
can: flexcan: assert FRZ bit in flexcan_chip_freeze()
can: flexcan: enable RX FIFO after FRZ/HALT valid
can: flexcan: invoke flexcan_chip_freeze() to enter freeze mode
can: tcan4x5x: tcan4x5x_init(): fix initialization - clear MRAM before entering Normal Mode
tcp: Fix sign comparison bug in getsockopt(TCP_ZEROCOPY_RECEIVE)
tcp: add sanity tests to TCP_QUEUE_SEQ
netfilter: nf_nat: undo erroneous tcp edemux lookup
netfilter: x_tables: gpf inside xt_find_revision()
net: always use icmp{,v6}_ndo_send from ndo_start_xmit
net: phy: fix save wrong speed and duplex problem if autoneg is on
selftests/bpf: Use the last page in test_snprintf_btf on s390
selftests/bpf: No need to drop the packet when there is no geneve opt
selftests/bpf: Mask bpf_csum_diff() return value to 16 bits in test_verifier
samples, bpf: Add missing munmap in xdpsock
libbpf: Clear map_info before each bpf_obj_get_info_by_fd
ibmvnic: Fix possibly uninitialized old_num_tx_queues variable warning.
ibmvnic: always store valid MAC address
mt76: dma: do not report truncated frames to mac80211
powerpc/603: Fix protection of user pages mapped with PROT_NONE
mount: fix mounting of detached mounts onto targets that reside on shared mounts
cifs: return proper error code in statfs(2)
Revert "mm, slub: consider rest of partial list if acquire_slab() fails"
docs: networking: drop special stable handling
net: dsa: tag_rtl4_a: fix egress tags
sh_eth: fix TRSCER mask for SH771x
net: enetc: don't overwrite the RSS indirection table when initializing
net: enetc: take the MDIO lock only once per NAPI poll cycle
net: enetc: fix incorrect TPID when receiving 802.1ad tagged packets
net: enetc: don't disable VLAN filtering in IFF_PROMISC mode
net: enetc: force the RGMII speed and duplex instead of operating in inband mode
net: enetc: remove bogus write to SIRXIDR from enetc_setup_rxbdr
net: enetc: keep RX ring consumer index in sync with hardware
net: ethernet: mtk-star-emac: fix wrong unmap in RX handling
net/mlx4_en: update moderation when config reset
net: stmmac: fix incorrect DMA channel intr enable setting of EQoS v4.10
nexthop: Do not flush blackhole nexthops when loopback goes down
net: sched: avoid duplicates in classes dump
net: mscc: ocelot: properly reject destination IP keys in VCAP IS1
net: dsa: sja1105: fix SGMII PCS being forced to SPEED_UNKNOWN instead of SPEED_10
net: usb: qmi_wwan: allow qmimux add/del with master up
netdevsim: init u64 stats for 32bit hardware
cipso,calipso: resolve a number of problems with the DOI refcounts
net: stmmac: Fix VLAN filter delete timeout issue in Intel mGBE SGMII
stmmac: intel: Fixes clock registration error seen for multiple interfaces
net: lapbether: Remove netif_start_queue / netif_stop_queue
net: davicom: Fix regulator not turned off on failed probe
net: davicom: Fix regulator not turned off on driver removal
net: enetc: allow hardware timestamping on TX queues with tc-etf enabled
net: qrtr: fix error return code of qrtr_sendmsg()
s390/qeth: fix memory leak after failed TX Buffer allocation
r8169: fix r8168fp_adjust_ocp_cmd function
ixgbe: fail to create xfrm offload of IPsec tunnel mode SA
tools/resolve_btfids: Fix build error with older host toolchains
perf build: Fix ccache usage in $(CC) when generating arch errno table
net: stmmac: stop each tx channel independently
net: stmmac: fix watchdog timeout during suspend/resume stress test
net: stmmac: fix wrongly set buffer2 valid when sph unsupport
ethtool: fix the check logic of at least one channel for RX/TX
net: phy: make mdio_bus_phy_suspend/resume as __maybe_unused
selftests: forwarding: Fix race condition in mirror installation
mlxsw: spectrum_ethtool: Add an external speed to PTYS register
perf traceevent: Ensure read cmdlines are null terminated.
perf report: Fix -F for branch & mem modes
net: hns3: fix query vlan mask value error for flow director
net: hns3: fix bug when calculating the TCAM table info
s390/cio: return -EFAULT if copy_to_user() fails again
bnxt_en: reliably allocate IRQ table on reset to avoid crash
gpiolib: acpi: Add ACPI_GPIO_QUIRK_ABSOLUTE_NUMBER quirk
gpiolib: acpi: Allow to find GpioInt() resource by name and index
gpio: pca953x: Set IRQ type when handle Intel Galileo Gen 2
gpio: fix gpio-device list corruption
drm/compat: Clear bounce structures
drm/amd/display: Add a backlight module option
drm/amdgpu/display: use GFP_ATOMIC in dcn21_validate_bandwidth_fp()
drm/amd/display: Fix nested FPU context in dcn21_validate_bandwidth()
drm/amd/pm: bug fix for pcie dpm
drm/amdgpu/display: simplify backlight setting
drm/amdgpu/display: don't assert in set backlight function
drm/amdgpu/display: handle aux backlight in backlight_get_brightness
drm/shmem-helper: Check for purged buffers in fault handler
drm/shmem-helper: Don't remove the offset in vm_area_struct pgoff
drm: Use USB controller's DMA mask when importing dmabufs
drm: meson_drv add shutdown function
drm/shmem-helpers: vunmap: Don't put pages for dma-buf
drm/i915: Wedge the GPU if command parser setup fails
s390/cio: return -EFAULT if copy_to_user() fails
s390/crypto: return -EFAULT if copy_to_user() fails
qxl: Fix uninitialised struct field head.surface_id
sh_eth: fix TRSCER mask for R7S9210
media: usbtv: Fix deadlock on suspend
media: rkisp1: params: fix wrong bits settings
media: v4l: vsp1: Fix uif null pointer access
media: v4l: vsp1: Fix bru null pointer access
media: rc: compile rc-cec.c into rc-core
cifs: fix credit accounting for extra channel
net: hns3: fix error mask definition of flow director
s390/qeth: don't replace a fully completed async TX buffer
s390/qeth: remove QETH_QDIO_BUF_HANDLED_DELAYED state
s390/qeth: improve completion of pending TX buffers
s390/qeth: fix notification for pending buffers during teardown
net: dsa: implement a central TX reallocation procedure
net: dsa: tag_ksz: don't allocate additional memory for padding/tagging
net: dsa: trailer: don't allocate additional memory for padding/tagging
net: dsa: tag_qca: let DSA core deal with TX reallocation
net: dsa: tag_ocelot: let DSA core deal with TX reallocation
net: dsa: tag_mtk: let DSA core deal with TX reallocation
net: dsa: tag_lan9303: let DSA core deal with TX reallocation
net: dsa: tag_edsa: let DSA core deal with TX reallocation
net: dsa: tag_brcm: let DSA core deal with TX reallocation
net: dsa: tag_dsa: let DSA core deal with TX reallocation
net: dsa: tag_gswip: let DSA core deal with TX reallocation
net: dsa: tag_ar9331: let DSA core deal with TX reallocation
net: dsa: tag_mtk: fix 802.1ad VLAN egress
enetc: Fix unused var build warning for CONFIG_OF
net: enetc: initialize RFS/RSS memories for unused ports too
ath11k: peer delete synchronization with firmware
ath11k: start vdev if a bss peer is already created
ath11k: fix AP mode for QCA6390
i2c: rcar: faster irq code to minimize HW race condition
i2c: rcar: optimize cacheline to minimize HW race condition
scsi: ufs: WB is only available on LUN #0 to #7
udf: fix silent AED tagLocation corruption
iommu/vt-d: Clear PRQ overflow only when PRQ is empty
mmc: mxs-mmc: Fix a resource leak in an error handling path in 'mxs_mmc_probe()'
mmc: mediatek: fix race condition between msdc_request_timeout and irq
mmc: sdhci-iproc: Add ACPI bindings for the RPi
Platform: OLPC: Fix probe error handling
powerpc/pci: Add ppc_md.discover_phbs()
spi: stm32: make spurious and overrun interrupts visible
powerpc: improve handling of unrecoverable system reset
powerpc/perf: Record counter overflow always if SAMPLE_IP is unset
HID: logitech-dj: add support for the new lightspeed connection iteration
powerpc/64: Fix stack trace not displaying final frame
iommu/amd: Fix performance counter initialization
clk: qcom: gdsc: Implement NO_RET_PERIPH flag
sparc32: Limit memblock allocation to low memory
sparc64: Use arch_validate_flags() to validate ADI flag
Input: applespi - don't wait for responses to commands indefinitely.
PCI: xgene-msi: Fix race in installing chained irq handler
PCI: mediatek: Add missing of_node_put() to fix reference leak
drivers/base: build kunit tests without structleak plugin
PCI/LINK: Remove bandwidth notification
ext4: don't try to processed freed blocks until mballoc is initialized
kbuild: clamp SUBLEVEL to 255
PCI: Fix pci_register_io_range() memory leak
i40e: Fix memory leak in i40e_probe
kasan: fix memory corruption in kasan_bitops_tags test
s390/smp: __smp_rescan_cpus() - move cpumask away from stack
drivers/base/memory: don't store phys_device in memory blocks
sysctl.c: fix underflow value setting risk in vm_table
scsi: libiscsi: Fix iscsi_prep_scsi_cmd_pdu() error handling
scsi: target: core: Add cmd length set before cmd complete
scsi: target: core: Prevent underflow for service actions
clk: qcom: gpucc-msm8998: Add resets, cxc, fix flags on gpu_gx_gdsc
mmc: sdhci: Update firmware interface API
ARM: 9029/1: Make iwmmxt.S support Clang's integrated assembler
ARM: assembler: introduce adr_l, ldr_l and str_l macros
ARM: efistub: replace adrl pseudo-op with adr_l macro invocation
ALSA: usb: Add Plantronics C320-M USB ctrl msg delay quirk
ALSA: hda/hdmi: Cancel pending works before suspend
ALSA: hda/conexant: Add quirk for mute LED control on HP ZBook G5
ALSA: hda/ca0132: Add Sound BlasterX AE-5 Plus support
ALSA: hda: Drop the BATCH workaround for AMD controllers
ALSA: hda: Flush pending unsolicited events before suspend
ALSA: hda: Avoid spurious unsol event handling during S3/S4
ALSA: usb-audio: Fix "cannot get freq eq" errors on Dell AE515 sound bar
ALSA: usb-audio: Apply the control quirk to Plantronics headsets
ALSA: usb-audio: Disable USB autosuspend properly in setup_disable_autosuspend()
ALSA: usb-audio: fix NULL ptr dereference in usb_audio_probe
ALSA: usb-audio: fix use after free in usb_audio_disconnect
Revert 95ebabde38 ("capabilities: Don't allow writing ambiguous v3 file capabilities")
block: Discard page cache of zone reset target range
block: Try to handle busy underlying device on discard
arm64: kasan: fix page_alloc tagging with DEBUG_VIRTUAL
arm64: mte: Map hotplugged memory as Normal Tagged
arm64: perf: Fix 64-bit event counter read truncation
s390/dasd: fix hanging DASD driver unbind
s390/dasd: fix hanging IO request during DASD driver unbind
software node: Fix node registration
xen/events: reset affinity of 2-level event when tearing it down
mmc: mmci: Add MMC_CAP_NEED_RSP_BUSY for the stm32 variants
mmc: core: Fix partition switch time for eMMC
mmc: cqhci: Fix random crash when remove mmc module/card
cifs: do not send close in compound create+close requests
Goodix Fingerprint device is not a modem
USB: gadget: udc: s3c2410_udc: fix return value check in s3c2410_udc_probe()
USB: gadget: u_ether: Fix a configfs return code
usb: gadget: f_uac2: always increase endpoint max_packet_size by one audio slot
usb: gadget: f_uac1: stop playback on function disable
usb: dwc3: qcom: Add missing DWC3 OF node refcount decrement
usb: dwc3: qcom: add URS Host support for sdm845 ACPI boot
usb: dwc3: qcom: add ACPI device id for sc8180x
usb: dwc3: qcom: Honor wakeup enabled/disabled state
USB: usblp: fix a hang in poll() if disconnected
usb: renesas_usbhs: Clear PIPECFG for re-enabling pipe with other EPNUM
usb: xhci: do not perform Soft Retry for some xHCI hosts
xhci: Improve detection of device initiated wake signal.
usb: xhci: Fix ASMedia ASM1042A and ASM3242 DMA addressing
xhci: Fix repeated xhci wake after suspend due to uncleared internal wake state
USB: serial: io_edgeport: fix memory leak in edge_startup
USB: serial: ch341: add new Product ID
USB: serial: cp210x: add ID for Acuity Brands nLight Air Adapter
USB: serial: cp210x: add some more GE USB IDs
usbip: fix stub_dev to check for stream socket
usbip: fix vhci_hcd to check for stream socket
usbip: fix vudc to check for stream socket
usbip: fix stub_dev usbip_sockfd_store() races leading to gpf
usbip: fix vhci_hcd attach_store() races leading to gpf
usbip: fix vudc usbip_sockfd_store races leading to gpf
Revert "serial: max310x: rework RX interrupt handling"
misc/pvpanic: Export module FDT device table
misc: fastrpc: restrict user apps from sending kernel RPC messages
staging: rtl8192u: fix ->ssid overflow in r8192_wx_set_scan()
staging: rtl8188eu: prevent ->ssid overflow in rtw_wx_set_scan()
staging: rtl8712: unterminated string leads to read overflow
staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data()
staging: ks7010: prevent buffer overflow in ks_wlan_set_scan()
staging: rtl8712: Fix possible buffer overflow in r8712_sitesurvey_cmd
staging: rtl8192e: Fix possible buffer overflow in _rtl92e_wx_set_scan
staging: comedi: addi_apci_1032: Fix endian problem for COS sample
staging: comedi: addi_apci_1500: Fix endian problem for command sample
staging: comedi: adv_pci1710: Fix endian problem for AI command data
staging: comedi: das6402: Fix endian problem for AI command data
staging: comedi: das800: Fix endian problem for AI command data
staging: comedi: dmm32at: Fix endian problem for AI command data
staging: comedi: me4000: Fix endian problem for AI command data
staging: comedi: pcl711: Fix endian problem for AI command data
staging: comedi: pcl818: Fix endian problem for AI command data
sh_eth: fix TRSCER mask for R7S72100
cpufreq: qcom-hw: fix dereferencing freed memory 'data'
cpufreq: qcom-hw: Fix return value check in qcom_cpufreq_hw_cpu_init()
arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
SUNRPC: Set memalloc_nofs_save() for sync tasks
NFS: Don't revalidate the directory permissions on a lookup failure
NFS: Don't gratuitously clear the inode cache when lookup failed
NFSv4.2: fix return value of _nfs4_get_security_label()
block: rsxx: fix error return code of rsxx_pci_probe()
nvme-fc: fix racing controller reset and create association
configfs: fix a use-after-free in __configfs_open_file
arm64: mm: use a 48-bit ID map when possible on 52-bit VA builds
perf/core: Flush PMU internal buffers for per-CPU events
perf/x86/intel: Set PERF_ATTACH_SCHED_CB for large PEBS and LBR
hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event()
powerpc/64s/exception: Clean up a missed SRR specifier
seqlock,lockdep: Fix seqcount_latch_init()
stop_machine: mark helpers __always_inline
include/linux/sched/mm.h: use rcu_dereference in in_vfork()
zram: fix return value on writeback_store
linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP*
sched/membarrier: fix missing local execution of ipi_sync_rq_state()
efi: stub: omit SetVirtualAddressMap() if marked unsupported in RT_PROP table
powerpc/64s: Fix instruction encoding for lis in ppc_function_entry()
powerpc: Fix inverted SET_FULL_REGS bitop
powerpc: Fix missing declaration of [en/dis]able_kernel_vsx()
binfmt_misc: fix possible deadlock in bm_register_write
x86/unwind/orc: Disable KASAN checking in the ORC unwinder, part 2
x86/sev-es: Introduce ip_within_syscall_gap() helper
x86/sev-es: Check regs->sp is trusted before adjusting #VC IST stack
x86/entry: Move nmi entry/exit into common code
x86/sev-es: Correctly track IRQ states in runtime #VC handler
x86/sev-es: Use __copy_from_user_inatomic()
x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls
KVM: x86: Ensure deadline timer has truly expired before posting its IRQ
KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged
KVM: arm64: Fix range alignment when walking page tables
KVM: arm64: Avoid corrupting vCPU context register in guest exit
KVM: arm64: nvhe: Save the SPE context early
KVM: arm64: Reject VM creation when the default IPA size is unsupported
KVM: arm64: Fix exclusive limit for IPA size
mm/userfaultfd: fix memory corruption due to writeprotect
mm/madvise: replace ptrace attach requirement for process_madvise
KVM: arm64: Ensure I-cache isolation between vcpus of a same VM
mm/page_alloc.c: refactor initialization of struct page for holes in memory layout
xen/events: don't unmask an event channel when an eoi is pending
xen/events: avoid handling the same event on two cpus at the same time
KVM: arm64: Fix nVHE hyp panic host context restore
RDMA/umem: Use ib_dma_max_seg_size instead of dma_get_max_seg_size
Linux 5.10.24
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie53a3c1963066a18d41357b6be41cff00690bd40
Changes in 5.10.6
Revert "drm/amd/display: Fix memory leaks in S3 resume"
Revert "mtd: spinand: Fix OOB read"
rtc: pcf2127: move watchdog initialisation to a separate function
rtc: pcf2127: only use watchdog when explicitly available
dt-bindings: rtc: add reset-source property
kdev_t: always inline major/minor helper functions
Bluetooth: Fix attempting to set RPA timeout when unsupported
ALSA: hda/realtek - Modify Dell platform name
ALSA: hda/hdmi: Fix incorrect mutex unlock in silent_stream_disable()
drm/i915/tgl: Fix Combo PHY DPLL fractional divider for 38.4MHz ref clock
scsi: ufs: Allow an error return value from ->device_reset()
scsi: ufs: Re-enable WriteBooster after device reset
RDMA/core: remove use of dma_virt_ops
RDMA/siw,rxe: Make emulated devices virtual in the device tree
fuse: fix bad inode
perf: Break deadlock involving exec_update_mutex
rwsem: Implement down_read_killable_nested
rwsem: Implement down_read_interruptible
exec: Transform exec_update_mutex into a rw_semaphore
mwifiex: Fix possible buffer overflows in mwifiex_cmd_802_11_ad_hoc_start
Linux 5.10.6
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Id4c57a151a1e8f2162163d2337b6055f04edbe9b
paused_cpus intending to force CPUs to go idle as quickly as possible,
adding a migration step, to drain the rq from any running task.
Two steps are actually needed. The first one, "lazy", will run before the
cpu_active_mask has been synced. The second one will run after. It is
possible for another CPU, to observe an outdated version of that mask and
to enqueue a task on a rq that has just been marked inactive. The second
migration is there to catch any of those spurious move, while the first
one will drain the rq as quickly as possible to let the CPU reach an idle
state.
Bug: 161210528
Change-Id: Ie26c2e4c42665dd61d41a899a84536e56bf2b887
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
pause_cpus intends to have a way to force a CPU to go idle and to resume
as quickly as possible, with as little disruption as possible on the
system. This is a way of saving energy or meet thermal constraints, for
which a full CPU hotunplug is too slow. A paused CPU is simply deactivated
from the scheduler point of view. This corresponds to the first hotunplug
step.
Each pause operation still needs some heavy synchronization. Allowing to
pause several CPUs in one go mitigate that issue.
Paused CPUs can be resumed with resume_cpus(), which also takes a cpumask
as an input.
Few limitations:
* It isn't possible to pause a CPU which is running SCHED_DEADLINE task.
* A paused CPU will be removed from any cpuset it is part of. Resuming
the CPU won't put back this CPU in the cpuset if using cgroup1.
Cgroup2 doesn't have this limitation.
* per-CPU kthreads are still allowed to run on a paused CPU.
Bug: 161210528
Change-Id: I1f5cb28190f8ec979bb8640a89b022f2f7266bcf
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Todd Kjos <tkjos@google.com>
Some partners have value-adds based on aosp/540066, which cannot be
carried in ACK in its entirety as it no longer makes sense as-is (the
select_idle_capacity() rework upstream solved the issue differently).
It seems that those partners do not actually need the wake-wide tweaks,
they only need to access the wake_q length for wake-up balance. To
support this, add minimal tracking to the wake_q infrastructure in the
core kernel, but do that by adding a pointer to the wake_q_head to
task_struct directly to not litter all sched classes with an additional
sibling_count_hint argument to the select_task_rq callbacks.
Modules needing to access the wake_q length can do so by dereferencing
p->wake_q_head in the wake-up path when it is non-NULL.
Bug: 173981591
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I9a98167face92e70aba847d9f04d0c216065478c
Steps on the way to 5.10-rc1
Resolves conflicts in:
fs/userfaultfd.c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie3fe3c818f1f6565cfd4fa551de72d2b72ef60af
Remote memcg charging API uses current->active_memcg to store the
currently active memory cgroup, which overwrites the memory cgroup of the
current process. It works well for normal contexts, but doesn't work for
interrupt contexts: indeed, if an interrupt occurs during the execution of
a section with an active memcg set, all allocations inside the interrupt
will be charged to the active memcg set (given that we'll enable
accounting for allocations from an interrupt context). But because the
interrupt might have no relation to the active memcg set outside, it's
obviously wrong from the accounting prospective.
To resolve this problem, let's add a global percpu int_active_memcg
variable, which will be used to store an active memory cgroup which will
be used from interrupt contexts. set_active_memcg() will transparently
use current->active_memcg or int_active_memcg depending on the context.
To make the read part simple and transparent for the caller, let's
introduce two new functions:
- struct mem_cgroup *active_memcg(void),
- struct mem_cgroup *get_active_memcg(void).
They are returning the active memcg if it's set, hiding all implementation
details: where to get it depending on the current context.
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Link: http://lkml.kernel.org/r/20200827225843.1270629-4-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently the remote memcg charging API consists of two functions:
memalloc_use_memcg() and memalloc_unuse_memcg(), which set and clear the
memcg value, which overwrites the memcg of the current task.
memalloc_use_memcg(target_memcg);
<...>
memalloc_unuse_memcg();
It works perfectly for allocations performed from a normal context,
however an attempt to call it from an interrupt context or just nest two
remote charging blocks will lead to an incorrect accounting. On exit from
the inner block the active memcg will be cleared instead of being
restored.
memalloc_use_memcg(target_memcg);
memalloc_use_memcg(target_memcg_2);
<...>
memalloc_unuse_memcg();
Error: allocation here are charged to the memcg of the current
process instead of target_memcg.
memalloc_unuse_memcg();
This patch extends the remote charging API by switching to a single
function: struct mem_cgroup *set_active_memcg(struct mem_cgroup *memcg),
which sets the new value and returns the old one. So a remote charging
block will look like:
old_memcg = set_active_memcg(target_memcg);
<...>
set_active_memcg(old_memcg);
This patch is heavily based on the patch by Johannes Weiner, which can be
found here: https://lkml.org/lkml/2020/5/28/806 .
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dan Schatzberg <dschatzberg@fb.com>
Link: https://lkml.kernel.org/r/20200821212056.3769116-1-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull kernel_clone() updates from Christian Brauner:
"During the v5.9 merge window we reworked the process creation
codepaths across multiple architectures. After this work we were only
left with the _do_fork() helper based on the struct kernel_clone_args
calling convention. As was pointed out _do_fork() isn't valid
kernelese especially for a helper that isn't just static.
This series removes the _do_fork() helper and introduces the new
kernel_clone() helper. The process creation cleanup didn't change the
name to something more reasonable mainly because _do_fork() was used
in quite a few places. So sending this as a separate series seemed the
better strategy.
I originally intended to send this early in the v5.9 development cycle
after the merge window had closed but given that this was touching
quite a few places I decided to defer this until the v5.10 merge
window"
* tag 'kernel-clone-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
sched: remove _do_fork()
tracing: switch to kernel_clone()
kgdbts: switch to kernel_clone()
kprobes: switch to kernel_clone()
x86: switch to kernel_clone()
sparc: switch to kernel_clone()
nios2: switch to kernel_clone()
m68k: switch to kernel_clone()
ia64: switch to kernel_clone()
h8300: switch to kernel_clone()
fork: introduce kernel_clone()
Currently __set_oom_adj loops through all processes in the system to keep
oom_score_adj and oom_score_adj_min in sync between processes sharing
their mm. This is done for any task with more that one mm_users, which
includes processes with multiple threads (sharing mm and signals).
However for such processes the loop is unnecessary because their signal
structure is shared as well.
Android updates oom_score_adj whenever a tasks changes its role
(background/foreground/...) or binds to/unbinds from a service, making it
more/less important. Such operation can happen frequently. We noticed
that updates to oom_score_adj became more expensive and after further
investigation found out that the patch mentioned in "Fixes" introduced a
regression. Using Pixel 4 with a typical Android workload, write time to
oom_score_adj increased from ~3.57us to ~362us. Moreover this regression
linearly depends on the number of multi-threaded processes running on the
system.
Mark the mm with a new MMF_MULTIPROCESS flag bit when task is created with
(CLONE_VM && !CLONE_THREAD && !CLONE_VFORK). Change __set_oom_adj to use
MMF_MULTIPROCESS instead of mm_users to decide whether oom_score_adj
update should be synchronized between multiple processes. To prevent
races between clone() and __set_oom_adj(), when oom_score_adj of the
process being cloned might be modified from userspace, we use
oom_adj_mutex. Its scope is changed to global.
The combination of (CLONE_VM && !CLONE_THREAD) is rarely used except for
the case of vfork(). To prevent performance regressions of vfork(), we
skip taking oom_adj_mutex and setting MMF_MULTIPROCESS when CLONE_VFORK is
specified. Clearing the MMF_MULTIPROCESS flag (when the last process
sharing the mm exits) is left out of this patch to keep it simple and
because it is believed that this threading model is rare. Should there
ever be a need for optimizing that case as well, it can be done by hooking
into the exit path, likely following the mm_update_next_owner pattern.
With the combination of (CLONE_VM && !CLONE_THREAD && !CLONE_VFORK) being
quite rare, the regression is gone after the change is applied.
[surenb@google.com: v3]
Link: https://lkml.kernel.org/r/20200902012558.2335613-1-surenb@google.com
Fixes: 44a70adec9 ("mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj")
Reported-by: Tim Murray <timmurray@google.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Eugene Syromiatnikov <esyr@redhat.com>
Cc: Christian Kellner <christian@kellner.me>
Cc: Adrian Reber <areber@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Bernd Edlinger <bernd.edlinger@hotmail.de>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Link: https://lkml.kernel.org/r/20200824153036.3201505-1-surenb@google.com
Debugged-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset is based on Google-internal RSEQ work done by Paul
Turner and Andrew Hunter.
When working with per-CPU RSEQ-based memory allocations, it is
sometimes important to make sure that a global memory location is no
longer accessed from RSEQ critical sections. For example, there can be
two per-CPU lists, one is "active" and accessed per-CPU, while another
one is inactive and worked on asynchronously "off CPU" (e.g. garbage
collection is performed). Then at some point the two lists are
swapped, and a fast RCU-like mechanism is required to make sure that
the previously active list is no longer accessed.
This patch introduces such a mechanism: in short, membarrier() syscall
issues an IPI to a CPU, restarting a potentially active RSEQ critical
section on the CPU.
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lkml.kernel.org/r/20200923233618.2572849-1-posk@google.com
The old _do_fork() helper doesn't follow naming conventions of in-kernel
helpers for syscalls. The process creation cleanup in [1] didn't change the
name to something more reasonable mainly because _do_fork() was used in quite a
few places. So sending this as a separate series seemed the better strategy.
This commit does two things:
1. renames _do_fork() to kernel_clone() but keeps _do_fork() as a simple static
inline wrapper around kernel_clone().
2. Changes the return type from long to pid_t. This aligns kernel_thread() and
kernel_clone(). Also, the return value from kernel_clone that is surfaced in
fork(), vfork(), clone(), and clone3() is taken from pid_vrn() which returns
a pid_t too.
Follow-up patches will switch each caller of _do_fork() and each place where it
is referenced over to kernel_clone(). After all these changes are done, we can
remove _do_fork() completely and will only be left with kernel_clone().
[1]: 9ba27414f2 ("Merge tag 'fork-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux")
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Link: https://lore.kernel.org/r/20200819104655.436656-2-christian.brauner@ubuntu.com
We currently set this flag *only* on domains whose topology level exactly
match the level where we detect asymmetry (as returned by
asym_cpu_capacity_level()). This is rather problematic.
Say there are two clusters in the system, one with a lone big CPU and the
other with a mix of big and LITTLE CPUs (as is allowed by DynamIQ):
DIE [ ]
MC [ ][ ]
0 1 2 3 4
L L B B B
asym_cpu_capacity_level() will figure out that the MC level is the one
where all CPUs can see a CPU of max capacity, and we will thus set
SD_ASYM_CPUCAPACITY at MC level for all CPUs.
That lone big CPU will degenerate its MC domain, since it would be alone in
there, and will end up with just a DIE domain. Since the flag was only set
at MC, this CPU ends up not seeing any SD with the flag set, which is
broken.
Rather than clearing dflags at every topology level, clear it before
entering the topology level loop. This will properly propagate upwards
flags that are set starting from a certain level.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Quentin Perret <qperret@google.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20200817113003.20802-11-valentin.schneider@arm.com
There are some expectations regarding how sched domain flags should be laid
out, but none of them are checked or asserted in
sched_domain_debug_one(). After staring at said flags for a while, I've
come to realize there's two repeating patterns:
- Shared with children: those flags are set from the base CPU domain
upwards. Any domain that has it set will have it set in its children. It
hints at "some property holds true / some behaviour is enabled until this
level".
- Shared with parents: those flags are set from the topmost domain
downwards. Any domain that has it set will have it set in its parents. It
hints at "some property isn't visible / some behaviour is disabled until
this level".
There are two outliers that (currently) do not map to either of these:
o SD_PREFER_SIBLING, which is cleared below levels with
SD_ASYM_CPUCAPACITY. The change was introduced by commit:
9c63e84db2 ("sched/core: Disable SD_PREFER_SIBLING on asymmetric CPU capacity domains")
as it could break misfit migration on some systems. In light of this, we
might want to change it back to make it fit one of the two categories and
fix the issue another way.
o SD_ASYM_CPUCAPACITY, which gets set on a single level and isn't
propagated up nor down. From a topology description point of view, it
really wants to be SDF_SHARED_PARENT; this will be rectified in a later
patch.
Tweak the sched_domain flag declaration to assign each flag an expected
layout, and include the rationale for each flag "meta type" assignment as a
comment. Consolidate the flag metadata into an array; the index of a flag's
metadata can easily be found with log2(flag), IOW __ffs(flag).
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20200817113003.20802-5-valentin.schneider@arm.com
To associate the SD flags with some metadata, we need some more structure
in the way they are declared.
Rather than shove that in a free-standing macro list, move the declaration
in a separate file that can be re-imported with different SD_FLAG
definitions. This is inspired by what is done with the syscall
table (see uapi/asm/unistd.h and sys_call_table).
The value assigned to a given SD flag now depends on the order it appears
in sd_flags.h. No change in functionality.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20200817113003.20802-4-valentin.schneider@arm.com
Impose a limit on the number of watches that a user can hold so that
they can't use this mechanism to fill up all the available memory.
This is done by putting a counter in user_struct that's incremented when
a watch is allocated and decreased when it is released. If the number
exceeds the RLIMIT_NOFILE limit, the watch is rejected with EAGAIN.
This can be tested by the following means:
(1) Create a watch queue and attach it to fd 5 in the program given - in
this case, bash:
keyctl watch_session /tmp/nlog /tmp/gclog 5 bash
(2) In the shell, set the maximum number of files to, say, 99:
ulimit -n 99
(3) Add 200 keyrings:
for ((i=0; i<200; i++)); do keyctl newring a$i @s || break; done
(4) Try to watch all of the keyrings:
for ((i=0; i<200; i++)); do echo $i; keyctl watch_add 5 %:a$i || break; done
This should fail when the number of watches belonging to the user hits
99.
(5) Remove all the keyrings and all of those watches should go away:
for ((i=0; i<200; i++)); do keyctl unlink %:a$i; done
(6) Kill off the watch queue by exiting the shell spawned by
watch_session.
Fixes: c73be61ced ("pipe: Add general notification queue support")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current_gfp_context() converts a number of PF_MEMALLOC_* per-process
flags into the corresponding GFP_* flags for memory allocation. In that
function, current->flags is accessed 3 times. That may lead to duplicated
access of the same memory location.
This is not usually a problem with minimal debug config options on as the
compiler can optimize away the duplicated memory accesses. With most of
the debug config options on, however, that may not be the case. For
example, the x86-64 object size of the __need_fs_reclaim() in a debug
kernel that calls current_gfp_context() was 309 bytes. With this patch
applied, the object size is reduced to 202 bytes. This is a saving of 107
bytes and will probably be slightly faster too.
Use READ_ONCE() to access current->flags to prevent the compiler from
possibly accessing current->flags multiple times.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michel Lespinasse <walken@google.com>
Link: http://lkml.kernel.org/r/20200618212936.9776-1-longman@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merges along the way to 5.9-rc1
resolves conflicts in:
Documentation/ABI/testing/sysfs-class-power
drivers/power/supply/power_supply_sysfs.c
fs/crypto/inline_crypt.c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia087834f54fb4e5269d68c3c404747ceed240701
Currently, memalloc_nocma_{save/restore} API that prevents CMA area
in page allocation is implemented by using current_gfp_context(). However,
there are two problems of this implementation.
First, this doesn't work for allocation fastpath. In the fastpath,
original gfp_mask is used since current_gfp_context() is introduced in
order to control reclaim and it is on slowpath. So, CMA area can be
allocated through the allocation fastpath even if
memalloc_nocma_{save/restore} APIs are used. Currently, there is just
one user for these APIs and it has a fallback method to prevent actual
problem.
Second, clearing __GFP_MOVABLE in current_gfp_context() has a side effect
to exclude the memory on the ZONE_MOVABLE for allocation target.
To fix these problems, this patch changes the implementation to exclude
CMA area in page allocation. Main point of this change is using the
alloc_flags. alloc_flags is mainly used to control allocation so it fits
for excluding CMA area in allocation.
Fixes: d7fefcc8de (mm/cma: add PF flag to force non cma alloc)
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Link: http://lkml.kernel.org/r/1595468942-29687-1-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steps on the way to 5.9-rc1
Resolves conflicts in:
drivers/irqchip/qcom-pdc.c
include/linux/device.h
net/xfrm/xfrm_state.c
security/lsm_audit.c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4aeb3d04f4717714a421721eb3ce690c099bb30a
Pull fork cleanups from Christian Brauner:
"This is cleanup series from when we reworked a chunk of the process
creation paths in the kernel and switched to struct
{kernel_}clone_args.
High-level this does two main things:
- Remove the double export of both do_fork() and _do_fork() where
do_fork() used the incosistent legacy clone calling convention.
Now we only export _do_fork() which is based on struct
kernel_clone_args.
- Remove the copy_thread_tls()/copy_thread() split making the
architecture specific HAVE_COYP_THREAD_TLS config option obsolete.
This switches all remaining architectures to select
HAVE_COPY_THREAD_TLS and thus to the copy_thread_tls() calling
convention. The current split makes the process creation codepaths
more convoluted than they need to be. Each architecture has their own
copy_thread() function unless it selects HAVE_COPY_THREAD_TLS then it
has a copy_thread_tls() function.
The split is not needed anymore nowadays, all architectures support
CLONE_SETTLS but quite a few of them never bothered to select
HAVE_COPY_THREAD_TLS and instead simply continued to use copy_thread()
and use the old calling convention. Removing this split cleans up the
process creation codepaths and paves the way for implementing clone3()
on such architectures since it requires the copy_thread_tls() calling
convention.
After having made each architectures support copy_thread_tls() this
series simply renames that function back to copy_thread(). It also
switches all architectures that call do_fork() directly over to
_do_fork() and the struct kernel_clone_args calling convention. This
is a corollary of switching the architectures that did not yet support
it over to copy_thread_tls() since do_fork() is conditional on not
supporting copy_thread_tls() (Mostly because it lacks a separate
argument for tls which is trivial to fix but there's no need for this
function to exist.).
The do_fork() removal is in itself already useful as it allows to to
remove the export of both do_fork() and _do_fork() we currently have
in favor of only _do_fork(). This has already been discussed back when
we added clone3(). The legacy clone() calling convention is - as is
probably well-known - somewhat odd:
#
# ABI hall of shame
#
config CLONE_BACKWARDS
config CLONE_BACKWARDS2
config CLONE_BACKWARDS3
that is aggravated by the fact that some architectures such as sparc
follow the CLONE_BACKWARDSx calling convention but don't really select
the corresponding config option since they call do_fork() directly.
So do_fork() enforces a somewhat arbitrary calling convention in the
first place that doesn't really help the individual architectures that
deviate from it. They can thus simply be switched to _do_fork()
enforcing a single calling convention. (I really hope that any new
architectures will __not__ try to implement their own calling
conventions...)
Most architectures already have made a similar switch (m68k comes to
mind).
Overall this removes more code than it adds even with a good portion
of added comments. It simplifies a chunk of arch specific assembly
either by moving the code into C or by simply rewriting the assembly.
Architectures that have been touched in non-trivial ways have all been
actually boot and stress tested: sparc and ia64 have been tested with
Debian 9 images. They are the two architectures which have been
touched the most. All non-trivial changes to architectures have seen
acks from the relevant maintainers. nios2 with a custom built
buildroot image. h8300 I couldn't get something bootable to test on
but the changes have been fairly automatic and I'm sure we'll hear
people yell if I broke something there.
All other architectures that have been touched in trivial ways have
been compile tested for each single patch of the series via git rebase
-x "make ..." v5.8-rc2. arm{64} and x86{_64} have been boot tested
even though they have just been trivially touched (removal of the
HAVE_COPY_THREAD_TLS macro from their Kconfig) because well they are
basically "core architectures" and since it is trivial to get your
hands on a useable image"
* tag 'fork-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
arch: rename copy_thread_tls() back to copy_thread()
arch: remove HAVE_COPY_THREAD_TLS
unicore: switch to copy_thread_tls()
sh: switch to copy_thread_tls()
nds32: switch to copy_thread_tls()
microblaze: switch to copy_thread_tls()
hexagon: switch to copy_thread_tls()
c6x: switch to copy_thread_tls()
alpha: switch to copy_thread_tls()
fork: remove do_fork()
h8300: select HAVE_COPY_THREAD_TLS, switch to kernel_clone_args
nios2: enable HAVE_COPY_THREAD_TLS, switch to kernel_clone_args
ia64: enable HAVE_COPY_THREAD_TLS, switch to kernel_clone_args
sparc: unconditionally enable HAVE_COPY_THREAD_TLS
sparc: share process creation helpers between sparc and sparc64
sparc64: enable HAVE_COPY_THREAD_TLS
fork: fold legacy_clone_args_valid() into _do_fork()