The MODULE_IMPORT_NS() macro does not allow defined strings to work
properly with it, so add a layer of indirection to allow this to happen.
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Matthias Maennich <maennich@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
(cherry picked from commit ca321ec74322e3c49552fc1ffc80b42d0dbf1a84)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ibd64ba139912ea10e81ac22490831129b23a31e1
There is a race present where the DWC3 runtime resume runs in parallel
to the UDC unbind sequence. This will eventually lead to a possible
scenario where we are enabling the run/stop bit, without a valid
composition defined.
Thread#1 (handling UDC unbind):
usb_gadget_remove_driver()
-->usb_gadget_disconnect()
-->dwc3_gadget_pullup(0)
--> continue UDC unbind sequence
-->Thread#2 is running in parallel here
Thread#2 (handing next cable connect)
__dwc3_set_mode()
-->pm_runtime_get_sync()
-->dwc3_gadget_resume()
-->dwc->gadget_driver is NOT NULL yet
-->dwc3_gadget_run_stop(1)
--> _dwc3gadget_start()
...
Fix this by tracking the pullup disable routine, and avoiding resuming
of the DWC3 gadget. Once the UDC is re-binded, that will trigger the
pullup enable routine, which would handle enabling the DWC3 gadget.
Acked-by: Felipe Balbi <balbi@kernel.org>
Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
Link: https://lore.kernel.org/r/20210917021852.2037-1-wcheng@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 200287549
(cherry picked from commit 8217f07a50236779880f13e87f99224cd9117f83)
Change-Id: Ie7d378c5ae5637cca10f254fa403ef2cd3af7d50
Signed-off-by: Wesley Cheng <quic_wcheng@quicinc.com>
For various reasons based on the allocator behaviour and typical
use-cases at the time, when the max32_alloc_size optimisation was
introduced it seemed reasonable to couple the reset of the tracked
size to the update of cached32_node upon freeing a relevant IOVA.
However, since subsequent optimisations focused on helping genuine
32-bit devices make best use of even more limited address spaces, it
is now a lot more likely for cached32_node to be anywhere in a "full"
32-bit address space, and as such more likely for space to become
available from IOVAs below that node being freed.
At this point, the short-cut in __cached_rbnode_delete_update() really
doesn't hold up any more, and we need to fix the logic to reliably
provide the expected behaviour. We still want cached32_node to only move
upwards, but we should reset the allocation size if *any* 32-bit space
has become available.
Reported-by: Yunfei Wang <yf.wang@mediatek.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Miles Chen <miles.chen@mediatek.com>
Link: https://lore.kernel.org/r/033815732d83ca73b13c11485ac39336f15c3b40.1646318408.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Bug: 223712131
(cherry picked from commit 5b61343b50590fb04a3f6be2cdc4868091757262
https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git core)
Signed-off-by: Yunfei Wang <yf.wang@mediatek.com>
Change-Id: I5026411dd022c6ddea5c0e4da6e69c7b14162c3f
(cherry picked from commit ec48b1892eb5425c7cb9136f19b00ac14da7f0c2)
Set KMI_GENERATION=3 for 4/6 KMI update
Leaf changes summary: 3064 artifacts changed
Changed leaf types summary: 5 leaf types changed
Removed/Changed/Added functions summary: 11 Removed, 2960 Changed, 0 Added function
Removed/Changed/Added variables summary: 0 Removed, 88 Changed, 0 Added variable
11 Removed functions:
[D] 'function void rndis_deregister(rndis_params*)'
[D] 'function void rndis_free_response(rndis_params*, u8*)'
[D] 'function u8* rndis_get_next_response(rndis_params*, u32*)'
[D] 'function int rndis_msg_parser(rndis_params*, u8*)'
[D] 'function rndis_params* rndis_register(void (void*)*, void*)'
[D] 'function void rndis_set_host_mac(rndis_params*, const u8*)'
[D] 'function int rndis_set_param_dev(rndis_params*, net_device*, u16*)'
[D] 'function int rndis_set_param_medium(rndis_params*, u32, u32)'
[D] 'function int rndis_set_param_vendor(rndis_params*, u32, const char*)'
[D] 'function int rndis_signal_connect(rndis_params*)'
[D] 'function void rndis_uninit(rndis_params*)'
2960 functions with some sub-type change:
[C] 'function void* PDE_DATA(const inode*)' at generic.c:794:1 has some sub-type changes:
CRC (modversions) changed from 0xedd5d462 to 0x6a6d7264
[C] 'function void __ClearPageMovable(page*)' at compaction.c:138:1 has some sub-type changes:
CRC (modversions) changed from 0x3aeae4f2 to 0x2500d324
[C] 'function void __SetPageMovable(page*, address_space*)' at compaction.c:130:1 has some sub-type changes:
CRC (modversions) changed from 0x96ef33e3 to 0x3bc05121
... 2957 omitted; 2960 symbols have only CRC changes
88 Changed variables:
[C] 'pglist_data contig_page_data' was changed at memblock.c:96:1:
size of symbol changed from 5632 to 5760
CRC (modversions) changed from 0xafbdb526 to 0x42a6e924
type of variable changed:
type size changed from 45056 to 46080 (in bits)
1 data member insertion:
'task_struct* mkswapd[16]', at offset 39680 (in bits) at mmzone.h:848:1
there are data member changes:
16 ('int kswapd_order' .. 'atomic_long_t vm_stat[40]') offsets changed (by +1024 bits)
3276 impacted interfaces
[C] 'rq runqueues' was changed at core.c:49:1:
size of symbol changed from 4160 to 4416
CRC (modversions) changed from 0x3be19baa to 0x6043515f
type of variable changed:
type size changed from 33280 to 35328 (in bits)
there are data member changes:
'uclamp_rq uclamp[2]' size changed from 768 to 2688 (in bits) (by +1920 bits)
'unsigned int uclamp_flags' offset changed (by +1920 bits)
65 ('cfs_rq cfs' .. 'u64 android_vendor_data1[96]') offsets changed (by +2048 bits)
3276 impacted interfaces
[C] 'tracepoint __tracepoint_android_vh_aes_decrypt' was changed at fips140.h:40:1:
CRC (modversions) changed from 0xde5b1cc7 to 0x64eaf879
[C] 'tracepoint __tracepoint_android_vh_aes_encrypt' was changed at fips140.h:33:1:
CRC (modversions) changed from 0x10f648a3 to 0x85db9ebb
[C] 'tracepoint __tracepoint_android_vh_aes_expandkey' was changed at fips140.h:26:1:
CRC (modversions) changed from 0xf7274615 to 0x71396455
... 83 omitted; 86 symbols have only CRC changes
'struct pglist_data at mmzone.h:800:1' changed:
details were reported earlier
'struct rq at sched.h:931:1' changed:
details were reported earlier
'struct snd_pcm_runtime at pcm.h:344:1' changed:
type size changed from 6144 to 6400 (in bits)
1 data member insertion:
'mutex buffer_mutex', at offset 2752 (in bits) at pcm.h:401:1
there are data member changes:
14 ('void* private_data' .. 'timespec64 driver_tstamp') offsets changed (by +256 bits)
68 impacted interfaces
'struct uclamp_rq at sched.h:916:1' changed:
type size changed from 384 to 1344 (in bits)
there are data member changes:
type 'uclamp_bucket[5]' of 'uclamp_rq::bucket' changed:
type name changed from 'uclamp_bucket[5]' to 'uclamp_bucket[20]'
array type size changed from 320 to 1280
array type subrange 1 changed length from 5 to 20
3276 impacted interfaces
'struct uclamp_se at sched.h:690:1' changed:
type size hasn't changed
there are data member changes:
2 ('unsigned int active' .. 'unsigned int user_defined') offsets changed (by +2 bits)
3276 impacted interfaces
Bug: 228318757
Signed-off-by: Todd Kjos <tkjos@google.com>
Change-Id: I72e72f07f1d6c95ecca451925d8aaf017db2d404
This reverts commit 39111fc404.
The bug in uclamp has now been fixed, we can switch back to 20 buckets.
Bug: 186415778
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I4b780e45398318e70cd1ce6e0dcebed45747bdf1
(cherry picked from commit 70094f39c7)
When vendor hooks are added to a file that previously didn't have any
vendor hooks, we end up indirectly including linux/tracepoint.h. This
causes some data types that used to be opaque (forward declared) to the
code to become visible to the code.
Modversions correctly catches this change in visibility, but we don't
really care about the data types made visible when linux/tracepoint.h is
included. So, hide this from modversions in the central vendor_hooks.h file
instead of having to fix this on a case by case basis.
This change itself will cause a one time CRC breakage/churn because it's
fixing the existing vendor hook headers, but should reduce unnecessary CRC
churns in the future.
To avoid future pointless CRC churn, vendor hook header files that include
vendor_hooks.h should not include linux/tracepoint.h directly.
Bug: 227513263
Bug: 226140073
Signed-off-by: Saravana Kannan <saravanak@google.com>
Change-Id: Ia88e6af11dd94fe475c464eb30a6e5e1e24c938b
Page replacement is handled in the Linux Kernel in one of two ways:
1) Asynchronously via kswapd
2) Synchronously, via direct reclaim
At page allocation time the allocating task is immediately given a page
from the zone free list allowing it to go right back to work doing
whatever it was doing; Probably directly or indirectly executing business
logic.
Just prior to satisfying the allocation, free pages is checked to see if
it has reached the zone low watermark and if so, kswapd is awakened.
Kswapd will start scanning pages looking for inactive pages to evict to
make room for new page allocations. The work of kswapd allows tasks to
continue allocating memory from their respective zone free list without
incurring any delay.
When the demand for free pages exceeds the rate that kswapd tasks can
supply them, page allocation works differently. Once the allocating task
finds that the number of free pages is at or below the zone min watermark,
the task will no longer pull pages from the free list. Instead, the task
will run the same CPU-bound routines as kswapd to satisfy its own
allocation by scanning and evicting pages. This is called a direct reclaim.
The time spent performing a direct reclaim can be substantial, often
taking tens to hundreds of milliseconds for small order0 allocations to
half a second or more for order9 huge-page allocations. In fact, kswapd is
not actually required on a linux system. It exists for the sole purpose of
optimizing performance by preventing direct reclaims.
When memory shortfall is sufficient to trigger direct reclaims, they can
occur in any task that is running on the system. A single aggressive
memory allocating task can set the stage for collateral damage to occur in
small tasks that rarely allocate additional memory. Consider the impact of
injecting an additional 100ms of latency when nscd allocates memory to
facilitate caching of a DNS query.
The presence of direct reclaims 10 years ago was a fairly reliable
indicator that too much was being asked of a Linux system. Kswapd was
likely wasting time scanning pages that were ineligible for eviction.
Adding RAM or reducing the working set size would usually make the problem
go away. Since then hardware has evolved to bring a new struggle for
kswapd. Storage speeds have increased by orders of magnitude while CPU
clock speeds stayed the same or even slowed down in exchange for more
cores per package. This presents a throughput problem for a single
threaded kswapd that will get worse with each generation of new hardware.
Test Details
NOTE: The tests below were run with shadow entries disabled. See the
associated patch and cover letter for details
The tests below were designed with the assumption that a kswapd bottleneck
is best demonstrated using filesystem reads. This way, the inactive list
will be full of clean pages, simplifying the analysis and allowing kswapd
to achieve the highest possible steal rate. Maximum steal rates for kswapd
are likely to be the same or lower for any other mix of page types on the
system.
Tests were run on a 2U Oracle X7-2L with 52 Intel Xeon Skylake 2GHz cores,
756GB of RAM and 8 x 3.6 TB NVMe Solid State Disk drives. Each drive has
an XFS file system mounted separately as /d0 through /d7. SSD drives
require multiple concurrent streams to show their potential, so I created
eleven 250GB zero-filled files on each drive so that I could test with
parallel reads.
The test script runs in multiple stages. At each stage, the number of dd
tasks run concurrently is increased by 2. I did not include all of the
test output for brevity.
During each stage dd tasks are launched to read from each drive in a round
robin fashion until the specified number of tasks for the stage has been
reached. Then iostat, vmstat and top are started in the background with 10
second intervals. After five minutes, all of the dd tasks are killed and
the iostat, vmstat and top output is parsed in order to report the
following:
CPU consumption
- sy - aggregate kernel mode CPU consumption from vmstat output. The value
doesn't tend to fluctuate much so I just grab the highest value.
Each sample is averaged over 10 seconds
- dd_cpu - for all of the dd tasks averaged across the top samples since
there is a lot of variation.
Throughput
- in Kbytes
- Command is iostat -x -d 10 -g total
This first test performs reads using O_DIRECT in order to show the maximum
throughput that can be obtained using these drives. It also demonstrates
how rapidly throughput scales as the number of dd tasks are increased.
The dd command for this test looks like this:
Command Used: dd iflag=direct if=/d${i}/$n of=/dev/null bs=4M
Test #1: Direct IO
dd sy dd_cpu throughput
6 0 2.33 14726026.40
10 1 2.95 19954974.80
16 1 2.63 24419689.30
22 1 2.63 25430303.20
28 1 2.91 26026513.20
34 1 2.53 26178618.00
40 1 2.18 26239229.20
46 1 1.91 26250550.40
52 1 1.69 26251845.60
58 1 1.54 26253205.60
64 1 1.43 26253780.80
70 1 1.31 26254154.80
76 1 1.21 26253660.80
82 1 1.12 26254214.80
88 1 1.07 26253770.00
90 1 1.04 26252406.40
Throughput was close to peak with only 22 dd tasks. Very little system CPU
was consumed as expected as the drives DMA directly into the user address
space when using direct IO.
In this next test, the iflag=direct option is removed and we only run the
test until the pgscan_kswapd from /proc/vmstat starts to increment. At
that point metrics are parsed and reported and the pagecache contents are
dropped prior to the next test. Lather, rinse, repeat.
Test #2: standard file system IO, no page replacement
dd sy dd_cpu throughput
6 2 28.78 5134316.40
10 3 31.40 8051218.40
16 5 34.73 11438106.80
22 7 33.65 14140596.40
28 8 31.24 16393455.20
34 10 29.88 18219463.60
40 11 28.33 19644159.60
46 11 25.05 20802497.60
52 13 26.92 22092370.00
58 13 23.29 22884881.20
64 14 23.12 23452248.80
70 15 22.40 23916468.00
76 16 22.06 24328737.20
82 17 20.97 24718693.20
88 16 18.57 25149404.40
90 16 18.31 25245565.60
Each read has to pause after the buffer in kernel space is populated while
those pages are added to the pagecache and copied into the user address
space. For this reason, more parallel streams are required to achieve peak
throughput. The copy operation consumes substantially more CPU than direct
IO as expected.
The next test measures throughput after kswapd starts running. This is the
same test only we wait for kswapd to wake up before we start collecting
metrics. The script actually keeps track of a few things that were not
mentioned earlier. It tracks direct reclaims and page scans by watching
the metrics in /proc/vmstat. CPU consumption for kswapd is tracked the
same way it is tracked for dd.
Since the test is 100% reads, you can assume that the page steal rate for
kswapd and direct reclaims is almost identical to the scan rate.
Test #3: 1 kswapd thread per node
dd sy dd_cpu kswapd0 kswapd1 throughput dr pgscan_kswapd pgscan_direct
10 4 26.07 28.56 27.03 7355924.40 0 459316976 0
16 7 34.94 69.33 69.66 10867895.20 0 872661643 0
22 10 36.03 93.99 99.33 13130613.60 489 1037654473 11268334
28 10 30.34 95.90 98.60 14601509.60 671 1182591373 15429142
34 14 34.77 97.50 99.23 16468012.00 10850 1069005644 249839515
40 17 36.32 91.49 97.11 17335987.60 18903 975417728 434467710
46 19 38.40 90.54 91.61 17705394.40 25369 855737040 582427973
52 22 40.88 83.97 83.70 17607680.40 31250 709532935 724282458
58 25 40.89 82.19 80.14 17976905.60 35060 657796473 804117540
64 28 41.77 73.49 75.20 18001910.00 39073 561813658 895289337
70 33 45.51 63.78 64.39 17061897.20 44523 379465571 1020726436
76 36 46.95 57.96 60.32 16964459.60 47717 291299464 1093172384
82 39 47.16 55.43 56.16 16949956.00 49479 247071062 1134163008
88 42 47.41 53.75 47.62 16930911.20 51521 195449924 1180442208
90 43 47.18 51.40 50.59 16864428.00 51618 190758156 1183203901
In the previous test where kswapd was not involved, the system-wide kernel
mode CPU consumption with 90 dd tasks was 16%. In this test CPU consumption
with 90 tasks is at 43%. With 52 cores, and two kswapd tasks (one per NUMA
node), kswapd can only be responsible for a little over 4% of the increase.
The rest is likely caused by 51,618 direct reclaims that scanned 1.2
billion pages over the five minute time period of the test.
Same test, more kswapd tasks:
Test #4: 4 kswapd threads per node
dd sy dd_cpu kswapd0 kswapd1 throughput dr pgscan_kswapd pgscan_direct
10 5 27.09 16.65 14.17 7842605.60 0 459105291 0
16 10 37.12 26.02 24.85 11352920.40 15 920527796 358515
22 11 36.94 37.13 35.82 13771869.60 0 1132169011 0
28 13 35.23 48.43 46.86 16089746.00 0 1312902070 0
34 15 33.37 53.02 55.69 18314856.40 0 1476169080 0
40 19 35.90 69.60 64.41 19836126.80 0 1629999149 0
46 22 36.82 88.55 57.20 20740216.40 0 1708478106 0
52 24 34.38 93.76 68.34 21758352.00 0 1794055559 0
58 24 30.51 79.20 82.33 22735594.00 0 1872794397 0
64 26 30.21 97.12 76.73 23302203.60 176 1916593721 4206821
70 33 32.92 92.91 92.87 23776588.00 3575 1817685086 85574159
76 37 31.62 91.20 89.83 24308196.80 4752 1812262569 113981763
82 29 25.53 93.23 92.33 24802791.20 306 2032093122 7350704
88 43 37.12 76.18 77.01 25145694.40 20310 1253204719 487048202
90 42 38.56 73.90 74.57 22516787.60 22774 1193637495 545463615
By increasing the number of kswapd threads, throughput increased by ~50%
while kernel mode CPU utilization decreased or stayed the same, likely due
to a decrease in the number of parallel tasks at any given time doing page
replacement.
Signed-off-by: Buddy Lumpkin <buddy.lumpkin@oracle.com>
Bug: 201263306
Link: https://lore.kernel.org/lkml/1522661062-39745-1-git-send-email-buddy.lumpkin@oracle.com
[charante@codeaurora.org]: Changes made to select number of kswapds through uapi
Change-Id: I8425cab7f40cbeaf65af0ea118c1a9ac7da0930e
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
[quic_vjitta@quicinc.com]: Changes made to move multiple kswapd threads logic to vendor hooks
Signed-off-by: Vijayanand Jitta <quic_vjitta@quicinc.com>
(cherry picked from commit 0d61a651e4)
This reverts commit 07566786dc.
It fixes CVE-2022-1048 and it had to originally be reverted due to ABI
issues. Add it back now as the ABI "break" is allowed and we want to
fix this real problem.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2989057ebdab8269c2f98702a9afb01d4814285b
This reverts commit f9e40dc812.
It fixes CVE-2022-1048 and it had to originally be reverted due to ABI
issues. Add it back now as the ABI "break" is allowed and we want to
fix this real problem.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ife1c36ac31c25f77aab42b7d96e9552d4acc3184
This reverts commit 9f368dfefd.
It fixes CVE-2022-1048 and it had to originally be reverted due to ABI
issues. Add it back now as the ABI "break" is allowed and we want to
fix this real problem.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I00cdc5c3d932a882ec759f455024a3e14d311032
This reverts commit 162cbdd807.
It fixes CVE-2022-1048 and it had to originally be reverted due to ABI
issues. Add it back now as the ABI "break" is allowed and we want to
fix this real problem.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I230096f0f827d4a49b0574a5cf27667ef132d5d9
It's obsolete and only works for unsupported Windows hosts and is
totally insecure and should never be used. Remove it in order to remove
a potential attack vector on Android systems.
NCM is a much better interface to use and it also works on other
operating system hosts.
Bug: 157965270
Bug: 226303025
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I45acc8b894220cdc9f170f9d5428aca195e9af38
When dealing with a guest with SVE enabled, we don't populate
the shadow SVE state, nor pin the SVE state at S1 EL2.
Fix both issues in one go.
Bug: 227292021
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Marc Zyngier <mzyngier@google.com>
Change-Id: I88dc7e9c84e5970ec2466a0aa98ad4e3c94711a0
It isn't always obvious what vcpu (host or shadow) should be used
in which context, nor whether the provided vcpu is valid or not.
To make this less error prone, provide a helper that will always
return a vcpu in the HYP address space as well as the corresponding
shadow state if we're in protected mode. If the host-provided vcpu
doesn't match the loaded vcpu, NULL is returned for both pointers.
In non-protected mode, no state is provided, of course, but the vcpu
is converted to its HYP pointer.
Bug: 227292021
Bug: 227768863
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Marc Zyngier <mzyngier@google.com>
Change-Id: Idcfc4e042ff05d97ae52f7991666935c1c570f10
Now that we have a bionic based sysroot available from the NDK, we can
build the header tests.
Bug: 190019968
Change-Id: I661851b1215fa75e39bd0874bf777299b73fe12e
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
In order to support CONFIG_UAPI_HEADER_TEST=y with a bionic sysroot
using prebuilt bionic from the NDK, we need to set a different target
triple for USERCFLAGS than what's used when cross compiling the kernel
(so that the correct headers and libc.{a|so} are found+used).
Bug: 190019968
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Change-Id: Ib3e60c41b862cda9f79ff8d2c812aaa8bfb571af
These aren't portable yet when building with a Bionic based sysroot.
Disable building them for now so that we can land support for
UAPI_HEADER_TEST.
Bug: 190019968
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Change-Id: Ice3d3c55bbf79dd08265f168e49e2058231c181d
In include/uapi/linux/tipc_config.h, there's a comment that it includes
arpa/inet.h for ntohs; but ntohs is not defined in any UAPI header. For
now, reuse the definitions from include/linux/byteorder/generic.h, since
the various conversion functions do exist in UAPI headers:
include/uapi/linux/byteorder/big_endian.h
include/uapi/linux/byteorder/little_endian.h
We would like to get to the point where we can build UAPI header tests
with -nostdinc, meaning that kernel UAPI headers should not have a
circular dependency on libc headers.
Link: https://android-review.googlesource.com/c/platform/bionic/+/2048127
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/netdev/20220404175448.46200-1-ndesaulniers@google.com/
Bug: 190019968
Change-Id: I5c45248fb88f03024da809b503ff3b9cd4ed66a9
When you compile-test UAPI headers (CONFIG_UAPI_HEADER_TEST=y) with
Clang, they are currently compiled for the host target (likely x86_64)
regardless of the given ARCH=.
In fact, some exported headers include libc headers. For example,
include/uapi/linux/agpgart.h includes <stdlib.h> after being exported.
The header search paths should match to the target we are compiling
them for.
Pick up the --target triple from KBUILD_CFLAGS in the same ways as
commit 7f58b487e9 ("kbuild: make Clang build userprogs for target
architecture").
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Bug: 190019968
Change-Id: I9f4867b4fc1d0c4cf1af2bb3d6f53c3a8a8f4437
(cherry picked from commit 9fbed27a7a1101c926718dfa9b49aff1d04477b5)
'kunit_test' member variable in task_struct is defined
under CONFIG_KUNIT. Besides there are supportive functions
in slub and kasan which gets conditionally compiled out.
Allow kunit to be build as vendor module by removing
compile time dependencies.
Bug: 215096354
Change-Id: If57b1df6e479aa0388aabc53af5ae10e20a844b2
Signed-off-by: Shiraz Hashim <quic_shashim@quicinc.com>
Add two new symbols to aarch64 kernel ABI:
* pkvm_iommu_sysmmu_sync_register
* pkvm_iommu_finalize
The former allows vendor modules to register a SYSMMU_SYNC device with
the hypervisor, and the latter tells the hypervisor to stop acception
new device registrations.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I6c6948d94cb6494f07d52b4e2b7e91db40e2fcd6
Add new hypercall that the host can use to inform the hypervisor that
all hypervisor-controlled IOMMUs have been registered and no new
registrations should be allowed. This will typically be called at the
end of kernel module initialization phase.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I8c175310d5b262a67947443c5a0154056a8ebf3e
The IOMMU DABT handler currently checks if the device is considered
powered by hyp before resolving the request. If the power tracking does
not reflect reality, the IOMMU may trigger issues in the host but the
incorrect state prevents it from diagnosing the issue.
Drop the powered check from the generic IOMMU code. The host accessing
the device's SFR means that it assumes it is powered, and individual
drivers can choose to reject that DABT request.
Bug: 224891559
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I1c132c4030a61a90be4675867c9658e3bc696118
SysMMU_SYNC devices expose an interface to start a sync counter and
poll its SFR until the device signals that all memory transactions in
flight at the start have drained. This gives the hypervisor a reliable
indicator that S2MPU invalidation has fully completed and all new
transactions will use the new MPTs.
Add a new pKVM IOMMU driver that the host can use to register
SysMMU_SYNCs. Each device is expected to be a supplier to exactly one
S2MPU (parent), but multiple SYNCs can supply a single S2MPU.
To keep things simple, the SYNCs do not implement suspend/resume and are
assumed to follow the power transitions of their parent.
Following an invalidation, the S2MPU driver iterates over its children
and waits for each SYNC to signal that its transactions have drained.
The algorithm currently waits on each SYNC in turn. If latency proves to
be an issue, this could be optimized to initiate a SYNC on all powered
devices before starting to poll.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I45b832fd11d76b65987935c8548e2a214ee2fa2a
In preparation for adding new IOMMU devices that act as suppliers to
others, add the notion of a parent IOMMU device. Such device must be
registered after its parent and the driver of the parent device must
validate the addition.
The relation has no generic implications, it is up to drivers to make
use of it.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I4ee3675e5529bb73ad4546fa32380f237f054177
In preparation for needing to validate more aspects of a device that is
about to be registered, change the callback to accept the to-be-added
'struct pkvm_iommu' rather than individual inputs.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I3fb911e4280c220ddd779cf6a5fc9c302a5617f7
Private EL2 mappings currently cannot be removed. Move the creation of
IOMMU device mappings at the end of the registration function so that
other errors do not result in unnecessary mappings.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I3139e9af3345f157295eb72441a7cf3cc055116d
Memory for IOMMU device entries gets allocated from a pool donated by
the host. It is possible for pkvm_iommu_register() to allocate the
memory and then fail, in which case the memory remains unused but not
freed.
Refactor the code such that the host lock covers the entire section
where the memory is allocated. This way we can return the memory back to
the linear allocator if an error is returned.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I8c1650ba3e545741144d793de506e93c4066896f
Currently __pkvm_iommu_pm_notify always changes the value of
dev->powered following a suspend/resume attempt. This could potentially
be abused to force the hypervisor to stop issuing updates to an S2MPU
and preserving an old/invalid state.
Modify to only update the power state if suspend/resume was successful.
Bug: 190463801
Signed-off-by: David Brazdil <dbrazdil@google.com>
Change-Id: I285fc822e9fc926c49b9b5e69446790e1edccafb
The synchronous wakeup interface is available only for the
interruptible wakeup. Add it for normal wakeup and use this
synchronous wakeup interface to wakeup the userspace daemon.
Scheduler can make use of this hint to find a better CPU for
the waker task.
With this change the performance numbers for compress, decompress
and copy use-cases on /sdcard path has improved by ~30%.
Use-case details:
1. copy 10000 files of each 4k size into /sdcard path
2. use any File explorer application that has compress/decompress
support
3. start compress/decompress and capture the time.
-------------------------------------------------
| Default | wakeup support | Improvement/Diff |
-------------------------------------------------
| 13.8 sec | 9.9 sec | 3.9 sec (28.26%) |
-------------------------------------------------
Co-developed-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
Signed-off-by: Pradeep P V K <quic_pragalla@quicinc.com>
Bug: 216261533
Link: https://lore.kernel.org/lkml/1638780405-38026-1-git-send-email-quic_pragalla@quicinc.com/
Change-Id: I9ac89064e34b1e0605064bf4d2d3a310679cb605
Signed-off-by: Pradeep P V K <quic_pragalla@quicinc.com>
Signed-off-by: Alessio Balsini <balsini@google.com>
Currently, The fixed 512KB prealloc buffer size is too larger for
tiny memory kernel (such as 16MB memory). This patch adds the module
option "prealloc_buffer_size_kbytes" to specify prealloc buffer size.
It's suitable for cards which use the generic dmaengine pcm driver
with no config.
Signed-off-by: Sugar Zhang <sugar.zhang@rock-chips.com>
Link: https://lore.kernel.org/r/1632394246-59341-1-git-send-email-sugar.zhang@rock-chips.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Bug: 226673331
Change-Id: I13e4e65502e8e9625577e9a9bb9f3e02154f31f4
(cherry picked from commit b0e3b0a7078d71455747025e7deee766d4d43432)
Signed-off-by: Zhipeng Wang <zhipeng.wang_1@nxp.com>
For SoC's skin temperature, we have to use more stringent temperature
control to make IPA can monitor and mitigate temperature control earlier
and faster, so add it to meet platform thermal requirement.
Bug: 211564753
Signed-off-by: Jeson Gao <jeson.gao@unisoc.com>
Signed-off-by: Di Shen <di.shen@unisoc.com>
Change-Id: Iaef87287eef93d6fdbc3c58c93f70c1525e38296
(cherry picked from commit 6709f523251f77dc1e9ea643668c630db1f7db80)
android_rvh_effective_cpu_util:
To perform vendor-specific cpu util, it is used in EAS/schedutil/thermal.
The effective_cpu_util would be called when thermal calc the dynamic power,
it's non-atomic context, so set the hook be restricted.
Bug: 226686099
Test: build pass
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
Change-Id: I6fd77f44ca4328f5ef37d96989aa2e08d65e29bb
virtio pci config structures may in future have non-standard bar
values in the bar field. We should anticipate this by skipping any
structures containing such a reserved value.
The bar value should never change: check for harmful modified values
we re-read it from the config space in vp_modern_map_capability().
Also clean up an existing check to consistently use PCI_STD_NUM_BARS.
Signed-off-by: Keir Fraser <keirf@google.com>
Link: https://lore.kernel.org/r/20220323140727.3499235-1-keirf@google.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 3f63a1d7f6f500b6891b1003cec3e23ea4996a2e)
Bug: 222232623
Signed-off-by: Keir Fraser <keirf@google.com>
Change-Id: Idbba48154a051cf173b9cb0bd40c77fcf02902a4
This reverts commit 9e35276a5344f74d4a3600fc4100b3dd251d5c56. Issue
were reported for the drivers that are using affinity managed IRQ
where manually toggling IRQ status is not expected. And we forget to
enable the interrupts in the restore path as well.
In the future, we will rework on the interrupt hardening.
Fixes: 9e35276a5344 ("virtio_pci: harden MSI-X interrupts")
Reported-by: Marc Zyngier <maz@kernel.org>
Reported-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20220323031524.6555-2-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit eb4cecb453a19b34d5454b49532e09e9cb0c1529)
Bug: 196772804
Signed-off-by: Keir Fraser <keirf@google.com>
Change-Id: I05264d9e61d558522a8a20cf87399aa3578b3a6e
This reverts commit 080cd7c3ac8701081d143a15ba17dd9475313188. Since
the MSI-X interrupts hardening will be reverted in the next patch. We
will rework the interrupt hardening in the future.
Fixes: 080cd7c3ac87 ("virtio-pci: harden INTX interrupts")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20220323031524.6555-1-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 7b79edfb862d6b1ecc66479419ae67a7db2d02e3)
Bug: 196772804
Signed-off-by: Keir Fraser <keirf@google.com>
Change-Id: I11ea35dcff3cfd43a535acf2033d7921f39c8be7
We no longer need to map the host's .rodata and .bss sections in the
pkvm hypervisor, so let's remove those mappings. This will avoid
creating dependencies at EL2 on host-controlled data-structures.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 225169428
Change-Id: I0fcb0e1b34d3c7c0c226b3fd30cdec0e8d7bfb44
The pkvm hypervisor may need to read the kvm_vgic_global_state variable
at EL2. Make sure to explicitely map it in the its stage-1 page-table
rather than relying on mapping all of .rodata.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 225169428
Change-Id: I72d1eba78fb6b7593d236539cd81269480856fdf
In pKVM mode, we can't trust the host not to mess with the hypervisor
per-cpu offsets, so let's move the array containing them to the nVHE
code.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 225169428
Change-Id: I9ef4175ce9cf00d6ff1c0e358551a565358f2408
The host KVM PMU code can currently index kvm_arm_hyp_percpu_base[]
through this_cpu_ptr_hyp_sym(), but will not actually dereference that
pointer when protected KVM is enabled. In preparation for making
kvm_arm_hyp_percpu_base[] unaccessible to the host, let's make sure the
indexing in hyp per-cpu pages is also done after the static key check to
avoid spurious accesses to EL2-private data from EL1.
Signed-off-by: Quentin Perret <qperret@google.com>
Bug: 225169428
Change-Id: I3f4e3f7ee789c31a1ae1f67e07edf8fb34f520b9
Add code to head.S's el2_setup to detect MPAM and disable any EL2 traps.
This register resets to an unknown value, setting it to the default
parititons/pmg before we enable the MMU is the best thing to do.
Kexec/kdump will depend on this if the previous kernel left the CPU
configured with a restrictive configuration.
If linux is booted at the highest implemented exception level el2_setup
will clear the enable bit, disabling MPAM.
Signed-off-by: James Morse <james.morse@arm.com>
Bug: 221768437
(cherry picked from commit fa0ff38f06b397d8a92d88eb8083c2c5a20ac87f
git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/snapshot/v5.16)
Change-Id: I2758f7f7b236d09a207e13d1165efb6887e8611a
Signed-off-by: Valentin Schneider <Valentin.Schneider@arm.com>
[bm: amended commit msg, dropped config option and switched to named labels]
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
This is a partial cherry-pick of commit:
7fe77616f156 ("arm64: cpufeature: discover CPU support for MPAM")
from git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git
Bug: 221768437
Change-Id: I77101abb07f9b73dbc7cc2a53ac44fbf772f0b1d
Signed-off-by: Valentin Schneider <Valentin.Schneider@arm.com>
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
While 5.15.27 LTS merge, the change of below patch might be omitted.
- 1921d1fd0e ("arm64: Mark start_backtrace() notrace and NOKPROBE_SYMBOL")
To correct the patch, we need to recover the NOKPROBE_SYMBOL macro along
with EXPORT_SYMBOL_GPL macro that was added by below patch.
- b7ca6bc390 ("ANDROID: arm64: stacktrace: export start_backtrace symbol")
Bug: 227151759
Signed-off-by: Chanho Park <chanho61.park@samsung.com>
Change-Id: I2e4cf0654c6bd65a8ad606709642a27cdc5b2f72
(recover commit from 1921d1fd0e)