Pablo Neira Ayuso says:
====================
netfilter/IPVS updates for net-next
The following patchset contains Netfilter updates for your net-next tree,
they are:
* Add full port randomization support. Some crazy researchers found a way
to reconstruct the secure ephemeral ports that are allocated in random mode
by sending off-path bursts of UDP packets to overrun the socket buffer of
the DNS resolver to trigger retransmissions, then if the timing for the
DNS resolution done by a client is larger than usual, then they conclude
that the port that received the burst of UDP packets is the one that was
opened. It seems a bit aggressive method to me but it seems to work for
them. As a result, Daniel Borkmann and Hannes Frederic Sowa came up with a
new NAT mode to fully randomize ports using prandom.
* Add a new classifier to x_tables based on the socket net_cls set via
cgroups. These includes two patches to prepare the field as requested by
Zefan Li. Also from Daniel Borkmann.
* Use prandom instead of get_random_bytes in several locations of the
netfilter code, from Florian Westphal.
* Allow to use the CTA_MARK_MASK in ctnetlink when mangling the conntrack
mark, also from Florian Westphal.
* Fix compilation warning due to unused variable in IPVS, from Geert
Uytterhoeven.
* Add support for UID/GID via nfnetlink_queue, from Valentina Giusti.
* Add IPComp extension to x_tables, from Fan Du.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates
This series contains updates to i40e and pci_regs.h.
Anjali provides a patch to prevent messages from stray HMC events, except
at interrupt message level, and refactors the HMC error handling.
Catherine adds routines in probe to populate/check PCI bus speed and width,
then verify we are in a 8GT/s x8 PCIe slot and warn when we are not.
Shannon adds Wake-on-LAN support for i40e, fixes curly brace use as well as
return type for i40e_vsi_clear_rings().
Joseph implements receive offload for VXLAN for i40e, where the hardware
supports checksum offload/verification of the inner/outer header.
Mitch provides the bulk of the changes, where he refactors the VF reset
code so that it works on real hardware. Then does code cleanup by
calling existing functions to enable and disable queues for VFs and
remove unused functions. Removes a unnecessary log messages that are
seen at every VF reset, for example complaining about disabling queues
that are already disabled. Fixes an error return when the VF asks to
add an invalid MAC address and if the VF sends a bad message, make it
more informative about what is actually going on.
Jesse refactors the LED function to flash LED lights correctly.
v2:
- removed patch 5 "i40e: add set settings and pauseparam" based on
feedback from Ben Hutchings, will re-work that patch for later
submission
- Added patch "i40e: Implementation of vxlan ndo's" from Joseph to
address Or Gerlitz's questions and concerns. This patch adds the
implementation for the VXLAN ndo's and allows the hardware to do
receive checksum offload for inner packets on the UDP ports that
VXLAN notifies us about.
- Added patch "i40e: using for_each_set_bit to simplify the code"
from Wei Yongjun. This patch uses for_each_set_bit() to simply
the code.
v3:
- fixed indentation issue in patch 11 based on feedback from
Sergei Shtylyov.
Sorry for the delayed release of v4, it was delayed to the holidays.
v4:
- Addressed Or Gerlitz's concerns about trying to get a hold of a mutex
while holding a spin lock in patch 6 by executing the AQ commands from
a subtask.
- Addressed David Miller's Kconfig concerns by creating a Kconfig VXLAN
option for i40e and wrapped appropriate code with the config option in
patch 6.
- Updated patch 7 based on the changes made in patch 6 in the above two
bullets.
v5:
- Added the patch to pci_regs.h based on David Miller's feedback to add
PCI defines for speed and width
- Updated patch 3 description to better explain the changes based on
feedback from David Miller
- Updated patch 4 to use the newly added defines to pci_regs.h instead
of local defines
- Updated patch 7 to use <net/vxlan.h> in the #include based on feedback
from David Miller
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_AD_SELECT to allow get/set of bonding parameter
ad_select via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_AD_LACP_RATE to allow get/set of bonding parameter
lacp_rate via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It would be useful e.g. in a server or desktop environment to have
a facility in the notion of fine-grained "per application" or "per
application group" firewall policies. Probably, users in the mobile,
embedded area (e.g. Android based) with different security policy
requirements for application groups could have great benefit from
that as well. For example, with a little bit of configuration effort,
an admin could whitelist well-known applications, and thus block
otherwise unwanted "hard-to-track" applications like [1] from a
user's machine. Blocking is just one example, but it is not limited
to that, meaning we can have much different scenarios/policies that
netfilter allows us than just blocking, e.g. fine grained settings
where applications are allowed to connect/send traffic to, application
traffic marking/conntracking, application-specific packet mangling,
and so on.
Implementation of PID-based matching would not be appropriate
as they frequently change, and child tracking would make that
even more complex and ugly. Cgroups would be a perfect candidate
for accomplishing that as they associate a set of tasks with a
set of parameters for one or more subsystems, in our case the
netfilter subsystem, which, of course, can be combined with other
cgroup subsystems into something more complex if needed.
As mentioned, to overcome this constraint, such processes could
be placed into one or multiple cgroups where different fine-grained
rules can be defined depending on the application scenario, while
e.g. everything else that is not part of that could be dropped (or
vice versa), thus making life harder for unwanted processes to
communicate to the outside world. So, we make use of cgroups here
to track jobs and limit their resources in terms of iptables
policies; in other words, limiting, tracking, etc what they are
allowed to communicate.
In our case we're working on outgoing traffic based on which local
socket that originated from. Also, one doesn't even need to have
an a-prio knowledge of the application internals regarding their
particular use of ports or protocols. Matching is *extremly*
lightweight as we just test for the sk_classid marker of sockets,
originating from net_cls. net_cls and netfilter do not contradict
each other; in fact, each construct can live as standalone or they
can be used in combination with each other, which is perfectly fine,
plus it serves Tejun's requirement to not introduce a new cgroups
subsystem. Through this, we result in a very minimal and efficient
module, and don't add anything except netfilter code.
One possible, minimal usage example (many other iptables options
can be applied obviously):
1) Configuring cgroups if not already done, e.g.:
mkdir /sys/fs/cgroup/net_cls
mount -t cgroup -o net_cls net_cls /sys/fs/cgroup/net_cls
mkdir /sys/fs/cgroup/net_cls/0
echo 1 > /sys/fs/cgroup/net_cls/0/net_cls.classid
(resp. a real flow handle id for tc)
2) Configuring netfilter (iptables-nftables), e.g.:
iptables -A OUTPUT -m cgroup ! --cgroup 1 -j DROP
3) Running applications, e.g.:
ping 208.67.222.222 <pid:1799>
echo 1799 > /sys/fs/cgroup/net_cls/0/tasks
64 bytes from 208.67.222.222: icmp_seq=44 ttl=49 time=11.9 ms
[...]
ping 208.67.220.220 <pid:1804>
ping: sendmsg: Operation not permitted
[...]
echo 1804 > /sys/fs/cgroup/net_cls/0/tasks
64 bytes from 208.67.220.220: icmp_seq=89 ttl=56 time=19.0 ms
[...]
Of course, real-world deployments would make use of cgroups user
space toolsuite, or own custom policy daemons dynamically moving
applications from/to various cgroups.
[1] http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-biondi/bh-eu-06-biondi-up.pdf
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We currently use prandom_u32() for allocation of ports in tcp bind(0)
and udp code. In case of plain SNAT we try to keep the ports as is
or increment on collision.
SNAT --random mode does use per-destination incrementing port
allocation. As a recent paper pointed out in [1] that this mode of
port allocation makes it possible to an attacker to find the randomly
allocated ports through a timing side-channel in a socket overloading
attack conducted through an off-path attacker.
So, NF_NAT_RANGE_PROTO_RANDOM actually weakens the port randomization
in regard to the attack described in this paper. As we need to keep
compatibility, add another flag called NF_NAT_RANGE_PROTO_RANDOM_FULLY
that would replace the NF_NAT_RANGE_PROTO_RANDOM hash-based port
selection algorithm with a simple prandom_u32() in order to mitigate
this attack vector. Note that the lfsr113's internal state is
periodically reseeded by the kernel through a local secure entropy
source.
More details can be found in [1], the basic idea is to send bursts
of packets to a socket to overflow its receive queue and measure
the latency to detect a possible retransmit when the port is found.
Because of increasing ports to given destination and port, further
allocations can be predicted. This information could then be used by
an attacker for e.g. for cache-poisoning, NS pinning, and degradation
of service attacks against DNS servers [1]:
The best defense against the poisoning attacks is to properly
deploy and validate DNSSEC; DNSSEC provides security not only
against off-path attacker but even against MitM attacker. We hope
that our results will help motivate administrators to adopt DNSSEC.
However, full DNSSEC deployment make take significant time, and
until that happens, we recommend short-term, non-cryptographic
defenses. We recommend to support full port randomisation,
according to practices recommended in [2], and to avoid
per-destination sequential port allocation, which we show may be
vulnerable to derandomisation attacks.
Joint work between Hannes Frederic Sowa and Daniel Borkmann.
[1] https://sites.google.com/site/hayashulman/files/NIC-derandomisation.pdf
[2] http://arxiv.org/pdf/1205.5190v1.pdf
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In order to facilitate development for netlink protocol dissector,
fill the unused field skb->pkt_type of the cloned skb with a hint
of the address space of the new owner (receiver) socket in the
notion of "to kernel" resp. "to user".
At the time we invoke __netlink_deliver_tap_skb(), we already have
set the new skb owner via netlink_skb_set_owner_r(), so we can use
that for netlink_is_kernel() probing.
In normal PF_PACKET network traffic, this field denotes if the
packet is destined for us (PACKET_HOST), if it's broadcast
(PACKET_BROADCAST), etc.
As we only have 3 bit reserved, we can use the value (= 6) of
PACKET_FASTROUTE as it's _not used_ anywhere in the whole kernel
and not supported anywhere, and packets of such type were never
exposed to user space, so there are no overlapping users of such
kind. Thus, as wished, that seems the only way to make both
PACKET_* values non-overlapping and therefore device agnostic.
By using those two flags for netlink skbs on nlmon devices, they
can be made available and picked up via sll_pkttype (previously
unused in netlink context) in struct sockaddr_ll. We now have
these two directions:
- PACKET_USER (= 6) -> to user space
- PACKET_KERNEL (= 7) -> to kernel space
Partial `ip a` example strace for sa_family=AF_NETLINK with
detected nl msg direction:
syscall: direction:
sendto(3, ...) = 40 /* to kernel */
recvmsg(3, ...) = 3404 /* to user */
recvmsg(3, ...) = 1120 /* to user */
recvmsg(3, ...) = 20 /* to user */
sendto(3, ...) = 40 /* to kernel */
recvmsg(3, ...) = 168 /* to user */
recvmsg(3, ...) = 144 /* to user */
recvmsg(3, ...) = 20 /* to user */
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we set burst to 1514 with low rate in userspace,
the kernel get a value of burst that less than 1514,
which doesn't work.
Because it may make some loss when transform burst
to buffer in userspace. This makes burst lose some
bytes, when the kernel transform the buffer back to
burst.
This patch adds two new attributes to support sending
burst/mtu to kernel directly to avoid the loss.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With this plugin, user could specify IPComp tagged with certain
CPI that host not interested will be DROPped or any other action.
For example:
iptables -A INPUT -p 108 -m ipcomp --ipcompspi 0x87 -j DROP
ip6tables -A INPUT -p 108 -m ipcomp --ipcompspi 0x87 -j DROP
Then input IPComp packet with CPI equates 0x87 will not reach
upper layer anymore.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Thanks to commits 41063e9 (ipv4: Early TCP socket demux) and 421b388
(udp: ipv4: Add udp early demux) it is now possible to parse UID and
GID socket info also for incoming TCP and UDP connections. Having
this info available, it is convenient to let NFQUEUE parse it in
order to improve and refine the traffic analysis in userspace.
Signed-off-by: Valentina Giusti <valentina.giusti@bmw-carit.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add IFLA_BOND_PACKETS_PER_SLAVE to allow get/set of bonding parameter
packets_per_slave via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_ALL_SLAVES_ACTIVE to allow get/set of bonding parameter
all_slaves_active via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_NUM_PEER_NOTIF to allow get/set of bonding parameter
num_grat_arp via netlink. Bonding parameter num_unsol_na is
synonymous with num_grat_arp, so add only one netlink attribute
to represent both bonding parameters.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements the first size-based qdisc that attempts to
differentiate between small flows and heavy-hitters. The goal is to
catch the heavy-hitters and move them to a separate queue with less
priority so that bulk traffic does not affect the latency of critical
traffic. Currently "less priority" means less weight (2:1 in
particular) in a Weighted Deficit Round Robin (WDRR) scheduler.
In essence, this patch addresses the "delay-bloat" problem due to
bloated buffers. In some systems, large queues may be necessary for
obtaining CPU efficiency, or due to the presence of unresponsive
traffic like UDP, or just a large number of connections with each
having a small amount of outstanding traffic. In these circumstances,
HHF aims to reduce the HoL blocking for latency sensitive traffic,
while not impacting the queues built up by bulk traffic. HHF can also
be used in conjunction with other AQM mechanisms such as CoDel.
To capture heavy-hitters, we implement the "multi-stage filter" design
in the following paper:
C. Estan and G. Varghese, "New Directions in Traffic Measurement and
Accounting", in ACM SIGCOMM, 2002.
Some configurable qdisc settings through 'tc':
- hhf_reset_timeout: period to reset counter values in the multi-stage
filter (default 40ms)
- hhf_admit_bytes: threshold to classify heavy-hitters
(default 128KB)
- hhf_evict_timeout: threshold to evict idle heavy-hitters
(default 1s)
- hhf_non_hh_weight: Weighted Deficit Round Robin (WDRR) weight for
non-heavy-hitters (default 2)
- hh_flows_limit: max number of heavy-hitter flow entries
(default 2048)
Note that the ratio between hhf_admit_bytes and hhf_reset_timeout
reflects the bandwidth of heavy-hitters that we attempt to capture
(25Mbps with the above default settings).
The false negative rate (heavy-hitter flows getting away unclassified)
is zero by the design of the multi-stage filter algorithm.
With 100 heavy-hitter flows, using four hashes and 4000 counters yields
a false positive rate (non-heavy-hitters mistakenly classified as
heavy-hitters) of less than 1e-4.
Signed-off-by: Terry Lam <vtlam@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPV6_PMTU_INTERFACE is the same as IPV6_PMTU_PROBE for ipv6. Add it
nontheless for symmetry with IPv4 sockets. Also drop incoming MTU
information if this mode is enabled.
The additional bit in ipv6_pinfo just eats in the padding behind the
bitfield. There are no changes to the layout of the struct at all.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/intel/i40e/i40e_main.c
drivers/net/macvtap.c
Both minor merge hassles, simple overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
struct tpacket{2,3}_hdr is aligned to a multiple of TPACKET_ALIGNMENT.
Explicitly defining and zeroing the gap of this makes additional changes
easier.
Signed-off-by: Atzm Watanabe <atzm@stratosphere.co.jp>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_RESEND_IGMP to allow get/set of bonding parameter
resend_igmp via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_XMIT_HASH_POLICY to allow get/set of bonding parameter
xmit_hash_policy via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_FAIL_OVER_MAC to allow get/set of bonding parameter
fail_over_mac via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_PRIMARY_SELECT to allow get/set of bonding parameter
primary_select via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_PRIMARY to allow get/set of bonding parameter
primary via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add VHT MCS/NSS set support for nl80211_set_tx_bitrate_mask().
This should be used mainly for test purpose, to check
different MCS/NSS VHT combinations.
Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add IFLA_BOND_ARP_ALL_TARGETS to allow get/set of bonding parameter
arp_all_targets via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_ARP_IP_TARGET to allow get/set of bonding parameter
arp_ip_target via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to netconf to show changes to proxy-arp status on a per
interface basis via netlink in a manner similar to forwarding
and reverse path state.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces a PACKET_QDISC_BYPASS socket option, that
allows for using a similar xmit() function as in pktgen instead
of taking the dev_queue_xmit() path. This can be very useful when
PF_PACKET applications are required to be used in a similar
scenario as pktgen, but with full, flexible packet payload that
needs to be provided, for example.
On default, nothing changes in behaviour for normal PF_PACKET
TX users, so everything stays as is for applications. New users,
however, can now set PACKET_QDISC_BYPASS if needed to prevent
own packets from i) reentering packet_rcv() and ii) to directly
push the frame to the driver.
In doing so we can increase pps (here 64 byte packets) for
PF_PACKET a bit:
# CPUs -- QDISC_BYPASS -- qdisc path -- qdisc path[**]
1 CPU == 1,509,628 pps -- 1,208,708 -- 1,247,436
2 CPUs == 3,198,659 pps -- 2,536,012 -- 1,605,779
3 CPUs == 4,787,992 pps -- 3,788,740 -- 1,735,610
4 CPUs == 6,173,956 pps -- 4,907,799 -- 1,909,114
5 CPUs == 7,495,676 pps -- 5,956,499 -- 2,014,422
6 CPUs == 9,001,496 pps -- 7,145,064 -- 2,155,261
7 CPUs == 10,229,776 pps -- 8,190,596 -- 2,220,619
8 CPUs == 11,040,732 pps -- 9,188,544 -- 2,241,879
9 CPUs == 12,009,076 pps -- 10,275,936 -- 2,068,447
10 CPUs == 11,380,052 pps -- 11,265,337 -- 1,578,689
11 CPUs == 11,672,676 pps -- 11,845,344 -- 1,297,412
[...]
20 CPUs == 11,363,192 pps -- 11,014,933 -- 1,245,081
[**]: qdisc path with packet_rcv(), how probably most people
seem to use it (hopefully not anymore if not needed)
The test was done using a modified trafgen, sending a simple
static 64 bytes packet, on all CPUs. The trick in the fast
"qdisc path" case, is to avoid reentering packet_rcv() by
setting the RAW socket protocol to zero, like:
socket(PF_PACKET, SOCK_RAW, 0);
Tradeoffs are documented as well in this patch, clearly, if
queues are busy, we will drop more packets, tc disciplines are
ignored, and these packets are not visible to taps anymore. For
a pktgen like scenario, we argue that this is acceptable.
The pointer to the xmit function has been placed in packet
socket structure hole between cached_dev and prot_hook that
is hot anyway as we're working on cached_dev in each send path.
Done in joint work together with Jesper Dangaard Brouer.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull input updates from Dmitry Torokhov:
"An update to ALPS to support devices on Dell XT2 (hopefully working
better this time around and although it is largish it should not
affect any other ALPS devices) and a tiny update to Elantech driver to
support newer devices as well.
Also a coupe of new input event codes have been defined"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: ALPS - add support for DualPoint device on Dell XT2 model
Input: elantech - add support for newer (August 2013) devices
Input: add SW_MUTE_DEVICE switch definition
Input: usbtouchscreen - separate report and transmit buffer size handling
Input: sur40 - suppress false uninitialized variable warning
Input: add key code for ambient light sensor button
Input: keyboard - "keycode & KEY_MAX" changes some keycode values
Pull char/misc driver fixes from Greg KH:
"Nothing huge, just a few small bugfixes for problems reported, and a
device id update"
* tag 'char-misc-3.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
mei: add 9 series PCH mei device ids
drivers/char/i8k.c: add Dell XPLS L421X
MAINTAINERS: add HSI subsystem
misc: mic: Suppress memory space sparse warnings
misc: mic: Fix endianness issues.
misc: mic: Fix user space namespace pollution from mic_common.h.
misc: mic: Bug fix for sysfs poll usage.
misc: mic: Minor bug fix in 'retry' loops.
misc: mic: Change mic_notify(...) to return true.
extcon: remove freed groups caused the panic or warning in unregister flow
extcon: arizona: Get pdata from arizona structure not device
Creating an address with this flag set will result in kernel taking care
of temporary addresses in the same way as if the address was created by
kernel itself (after RA receive). This allows userspace applications
implementing the autoconfiguration (NetworkManager for example) to
implement ipv6 addresses privacy.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no more space in u8 ifa_flags. So do what davem suffested and
add another netlink attr called IFA_FLAGS for carry more flags.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
Please pull this batch of updates intended for the 3.14 stream...
For the mac80211 bits, Johannes says:
"I have various improvements/cleanups/fixes all over, but the shortlog
shows that Luis's regulatory work and mesh work from the cozybit folks
are the biggest ones, along with the CSA fixes."
Along with that, we have big batches of updates to brcmfmac, rtlwifi,
and ath9k. There are updates to wcn36xx, rt2x00, and a handful of
others as well.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
With the introduction of TCP Small Queues, TSO auto sizing, and TCP
pacing, we can implement Automatic Corking in the kernel, to help
applications doing small write()/sendmsg() to TCP sockets.
Idea is to change tcp_push() to check if the current skb payload is
under skb optimal size (a multiple of MSS bytes)
If under 'size_goal', and at least one packet is still in Qdisc or
NIC TX queues, set the TCP Small Queue Throttled bit, so that the push
will be delayed up to TX completion time.
This delay might allow the application to coalesce more bytes
in the skb in following write()/sendmsg()/sendfile() system calls.
The exact duration of the delay is depending on the dynamics
of the system, and might be zero if no packet for this flow
is actually held in Qdisc or NIC TX ring.
Using FQ/pacing is a way to increase the probability of
autocorking being triggered.
Add a new sysctl (/proc/sys/net/ipv4/tcp_autocorking) to control
this feature and default it to 1 (enabled)
Add a new SNMP counter : nstat -a | grep TcpExtTCPAutoCorking
This counter is incremented every time we detected skb was under used
and its flush was deferred.
Tested:
Interesting effects when using line buffered commands under ssh.
Excellent performance results in term of cpu usage and total throughput.
lpq83:~# echo 1 >/proc/sys/net/ipv4/tcp_autocorking
lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
9410.39
Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':
35209.439626 task-clock # 2.901 CPUs utilized
2,294 context-switches # 0.065 K/sec
101 CPU-migrations # 0.003 K/sec
4,079 page-faults # 0.116 K/sec
97,923,241,298 cycles # 2.781 GHz [83.31%]
51,832,908,236 stalled-cycles-frontend # 52.93% frontend cycles idle [83.30%]
25,697,986,603 stalled-cycles-backend # 26.24% backend cycles idle [66.70%]
102,225,978,536 instructions # 1.04 insns per cycle
# 0.51 stalled cycles per insn [83.38%]
18,657,696,819 branches # 529.906 M/sec [83.29%]
91,679,646 branch-misses # 0.49% of all branches [83.40%]
12.136204899 seconds time elapsed
lpq83:~# echo 0 >/proc/sys/net/ipv4/tcp_autocorking
lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
6624.89
Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':
40045.864494 task-clock # 3.301 CPUs utilized
171 context-switches # 0.004 K/sec
53 CPU-migrations # 0.001 K/sec
4,080 page-faults # 0.102 K/sec
111,340,458,645 cycles # 2.780 GHz [83.34%]
61,778,039,277 stalled-cycles-frontend # 55.49% frontend cycles idle [83.31%]
29,295,522,759 stalled-cycles-backend # 26.31% backend cycles idle [66.67%]
108,654,349,355 instructions # 0.98 insns per cycle
# 0.57 stalled cycles per insn [83.34%]
19,552,170,748 branches # 488.244 M/sec [83.34%]
157,875,417 branch-misses # 0.81% of all branches [83.34%]
12.130267788 seconds time elapsed
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>