According to the perf_event_map_fd and index, the function
bpf_perf_event_read() can convert the corresponding map
value to the pointer to struct perf_event and return the
Hardware PMU counter value.
Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce a new bpf map type 'BPF_MAP_TYPE_PERF_EVENT_ARRAY'.
This map only stores the pointer to struct perf_event. The
user space event FDs from perf_event_open() syscall are converted
to the pointer to struct perf_event and stored in map.
Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IFLA_VXLAN_FLOWBASED is useless without IFLA_VXLAN_COLLECT_METADATA,
so combine them into single IFLA_VXLAN_COLLECT_METADATA flag.
'flowbased' doesn't convey real meaning of the vxlan tunnel mode.
This mode can be used by routing, tc+bpf and ovs.
Only ovs is strictly flow based, so 'collect metadata' is a better
name for this tunnel mode.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next, they are:
1) A couple of cleanups for the netfilter core hook from Eric Biederman.
2) Net namespace hook registration, also from Eric. This adds a dependency with
the rtnl_lock. This should be fine by now but we have to keep an eye on this
because if we ever get the per-subsys nfnl_lock before rtnl we have may
problems in the future. But we have room to remove this in the future by
propagating the complexity to the clients, by registering hooks for the init
netns functions.
3) Update nf_tables to use the new net namespace hook infrastructure, also from
Eric.
4) Three patches to refine and to address problems from the new net namespace
hook infrastructure.
5) Switch to alternate jumpstack in xtables iff the packet is reentering. This
only applies to a very special case, the TEE target, but Eric Dumazet
reports that this is slowing down things for everyone else. So let's only
switch to the alternate jumpstack if the tee target is in used through a
static key. This batch also comes with offline precalculation of the
jumpstack based on the callchain depth. From Florian Westphal.
6) Minimal SCTP multihoming support for our conntrack helper, from Michal
Kubecek.
7) Reduce nf_bridge_info per skbuff scratchpad area to 32 bytes, from Florian
Westphal.
8) Fix several checkpatch errors in bridge netfilter, from Bernhard Thaler.
9) Get rid of useless debug message in ip6t_REJECT, from Subash Abhinov.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In multiple locations there are checks for whether the label in hand
is a reserved label or not using the arbritray value of 16. Factor
this out into a #define for better maintainability and for
documentation.
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add skb->hash to the __sk_buff offset map, so it can be accessed from
an eBPF program. We currently already do this for classic BPF filters,
but not yet on eBPF, it might be useful as a demuxer in combination with
helpers like bpf_clone_redirect(), toy example:
__section("cls-lb") int ingress_main(struct __sk_buff *skb)
{
unsigned int which = 3 + (skb->hash & 7);
/* bpf_skb_store_bytes(skb, ...); */
/* bpf_l{3,4}_csum_replace(skb, ...); */
bpf_clone_redirect(skb, which, 0);
return -1;
}
I was thinking whether to add skb_get_hash(), but then concluded the
raw skb->hash seems fine in this case: we can directly access the hash
w/o extra eBPF helper function call, it's filled out by many NICs on
ingress, and in case the entropy level would not be sufficient, people
can still implement their own specific sw fallback hash mix anyway.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
arch/s390/net/bpf_jit_comp.c
drivers/net/ethernet/ti/netcp_ethss.c
net/bridge/br_multicast.c
net/ipv4/ip_fragment.c
All four conflicts were cases of simple overlapping
changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
tlb_dynamic_lb could be set only via sysfs, this patch allows it to be
set via netlink.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two vxlan driver flags FLOWBASED and COLLECT_METADATA need to be set to
make use of its new flow mode. The former already exposed. Expose the latter.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce helpers to let eBPF programs attached to TC manipulate tunnel metadata:
bpf_skb_[gs]et_tunnel_key(skb, key, size, flags)
skb: pointer to skb
key: pointer to 'struct bpf_tunnel_key'
size: size of 'struct bpf_tunnel_key'
flags: room for future extensions
First eBPF program that uses these helpers will allocate per_cpu
metadata_dst structures that will be used on TX.
On RX metadata_dst is allocated by tunnel driver.
Typical usage for TX:
struct bpf_tunnel_key tkey;
... populate tkey ...
bpf_skb_set_tunnel_key(skb, &tkey, sizeof(tkey), 0);
bpf_clone_redirect(skb, vxlan_dev_ifindex, 0);
RX:
struct bpf_tunnel_key tkey = {};
bpf_skb_get_tunnel_key(skb, &tkey, sizeof(tkey), 0);
... lookup or redirect based on tkey ...
'struct bpf_tunnel_key' will be extended in the future by adding
elements to the end and the 'size' argument will indicate which fields
are populated, thereby keeping backwards compatibility.
The 'flags' argument may be used as well when the 'size' is not enough or
to indicate completely different layout of bpf_tunnel_key.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 6fd99094de ("ipv6: Don't reduce hop limit for an interface")
disabled accept hop limit from RA if it is smaller than the current hop
limit for security stuff. But this behavior kind of break the RFC definition.
RFC 4861, 6.3.4. Processing Received Router Advertisements
A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
and Retrans Timer) may contain a value denoting that it is
unspecified. In such cases, the parameter should be ignored and the
host should continue using whatever value it is already using.
If the received Cur Hop Limit value is non-zero, the host SHOULD set
its CurHopLimit variable to the received value.
So add sysctl option accept_ra_min_hop_limit to let user choose the minimum
hop limit value they can accept from RA. And set default to 1 to meet RFC
standards.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently nf_conntrack_proto_sctp module handles only packets between
primary addresses used to establish the connection. Any packets between
secondary addresses are classified as invalid so that usual firewall
configurations drop them. Allowing HEARTBEAT and HEARTBEAT-ACK chunks to
establish a new conntrack would allow traffic between secondary
addresses to pass through. A more sophisticated solution based on the
addresses advertised in the initial handshake (and possibly also later
dynamic address addition and removal) would be much harder to implement.
Moreover, in general we cannot assume to always see the initial
handshake as it can be routed through a different path.
The patch adds two new conntrack states:
SCTP_CONNTRACK_HEARTBEAT_SENT - a HEARTBEAT chunk seen but not acked
SCTP_CONNTRACK_HEARTBEAT_ACKED - a HEARTBEAT acked by HEARTBEAT-ACK
State transition rules:
- HEARTBEAT_SENT responds to usual chunks the same way as NONE (so that
the behaviour changes as little as possible)
- HEARTBEAT_ACKED responds to usual chunks the same way as ESTABLISHED
does, except the resulting state is HEARTBEAT_ACKED rather than
ESTABLISHED
- previously existing states except NONE are preserved when HEARTBEAT or
HEARTBEAT-ACK is seen
- NONE (in the initial direction) changes to HEARTBEAT_SENT on HEARTBEAT
and to CLOSED on HEARTBEAT-ACK
- HEARTBEAT_SENT changes to HEARTBEAT_ACKED on HEARTBEAT-ACK in the
reply direction
- HEARTBEAT_SENT and HEARTBEAT_ACKED are preserved on HEARTBEAT and
HEARTBEAT-ACK otherwise
Normally, vtag is set from the INIT chunk for the reply direction and
from the INIT-ACK chunk for the originating direction (i.e. each of
these defines vtag value for the opposite direction). For secondary
conntracks, we can't rely on seeing INIT/INIT-ACK and even if we have
seen them, we would need to connect two different conntracks. Therefore
simplified logic is applied: vtag of first packet in each direction
(HEARTBEAT in the originating and HEARTBEAT-ACK in reply direction) is
saved and all following packets in that direction are compared with this
saved value. While INIT and INIT-ACK define vtag for the opposite
direction, vtags extracted from HEARTBEAT and HEARTBEAT-ACK are always
for their direction.
Default timeout values for new states are
HEARTBEAT_SENT: 30 seconds (default hb_interval)
HEARTBEAT_ACKED: 210 seconds (hb_interval * path_max_retry + max_rto)
(We cannot expect to see the shutdown sequence so that, unlike
ESTABLISHED, the HEARTBEAT_ACKED timeout shouldn't be too long.)
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull virtio/vhost fixes from Michael Tsirkin:
"Bugfixes and documentation fixes.
Igor's patch that allows users to tweak memory table size is
borderline, but it does fix known crashes, so I merged it"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost: add max_mem_regions module parameter
vhost: extend memory regions allocation to vmalloc
9p/trans_virtio: reset virtio device on remove
virtio/s390: rename drivers/s390/kvm -> drivers/s390/virtio
MAINTAINERS: separate section for s390 virtio drivers
virtio: define virtio_pci_cfg_cap in header.
virtio: Fix typecast of pointer in vring_init()
virtio scsi: fix unused variable warning
vhost: use binary search instead of linear in find_region()
virtio_net: document VIRTIO_NET_CTRL_GUEST_OFFLOADS
Per RFC 6724, section 4, "Candidate Source Addresses":
It is RECOMMENDED that the candidate source addresses be the set
of unicast addresses assigned to the interface that will be used
to send to the destination (the "outgoing" interface).
Add a sysctl to enable this behaviour.
Signed-off-by: Erik Kline <ek@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This add the ability to select a routing table based on the tunnel
id which allows to maintain separate routing tables for each virtual
tunnel network.
ip rule add from all tunnel-id 100 lookup 100
ip rule add from all tunnel-id 200 lookup 200
A new static key controls the collection of metadata at tunnel level
upon demand.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This introduces a new IP tunnel lightweight tunnel type which allows
to specify IP tunnel instructions per route. Only IPv4 is supported
at this point.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allows putting a VXLAN device into a new flow-based mode in which
skbs with a ip_tunnel_info dst metadata attached will be encapsulated
according to the instructions stored in there with the VXLAN device
defaults taken into consideration.
Similar on the receive side, if the VXLAN_F_COLLECT_METADATA flag is
set, the packet processing will populate a ip_tunnel_info struct for
each packet received and attach it to the skb using the new metadata
dst. The metadata structure will contain the outer header and tunnel
header fields which have been stripped off. Layers further up in the
stack such as routing, tc or netfitler can later match on these fields
and perform forwarding. It is the responsibility of upper layers to
ensure that the flag is set if the metadata is needed. The flag limits
the additional cost of metadata collecting based on demand.
This prepares the VXLAN device to be steered by the routing and other
subsystems which allows to support encapsulation for a large number
of tunnel endpoints and tunnel ids through a single net_device which
improves the scalability.
It also allows for OVS to leverage this mode which in turn allows for
the removal of the OVS specific VXLAN code.
Because the skb is currently scrubed in vxlan_rcv(), the attachment of
the new dst metadata is postponed until after scrubing which requires
the temporary addition of a new member to vxlan_metadata. This member
is removed again in a later commit after the indirect VXLAN receive API
has been removed.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename the tunnel metadata data structures currently internal to
OVS and make them generic for use by all IP tunnels.
Both structures are kernel internal and will stay that way. Their
members are exposed to user space through individual Netlink
attributes by OVS. It will therefore be possible to extend/modify
these structures without affecting user ABI.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This implementation uses lwtunnel infrastructure to register
hooks for mpls tunnel encaps.
It picks cues from iptunnel_encaps infrastructure and previous
mpls iptunnel RFC patches from Eric W. Biederman and Robert Shearman
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provides infrastructure to parse/dump/store encap information for
light weight tunnels like mpls. Encap information for such tunnels
is associated with fib routes.
This infrastructure is based on previous suggestions from
Eric Biederman to follow the xfrm infrastructure.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces two new RTA attributes to attach encap
data to fib routes.
Example iproute2 command to attach mpls encap data to ipv4 routes
$ip route add 10.1.1.0/30 encap mpls 200 via inet 10.1.1.1 dev swp1
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow eBPF programs attached to TC qdiscs call skb_vlan_push/pop via
helper functions. These functions may change skb->data/hlen which are
cached by some JITs to improve performance of ld_abs/ld_ind instructions.
Therefore JITs need to recognize bpf_skb_vlan_push/pop() calls,
re-compute header len and re-cache skb->data/hlen back into cpu registers.
Note, skb->data/hlen are not directly accessible from the programs,
so any changes to skb->data done either by these helpers or by other
TC actions are safe.
eBPF JIT supported by three architectures:
- arm64 JIT is using bpf_load_pointer() without caching, so it's ok as-is.
- x64 JIT re-caches skb->data/hlen unconditionally after vlan_push/pop calls
(experiments showed that conditional re-caching is slower).
- s390 JIT falls back to interpreter for now when bpf_skb_vlan_push() is present
in the program (re-caching is tbd).
These helpers allow more scalable handling of vlan from the programs.
Instead of creating thousands of vlan netdevs on top of eth0 and attaching
TC+ingress+bpf to all of them, the program can be attached to eth0 directly
and manipulate vlans as necessary.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-07-17
This series contains updates to igb, ixgbe, ixgbevf, i40e, bnx2x,
freescale, siena and dp83640.
Jacob provides several patches to clarify the intended way to implement
both SIOCSHWTSTAMP and ethtool's get_ts_info(). It is okay to support
the specific filters in SIOCSHWTSTAMP by upscaling them to the generic
filters.
Alex Duyck provides a igb patch to pull the time stamp from the fragment
before it gets added to the skb, to avoid a possible issue in which the
fragment can possibly be less than IGB_RX_HDR_LEN due to the time stamp
being pulled after the copybreak check. Also provides a ixgbevf patch to
fold the ixgbevf_pull_tail() call into ixgbevf_add_rx_frag(), which gives
the advantage that the fragment does not have to be modified after it is
added to the skb.
Fan provides patches for ixgbe/ixgbevf to set the receive hash type
based on receive descriptor RSS type.
Todd provides a fix for igb where on check for link on any media other
than copper was not being detected since it was looking on the incorrect
PHY page (due to the page being used gets switched before the function
to check link gets executed).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
It would be very useful to retrieve the net_cls's classid from an eBPF
program to allow for a more fine-grained classification, it could be
directly used or in conjunction with additional policies. I.e. docker,
but also tooling such as cgexec, can easily run applications via net_cls
cgroups:
cgcreate -g net_cls:/foo
echo 42 > foo/net_cls.classid
cgexec -g net_cls:foo <prog>
Thus, their respecitve classid cookie of foo can then be looked up on
the egress path to apply further policies. The helper is desigend such
that a non-zero value returns the cgroup id.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds some clarification about the intended way to implement
both SIOCSHWTSTAMP and ethtool's get_ts_info. The HWTSTAMP API has
several Rx filters which are very specific, as well as more general
filters. The specific filters really only exist to support some broken
hardware which can't fully implement the generic filters. This patch
adds clarification that it is okay to support the specific filters in
SIOCSHWTSTAMP by upscaling them to the generic filters. In addition,
update the header for ethtool_ts_info to specify that drivers ought to
only report the filters they support without upscaling in this manner.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Reviewed-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Conflicts:
net/bridge/br_mdb.c
Minor conflict in br_mdb.c, in 'net' we added a memset of the
on-stack 'ip' variable whereas in 'net-next' we assign a new
member 'vid'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Until now all user mdb entries were added in vlan 0, this patch adds
support to allow the user to specify the vlan for the entry.
About the uapi change a hole in struct br_mdb_entry is used so the size
and offsets are kept the same (verified with pahole and tested with older
iproute2).
Example:
$ bridge mdb
dev br0 port eth1 grp 239.0.0.1 permanent vlan 2000
dev br0 port eth1 grp 239.0.0.1 permanent vlan 200
dev br0 port eth1 grp 239.0.0.1 permanent
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Missing list head init in bluetooth hidp session creation, from Tedd
Ho-Jeong An.
2) Don't leak SKB in bridge netfilter error paths, from Florian
Westphal.
3) ipv6 netdevice private leak in netfilter bridging, fixed by Julien
Grall.
4) Fix regression in IP over hamradio bpq encapsulation, from Ralf
Baechle.
5) Fix race between rhashtable resize events and table walks, from Phil
Sutter.
6) Missing validation of IFLA_VF_INFO netlink attributes, fix from
Daniel Borkmann.
7) Missing security layer socket state initialization in tipc code,
from Stephen Smalley.
8) Fix shared IRQ handling in boomerang 3c59x interrupt handler, from
Denys Vlasenko.
9) Missing minor_idr destroy on module unload on macvtap driver, from
Johannes Thumshirn.
10) Various pktgen kernel thread races, from Oleg Nesterov.
11) Fix races that can cause packets to be processed in the backlog even
after a device attached to that SKB has been fully unregistered.
From Julian Anastasov.
12) bcmgenet driver doesn't account packet drops vs. errors properly,
fix from Petri Gynther.
13) Array index validation and off by one fix in DSA layer from Florian
Fainelli
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (66 commits)
can: replace timestamp as unique skb attribute
ARM: dts: dra7x-evm: Prevent glitch on DCAN1 pinmux
can: c_can: Fix default pinmux glitch at init
can: rcar_can: unify error messages
can: rcar_can: print request_irq() error code
can: rcar_can: fix typo in error message
can: rcar_can: print signed IRQ #
can: rcar_can: fix IRQ check
net: dsa: Fix off-by-one in switch address parsing
net: dsa: Test array index before use
net: switchdev: don't abort unsupported operations
net: bcmgenet: fix accounting of packet drops vs errors
cdc_ncm: update specs URL
Doc: z8530book: Fix typo in API-z8530-sync-txdma-open.html
net: inet_diag: always export IPV6_V6ONLY sockopt for listening sockets
bridge: mdb: allow the user to delete mdb entry if there's a querier
net: call rcu_read_lock early in process_backlog
net: do not process device backlog during unregistration
bridge: fix potential crash in __netdev_pick_tx()
net: axienet: Fix devm_ioremap_resource return value check
...
This kernel patch exports the value of the new
ignore_routes_with_linkdown via netconf.
v2: changes to notify userspace via netlink when sysctl values change
and proposed for 'net' since this could be considered a bugfix
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Suggested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The virtio_ring.h header is used in userspace programs (ie. QEMU),
too. Here we can not assume that sizeof(pointer) is the same as
sizeof(long), e.g. when compiling for Windows, so the typecast in
vring_init() should be done with (uintptr_t) instead of (unsigned long).
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Pull virtio/vhost cross endian support from Michael Tsirkin:
"I have just queued some more bugfix patches today but none fix
regressions and none are related to these ones, so it looks like a
good time for a merge for -rc1.
The motivation for this is support for legacy BE guests on the new LE
hosts. There are two redeeming properties that made me merge this:
- It's a trivial amount of code: since we wrap host/guest accesses
anyway, almost all of it is well hidden from drivers.
- Sane platforms would never set flags like VHOST_CROSS_ENDIAN_LEGACY,
and when it's clear, there's zero overhead (as some point it was
tested by compiling with and without the patches, got the same
stripped binary).
Maybe we could create a Kconfig symbol to enforce the second point:
prevent people from enabling it eg on x86. I will look into this"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio-pci: alloc only resources actually used.
macvtap/tun: cross-endian support for little-endian hosts
vhost: cross-endian support for legacy devices
virtio: add explicit big-endian support to memory accessors
vhost: introduce vhost_is_little_endian() helper
vringh: introduce vringh_is_little_endian() helper
macvtap: introduce macvtap_is_little_endian() helper
tun: add tun_is_little_endian() helper
virtio: introduce virtio_is_little_endian() helper
Pull fuse updates from Miklos Szeredi:
"This is the start of improving fuse scalability.
An input queue and a processing queue is split out from the monolithic
fuse connection, each of those having their own spinlock. The end of
the patchset adds the ability to clone a fuse connection. This means,
that instead of having to read/write requests/answers on a single fuse
device fd, the fuse daemon can have multiple distinct file descriptors
open. Each of those can be used to receive requests and send answers,
currently the only constraint is that a request must be answered on
the same fd as it was read from.
This can be extended further to allow binding a device clone to a
specific CPU or NUMA node.
Based on a patchset by Srinivas Eeda and Ashish Samant. Thanks to
Ashish for the review of this series"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (40 commits)
fuse: update MAINTAINERS entry
fuse: separate pqueue for clones
fuse: introduce per-instance fuse_dev structure
fuse: device fd clone
fuse: abort: no fc->lock needed for request ending
fuse: no fc->lock for pqueue parts
fuse: no fc->lock in request_end()
fuse: cleanup request_end()
fuse: request_end(): do once
fuse: add req flag for private list
fuse: pqueue locking
fuse: abort: group pqueue accesses
fuse: cleanup fuse_dev_do_read()
fuse: move list_del_init() from request_end() into callers
fuse: duplicate ->connected in pqueue
fuse: separate out processing queue
fuse: simplify request_wait()
fuse: no fc->lock for iqueue parts
fuse: allow interrupt queuing without fc->lock
fuse: iqueue locking
...
Pull networking fixes from David Miller:
1) mlx4 driver bug fixes (TX queue wakeups, csum complete indications)
from Ido Shamay, Eran Ben Elisha, and Or Gerlitz.
2) Missing unlock in error path of PTP support in renesas driver, from
Dan Carpenter.
3) Add Vitesse 8641 phy IDs to vitesse PHY driver, from Shaohui Xie.
4) Bnx2x driver bug fixes (linearization of encap packets, scratchpad
parity error notifications, flow-control and speed settings) from
Yuval Mintz, Manish Chopra, Shahed Shaikh, and Ariel Elior.
5) ipv6 extension header parsing in the igb chip has a HW errata,
disable it. Frm Todd Fujinaka.
6) Fix PCI link state locking issue in e1000e driver, from Yanir
Lubetkin.
7) Cure panics during MTU change in i40e, from Mitch Williams.
8) Don't leak promisc refs in DSA slave driver, from Gilad Ben-Yossef.
9) Add missing HAS_DMA dep to VIA Rhine driver, from Geery
Uytterhoeven.
10) Make sure DMA map/unmap calls are symmetric in bnx2x driver, from
Michal Schmidt.
11) Workaround for MDIO access problems in bcm7xxx devices, from FLorian
Fainelli.
12) Fix races in SCTP protocol between OTTB responses and route
removals, from Alexander Sverdlin.
13) Fix jumbo frame checksum issue with some mvneta devices, from Simon
Guinot.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (58 commits)
sock_diag: don't broadcast kernel sockets
net: mvneta: disable IP checksum with jumbo frames for Armada 370
ARM: mvebu: update Ethernet compatible string for Armada XP
net: mvneta: introduce compatible string "marvell, armada-xp-neta"
api: fix compatibility of linux/in.h with netinet/in.h
net: icplus: fix typo in constant name
sis900: Trivial: Fix typos in enums
stmmac: Trivial: fix typo in constant name
sctp: Fix race between OOTB responce and route removal
net-Liquidio: Delete unnecessary checks before the function call "vfree"
vmxnet3: Bump up driver version number
amd-xgbe: Add the __GFP_NOWARN flag to Rx buffer allocation
net: phy: mdio-bcm-unimac: workaround initial read failures for integrated PHYs
net: bcmgenet: workaround initial read failures for integrated PHYs
net: phy: bcm7xxx: workaround MDIO management controller initial read
bnx2x: fix DMA API usage
net: via: VIA_RHINE and VIA_VELOCITY should depend on HAS_DMA
net/phy: tune get_phy_c45_ids to support more c45 phy
bnx2x: fix lockdep splat
net: fec: don't access RACC register when not available
...
Allow an open fuse device to be "cloned". Userspace can create a clone by:
newfd = open("/dev/fuse", O_RDWR)
ioctl(newfd, FUSE_DEV_IOC_CLONE, &oldfd);
At this point newfd will refer to the same fuse connection as oldfd.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Document VIRTIO_NET_CTRL_GUEST_OFFLOADS and the
relevant feature bits.
Will allow ethtool control of the offloads down the road.
Reported-by: Yan Vugenfirer <yan@daynix.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
u
This fixes breakage to iproute2 build with recent kernel headers
caused by:
commit a263653ed7
Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Wed Jun 17 10:28:27 2015 -0500
netfilter: don't pull include/linux/netfilter.h from netns headers
The issue is that definitions in linux/in.h overlap with those
in netinet/in.h. This patch solves this by introducing the same
mechanism as was used to solve the same problem with linux/in6.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull libnvdimm subsystem from Dan Williams:
"The libnvdimm sub-system introduces, in addition to the
libnvdimm-core, 4 drivers / enabling modules:
NFIT:
Instantiates an "nvdimm bus" with the core and registers memory
devices (NVDIMMs) enumerated by the ACPI 6.0 NFIT (NVDIMM Firmware
Interface table).
After registering NVDIMMs the NFIT driver then registers "region"
devices. A libnvdimm-region defines an access mode and the
boundaries of persistent memory media. A region may span multiple
NVDIMMs that are interleaved by the hardware memory controller. In
turn, a libnvdimm-region can be carved into a "namespace" device and
bound to the PMEM or BLK driver which will attach a Linux block
device (disk) interface to the memory.
PMEM:
Initially merged in v4.1 this driver for contiguous spans of
persistent memory address ranges is re-worked to drive
PMEM-namespaces emitted by the libnvdimm-core.
In this update the PMEM driver, on x86, gains the ability to assert
that writes to persistent memory have been flushed all the way
through the caches and buffers in the platform to persistent media.
See memcpy_to_pmem() and wmb_pmem().
BLK:
This new driver enables access to persistent memory media through
"Block Data Windows" as defined by the NFIT. The primary difference
of this driver to PMEM is that only a small window of persistent
memory is mapped into system address space at any given point in
time.
Per-NVDIMM windows are reprogrammed at run time, per-I/O, to access
different portions of the media. BLK-mode, by definition, does not
support DAX.
BTT:
This is a library, optionally consumed by either PMEM or BLK, that
converts a byte-accessible namespace into a disk with atomic sector
update semantics (prevents sector tearing on crash or power loss).
The sinister aspect of sector tearing is that most applications do
not know they have a atomic sector dependency. At least today's
disk's rarely ever tear sectors and if they do one almost certainly
gets a CRC error on access. NVDIMMs will always tear and always
silently. Until an application is audited to be robust in the
presence of sector-tearing the usage of BTT is recommended.
Thanks to: Ross Zwisler, Jeff Moyer, Vishal Verma, Christoph Hellwig,
Ingo Molnar, Neil Brown, Boaz Harrosh, Robert Elliott, Matthew Wilcox,
Andy Rudoff, Linda Knippers, Toshi Kani, Nicholas Moulin, Rafael
Wysocki, and Bob Moore"
* tag 'libnvdimm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm: (33 commits)
arch, x86: pmem api for ensuring durability of persistent memory updates
libnvdimm: Add sysfs numa_node to NVDIMM devices
libnvdimm: Set numa_node to NVDIMM devices
acpi: Add acpi_map_pxm_to_online_node()
libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only
pmem: flag pmem block devices as non-rotational
libnvdimm: enable iostat
pmem: make_request cleanups
libnvdimm, pmem: fix up max_hw_sectors
libnvdimm, blk: add support for blk integrity
libnvdimm, btt: add support for blk integrity
fs/block_dev.c: skip rw_page if bdev has integrity
libnvdimm: Non-Volatile Devices
tools/testing/nvdimm: libnvdimm unit test infrastructure
libnvdimm, nfit, nd_blk: driver for BLK-mode access persistent memory
nd_btt: atomic sector updates
libnvdimm: infrastructure for btt devices
libnvdimm: write blk label set
libnvdimm: write pmem label set
libnvdimm: blk labels and namespace instantiation
...
Pull nfsd updates from Bruce Fields:
"A relatively quiet cycle, with a mix of cleanup and smaller bugfixes"
* 'for-4.2' of git://linux-nfs.org/~bfields/linux: (24 commits)
sunrpc: use sg_init_one() in krb5_rc4_setup_enc/seq_key()
nfsd: wrap too long lines in nfsd4_encode_read
nfsd: fput rd_file from XDR encode context
nfsd: take struct file setup fully into nfs4_preprocess_stateid_op
nfsd: refactor nfs4_preprocess_stateid_op
nfsd: clean up raparams handling
nfsd: use swap() in sort_pacl_range()
rpcrdma: Merge svcrdma and xprtrdma modules into one
svcrdma: Add a separate "max data segs macro for svcrdma
svcrdma: Replace GFP_KERNEL in a loop with GFP_NOFAIL
svcrdma: Keep rpcrdma_msg fields in network byte-order
svcrdma: Fix byte-swapping in svc_rdma_sendto.c
nfsd: Update callback sequnce id only CB_SEQUENCE success
nfsd: Reset cb_status in nfsd4_cb_prepare() at retrying
svcrdma: Remove svc_rdma_xdr_decode_deferred_req()
SUNRPC: Move EXPORT_SYMBOL for svc_process
uapi/nfs: Add NFSv4.1 ACL definitions
nfsd: Remove dead declarations
nfsd: work around a gcc-5.1 warning
nfsd: Checking for acl support does not require fetching any acls
...
Pull tty/serial driver updates from Greg KH:
"Here's the tty and serial driver patches for 4.2-rc1.
A number of individual driver updates, some code cleanups, and other
minor things, full details in the shortlog.
All have been in linux-next for a while with no reported issues"
* tag 'tty-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (152 commits)
Doc: serial-rs485.txt: update RS485 driver interface
Doc: tty.txt: remove mention of the BKL
MAINTAINERS: tty: add serial docs directory
serial: sprd: check for NULL after calling devm_clk_get
serial: 8250_pci: Correct uartclk for xr17v35x expansion chips
serial: 8250_pci: Add support for 12 port Exar boards
serial: 8250_uniphier: add bindings document for UniPhier UART
serial: core: cleanup in uart_get_baud_rate()
serial: stm32-usart: Add STM32 USART Driver
tty/serial: kill off set_irq_flags usage
tty: move linux/gsmmux.h to uapi
doc: dt: add documentation for nxp,lpc1850-uart
serial: 8250: add LPC18xx/43xx UART driver
serial: 8250_uniphier: add UniPhier serial driver
serial: 8250_dw: support ACPI platforms with integrated DMA engine
serial: of_serial: check the return value of clk_prepare_enable()
serial: of_serial: use devm_clk_get() instead of clk_get()
serial: earlycon: Add support for big-endian MMIO accesses
serial: sirf: use hrtimer for data rx
serial: sirf: correct the fifo empty_bit
...
Pull staging driver updates from Greg KH:
"Here's the big, really big, staging tree patches for 4.2-rc1.
Loads of stuff in here, almost all just coding style fixes / churn,
and a few new drivers as well, one of which I just disabled from the
build a few minutes ago due to way too many build warnings.
Other than the one "disable this driver" patch, all of these have been
in linux-next for quite a while with no reported issues"
* tag 'staging-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (1163 commits)
staging: wilc1000: disable driver due to build warnings
Staging: rts5208: fix CHANGE_LINK_STATE value
Staging: sm750fb: ddk750_swi2c.c: Insert spaces before parenthesis
Staging: sm750fb: ddk750_swi2c.c: Place braces on correct lines
Staging: sm750fb: ddk750_swi2c.c: Insert spaces around operators
Staging: sm750fb: ddk750_swi2c.c: Replace spaces with tabs
Staging: sm750fb: ddk750_swi2c.h: Shorten lines to under 80 characters
Staging: sm750fb: ddk750_swi2c.h: Replace spaces with tabs
Staging: sm750fb: modedb.h: Shorten lines to under 80 characters
Staging: sm750fb: modedb.h: Replace spaces with tabs
staging: comedi: addi_apci_3120: rename 'this_board' variables
staging: comedi: addi_apci_1516: rename 'this_board' variables
staging: comedi: ni_atmio: cleanup ni_getboardtype()
staging: comedi: vmk80xx: sanity check context used to get the boardinfo
staging: comedi: vmk80xx: rename 'boardinfo' variables
staging: comedi: dt3000: rename 'this_board' variables
staging: comedi: adv_pci_dio: rename 'this_board' variables
staging: comedi: cb_pcidas64: rename 'thisboard' variables
staging: comedi: cb_pcidas: rename 'thisboard' variables
staging: comedi: me4000: rename 'thisboard' variables
...
Pull char/misc driver updates from Greg KH:
"Here's the big char/misc driver pull request for 4.2-rc1.
Lots of mei, extcon, coresight, uio, mic, and other driver updates in
here. Full details in the shortlog. All of these have been in
linux-next for some time with no reported problems"
* tag 'char-misc-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (176 commits)
mei: me: wait for power gating exit confirmation
mei: reset flow control on the last client disconnection
MAINTAINERS: mei: add mei_cl_bus.h to maintained file list
misc: sram: sort and clean up included headers
misc: sram: move reserved block logic out of probe function
misc: sram: add private struct device and virt_base members
misc: sram: report correct SRAM pool size
misc: sram: bump error message level on unclean driver unbinding
misc: sram: fix device node reference leak on error
misc: sram: fix enabled clock leak on error path
misc: mic: Fix reported static checker warning
misc: mic: Fix randconfig build error by including errno.h
uio: pruss: Drop depends on ARCH_DAVINCI_DA850 from config
uio: pruss: Add CONFIG_HAS_IOMEM dependence
uio: pruss: Include <linux/sizes.h>
extcon: Redefine the unique id of supported external connectors without 'enum extcon' type
char:xilinx_hwicap:buffer_icap - change 1/0 to true/false for bool type variable in function buffer_icap_set_configuration().
Drivers: hv: vmbus: Allocate ring buffer memory in NUMA aware fashion
parport: check exclusive access before register
w1: use correct lock on error in w1_seq_show()
...
Pull drm updates from Dave Airlie:
"This is the main drm pull request for v4.2.
I've one other new driver from freescale on my radar, it's been posted
and reviewed, I'd just like to get someone to give it a last look, so
maybe I'll send it or maybe I'll leave it.
There is no major nouveau changes in here, Ben was working on
something big, and we agreed it was a bit late, there wasn't anything
else he considered urgent to merge.
There might be another msm pull for some bits that are waiting on
arm-soc, I'll see how we time it.
This touches some "of" stuff, acks are in place except for the fixes
to the build in various configs,t hat I just applied.
Summary:
New drivers:
- virtio-gpu:
KMS only pieces of driver for virtio-gpu in qemu.
This is just the first part of this driver, enough to run
unaccelerated userspace on. As qemu merges more we'll start
adding the 3D features for the virgl 3d work.
- amdgpu:
a new driver from AMD to driver their newer GPUs. (VI+)
It contains a new cleaner userspace API, and is a clean
break from radeon moving forward, that AMD are going to
concentrate on. It also contains a set of register headers
auto generated from AMD internal database.
core:
- atomic modesetting API completed, enabled by default now.
- Add support for mode_id blob to atomic ioctl to complete interface.
- bunch of Displayport MST fixes
- lots of misc fixes.
panel:
- new simple panels
- fix some long-standing build issues with bridge drivers
radeon:
- VCE1 support
- add a GPU reset counter for userspace
- lots of fixes.
amdkfd:
- H/W debugger support module
- static user-mode queues
- support killing all the waves when a process terminates
- use standard DECLARE_BITMAP
i915:
- Add Broxton support
- S3, rotation support for Skylake
- RPS booting tuning
- CPT modeset sequence fixes
- ns2501 dither support
- enable cmd parser on haswell
- cdclk handling fixes
- gen8 dynamic pte allocation
- lots of atomic conversion work
exynos:
- Add atomic modesetting support
- Add iommu support
- Consolidate drm driver initialization
- and MIC, DECON and MIPI-DSI support for exynos5433
omapdrm:
- atomic modesetting support (fixes lots of things in rewrite)
tegra:
- DP aux transaction fixes
- iommu support fix
msm:
- adreno a306 support
- various dsi bits
- various 64-bit fixes
- NV12MT support
rcar-du:
- atomic and misc fixes
sti:
- fix HDMI timing complaince
tilcdc:
- use drm component API to access tda998x driver
- fix module unloading
qxl:
- stability fixes"
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (872 commits)
drm/nouveau: Pause between setting gpu to D3hot and cutting the power
drm/dp/mst: close deadlock in connector destruction.
drm: Always enable atomic API
drm/vgem: Set unique to "vgem"
of: fix a build error to of_graph_get_endpoint_by_regs function
drm/dp/mst: take lock around looking up the branch device on hpd irq
drm/dp/mst: make sure mst_primary mstb is valid in work function
of: add EXPORT_SYMBOL for of_graph_get_endpoint_by_regs
ARM: dts: rename the clock of MIPI DSI 'pll_clk' to 'sclk_mipi'
drm/atomic: Don't set crtc_state->enable manually
drm/exynos: dsi: do not set TE GPIO direction by input
drm/exynos: dsi: add support for MIC driver as a bridge
drm/exynos: dsi: add support for Exynos5433
drm/exynos: dsi: make use of array for clock access
drm/exynos: dsi: make use of driver data for static values
drm/exynos: dsi: add macros for register access
drm/exynos: dsi: rename pll_clk to sclk_clk
drm/exynos: mic: add MIC driver
of: add helper for getting endpoint node of specific identifiers
drm/exynos: add Exynos5433 decon driver
...
Pull device mapper fixes from Mike Snitzer:
"Apologies for not pressing this request-based DM partial completion
issue further, it was an oversight on my part. We'll have to get it
fixed up properly and revisit for a future release.
- Revert block and DM core changes the removed request-based DM's
ability to handle partial request completions -- otherwise with the
current SCSI LLDs these changes could lead to silent data
corruption.
- Fix two DM version bumps that were missing from the initial 4.2 DM
pull request (enabled userspace lvm2 to know certain changes have
been made)"
* tag 'dm-4.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm cache policy smq: fix "default" version to be 1.4.0
dm: bump the ioctl version to 4.32.0
Revert "block, dm: don't copy bios for request clones"
Revert "dm: do not allocate any mempools for blk-mq request-based DM"
Pull ARM SoC platform support updates from Kevin Hilman:
"Our SoC branch usually contains expanded support for new SoCs and
other core platform code. Some highlights from this round:
- sunxi: SMP support for A23 SoC
- socpga: big-endian support
- pxa: conversion to common clock framework
- bcm: SMP support for BCM63138
- imx: support new I.MX7D SoC
- zte: basic support for ZX296702 SoC"
* tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (134 commits)
ARM: zx: Add basic defconfig support for ZX296702
ARM: dts: zx: add an initial zx296702 dts and doc
clk: zx: add clock support to zx296702
dt-bindings: Add #defines for ZTE ZX296702 clocks
ARM: socfpga: fix build error due to secondary_startup
MAINTAINERS: ARM64: EXYNOS: Extend entry for ARM64 DTS
ARM: ep93xx: simone: support for SPI-based MMC/SD cards
MAINTAINERS: update Shawn's email to use kernel.org one
ARM: socfpga: support suspend to ram
ARM: socfpga: add CPU_METHOD_OF_DECLARE for Arria 10
ARM: socfpga: use CPU_METHOD_OF_DECLARE for socfpga_cyclone5
ARM: EXYNOS: register power domain driver from core_initcall
ARM: EXYNOS: use PS_HOLD based poweroff for all supported SoCs
ARM: SAMSUNG: Constify platform_device_id
ARM: EXYNOS: Constify irq_domain_ops
ARM: EXYNOS: add coupled cpuidle support for Exynos3250
ARM: EXYNOS: add exynos_get_boot_addr() helper
ARM: EXYNOS: add exynos_set_boot_addr() helper
ARM: EXYNOS: make exynos_core_restart() less verbose
ARM: EXYNOS: fix exynos_boot_secondary() return value on timeout
...
This fix enables userspace to detect that the dm-stats changes from the
4.2 merge are in place.
Reported-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Pull i2c updates from Wolfram Sang:
"Highlights:
- new drivers for Mediatek I2C, APM X-Gene, Broadcom Settop
- major updates to at91, davinci
- bugfixes to the mux infrastructure when dealing with the new quirk
mechanism
- more users for the bus recovery feature
- further improvements to the slave framework
Plus the usual bunch of smaller driver and core improvements and
fixes.
There is one patch removing old code from an ARM platform. This has
been acked by the sh_mobile maintainer Simon Horman"
* 'i2c/for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (48 commits)
i2c: busses: i2c-bcm2835: limits cdiv to allowed values
i2c: sh_mobile: use proper type for timeout
i2c: sh_mobile: use adapter default for timeout
i2c: rcar: use proper type for timeout
i2c: rcar: use adapter default for timeout
i2c: designware: Make sure the device is suspended before disabling runtime PM
i2c: tegra: apply size limit quirk
i2c: tegra: don't advertise SMBUS_QUICK
i2c: octeon: remove unused signal handling
i2c: davinci: Optimize SCL generation
i2c: mux: pca954x: Use __i2c_transfer because of quirks
i2c: mux: Use __i2c_transfer() instead of calling parent's master_xfer()
i2c: use parent adapter quirks in mux
i2c: bcm2835: clear reserved bits in S-Register
ARM: shmobile: r8a7740: remove I2C errata handling
i2c: sh_mobile: add errata workaround
i2c: at91: fix code checker warnings
i2c: busses: xgene-slimpro: fix incorrect __init declation for probe
i2c: davinci: Avoid sending to own address
i2c: davinci: Refactor i2c_davinci_wait_bus_not_busy()
...