Commit Graph

2102 Commits

Author SHA1 Message Date
Alexei Starovoitov
a166151cbe bpf: fix bpf helpers to use skb->mac_header relative offsets
For the short-term solution, lets fix bpf helper functions to use
skb->mac_header relative offsets instead of skb->data in order to
get the same eBPF programs with cls_bpf and act_bpf work on ingress
and egress qdisc path. We need to ensure that mac_header is set
before calling into programs. This is effectively the first option
from below referenced discussion.

More long term solution for LD_ABS|LD_IND instructions will be more
intrusive but also more beneficial than this, and implemented later
as it's too risky at this point in time.

I.e., we plan to look into the option of moving skb_pull() out of
eth_type_trans() and into netif_receive_skb() as has been suggested
as second option. Meanwhile, this solution ensures ingress can be
used with eBPF, too, and that we won't run into ABI troubles later.
For dealing with negative offsets inside eBPF helper functions,
we've implemented bpf_skb_clone_unwritable() to test for unwriteable
headers.

Reference: http://thread.gmane.org/gmane.linux.network/359129/focus=359694
Fixes: 608cd71a9c ("tc: bpf: generalize pedit action")
Fixes: 91bc4822c3 ("tc: bpf: add checksum helpers")
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-16 14:08:49 -04:00
Mike Snitzer
bfebd1cdb4 dm: add full blk-mq support to request-based DM
Commit e5863d9ad ("dm: allocate requests in target when stacking on
blk-mq devices") served as the first step toward fully utilizing blk-mq
in request-based DM -- it enabled stacking an old-style (request_fn)
request_queue ontop of the underlying blk-mq device(s).  That first step
didn't improve performance of DM multipath ontop of fast blk-mq devices
(e.g. NVMe) because the top-level old-style request_queue was severely
limited by the queue_lock.

The second step offered here enables stacking a blk-mq request_queue
ontop of the underlying blk-mq device(s).  This unlocks significant
performance gains on fast blk-mq devices, Keith Busch tested on his NVMe
testbed and offered this really positive news:

 "Just providing a performance update. All my fio tests are getting
  roughly equal performance whether accessed through the raw block
  device or the multipath device mapper (~470k IOPS). I could only push
  ~20% of the raw iops through dm before this conversion, so this latest
  tree is looking really solid from a performance standpoint."

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Tested-by: Keith Busch <keith.busch@intel.com>
2015-04-15 12:10:16 -04:00
Linus Torvalds
6c373ca893 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Add BQL support to via-rhine, from Tino Reichardt.

 2) Integrate SWITCHDEV layer support into the DSA layer, so DSA drivers
    can support hw switch offloading.  From Floria Fainelli.

 3) Allow 'ip address' commands to initiate multicast group join/leave,
    from Madhu Challa.

 4) Many ipv4 FIB lookup optimizations from Alexander Duyck.

 5) Support EBPF in cls_bpf classifier and act_bpf action, from Daniel
    Borkmann.

 6) Remove the ugly compat support in ARP for ugly layers like ax25,
    rose, etc.  And use this to clean up the neigh layer, then use it to
    implement MPLS support.  All from Eric Biederman.

 7) Support L3 forwarding offloading in switches, from Scott Feldman.

 8) Collapse the LOCAL and MAIN ipv4 FIB tables when possible, to speed
    up route lookups even further.  From Alexander Duyck.

 9) Many improvements and bug fixes to the rhashtable implementation,
    from Herbert Xu and Thomas Graf.  In particular, in the case where
    an rhashtable user bulk adds a large number of items into an empty
    table, we expand the table much more sanely.

10) Don't make the tcp_metrics hash table per-namespace, from Eric
    Biederman.

11) Extend EBPF to access SKB fields, from Alexei Starovoitov.

12) Split out new connection request sockets so that they can be
    established in the main hash table.  Much less false sharing since
    hash lookups go direct to the request sockets instead of having to
    go first to the listener then to the request socks hashed
    underneath.  From Eric Dumazet.

13) Add async I/O support for crytpo AF_ALG sockets, from Tadeusz Struk.

14) Support stable privacy address generation for RFC7217 in IPV6.  From
    Hannes Frederic Sowa.

15) Hash network namespace into IP frag IDs, also from Hannes Frederic
    Sowa.

16) Convert PTP get/set methods to use 64-bit time, from Richard
    Cochran.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1816 commits)
  fm10k: Bump driver version to 0.15.2
  fm10k: corrected VF multicast update
  fm10k: mbx_update_max_size does not drop all oversized messages
  fm10k: reset head instead of calling update_max_size
  fm10k: renamed mbx_tx_dropped to mbx_tx_oversized
  fm10k: update xcast mode before synchronizing multicast addresses
  fm10k: start service timer on probe
  fm10k: fix function header comment
  fm10k: comment next_vf_mbx flow
  fm10k: don't handle mailbox events in iov_event path and always process mailbox
  fm10k: use separate workqueue for fm10k driver
  fm10k: Set PF queues to unlimited bandwidth during virtualization
  fm10k: expose tx_timeout_count as an ethtool stat
  fm10k: only increment tx_timeout_count in Tx hang path
  fm10k: remove extraneous "Reset interface" message
  fm10k: separate PF only stats so that VF does not display them
  fm10k: use hw->mac.max_queues for stats
  fm10k: only show actual queues, not the maximum in hardware
  fm10k: allow creation of VLAN on default vid
  fm10k: fix unused warnings
  ...
2015-04-15 09:00:47 -07:00
Michael S. Tsirkin
df81b29c7b virtio_balloon: transitional interface
Virtio 1.0 doesn't include a modern balloon device.
But it's not a big change to support a transitional
balloon device: this has the advantage of supporting
existing drivers, transparently.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-04-15 12:41:09 +09:30
Linus Torvalds
8691c130fa Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input subsystem updates from Dmitry Torokhov:
 "You will get the following new drivers:

   - Qualcomm PM8941 power key drver
   - ChipOne icn8318 touchscreen controller driver
   - Broadcom iProc touchscreen and keypad drivers
   - Semtech SX8654 I2C touchscreen controller driver

  ALPS driver now supports newer SS4 devices; Elantech got a fix that
  should make it work on some ASUS laptops; and a slew of other
  enhancements and random fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (51 commits)
  Input: alps - non interleaved V2 dualpoint has separate stick button bits
  Input: alps - fix touchpad buttons getting stuck when used with trackpoint
  Input: atkbd - document "no new force-release quirks" policy
  Input: ALPS - make alps_get_pkt_id_ss4_v2() and others static
  Input: ALPS - V7 devices can report 5-finger taps
  Input: ALPS - add support for SS4 touchpad devices
  Input: ALPS - refactor alps_set_abs_params_mt()
  Input: elantech - fix absolute mode setting on some ASUS laptops
  Input: atmel_mxt_ts - split out touchpad initialisation logic
  Input: atmel_mxt_ts - implement support for T100 touch object
  Input: cros_ec_keyb - fix clearing keyboard state on wakeup
  Input: gscps2 - drop pci_ids dependency
  Input: synaptics - allocate 3 slots to keep stability in image sensors
  Input: Revert "Revert "synaptics - use dmax in input_mt_assign_slots""
  Input: MT - make slot assignment work for overcovered solutions
  mfd: tc3589x: enforce device-tree only mode
  Input: tc3589x - localize platform data
  Input: tsc2007 - Convert msecs to jiffies only once
  Input: edt-ft5x06 - remove EV_SYN event report
  Input: edt-ft5x06 - allow to setting the maximum axes value through the DT
  ...
2015-04-14 18:25:15 -07:00
Linus Torvalds
8c194f3bd3 Merge tag 'vfio-v4.1-rc1' of git://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:

 - VFIO platform bus driver support (Baptiste Reynal, Antonios Motakis,
   testing and review by Eric Auger)

 - Split VFIO irqfd support to separate module (Alex Williamson)

 - vfio-pci VGA arbiter client (Alex Williamson)

 - New vfio-pci.ids= module option (Alex Williamson)

 - vfio-pci D3 power state support for idle devices (Alex Williamson)

* tag 'vfio-v4.1-rc1' of git://github.com/awilliam/linux-vfio: (30 commits)
  vfio-pci: Fix use after free
  vfio-pci: Move idle devices to D3hot power state
  vfio-pci: Remove warning if try-reset fails
  vfio-pci: Allow PCI IDs to be specified as module options
  vfio-pci: Add VGA arbiter client
  vfio-pci: Add module option to disable VGA region access
  vgaarb: Stub vga_set_legacy_decoding()
  vfio: Split virqfd into a separate module for vfio bus drivers
  vfio: virqfd_lock can be static
  vfio: put off the allocation of "minor" in vfio_create_group
  vfio/platform: implement IRQ masking/unmasking via an eventfd
  vfio: initialize the virqfd workqueue in VFIO generic code
  vfio: move eventfd support code for VFIO_PCI to a separate file
  vfio: pass an opaque pointer on virqfd initialization
  vfio: add local lock for virqfd instead of depending on VFIO PCI
  vfio: virqfd: rename vfio_pci_virqfd_init and vfio_pci_virqfd_exit
  vfio: add a vfio_ prefix to virqfd_enable and virqfd_disable and export
  vfio/platform: support for level sensitive interrupts
  vfio/platform: trigger an interrupt via eventfd
  vfio/platform: initial interrupts support code
  ...
2015-04-14 18:06:47 -07:00
Linus Torvalds
6c8a53c9e6 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar:
 "Core kernel changes:

   - One of the more interesting features in this cycle is the ability
     to attach eBPF programs (user-defined, sandboxed bytecode executed
     by the kernel) to kprobes.

     This allows user-defined instrumentation on a live kernel image
     that can never crash, hang or interfere with the kernel negatively.
     (Right now it's limited to root-only, but in the future we might
     allow unprivileged use as well.)

     (Alexei Starovoitov)

   - Another non-trivial feature is per event clockid support: this
     allows, amongst other things, the selection of different clock
     sources for event timestamps traced via perf.

     This feature is sought by people who'd like to merge perf generated
     events with external events that were measured with different
     clocks:

       - cluster wide profiling

       - for system wide tracing with user-space events,

       - JIT profiling events

     etc.  Matching perf tooling support is added as well, available via
     the -k, --clockid <clockid> parameter to perf record et al.

     (Peter Zijlstra)

  Hardware enablement kernel changes:

   - x86 Intel Processor Trace (PT) support: which is a hardware tracer
     on steroids, available on Broadwell CPUs.

     The hardware trace stream is directly output into the user-space
     ring-buffer, using the 'AUX' data format extension that was added
     to the perf core to support hardware constraints such as the
     necessity to have the tracing buffer physically contiguous.

     This patch-set was developed for two years and this is the result.
     A simple way to make use of this is to use BTS tracing, the PT
     driver emulates BTS output - available via the 'intel_bts' PMU.
     More explicit PT specific tooling support is in the works as well -
     will probably be ready by 4.2.

     (Alexander Shishkin, Peter Zijlstra)

   - x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware
     feature of Intel Xeon CPUs that allows the measurement and
     allocation/partitioning of caches to individual workloads.

     These kernel changes expose the measurement side as a new PMU
     driver, which exposes various QoS related PMU events.  (The
     partitioning change is work in progress and is planned to be merged
     as a cgroup extension.)

     (Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P
     Waskiewicz Jr)

   - x86 Intel Haswell LBR call stack support: this is a new Haswell
     feature that allows the hardware recording of call chains, plus
     tooling support.  To activate this feature you have to enable it
     via the new 'lbr' call-graph recording option:

        perf record --call-graph lbr
        perf report

     or:

        perf top --call-graph lbr

     This hardware feature is a lot faster than stack walk or dwarf
     based unwinding, but has some limitations:

       - It reuses the current LBR facility, so LBR call stack and
         branch record can not be enabled at the same time.

       - It is only available for user-space callchains.

     (Yan, Zheng)

   - x86 Intel Broadwell CPU support and various event constraints and
     event table fixes for earlier models.

     (Andi Kleen)

   - x86 Intel HT CPUs event scheduling workarounds.  This is a complex
     CPU bug affecting the SNB,IVB,HSW families that results in counter
     value corruption.  The mitigation code is automatically enabled and
     is transparent.

     (Maria Dimakopoulou, Stephane Eranian)

  The perf tooling side had a ton of changes in this cycle as well, so
  I'm only able to list the user visible changes here, in addition to
  the tooling changes outlined above:

  User visible changes affecting all tools:

      - Improve support of compressed kernel modules (Jiri Olsa)
      - Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo)
      - Bash completion for subcommands (Yunlong Song)
      - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa)
      - Support missing -f to override perf.data file ownership. (Yunlong Song)
      - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo)

  User visible changes in individual tools:

    'perf data':

        New tool for converting perf.data to other formats, initially
        for the CTF (Common Trace Format) from LTTng (Jiri Olsa,
        Sebastian Siewior)

    'perf diff':

        Add --kallsyms option (David Ahern)

    'perf list':

        Allow listing events with 'tracepoint' prefix (Yunlong Song)

        Sort the output of the command (Yunlong Song)

    'perf kmem':

        Respect -i option (Jiri Olsa)

        Print big numbers using thousands' group (Namhyung Kim)

        Allow -v option (Namhyung Kim)

        Fix alignment of slab result table (Namhyung Kim)

    'perf probe':

        Support multiple probes on different binaries on the same command line (Masami Hiramatsu)

        Support unnamed union/structure members data collection. (Masami Hiramatsu)

        Check kprobes blacklist when adding new events. (Masami Hiramatsu)

    'perf record':

        Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra)

        Support recording running/enabled time (Andi Kleen)

    'perf sched':

        Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song)

    'perf report' and 'perf top':

        Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo)

        Indicate which callchain entries are annotated in the
        TUI hists browser (Arnaldo Carvalho de Melo)

        Add pid/tid filtering to 'report' and 'script' commands (David Ahern)

        Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one
        cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT
        events (Arnaldo Carvalho de Melo)

    'perf stat':

        Report unsupported events properly (Suzuki K. Poulose)

        Output running time and run/enabled ratio in CSV mode (Andi Kleen)

    'perf trace':

        Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo)

        Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo)

        Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo)

        Dump stack on segfaults (Arnaldo Carvalho de Melo)

        No need to explicitely enable evsels for workload started from perf, let it
        be enabled via perf_event_attr.enable_on_exec, removing some events that take
        place in the 'perf trace' before a workload is really started by it.
        (Arnaldo Carvalho de Melo)

        Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo)

  There's also been a ton of infrastructure work done, such as the
  split-out of perf's build system into tools/build/ and other changes -
  see the shortlog and changelog for details"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits)
  perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init()
  perf evlist: Fix type for references to data_head/tail
  perf probe: Check the orphaned -x option
  perf probe: Support multiple probes on different binaries
  perf buildid-list: Fix segfault when show DSOs with hits
  perf tools: Fix cross-endian analysis
  perf tools: Fix error path to do closedir() when synthesizing threads
  perf tools: Fix synthesizing fork_event.ppid for non-main thread
  perf tools: Add 'I' event modifier for exclude_idle bit
  perf report: Don't call map__kmap if map is NULL.
  perf tests: Fix attr tests
  perf probe: Fix ARM 32 building error
  perf tools: Merge all perf_event_attr print functions
  perf record: Add clockid parameter
  perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10
  perf sched replay: Support using -f to override perf.data file ownership
  perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files
  perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task
  perf sched replay: Fix the segmentation fault problem caused by pr_err in threads
  perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations
  ...
2015-04-14 14:37:47 -07:00
Linus Torvalds
3f3c73de77 Merge tag 'trace-4.1-tracefs' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracefs from Steven Rostedt:
 "This adds the new tracefs file system.

  This has been in linux-next for more than one release, as I had it
  ready for the 4.0 merge window, but a last minute thing that needed to
  go into Linux first had to be done.  That was that perf hard coded the
  file system number when reading /sys/kernel/debugfs/tracing directory
  making sure that the path had the debugfs mount # before it would
  parse the tracing file.  This broke other use cases of perf, and the
  check is removed.

  Now when mounting /sys/kernel/debug, tracefs is automatically mounted
  in /sys/kernel/debug/tracing such that old tools will still see that
  path as expected.  But now system admins can mount tracefs directly
  and not need to mount debugfs, which can expose security issues.  A
  new directory is created when tracefs is configured such that system
  admins can now mount it separately (/sys/kernel/tracing)"

* tag 'trace-4.1-tracefs' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Have mkdir and rmdir be part of tracefs
  tracefs: Add directory /sys/kernel/tracing
  tracing: Automatically mount tracefs on debugfs/tracing
  tracing: Convert the tracing facility over to use tracefs
  tracefs: Add new tracefs file system
  tracing: Create cmdline tracer options on tracing fs init
  tracing: Only create tracer options files if directory exists
  debugfs: Provide a file creation function that also takes an initial size
2015-04-14 10:22:29 -07:00
Linus Torvalds
8de29a35dc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID updates from Jiri Kosina:

 - quite a few firmware fixes for RMI driver by Andrew Duggan

 - huion and uclogic drivers have been substantially overlaping in
   functionality laterly.  This redundancy is fixed by hid-huion driver
   being merged into hid-uclogic; work done by Benjamin Tissoires and
   Nikolai Kondrashov

 - i2c-hid now supports ACPI GPIO interrupts; patch from Mika Westerberg

 - Some of the quirks, that got separated into individual drivers, have
   historically had EXPERT dependency.  As HID subsystem matured (as
   well as the individual drivers), this made less and less sense.  This
   dependency is now being removed by patch from Jean Delvare

 - Logitech lg4ff driver received a couple of improvements for mode
   switching, by Michal Malý

 - multitouch driver now supports clickpads, patches by Benjamin
   Tissoires and Seth Forshee

 - hid-sensor framework received a substantial update; namely support
   for Custom and Generic pages is being added; work done by Srinivas
   Pandruvada

 - wacom driver received substantial update; it now supports
   i2c-conntected devices (Mika Westerberg), Bamboo PADs are now
   properly supported (Benjamin Tissoires), much improved battery
   reporting (Jason Gerecke) and pen proximity cleanups (Ping Cheng)

 - small assorted fixes and device ID additions

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (68 commits)
  HID: sensor: Update document for custom sensor
  HID: sensor: Custom and Generic sensor support
  HID: debug: fix error handling in hid_debug_events_read()
  Input - mt: Fix input_mt_get_slot_by_key
  HID: logitech-hidpp: fix error return code
  HID: wacom: Add support for Cintiq 13HD Touch
  HID: logitech-hidpp: add a module parameter to keep firmware gestures
  HID: usbhid: yet another mouse with ALWAYS_POLL
  HID: usbhid: more mice with ALWAYS_POLL
  HID: wacom: set stylus_in_proximity before checking touch_down
  HID: wacom: use wacom_wac_finger_count_touches to set touch_down
  HID: wacom: remove hardcoded WACOM_QUIRK_MULTI_INPUT
  HID: pidff: effect can't be NULL
  HID: add quirk for PIXART OEM mouse used by HP
  HID: add HP OEM mouse to quirk ALWAYS_POLL
  HID: wacom: ask for a in-prox report when it was missed
  HID: hid-sensor-hub: Fix sparse warning
  HID: hid-sensor-hub: fix attribute read for logical usage id
  HID: plantronics: fix Kconfig default
  HID: pidff: support more than one concurrent effect
  ...
2015-04-14 09:25:26 -07:00
Linus Torvalds
b79013b244 Merge tag 'staging-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver updates from Greg KH:
 "Here's the big staging driver patchset for 4.1-rc1.

  There's a lot of patches here, the Outreachy application period
  happened during this development cycle, so that means that there was a
  lot of cleanup patches accepted.  Other than the normal coding style
  and sparse fixes here, there are some driver updates and work toward
  making some of the drivers into "mergable" shape (like the Unisys
  drivers.)

  All of these have been in linux-next for a while"

* tag 'staging-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (1214 commits)
  staging: lustre: orthography & coding style
  staging: lustre: lnet: lnet: fix error return code
  staging: lustre: fix sparse warning
  Revert "Staging: sm750fb: Fix C99 Comments"
  Staging: rtl8192u: use correct array for debug output
  staging: rtl8192e: Remove dead code
  staging: rtl8192e: Comment cleanup (style/format)
  staging: rtl8192e: Fix indentation in rtllib_rx_auth_resp()
  staging: rtl8192e: Decrease nesting of rtllib_rx_auth_resp()
  staging: rtl8192e: Divide rtllib_rx_auth()
  staging: rtl8192e: Fix PRINTK_WITHOUT_KERN_LEVEL warnings
  staging: rtl8192e: Fix DO_WHILE_MACRO_WITH_TRAILING_SEMICOLON warning
  staging: rtl8192e: Fix BRACES warning
  staging: rtl8192e: Fix LINE_CONTINUATIONS warning
  staging: rtl8192e: Fix UNNECESSARY_PARENTHESES warnings
  staging: rtl8192e: remove unused EXPORT_SYMBOL_RSL macro
  staging: rtl8192e: Fix RETURN_VOID warnings
  staging: rtl8192e: Fix UNNECESSARY_ELSE warning
  staging: rtl8723au: Remove unneeded comments
  staging: rtl8723au: Use __func__ in trace logs
  ...
2015-04-13 17:37:33 -07:00
Linus Torvalds
392b46f31f Merge tag 'hsi-for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi
Pull HSI changes from Sebastian Reichel:

 - nokia-modem: support speech data
 - misc fixes

* tag 'hsi-for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi:
  HSI: cmt_speech: fix error return code
  HSI: nokia-modem: Add cmt-speech support
  HSI: cmt_speech: Add cmt-speech driver
  HSI: nokia-modem: fix error return code
2015-04-13 15:36:31 -07:00
Jiri Kosina
05f6d02521 Merge branches 'for-4.0/upstream-fixes', 'for-4.1/genius', 'for-4.1/huion-uclogic-merge', 'for-4.1/i2c-hid', 'for-4.1/kconfig-drop-expert-dependency', 'for-4.1/logitech', 'for-4.1/multitouch', 'for-4.1/rmi', 'for-4.1/sony', 'for-4.1/upstream' and 'for-4.1/wacom' into for-linus 2015-04-13 23:41:15 +02:00
Patrick McHardy
3e135cd499 netfilter: nft_dynset: dynamic stateful expression instantiation
Support instantiating stateful expressions based on a template that
are associated with dynamically created set entries. The expressions
are evaluated when adding or updating the set element.

This allows to maintain per flow state using the existing set
infrastructure and expression types, with arbitrary definitions of
a flow.

Usage is currently restricted to anonymous sets, meaning only a single
binding can exist, since the desired semantics of multiple independant
bindings haven't been defined so far.

Examples (userspace syntax is still WIP):

1. Limit the rate of new SSH connections per host, similar to iptables
   hashlimit:

	flow ip saddr timeout 60s \
	limit 10/second \
	accept

2. Account network traffic between each set of /24 networks:

	flow ip saddr & 255.255.255.0 . ip daddr & 255.255.255.0 \
	counter

3. Account traffic to each host per user:

	flow skuid . ip daddr \
	counter

4. Account traffic for each combination of source address and TCP flags:

	flow ip saddr . tcp flags \
	counter

The resulting set content after a Xmas-scan look like this:

{
	192.168.122.1 . fin | psh | urg : counter packets 1001 bytes 40040,
	192.168.122.1 . ack : counter packets 74 bytes 3848,
	192.168.122.1 . psh | ack : counter packets 35 bytes 3144
}

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 20:19:55 +02:00
Patrick McHardy
7c6c6e95a1 netfilter: nf_tables: add flag to indicate set contains expressions
Add a set flag to indicate that the set is used as a state table and
contains expressions for evaluation. This operation is mutually
exclusive with the mapping operation, so sets specifying both are
rejected. The lookup expression also rejects binding to state tables
since it only deals with loopup and map operations.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 20:12:32 +02:00
Patrick McHardy
f25ad2e907 netfilter: nf_tables: prepare for expressions associated to set elements
Preparation to attach expressions to set elements: add a set extension
type to hold an expression and dump the expression information with the
set element.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 20:12:31 +02:00
Pablo Neira Ayuso
24477e5741 uapi: ebtables: don't include linux/if.h
linux/if.h creates conflicts in userspace with net/if.h

By using it here we force userspace to use linux/if.h while
net/if.h may be needed.

Note that:

include/linux/netfilter_ipv4/ip_tables.h and
include/linux/netfilter_ipv6/ip6_tables.h

don't include linux/if.h and they also refer to IFNAMSIZ, so they are
expecting userspace to include use net/if.h from the client program.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 20:08:38 +02:00
Linus Torvalds
9003601310 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
 "First batch of KVM changes for 4.1

  The most interesting bit here is irqfd/ioeventfd support for ARM and
  ARM64.

  Summary:

  ARM/ARM64:
     fixes for live migration, irqfd and ioeventfd support (enabling
     vhost, too), page aging

  s390:
     interrupt handling rework, allowing to inject all local interrupts
     via new ioctl and to get/set the full local irq state for migration
     and introspection.  New ioctls to access memory by virtual address,
     and to get/set the guest storage keys.  SIMD support.

  MIPS:
     FPU and MIPS SIMD Architecture (MSA) support.  Includes some
     patches from Ralf Baechle's MIPS tree.

  x86:
     bugfixes (notably for pvclock, the others are small) and cleanups.
     Another small latency improvement for the TSC deadline timer"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits)
  KVM: use slowpath for cross page cached accesses
  kvm: mmu: lazy collapse small sptes into large sptes
  KVM: x86: Clear CR2 on VCPU reset
  KVM: x86: DR0-DR3 are not clear on reset
  KVM: x86: BSP in MSR_IA32_APICBASE is writable
  KVM: x86: simplify kvm_apic_map
  KVM: x86: avoid logical_map when it is invalid
  KVM: x86: fix mixed APIC mode broadcast
  KVM: x86: use MDA for interrupt matching
  kvm/ppc/mpic: drop unused IRQ_testbit
  KVM: nVMX: remove unnecessary double caching of MAXPHYADDR
  KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entry
  KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu
  KVM: vmx: pass error code with internal error #2
  x86: vdso: fix pvclock races with task migration
  KVM: remove kvm_read_hva and kvm_read_hva_atomic
  KVM: x86: optimize delivery of TSC deadline timer interrupt
  KVM: x86: extract blocking logic from __vcpu_run
  kvm: x86: fix x86 eflags fixed bit
  KVM: s390: migrate vcpu interrupt state
  ...
2015-04-13 09:47:01 -07:00
Patrick McHardy
7d7402642e netfilter: nf_tables: variable sized set element keys / data
This patch changes sets to support variable sized set element keys / data
up to 64 bytes each by using variable sized set extensions. This allows
to use concatenations with bigger data items suchs as IPv6 addresses.

As a side effect, small keys/data now don't require the full 16 bytes
of struct nft_data anymore but just the space they need.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 17:17:31 +02:00
Patrick McHardy
49499c3e6e netfilter: nf_tables: switch registers to 32 bit addressing
Switch the nf_tables registers from 128 bit addressing to 32 bit
addressing to support so called concatenations, where multiple values
can be concatenated over multiple registers for O(1) exact matches of
multiple dimensions using sets.

The old register values are mapped to areas of 128 bits for compatibility.
When dumping register numbers, values are expressed using the old values
if they refer to the beginning of a 128 bit area for compatibility.

To support concatenations, register loads of less than a full 32 bit
value need to be padded. This mainly affects the payload and exthdr
expressions, which both unconditionally zero the last word before
copying the data.

Userspace fully passes the testsuite using both old and new register
addressing.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-13 17:17:29 +02:00
Dave Airlie
1d2add28ed Merge tag 'imx-drm-next-2015-03-31' of git://git.pengutronix.de/git/pza/linux into drm-next
imx-drm changes to use media bus formats and LDB drm_panel support

- Add media bus formats needed by imx-drm
- Switch to use media bus formats to describe the pixel format
  on the internal parallel bus between display interface and
  encoders
- Some preparations for TV Output via TVEv2 on i.MX5
- Add drm_panel support to the i.MX LVDS driver, allow to
  determine the bus pixel format from the panel descriptor.

* tag 'imx-drm-next-2015-03-31' of git://git.pengutronix.de/git/pza/linux:
  drm/imx: imx-ldb: allow to determine bus format from the connected panel
  drm/imx: imx-ldb: reset display clock input when disabling LVDS
  drm/imx: imx-ldb: add drm_panel support
  drm/imx: consolidate bus format variable names
  drm/imx: switch to use media bus formats
  Add RGB666_1X24_CPADHI media bus format
  Add YUV8_1X24 media bus format
  Add BGR888_1X24 and GBR888_1X24 media bus formats
  Add LVDS RGB media bus formats
  Add RGB444_1X12 and RGB565_1X16 media bus formats
  drm/imx: ipuv3-crtc: Allow to divide DI clock from TVEv2
  drm/imx: Add support for interlaced scanout
2015-04-13 17:28:57 +10:00
David S. Miller
e60a9de49c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-04-11

This series contains updates to iflink, ixgbe and ixgbevf.

The entire set of changes come from Vlad Zolotarov to ultimately add
the ethtool ops to VF driver to allow querying the RSS indirection table
and RSS random key.

Currently we support only 82599 and x540 devices.  On those devices, VFs
share the RSS redirection table and hash key with a PF.  Letting the VF
query this information may introduce some security risks, therefore this
feature will be disabled by default.

The new netdev op allows a system administrator to change the default
behaviour with "ip link set" command.  The relevant iproute2 patch has
already been sent and awaits for this series upstream.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-12 21:36:57 -04:00
WANG Cong
7a6c8c34e5 fou: implement FOU_CMD_GET
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-12 21:25:13 -04:00
Vlad Zolotarov
01a3d79681 if_link: Add an additional parameter to ifla_vf_info for RSS querying
Add configuration setting for drivers to allow/block an RSS Redirection
Table and a Hash Key querying for discrete VFs.

On some devices VF share the mentioned above information with PF and
querying it may adduce a theoretical security risk. We want to let a
system administrator to decide if he/she wants to take this risk or not.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-10 21:57:22 -07:00
Pablo Neira Ayuso
aadd51aa71 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Resolve conflicts between 5888b93 ("Merge branch 'nf-hook-compress'") and
Florian Westphal br_netfilter works.

Conflicts:
        net/bridge/br_netfilter.c

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-08 18:30:21 +02:00
Patrick McHardy
68e942e88a netfilter: nf_tables: support optional userdata for set elements
Add an userdata set extension and allow the user to attach arbitrary
data to set elements. This is intended to hold TLV encoded data like
comments or DNS annotations that have no meaning to the kernel.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-08 16:58:27 +02:00
Patrick McHardy
22fe54d5fe netfilter: nf_tables: add support for dynamic set updates
Add a new "dynset" expression for dynamic set updates.

A new set op ->update() is added which, for non existant elements,
invokes an initialization callback and inserts the new element.
For both new or existing elements the extenstion pointer is returned
to the caller to optionally perform timer updates or other actions.

Element removal is not supported so far, however that seems to be a
rather exotic need and can be added later on.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-04-08 16:58:27 +02:00
Hans Verkuil
5ce65d1f87 [media] videodev2.h/v4l2-dv-timings.h: add V4L2_DV_FL_IS_CE_VIDEO flag
In the past the V4L2_DV_BT_STD_CEA861 standard bit was used to
determine whether the format is a CE (Consumer Electronics) format
or not. However, the 640x480p59.94 format is part of the CEA-861
standard, but it is *not* a CE video format.

Add a new flag to make this explicit. This information is needed
in order to determine the default R'G'B' encoding for the format:
for CE video this is limited range (16-235) instead of full range
(0-255).

The header with all the timings has been updated with this new
flag.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Cc: Martin Bugge <marbugge@cisco.com>
Cc: Mats Randgaard <mats.randgaard@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2015-04-08 06:36:52 -03:00
Ricardo Ribalda
b6e5b8f1a9 [media] media: New flag V4L2_CTRL_FLAG_EXECUTE_ON_WRITE
Create a new flag that represent controls which its value needs to be
passed to the driver even if it has not changed.

They typically represent actions, like triggering a flash or clearing an
error flag. So writing to such a control means some action is executed.

Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2015-04-08 06:33:26 -03:00
Nicolas Dichtel
9a9634545c netns: notify netns id events
With this patch, netns ids that are created and deleted are advertised into the
group RTNLGRP_NSID.

Because callers of rtnl_net_notifyid() already know the id of the peer, there is
no need to call __peernet2id() in rtnl_net_fill().

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-07 17:29:41 -04:00
Paolo Bonzini
7f22b45d66 Merge tag 'kvm-s390-next-20150331' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
Features and fixes for 4.1 (kvm/next)

1. Assorted changes
1.1 allow more feature bits for the guest
1.2 Store breaking event address on program interrupts

2. Interrupt handling rework
2.1 Fix copy_to_user while holding a spinlock (cc stable)
2.2 Rework floating interrupts to follow the priorities
2.3 Allow to inject all local interrupts via new ioctl
2.4 allow to get/set the full local irq state, e.g. for migration
    and introspection
2015-04-07 18:10:03 +02:00
Dmitry Torokhov
5f63e885ac Merge tag 'v4.0-rc7' into next
Sync up with Linux 4.0-rc7 to bring in ALPS changes.
2015-04-07 08:46:23 -07:00
Greg Kroah-Hartman
b3e3bf2ef2 Merge 4.0-rc7 into tty-next
We want the fixes in here as well, also to help out with merge issues.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-07 11:07:20 +02:00
Greg Kroah-Hartman
c610f7f772 Merge 4.0-rc7 into staging-next
We want those fixes (iio primarily) into the -next branch to help with
merge and testing issues.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-07 11:03:02 +02:00
David S. Miller
c85d6975ef Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/mellanox/mlx4/cmd.c
	net/core/fib_rules.c
	net/ipv4/fib_frontend.c

The fib_rules.c and fib_frontend.c conflicts were locking adjustments
in 'net' overlapping addition and removal of code in 'net-next'.

The mlx4 conflict was a bug fix in 'net' happening in the same
place a constant was being replaced with a more suitable macro.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-06 22:34:15 -04:00
Alexei Starovoitov
91bc4822c3 tc: bpf: add checksum helpers
Commit 608cd71a9c ("tc: bpf: generalize pedit action") has added the
possibility to mangle packet data to BPF programs in the tc pipeline.
This patch adds two helpers bpf_l3_csum_replace() and bpf_l4_csum_replace()
for fixing up the protocol checksums after the packet mangling.

It also adds 'flags' argument to bpf_skb_store_bytes() helper to avoid
unnecessary checksum recomputations when BPF programs adjusting l3/l4
checksums and documents all three helpers in uapi header.

Moreover, a sample program is added to show how BPF programs can make use
of the mangle and csum helpers.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-06 16:42:35 -04:00
Linus Torvalds
1cced5015b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input subsystem fixes from Dmitry Torokhov:
 "A fix for ALPS driver for issue introduced in the latest update and a
  tweak for yet another Lenovo box in Synaptics.

  There will be more ALPS tweaks coming.."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: define INPUT_PROP_ACCELEROMETER behavior
  Input: synaptics - fix min-max quirk value for E440
  Input: synaptics - add quirk for Thinkpad E440
  Input: ALPS - fix max coordinates for v5 and v7 protocols
  Input: add MT_TOOL_PALM
2015-04-03 14:58:48 -07:00
Daniel Borkmann
bcad571824 ebpf: add skb->priority to offset map for usage in {cls, act}_bpf
This adds the ability to read out the skb->priority from an eBPF
program, so that it can be taken into account from a tc filter
or action for the use-case where the priority is not being used
to directly override the filter classification in a qdisc, but
to tag traffic otherwise for the classifier; the priority can be
assigned from various places incl. user space, in future we may
also mangle it from an eBPF program.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-03 14:59:15 -04:00
Christoph Hellwig
9b3075c59f nfsd: add NFSEXP_PNFS to the exflags array
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-04-03 10:00:59 -04:00
Laurent Pinchart
a5562f65b1 [media] v4l: xilinx: Add Test Pattern Generator driver
The TPG generates multiple static or dynamic test patterns. The driver
currently hardcodes the pattern to the moving box pattern.

Signed-off-by: Christian Kohn <christian.kohn@xilinx.com>
Signed-off-by: Hyun Kwon <hyun.kwon@xilinx.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2015-04-03 01:04:18 -03:00
Hyun Kwon
2dca0551e4 [media] v4l: Add VUY8 24 bits bus format
Add VUY8 24 bits bus format, V4L2_MBUS_FMT_VUY8_1X24.

Signed-off-by: Hyun Kwon <hyun.kwon@xilinx.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2015-04-03 00:49:45 -03:00
Hyun Kwon
e8b2d7a565 [media] v4l: Sort YUV formats of v4l2_mbus_pixelcode
Keep the formats sorted by type, bus_width, bits per component, samples
per pixel and order of subsamples, in that order.

Signed-off-by: Hyun Kwon <hyun.kwon@xilinx.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2015-04-03 00:49:07 -03:00
Laurent Pinchart
7b0fd4568b [media] v4l: Add RBG and RGB 8:8:8 media bus formats on 24 and 32 bit busses
Add support and documentation for two media bus formats:
MEDIA_BUS_FMT_RBG888_1X24 and MEDIA_BUS_FMT_RGB888_1X32_PADHI

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2015-04-03 00:47:29 -03:00
Hans Verkuil
cc7d2dfb75 [media] v4l2_plane_pix_format: use __u32 bytesperline instead of __u16
While running v4l2-compliance tests on vivid I suddenly got errors due to
a call to vmalloc_user with size 0 from vb2.

Digging deeper into the cause I discovered that this was due to the fact that
struct v4l2_plane_pix_format defines bytesperline as a __u16 instead of a __u32.

The test I was running selected a format of 4 * 4096 by 4 * 2048 with a 32
bit pixelformat.

So bytesperline was 4 * 4 * 4096 = 65536, which becomes 0 in a __u16. And
bytesperline * height is suddenly 0 as well. While the vivid driver may be
a virtual driver, it is to be expected that this limit will be hit for real
hardware as well in the near future: 8k deep-color video will already reach
it.

The solution is to change the type to __u32. The only drivers besides vivid
that use the multiplanar API are little-endian ARM and SH platforms (exynos,
ti-vpe, vsp1), so this is safe.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2015-04-02 23:54:47 -03:00
Hans Verkuil
aa05b979f1 [media] videodev2.h: fix comment
The quantization comment in the header was incorrect w.r.t. BT.2020.
Fix this.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2015-04-02 18:11:00 -03:00
Masatake YAMATO
64bf8049a2 [media] am437x: include linux/videodev2.h for expanding BASE_VIDIOC_PRIVATE
In am437x-vpfe.h BASE_VIDIOC_PRIVATE is used for
making the name of ioctl command(VIDIOC_AM437X_CCDC_CFG).
The definition of BASE_VIDIOC_PRIVATE is in linux/videodev2.h.
However, linux/videodev2.h is not included in am437x-vpfe.h.
As the result an application using has to include both
am437x-vpfe.h and linux/videodev2.h.

With this patch, the application can include just am437x-vpfe.h.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Acked-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2015-04-02 18:10:35 -03:00
David S. Miller
9f0d34bc34 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/asix_common.c
	drivers/net/usb/sr9800.c
	drivers/net/usb/usbnet.c
	include/linux/usb/usbnet.h
	net/ipv4/tcp_ipv4.c
	net/ipv6/tcp_ipv6.c

The TCP conflicts were overlapping changes.  In 'net' we added a
READ_ONCE() to the socket cached RX route read, whilst in 'net-next'
Eric Dumazet touched the surrounding code dealing with how mini
sockets are handled.

With USB, it's a case of the same bug fix first going into net-next
and then I cherry picked it back into net.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:16:53 -04:00
Alexander Shishkin
ec0d7729bb perf: Add ITRACE_START record to indicate that tracing has started
For counters that generate AUX data that is bound to the context of a
running task, such as instruction tracing, the decoder needs to know
exactly which task is running when the event is first scheduled in,
before the first sched_switch. The decoder's need to know this stems
from the fact that instruction flow trace decoding will almost always
require program's object code in order to reconstruct said flow and
for that we need at least its pid/tid in the perf stream.

To single out such instruction tracing pmus, this patch introduces
ITRACE PMU capability. The reason this is not part of RECORD_AUX
record is that not all pmus capable of generating AUX data need this,
and the opposite is *probably* also true.

While sched_switch covers for most cases, there are two problems with it:
the consumer will need to process events out of order (that is, having
found RECORD_AUX, it will have to skip forward to the nearest sched_switch
to figure out which task it was, then go back to the actual trace to
decode it) and it completely misses the case when the tracing is enabled
and disabled before sched_switch, for example, via PERF_EVENT_IOC_DISABLE.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kaixu Xia <kaixu.xia@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: markus.t.metzger@intel.com
Cc: mathieu.poirier@linaro.org
Link: http://lkml.kernel.org/r/1421237903-181015-15-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:14:17 +02:00
Alexander Shishkin
1a59413124 perf: Add wakeup watermark control to the AUX area
When AUX area gets a certain amount of new data, we want to wake up
userspace to collect it. This adds a new control to specify how much
data will cause a wakeup. This is then passed down to pmu drivers via
output handle's "wakeup" field, so that the driver can find the nearest
point where it can generate an interrupt.

We repurpose __reserved_2 in the event attribute for this, even though
it was never checked to be zero before, aux_watermark will only matter
for new AUX-aware code, so the old code should still be fine.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kaixu Xia <kaixu.xia@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: markus.t.metzger@intel.com
Cc: mathieu.poirier@linaro.org
Link: http://lkml.kernel.org/r/1421237903-181015-10-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:14:16 +02:00
Alexander Shishkin
2023a0d282 perf: Support overwrite mode for the AUX area
This adds support for overwrite mode in the AUX area, which means "keep
collecting data till you're stopped", turning AUX area into a circular
buffer, where new data overwrites old data. It does not depend on data
buffer's overwrite mode, so that it doesn't lose sideband data that is
instrumental for processing AUX data.

Overwrite mode is enabled at mapping AUX area read only. Even though
aux_tail in the buffer's user page might be user writable, it will be
ignored in this mode.

A PERF_RECORD_AUX with PERF_AUX_FLAG_OVERWRITE set is written to the perf
data stream every time an event writes new data to the AUX area. The pmu
driver might not be able to infer the exact beginning of the new data in
each snapshot, some drivers will only provide the tail, which is
aux_offset + aux_size in the AUX record. Consumer has to be able to tell
the new data from the old one, for example, by means of time stamps if
such are provided in the trace.

Consumer is also responsible for disabling any events that might write
to the AUX area (thus potentially racing with the consumer) before
collecting the data.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kaixu Xia <kaixu.xia@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: markus.t.metzger@intel.com
Cc: mathieu.poirier@linaro.org
Link: http://lkml.kernel.org/r/1421237903-181015-9-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:14:15 +02:00
Alexander Shishkin
68db7e98c3 perf: Add AUX record
When there's new data in the AUX space, output a record indicating its
offset and size and a set of flags, such as PERF_AUX_FLAG_TRUNCATED, to
mean the described data was truncated to fit in the ring buffer.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kaixu Xia <kaixu.xia@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: markus.t.metzger@intel.com
Cc: mathieu.poirier@linaro.org
Link: http://lkml.kernel.org/r/1421237903-181015-7-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:14:12 +02:00