Files
kernel_arpi/include/uapi/linux
Adrian Reber 49cb2fc42c fork: extend clone3() to support setting a PID
The main motivation to add set_tid to clone3() is CRIU.

To restore a process with the same PID/TID CRIU currently uses
/proc/sys/kernel/ns_last_pid. It writes the desired (PID - 1) to
ns_last_pid and then (quickly) does a clone(). This works most of the
time, but it is racy. It is also slow as it requires multiple syscalls.

Extending clone3() to support *set_tid makes it possible restore a
process using CRIU without accessing /proc/sys/kernel/ns_last_pid and
race free (as long as the desired PID/TID is available).

This clone3() extension places the same restrictions (CAP_SYS_ADMIN)
on clone3() with *set_tid as they are currently in place for ns_last_pid.

The original version of this change was using a single value for
set_tid. At the 2019 LPC, after presenting set_tid, it was, however,
decided to change set_tid to an array to enable setting the PID of a
process in multiple PID namespaces at the same time. If a process is
created in a PID namespace it is possible to influence the PID inside
and outside of the PID namespace. Details also in the corresponding
selftest.

To create a process with the following PIDs:

      PID NS level         Requested PID
        0 (host)              31496
        1                        42
        2                         1

For that example the two newly introduced parameters to struct
clone_args (set_tid and set_tid_size) would need to be:

  set_tid[0] = 1;
  set_tid[1] = 42;
  set_tid[2] = 31496;
  set_tid_size = 3;

If only the PIDs of the two innermost nested PID namespaces should be
defined it would look like this:

  set_tid[0] = 1;
  set_tid[1] = 42;
  set_tid_size = 2;

The PID of the newly created process would then be the next available
free PID in the PID namespace level 0 (host) and 42 in the PID namespace
at level 1 and the PID of the process in the innermost PID namespace
would be 1.

The set_tid array is used to specify the PID of a process starting
from the innermost nested PID namespaces up to set_tid_size PID namespaces.

set_tid_size cannot be larger then the current PID namespace level.

Signed-off-by: Adrian Reber <areber@redhat.com>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Acked-by: Andrei Vagin <avagin@gmail.com>
Link: https://lore.kernel.org/r/20191115123621.142252-1-areber@redhat.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2019-11-15 23:49:22 +01:00
..
2019-01-28 08:13:52 +01:00
2019-07-09 12:11:59 -07:00
2019-05-23 21:07:30 -04:00
2019-03-07 18:32:01 -08:00
2019-01-24 11:11:42 -07:00
2019-09-19 14:22:44 +02:00
2019-09-25 17:51:39 -07:00
2019-08-02 14:44:02 +10:00
2019-09-16 10:18:01 -04:00
2019-06-14 15:00:51 +05:30
2019-02-14 11:51:51 -05:00
2018-12-20 19:13:07 +01:00
2019-03-27 13:30:07 -07:00
2019-08-12 19:33:50 -07:00
2019-09-12 14:59:41 +02:00
2019-09-18 10:43:22 -06:00
2019-09-08 15:37:04 +02:00
2019-08-24 14:20:10 +02:00
2019-08-19 13:04:45 -07:00
2018-12-16 12:15:25 -08:00
2018-12-01 12:38:32 +01:00
2019-05-28 21:37:30 -07:00
2019-09-23 16:10:28 -05:00
2019-10-02 20:32:27 -06:00
2019-01-22 10:21:45 +01:00
2019-07-30 20:34:34 +02:00
2019-05-28 21:37:30 -07:00
2019-09-16 16:26:11 +02:00
2019-02-14 11:51:51 -05:00
2019-09-18 20:17:50 +02:00
2019-08-01 21:49:46 +02:00