Merge remote-tracking branch 'aosp/upstream-f2fs-stable-linux-5.15.y' into android14-5.15

* aosp/upstream-f2fs-stable-linux-5.15.y:
  f2fs: attach inline_data after setting compression
  f2fs: fix to tag gcing flag on page during file defragment
  f2fs: replace F2FS_I(inode) and sbi by the local variable
  f2fs: add f2fs_init_write_merge_io function
  f2fs: avoid unneeded error handling for revoke_entry_slab allocation
  f2fs: allow compression for mmap files in compress_mode=user
  f2fs: fix typo in comment
  f2fs: make f2fs_read_inline_data() more readable
  f2fs: fix to do sanity check for inline inode
  f2fs: fix fallocate to use file_modified to update permissions consistently
  f2fs: don't use casefolded comparison for "." and ".."
  f2fs: do not stop GC when requiring a free section
  f2fs: keep wait_ms if EAGAIN happens
  f2fs: introduce f2fs_gc_control to consolidate f2fs_gc parameters
  f2fs: reject test_dummy_encryption when !CONFIG_FS_ENCRYPTION
  f2fs: kill volatile write support
  f2fs: change the current atomic write way
  f2fs: don't need inode lock for system hidden quota
  f2fs: stop allocating pinned sections if EAGAIN happens
  f2fs: skip GC if possible when checkpoint disabling
  f2fs: give priority to select unpinned section for foreground GC
  f2fs: fix to do sanity check on total_data_blocks
  f2fs: fix deadloop in foreground GC
  f2fs: fix to do sanity check on block address in f2fs_do_zero_range()
  f2fs: fix to avoid f2fs_bug_on() in dec_valid_node_count()
  f2fs: write checkpoint during FG_GC
  f2fs: fix to clear dirty inode in f2fs_evict_inode()
  f2fs: ensure only power of 2 zone sizes are allowed
  f2fs: call bdev_zone_sectors() only once on init_blkz_info()
  f2fs: extend stat_lock to avoid potential race in statfs
  f2fs: avoid infinite loop to flush node pages
  f2fs: use flush command instead of FUA for zoned device
  f2fs: remove WARN_ON in f2fs_is_valid_blkaddr
  f2fs: replace usage of found with dedicated list iterator variable
  f2fs: Remove usage of list iterator pas the loop for list_move_tail()
  f2fs: fix dereference of stale list iterator after loop body
  f2fs: fix to do sanity check on inline_dots inode
  f2fs: introduce data read/write showing path info
  f2fs: remove unnecessary f2fs_lock_op in f2fs_new_inode
  f2fs: don't set GC_FAILURE_PIN for background GC
  f2fs: check pinfile in gc_data_segment() in advance
  f2fs: should not truncate blocks during roll-forward recovery
  f2fs: fix wrong condition check when failing metapage read
  f2fs: keep io_flags to avoid IO split due to different op_flags in two fio holders
  f2fs: remove obsolete whint_mode
  f2fs: pass the bio operation to bio_alloc_bioset
  f2fs: don't pass a bio to f2fs_target_device
  f2fs: replace congestion_wait() calls with io_schedule_timeout()
  FROMGIT: scsi: scsi_debug: Add gap zone support
  FROMGIT: scsi: scsi_debug: Rename zone type constants
  FROMGIT: scsi: scsi_debug: Fix a typo
  FROMGIT: scsi: sd: sd_zbc: Hide gap zones
  FROMGIT: scsi: sd: sd_zbc: Return early in sd_zbc_check_zoned_characteristics()
  FROMGIT: scsi: sd: sd_zbc: Introduce struct zoned_disk_info
  FROMGIT: scsi: sd: sd_zbc: Use logical blocks as unit when querying zones
  FROMGIT: scsi: sd: sd_zbc: Verify that the zone size is a power of two
  FROMGIT: scsi: sd: sd_zbc: Improve source code documentation

Bug: 228919347
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Change-Id: If51d1a03be757e74034b297c4f54df23b501da71
This commit is contained in:
Jaegeuk Kim
2022-06-07 16:31:14 -07:00
24 changed files with 1357 additions and 1234 deletions

View File

@@ -235,12 +235,6 @@ offgrpjquota Turn off group journalled quota.
offprjjquota Turn off project journalled quota. offprjjquota Turn off project journalled quota.
quota Enable plain user disk quota accounting. quota Enable plain user disk quota accounting.
noquota Disable all plain disk quota option. noquota Disable all plain disk quota option.
whint_mode=%s Control which write hints are passed down to block
layer. This supports "off", "user-based", and
"fs-based". In "off" mode (default), f2fs does not pass
down hints. In "user-based" mode, f2fs tries to pass
down hints given by users. And in "fs-based" mode, f2fs
passes down hints with its policy.
alloc_mode=%s Adjust block allocation policy, which supports "reuse" alloc_mode=%s Adjust block allocation policy, which supports "reuse"
and "default". and "default".
fsync_mode=%s Control the policy of fsync. Currently supports "posix", fsync_mode=%s Control the policy of fsync. Currently supports "posix",
@@ -751,70 +745,6 @@ In order to identify whether the data in the victim segment are valid or not,
F2FS manages a bitmap. Each bit represents the validity of a block, and the F2FS manages a bitmap. Each bit represents the validity of a block, and the
bitmap is composed of a bit stream covering whole blocks in main area. bitmap is composed of a bit stream covering whole blocks in main area.
Write-hint Policy
-----------------
1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.
2) whint_mode=user-based. F2FS tries to pass down hints given by
users.
===================== ======================== ===================
User F2FS Block
===================== ======================== ===================
N/A META WRITE_LIFE_NOT_SET
N/A HOT_NODE "
N/A WARM_NODE "
N/A COLD_NODE "
ioctl(COLD) COLD_DATA WRITE_LIFE_EXTREME
extension list " "
-- buffered io
WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
WRITE_LIFE_NONE " "
WRITE_LIFE_MEDIUM " "
WRITE_LIFE_LONG " "
-- direct io
WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
WRITE_LIFE_NONE " WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG " WRITE_LIFE_LONG
===================== ======================== ===================
3) whint_mode=fs-based. F2FS passes down hints with its policy.
===================== ======================== ===================
User F2FS Block
===================== ======================== ===================
N/A META WRITE_LIFE_MEDIUM;
N/A HOT_NODE WRITE_LIFE_NOT_SET
N/A WARM_NODE "
N/A COLD_NODE WRITE_LIFE_NONE
ioctl(COLD) COLD_DATA WRITE_LIFE_EXTREME
extension list " "
-- buffered io
WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_LONG
WRITE_LIFE_NONE " "
WRITE_LIFE_MEDIUM " "
WRITE_LIFE_LONG " "
-- direct io
WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
WRITE_LIFE_NONE " WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG " WRITE_LIFE_LONG
===================== ======================== ===================
Fallocate(2) Policy Fallocate(2) Policy
------------------- -------------------

View File

@@ -16,7 +16,7 @@
#define pr_fmt(fmt) KBUILD_MODNAME ":%s: " fmt, __func__ #define pr_fmt(fmt) KBUILD_MODNAME ":%s: " fmt, __func__
#include <linux/module.h> #include <linux/module.h>
#include <linux/align.h>
#include <linux/kernel.h> #include <linux/kernel.h>
#include <linux/errno.h> #include <linux/errno.h>
#include <linux/jiffies.h> #include <linux/jiffies.h>
@@ -98,6 +98,7 @@ static const char *sdebug_version_date = "20200710";
#define WRITE_BOUNDARY_ASCQ 0x5 #define WRITE_BOUNDARY_ASCQ 0x5
#define READ_INVDATA_ASCQ 0x6 #define READ_INVDATA_ASCQ 0x6
#define READ_BOUNDARY_ASCQ 0x7 #define READ_BOUNDARY_ASCQ 0x7
#define ATTEMPT_ACCESS_GAP 0x9
#define INSUFF_ZONE_ASCQ 0xe #define INSUFF_ZONE_ASCQ 0xe
/* Additional Sense Code Qualifier (ASCQ) */ /* Additional Sense Code Qualifier (ASCQ) */
@@ -250,9 +251,11 @@ static const char *sdebug_version_date = "20200710";
/* Zone types (zbcr05 table 25) */ /* Zone types (zbcr05 table 25) */
enum sdebug_z_type { enum sdebug_z_type {
ZBC_ZONE_TYPE_CNV = 0x1, ZBC_ZTYPE_CNV = 0x1,
ZBC_ZONE_TYPE_SWR = 0x2, ZBC_ZTYPE_SWR = 0x2,
ZBC_ZONE_TYPE_SWP = 0x3, ZBC_ZTYPE_SWP = 0x3,
/* ZBC_ZTYPE_SOBR = 0x4, */
ZBC_ZTYPE_GAP = 0x5,
}; };
/* enumeration names taken from table 26, zbcr05 */ /* enumeration names taken from table 26, zbcr05 */
@@ -290,10 +293,12 @@ struct sdebug_dev_info {
/* For ZBC devices */ /* For ZBC devices */
enum blk_zoned_model zmodel; enum blk_zoned_model zmodel;
unsigned int zcap;
unsigned int zsize; unsigned int zsize;
unsigned int zsize_shift; unsigned int zsize_shift;
unsigned int nr_zones; unsigned int nr_zones;
unsigned int nr_conv_zones; unsigned int nr_conv_zones;
unsigned int nr_seq_zones;
unsigned int nr_imp_open; unsigned int nr_imp_open;
unsigned int nr_exp_open; unsigned int nr_exp_open;
unsigned int nr_closed; unsigned int nr_closed;
@@ -827,6 +832,7 @@ static int dif_errors;
/* ZBC global data */ /* ZBC global data */
static bool sdeb_zbc_in_use; /* true for host-aware and host-managed disks */ static bool sdeb_zbc_in_use; /* true for host-aware and host-managed disks */
static int sdeb_zbc_zone_cap_mb;
static int sdeb_zbc_zone_size_mb; static int sdeb_zbc_zone_size_mb;
static int sdeb_zbc_max_open = DEF_ZBC_MAX_OPEN_ZONES; static int sdeb_zbc_max_open = DEF_ZBC_MAX_OPEN_ZONES;
static int sdeb_zbc_nr_conv = DEF_ZBC_NR_CONV_ZONES; static int sdeb_zbc_nr_conv = DEF_ZBC_NR_CONV_ZONES;
@@ -1551,6 +1557,12 @@ static int inquiry_vpd_b6(struct sdebug_dev_info *devip, unsigned char *arr)
put_unaligned_be32(devip->max_open, &arr[12]); put_unaligned_be32(devip->max_open, &arr[12]);
else else
put_unaligned_be32(0xffffffff, &arr[12]); put_unaligned_be32(0xffffffff, &arr[12]);
if (devip->zcap < devip->zsize) {
arr[19] = ZBC_CONSTANT_ZONE_START_OFFSET;
put_unaligned_be64(devip->zsize, &arr[20]);
} else {
arr[19] = 0;
}
return 0x3c; return 0x3c;
} }
@@ -2670,12 +2682,38 @@ static inline bool sdebug_dev_is_zoned(struct sdebug_dev_info *devip)
static struct sdeb_zone_state *zbc_zone(struct sdebug_dev_info *devip, static struct sdeb_zone_state *zbc_zone(struct sdebug_dev_info *devip,
unsigned long long lba) unsigned long long lba)
{ {
return &devip->zstate[lba >> devip->zsize_shift]; u32 zno = lba >> devip->zsize_shift;
struct sdeb_zone_state *zsp;
if (devip->zcap == devip->zsize || zno < devip->nr_conv_zones)
return &devip->zstate[zno];
/*
* If the zone capacity is less than the zone size, adjust for gap
* zones.
*/
zno = 2 * zno - devip->nr_conv_zones;
WARN_ONCE(zno >= devip->nr_zones, "%u > %u\n", zno, devip->nr_zones);
zsp = &devip->zstate[zno];
if (lba >= zsp->z_start + zsp->z_size)
zsp++;
WARN_ON_ONCE(lba >= zsp->z_start + zsp->z_size);
return zsp;
} }
static inline bool zbc_zone_is_conv(struct sdeb_zone_state *zsp) static inline bool zbc_zone_is_conv(struct sdeb_zone_state *zsp)
{ {
return zsp->z_type == ZBC_ZONE_TYPE_CNV; return zsp->z_type == ZBC_ZTYPE_CNV;
}
static inline bool zbc_zone_is_gap(struct sdeb_zone_state *zsp)
{
return zsp->z_type == ZBC_ZTYPE_GAP;
}
static inline bool zbc_zone_is_seq(struct sdeb_zone_state *zsp)
{
return !zbc_zone_is_conv(zsp) && !zbc_zone_is_gap(zsp);
} }
static void zbc_close_zone(struct sdebug_dev_info *devip, static void zbc_close_zone(struct sdebug_dev_info *devip,
@@ -2683,7 +2721,7 @@ static void zbc_close_zone(struct sdebug_dev_info *devip,
{ {
enum sdebug_z_cond zc; enum sdebug_z_cond zc;
if (zbc_zone_is_conv(zsp)) if (!zbc_zone_is_seq(zsp))
return; return;
zc = zsp->z_cond; zc = zsp->z_cond;
@@ -2721,7 +2759,7 @@ static void zbc_open_zone(struct sdebug_dev_info *devip,
{ {
enum sdebug_z_cond zc; enum sdebug_z_cond zc;
if (zbc_zone_is_conv(zsp)) if (!zbc_zone_is_seq(zsp))
return; return;
zc = zsp->z_cond; zc = zsp->z_cond;
@@ -2753,10 +2791,10 @@ static void zbc_inc_wp(struct sdebug_dev_info *devip,
struct sdeb_zone_state *zsp = zbc_zone(devip, lba); struct sdeb_zone_state *zsp = zbc_zone(devip, lba);
unsigned long long n, end, zend = zsp->z_start + zsp->z_size; unsigned long long n, end, zend = zsp->z_start + zsp->z_size;
if (zbc_zone_is_conv(zsp)) if (!zbc_zone_is_seq(zsp))
return; return;
if (zsp->z_type == ZBC_ZONE_TYPE_SWR) { if (zsp->z_type == ZBC_ZTYPE_SWR) {
zsp->z_wp += num; zsp->z_wp += num;
if (zsp->z_wp >= zend) if (zsp->z_wp >= zend)
zsp->z_cond = ZC5_FULL; zsp->z_cond = ZC5_FULL;
@@ -2801,9 +2839,7 @@ static int check_zbc_access_params(struct scsi_cmnd *scp,
if (devip->zmodel == BLK_ZONED_HA) if (devip->zmodel == BLK_ZONED_HA)
return 0; return 0;
/* For host-managed, reads cannot cross zone types boundaries */ /* For host-managed, reads cannot cross zone types boundaries */
if (zsp_end != zsp && if (zsp->z_type != zsp_end->z_type) {
zbc_zone_is_conv(zsp) &&
!zbc_zone_is_conv(zsp_end)) {
mk_sense_buffer(scp, ILLEGAL_REQUEST, mk_sense_buffer(scp, ILLEGAL_REQUEST,
LBA_OUT_OF_RANGE, LBA_OUT_OF_RANGE,
READ_INVDATA_ASCQ); READ_INVDATA_ASCQ);
@@ -2812,6 +2848,13 @@ static int check_zbc_access_params(struct scsi_cmnd *scp,
return 0; return 0;
} }
/* Writing into a gap zone is not allowed */
if (zbc_zone_is_gap(zsp)) {
mk_sense_buffer(scp, ILLEGAL_REQUEST, LBA_OUT_OF_RANGE,
ATTEMPT_ACCESS_GAP);
return check_condition_result;
}
/* No restrictions for writes within conventional zones */ /* No restrictions for writes within conventional zones */
if (zbc_zone_is_conv(zsp)) { if (zbc_zone_is_conv(zsp)) {
if (!zbc_zone_is_conv(zsp_end)) { if (!zbc_zone_is_conv(zsp_end)) {
@@ -2823,7 +2866,7 @@ static int check_zbc_access_params(struct scsi_cmnd *scp,
return 0; return 0;
} }
if (zsp->z_type == ZBC_ZONE_TYPE_SWR) { if (zsp->z_type == ZBC_ZTYPE_SWR) {
/* Writes cannot cross sequential zone boundaries */ /* Writes cannot cross sequential zone boundaries */
if (zsp_end != zsp) { if (zsp_end != zsp) {
mk_sense_buffer(scp, ILLEGAL_REQUEST, mk_sense_buffer(scp, ILLEGAL_REQUEST,
@@ -4307,18 +4350,18 @@ cleanup:
#define RZONES_DESC_HD 64 #define RZONES_DESC_HD 64
/* Report zones depending on start LBA nad reporting options */ /* Report zones depending on start LBA and reporting options */
static int resp_report_zones(struct scsi_cmnd *scp, static int resp_report_zones(struct scsi_cmnd *scp,
struct sdebug_dev_info *devip) struct sdebug_dev_info *devip)
{ {
unsigned int i, max_zones, rep_max_zones, nrz = 0; unsigned int rep_max_zones, nrz = 0;
int ret = 0; int ret = 0;
u32 alloc_len, rep_opts, rep_len; u32 alloc_len, rep_opts, rep_len;
bool partial; bool partial;
u64 lba, zs_lba; u64 lba, zs_lba;
u8 *arr = NULL, *desc; u8 *arr = NULL, *desc;
u8 *cmd = scp->cmnd; u8 *cmd = scp->cmnd;
struct sdeb_zone_state *zsp; struct sdeb_zone_state *zsp = NULL;
struct sdeb_store_info *sip = devip2sip(devip, false); struct sdeb_store_info *sip = devip2sip(devip, false);
rwlock_t *macc_lckp = sip ? &sip->macc_lck : &sdeb_fake_rw_lck; rwlock_t *macc_lckp = sip ? &sip->macc_lck : &sdeb_fake_rw_lck;
@@ -4338,9 +4381,7 @@ static int resp_report_zones(struct scsi_cmnd *scp,
return check_condition_result; return check_condition_result;
} }
max_zones = devip->nr_zones - (zs_lba >> devip->zsize_shift); rep_max_zones = (alloc_len - 64) >> ilog2(RZONES_DESC_HD);
rep_max_zones = min((alloc_len - 64) >> ilog2(RZONES_DESC_HD),
max_zones);
arr = kzalloc(alloc_len, GFP_ATOMIC); arr = kzalloc(alloc_len, GFP_ATOMIC);
if (!arr) { if (!arr) {
@@ -4352,9 +4393,9 @@ static int resp_report_zones(struct scsi_cmnd *scp,
read_lock(macc_lckp); read_lock(macc_lckp);
desc = arr + 64; desc = arr + 64;
for (i = 0; i < max_zones; i++) { for (lba = zs_lba; lba < sdebug_capacity;
lba = zs_lba + devip->zsize * i; lba = zsp->z_start + zsp->z_size) {
if (lba > sdebug_capacity) if (WARN_ONCE(zbc_zone(devip, lba) == zsp, "lba = %llu\n", lba))
break; break;
zsp = zbc_zone(devip, lba); zsp = zbc_zone(devip, lba);
switch (rep_opts) { switch (rep_opts) {
@@ -4399,9 +4440,14 @@ static int resp_report_zones(struct scsi_cmnd *scp,
if (!zsp->z_non_seq_resource) if (!zsp->z_non_seq_resource)
continue; continue;
break; break;
case 0x3e:
/* All zones except gap zones. */
if (zbc_zone_is_gap(zsp))
continue;
break;
case 0x3f: case 0x3f:
/* Not write pointer (conventional) zones */ /* Not write pointer (conventional) zones */
if (!zbc_zone_is_conv(zsp)) if (zbc_zone_is_seq(zsp))
continue; continue;
break; break;
default: default:
@@ -4430,8 +4476,13 @@ static int resp_report_zones(struct scsi_cmnd *scp,
} }
/* Report header */ /* Report header */
/* Zone list length. */
put_unaligned_be32(nrz * RZONES_DESC_HD, arr + 0); put_unaligned_be32(nrz * RZONES_DESC_HD, arr + 0);
/* Maximum LBA */
put_unaligned_be64(sdebug_capacity - 1, arr + 8); put_unaligned_be64(sdebug_capacity - 1, arr + 8);
/* Zone starting LBA granularity. */
if (devip->zcap < devip->zsize)
put_unaligned_be64(devip->zsize, arr + 16);
rep_len = (unsigned long)desc - (unsigned long)arr; rep_len = (unsigned long)desc - (unsigned long)arr;
ret = fill_from_dev_buffer(scp, arr, min_t(u32, alloc_len, rep_len)); ret = fill_from_dev_buffer(scp, arr, min_t(u32, alloc_len, rep_len));
@@ -4659,7 +4710,7 @@ static void zbc_rwp_zone(struct sdebug_dev_info *devip,
enum sdebug_z_cond zc; enum sdebug_z_cond zc;
struct sdeb_store_info *sip = devip2sip(devip, false); struct sdeb_store_info *sip = devip2sip(devip, false);
if (zbc_zone_is_conv(zsp)) if (!zbc_zone_is_seq(zsp))
return; return;
zc = zsp->z_cond; zc = zsp->z_cond;
@@ -4850,6 +4901,7 @@ static int sdebug_device_create_zones(struct sdebug_dev_info *devip)
{ {
struct sdeb_zone_state *zsp; struct sdeb_zone_state *zsp;
sector_t capacity = get_sdebug_capacity(); sector_t capacity = get_sdebug_capacity();
sector_t conv_capacity;
sector_t zstart = 0; sector_t zstart = 0;
unsigned int i; unsigned int i;
@@ -4884,11 +4936,30 @@ static int sdebug_device_create_zones(struct sdebug_dev_info *devip)
devip->zsize_shift = ilog2(devip->zsize); devip->zsize_shift = ilog2(devip->zsize);
devip->nr_zones = (capacity + devip->zsize - 1) >> devip->zsize_shift; devip->nr_zones = (capacity + devip->zsize - 1) >> devip->zsize_shift;
if (sdeb_zbc_nr_conv >= devip->nr_zones) { if (sdeb_zbc_zone_cap_mb == 0) {
devip->zcap = devip->zsize;
} else {
devip->zcap = (sdeb_zbc_zone_cap_mb * SZ_1M) >>
ilog2(sdebug_sector_size);
if (devip->zcap > devip->zsize) {
pr_err("Zone capacity too large\n");
return -EINVAL;
}
}
conv_capacity = (sector_t)sdeb_zbc_nr_conv << devip->zsize_shift;
if (conv_capacity >= capacity) {
pr_err("Number of conventional zones too large\n"); pr_err("Number of conventional zones too large\n");
return -EINVAL; return -EINVAL;
} }
devip->nr_conv_zones = sdeb_zbc_nr_conv; devip->nr_conv_zones = sdeb_zbc_nr_conv;
devip->nr_seq_zones = ALIGN(capacity - conv_capacity, devip->zsize) >>
devip->zsize_shift;
devip->nr_zones = devip->nr_conv_zones + devip->nr_seq_zones;
/* Add gap zones if zone capacity is smaller than the zone size */
if (devip->zcap < devip->zsize)
devip->nr_zones += devip->nr_seq_zones;
if (devip->zmodel == BLK_ZONED_HM) { if (devip->zmodel == BLK_ZONED_HM) {
/* zbc_max_open_zones can be 0, meaning "not reported" */ /* zbc_max_open_zones can be 0, meaning "not reported" */
@@ -4909,23 +4980,29 @@ static int sdebug_device_create_zones(struct sdebug_dev_info *devip)
zsp->z_start = zstart; zsp->z_start = zstart;
if (i < devip->nr_conv_zones) { if (i < devip->nr_conv_zones) {
zsp->z_type = ZBC_ZONE_TYPE_CNV; zsp->z_type = ZBC_ZTYPE_CNV;
zsp->z_cond = ZBC_NOT_WRITE_POINTER; zsp->z_cond = ZBC_NOT_WRITE_POINTER;
zsp->z_wp = (sector_t)-1; zsp->z_wp = (sector_t)-1;
} else { zsp->z_size =
min_t(u64, devip->zsize, capacity - zstart);
} else if ((zstart & (devip->zsize - 1)) == 0) {
if (devip->zmodel == BLK_ZONED_HM) if (devip->zmodel == BLK_ZONED_HM)
zsp->z_type = ZBC_ZONE_TYPE_SWR; zsp->z_type = ZBC_ZTYPE_SWR;
else else
zsp->z_type = ZBC_ZONE_TYPE_SWP; zsp->z_type = ZBC_ZTYPE_SWP;
zsp->z_cond = ZC1_EMPTY; zsp->z_cond = ZC1_EMPTY;
zsp->z_wp = zsp->z_start; zsp->z_wp = zsp->z_start;
zsp->z_size =
min_t(u64, devip->zcap, capacity - zstart);
} else {
zsp->z_type = ZBC_ZTYPE_GAP;
zsp->z_cond = ZBC_NOT_WRITE_POINTER;
zsp->z_wp = (sector_t)-1;
zsp->z_size = min_t(u64, devip->zsize - devip->zcap,
capacity - zstart);
} }
if (zsp->z_start + devip->zsize < capacity) WARN_ON_ONCE((int)zsp->z_size <= 0);
zsp->z_size = devip->zsize;
else
zsp->z_size = capacity - zsp->z_start;
zstart += zsp->z_size; zstart += zsp->z_size;
} }
@@ -5693,6 +5770,7 @@ module_param_named(wp, sdebug_wp, bool, S_IRUGO | S_IWUSR);
module_param_named(write_same_length, sdebug_write_same_length, int, module_param_named(write_same_length, sdebug_write_same_length, int,
S_IRUGO | S_IWUSR); S_IRUGO | S_IWUSR);
module_param_named(zbc, sdeb_zbc_model_s, charp, S_IRUGO); module_param_named(zbc, sdeb_zbc_model_s, charp, S_IRUGO);
module_param_named(zone_cap_mb, sdeb_zbc_zone_cap_mb, int, S_IRUGO);
module_param_named(zone_max_open, sdeb_zbc_max_open, int, S_IRUGO); module_param_named(zone_max_open, sdeb_zbc_max_open, int, S_IRUGO);
module_param_named(zone_nr_conv, sdeb_zbc_nr_conv, int, S_IRUGO); module_param_named(zone_nr_conv, sdeb_zbc_nr_conv, int, S_IRUGO);
module_param_named(zone_size_mb, sdeb_zbc_zone_size_mb, int, S_IRUGO); module_param_named(zone_size_mb, sdeb_zbc_zone_size_mb, int, S_IRUGO);
@@ -5763,6 +5841,7 @@ MODULE_PARM_DESC(vpd_use_hostno, "0 -> dev ids ignore hostno (def=1 -> unique de
MODULE_PARM_DESC(wp, "Write Protect (def=0)"); MODULE_PARM_DESC(wp, "Write Protect (def=0)");
MODULE_PARM_DESC(write_same_length, "Maximum blocks per WRITE SAME cmd (def=0xffff)"); MODULE_PARM_DESC(write_same_length, "Maximum blocks per WRITE SAME cmd (def=0xffff)");
MODULE_PARM_DESC(zbc, "'none' [0]; 'aware' [1]; 'managed' [2] (def=0). Can have 'host-' prefix"); MODULE_PARM_DESC(zbc, "'none' [0]; 'aware' [1]; 'managed' [2] (def=0). Can have 'host-' prefix");
MODULE_PARM_DESC(zone_cap_mb, "Zone capacity in MiB (def=zone size)");
MODULE_PARM_DESC(zone_max_open, "Maximum number of open zones; [0] for no limit (def=auto)"); MODULE_PARM_DESC(zone_max_open, "Maximum number of open zones; [0] for no limit (def=auto)");
MODULE_PARM_DESC(zone_nr_conv, "Number of conventional zones (def=1)"); MODULE_PARM_DESC(zone_nr_conv, "Number of conventional zones (def=1)");
MODULE_PARM_DESC(zone_size_mb, "Zone size in MiB (def=auto)"); MODULE_PARM_DESC(zone_size_mb, "Zone size in MiB (def=auto)");

View File

@@ -67,6 +67,20 @@ enum {
SD_ZERO_WS10_UNMAP, /* Use WRITE SAME(10) with UNMAP */ SD_ZERO_WS10_UNMAP, /* Use WRITE SAME(10) with UNMAP */
}; };
/**
* struct zoned_disk_info - Specific properties of a ZBC SCSI device.
* @nr_zones: number of zones.
* @zone_blocks: number of logical blocks per zone.
*
* This data structure holds the ZBC SCSI device properties that are retrieved
* twice: a first time before the gendisk capacity is known and a second time
* after the gendisk capacity is known.
*/
struct zoned_disk_info {
u32 nr_zones;
u32 zone_blocks;
};
struct scsi_disk { struct scsi_disk {
struct scsi_driver *driver; /* always &sd_template */ struct scsi_driver *driver; /* always &sd_template */
struct scsi_device *device; struct scsi_device *device;
@@ -74,13 +88,18 @@ struct scsi_disk {
struct gendisk *disk; struct gendisk *disk;
struct opal_dev *opal_dev; struct opal_dev *opal_dev;
#ifdef CONFIG_BLK_DEV_ZONED #ifdef CONFIG_BLK_DEV_ZONED
u32 nr_zones; /* Updated during revalidation before the gendisk capacity is known. */
u32 rev_nr_zones; struct zoned_disk_info early_zone_info;
u32 zone_blocks; /* Updated during revalidation after the gendisk capacity is known. */
u32 rev_zone_blocks; struct zoned_disk_info zone_info;
u32 zones_optimal_open; u32 zones_optimal_open;
u32 zones_optimal_nonseq; u32 zones_optimal_nonseq;
u32 zones_max_open; u32 zones_max_open;
/*
* Either zero or a power of two. If not zero it means that the offset
* between zone starting LBAs is constant.
*/
u32 zone_starting_lba_gran;
u32 *zones_wp_offset; u32 *zones_wp_offset;
spinlock_t zones_wp_offset_lock; spinlock_t zones_wp_offset_lock;
u32 *rev_wp_offset; u32 *rev_wp_offset;
@@ -217,7 +236,7 @@ static inline int sd_is_zoned(struct scsi_disk *sdkp)
#ifdef CONFIG_BLK_DEV_ZONED #ifdef CONFIG_BLK_DEV_ZONED
void sd_zbc_release_disk(struct scsi_disk *sdkp); void sd_zbc_release_disk(struct scsi_disk *sdkp);
int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buffer); int sd_zbc_read_zones(struct scsi_disk *sdkp, u8 buf[SD_BUF_SIZE]);
int sd_zbc_revalidate_zones(struct scsi_disk *sdkp); int sd_zbc_revalidate_zones(struct scsi_disk *sdkp);
blk_status_t sd_zbc_setup_zone_mgmt_cmnd(struct scsi_cmnd *cmd, blk_status_t sd_zbc_setup_zone_mgmt_cmnd(struct scsi_cmnd *cmd,
unsigned char op, bool all); unsigned char op, bool all);
@@ -233,8 +252,7 @@ blk_status_t sd_zbc_prepare_zone_append(struct scsi_cmnd *cmd, sector_t *lba,
static inline void sd_zbc_release_disk(struct scsi_disk *sdkp) {} static inline void sd_zbc_release_disk(struct scsi_disk *sdkp) {}
static inline int sd_zbc_read_zones(struct scsi_disk *sdkp, static inline int sd_zbc_read_zones(struct scsi_disk *sdkp, u8 buf[SD_BUF_SIZE])
unsigned char *buf)
{ {
return 0; return 0;
} }

View File

@@ -20,6 +20,12 @@
#include "sd.h" #include "sd.h"
/**
* sd_zbc_get_zone_wp_offset - Get zone write pointer offset.
* @zone: Zone for which to return the write pointer offset.
*
* Return: offset of the write pointer from the start of the zone.
*/
static unsigned int sd_zbc_get_zone_wp_offset(struct blk_zone *zone) static unsigned int sd_zbc_get_zone_wp_offset(struct blk_zone *zone)
{ {
if (zone->type == ZBC_ZONE_TYPE_CONV) if (zone->type == ZBC_ZONE_TYPE_CONV)
@@ -44,13 +50,37 @@ static unsigned int sd_zbc_get_zone_wp_offset(struct blk_zone *zone)
} }
} }
static int sd_zbc_parse_report(struct scsi_disk *sdkp, u8 *buf, /* Whether or not a SCSI zone descriptor describes a gap zone. */
static bool sd_zbc_is_gap_zone(const u8 buf[64])
{
return (buf[0] & 0xf) == ZBC_ZONE_TYPE_GAP;
}
/**
* sd_zbc_parse_report - Parse a SCSI zone descriptor
* @sdkp: SCSI disk pointer.
* @buf: SCSI zone descriptor.
* @idx: Index of the zone relative to the first zone reported by the current
* sd_zbc_report_zones() call.
* @cb: Callback function pointer.
* @data: Second argument passed to @cb.
*
* Return: Value returned by @cb.
*
* Convert a SCSI zone descriptor into struct blk_zone format. Additionally,
* call @cb(blk_zone, @data).
*/
static int sd_zbc_parse_report(struct scsi_disk *sdkp, const u8 buf[64],
unsigned int idx, report_zones_cb cb, void *data) unsigned int idx, report_zones_cb cb, void *data)
{ {
struct scsi_device *sdp = sdkp->device; struct scsi_device *sdp = sdkp->device;
struct blk_zone zone = { 0 }; struct blk_zone zone = { 0 };
sector_t start_lba, gran;
int ret; int ret;
if (WARN_ON_ONCE(sd_zbc_is_gap_zone(buf)))
return -EINVAL;
zone.type = buf[0] & 0x0f; zone.type = buf[0] & 0x0f;
zone.cond = (buf[1] >> 4) & 0xf; zone.cond = (buf[1] >> 4) & 0xf;
if (buf[1] & 0x01) if (buf[1] & 0x01)
@@ -58,13 +88,31 @@ static int sd_zbc_parse_report(struct scsi_disk *sdkp, u8 *buf,
if (buf[1] & 0x02) if (buf[1] & 0x02)
zone.non_seq = 1; zone.non_seq = 1;
zone.len = logical_to_sectors(sdp, get_unaligned_be64(&buf[8])); start_lba = get_unaligned_be64(&buf[16]);
zone.capacity = zone.len; zone.start = logical_to_sectors(sdp, start_lba);
zone.start = logical_to_sectors(sdp, get_unaligned_be64(&buf[16])); zone.capacity = logical_to_sectors(sdp, get_unaligned_be64(&buf[8]));
zone.wp = logical_to_sectors(sdp, get_unaligned_be64(&buf[24])); zone.len = zone.capacity;
if (zone.type != ZBC_ZONE_TYPE_CONV && if (sdkp->zone_starting_lba_gran) {
zone.cond == ZBC_ZONE_COND_FULL) gran = logical_to_sectors(sdp, sdkp->zone_starting_lba_gran);
if (zone.len > gran) {
sd_printk(KERN_ERR, sdkp,
"Invalid zone at LBA %llu with capacity %llu and length %llu; granularity = %llu\n",
start_lba,
sectors_to_logical(sdp, zone.capacity),
sectors_to_logical(sdp, zone.len),
sectors_to_logical(sdp, gran));
return -EINVAL;
}
/*
* Use the starting LBA granularity instead of the zone length
* obtained from the REPORT ZONES command.
*/
zone.len = gran;
}
if (zone.cond == ZBC_ZONE_COND_FULL)
zone.wp = zone.start + zone.len; zone.wp = zone.start + zone.len;
else
zone.wp = logical_to_sectors(sdp, get_unaligned_be64(&buf[24]));
ret = cb(&zone, idx, data); ret = cb(&zone, idx, data);
if (ret) if (ret)
@@ -161,7 +209,7 @@ static void *sd_zbc_alloc_report_buffer(struct scsi_disk *sdkp,
* sure that the allocated buffer can always be mapped by limiting the * sure that the allocated buffer can always be mapped by limiting the
* number of pages allocated to the HBA max segments limit. * number of pages allocated to the HBA max segments limit.
*/ */
nr_zones = min(nr_zones, sdkp->nr_zones); nr_zones = min(nr_zones, sdkp->zone_info.nr_zones);
bufsize = roundup((nr_zones + 1) * 64, SECTOR_SIZE); bufsize = roundup((nr_zones + 1) * 64, SECTOR_SIZE);
bufsize = min_t(size_t, bufsize, bufsize = min_t(size_t, bufsize,
queue_max_hw_sectors(q) << SECTOR_SHIFT); queue_max_hw_sectors(q) << SECTOR_SHIFT);
@@ -186,16 +234,28 @@ static void *sd_zbc_alloc_report_buffer(struct scsi_disk *sdkp,
*/ */
static inline sector_t sd_zbc_zone_sectors(struct scsi_disk *sdkp) static inline sector_t sd_zbc_zone_sectors(struct scsi_disk *sdkp)
{ {
return logical_to_sectors(sdkp->device, sdkp->zone_blocks); return logical_to_sectors(sdkp->device, sdkp->zone_info.zone_blocks);
} }
/**
* sd_zbc_report_zones - SCSI .report_zones() callback.
* @disk: Disk to report zones for.
* @sector: Start sector.
* @nr_zones: Maximum number of zones to report.
* @cb: Callback function called to report zone information.
* @data: Second argument passed to @cb.
*
* Called by the block layer to iterate over zone information. See also the
* disk->fops->report_zones() calls in block/blk-zoned.c.
*/
int sd_zbc_report_zones(struct gendisk *disk, sector_t sector, int sd_zbc_report_zones(struct gendisk *disk, sector_t sector,
unsigned int nr_zones, report_zones_cb cb, void *data) unsigned int nr_zones, report_zones_cb cb, void *data)
{ {
struct scsi_disk *sdkp = scsi_disk(disk); struct scsi_disk *sdkp = scsi_disk(disk);
sector_t capacity = logical_to_sectors(sdkp->device, sdkp->capacity); sector_t lba = sectors_to_logical(sdkp->device, sector);
unsigned int nr, i; unsigned int nr, i;
unsigned char *buf; unsigned char *buf;
u64 zone_length, start_lba;
size_t offset, buflen = 0; size_t offset, buflen = 0;
int zone_idx = 0; int zone_idx = 0;
int ret; int ret;
@@ -204,7 +264,7 @@ int sd_zbc_report_zones(struct gendisk *disk, sector_t sector,
/* Not a zoned device */ /* Not a zoned device */
return -EOPNOTSUPP; return -EOPNOTSUPP;
if (!capacity) if (!sdkp->capacity)
/* Device gone or invalid */ /* Device gone or invalid */
return -ENODEV; return -ENODEV;
@@ -212,9 +272,8 @@ int sd_zbc_report_zones(struct gendisk *disk, sector_t sector,
if (!buf) if (!buf)
return -ENOMEM; return -ENOMEM;
while (zone_idx < nr_zones && sector < capacity) { while (zone_idx < nr_zones && lba < sdkp->capacity) {
ret = sd_zbc_do_report_zones(sdkp, buf, buflen, ret = sd_zbc_do_report_zones(sdkp, buf, buflen, lba, true);
sectors_to_logical(sdkp->device, sector), true);
if (ret) if (ret)
goto out; goto out;
@@ -225,14 +284,36 @@ int sd_zbc_report_zones(struct gendisk *disk, sector_t sector,
for (i = 0; i < nr && zone_idx < nr_zones; i++) { for (i = 0; i < nr && zone_idx < nr_zones; i++) {
offset += 64; offset += 64;
start_lba = get_unaligned_be64(&buf[offset + 16]);
zone_length = get_unaligned_be64(&buf[offset + 8]);
if ((zone_idx == 0 &&
(lba < start_lba ||
lba >= start_lba + zone_length)) ||
(zone_idx > 0 && start_lba != lba) ||
start_lba + zone_length < start_lba) {
sd_printk(KERN_ERR, sdkp,
"Zone %d at LBA %llu is invalid: %llu + %llu\n",
zone_idx, lba, start_lba, zone_length);
ret = -EINVAL;
goto out;
}
lba = start_lba + zone_length;
if (sd_zbc_is_gap_zone(&buf[offset])) {
if (sdkp->zone_starting_lba_gran)
continue;
sd_printk(KERN_ERR, sdkp,
"Gap zone without constant LBA offsets\n");
ret = -EINVAL;
goto out;
}
ret = sd_zbc_parse_report(sdkp, buf + offset, zone_idx, ret = sd_zbc_parse_report(sdkp, buf + offset, zone_idx,
cb, data); cb, data);
if (ret) if (ret)
goto out; goto out;
zone_idx++; zone_idx++;
} }
sector += sd_zbc_zone_sectors(sdkp) * i;
} }
ret = zone_idx; ret = zone_idx;
@@ -276,6 +357,10 @@ static int sd_zbc_update_wp_offset_cb(struct blk_zone *zone, unsigned int idx,
return 0; return 0;
} }
/*
* An attempt to append a zone triggered an invalid write pointer error.
* Reread the write pointer of the zone(s) in which the append failed.
*/
static void sd_zbc_update_wp_offset_workfn(struct work_struct *work) static void sd_zbc_update_wp_offset_workfn(struct work_struct *work)
{ {
struct scsi_disk *sdkp; struct scsi_disk *sdkp;
@@ -286,14 +371,14 @@ static void sd_zbc_update_wp_offset_workfn(struct work_struct *work)
sdkp = container_of(work, struct scsi_disk, zone_wp_offset_work); sdkp = container_of(work, struct scsi_disk, zone_wp_offset_work);
spin_lock_irqsave(&sdkp->zones_wp_offset_lock, flags); spin_lock_irqsave(&sdkp->zones_wp_offset_lock, flags);
for (zno = 0; zno < sdkp->nr_zones; zno++) { for (zno = 0; zno < sdkp->zone_info.nr_zones; zno++) {
if (sdkp->zones_wp_offset[zno] != SD_ZBC_UPDATING_WP_OFST) if (sdkp->zones_wp_offset[zno] != SD_ZBC_UPDATING_WP_OFST)
continue; continue;
spin_unlock_irqrestore(&sdkp->zones_wp_offset_lock, flags); spin_unlock_irqrestore(&sdkp->zones_wp_offset_lock, flags);
ret = sd_zbc_do_report_zones(sdkp, sdkp->zone_wp_update_buf, ret = sd_zbc_do_report_zones(sdkp, sdkp->zone_wp_update_buf,
SD_BUF_SIZE, SD_BUF_SIZE,
zno * sdkp->zone_blocks, true); zno * sdkp->zone_info.zone_blocks, true);
spin_lock_irqsave(&sdkp->zones_wp_offset_lock, flags); spin_lock_irqsave(&sdkp->zones_wp_offset_lock, flags);
if (!ret) if (!ret)
sd_zbc_parse_report(sdkp, sdkp->zone_wp_update_buf + 64, sd_zbc_parse_report(sdkp, sdkp->zone_wp_update_buf + 64,
@@ -360,7 +445,7 @@ blk_status_t sd_zbc_prepare_zone_append(struct scsi_cmnd *cmd, sector_t *lba,
break; break;
default: default:
wp_offset = sectors_to_logical(sdkp->device, wp_offset); wp_offset = sectors_to_logical(sdkp->device, wp_offset);
if (wp_offset + nr_blocks > sdkp->zone_blocks) { if (wp_offset + nr_blocks > sdkp->zone_info.zone_blocks) {
ret = BLK_STS_IOERR; ret = BLK_STS_IOERR;
break; break;
} }
@@ -491,7 +576,7 @@ static unsigned int sd_zbc_zone_wp_update(struct scsi_cmnd *cmd,
break; break;
case REQ_OP_ZONE_RESET_ALL: case REQ_OP_ZONE_RESET_ALL:
memset(sdkp->zones_wp_offset, 0, memset(sdkp->zones_wp_offset, 0,
sdkp->nr_zones * sizeof(unsigned int)); sdkp->zone_info.nr_zones * sizeof(unsigned int));
break; break;
default: default:
break; break;
@@ -547,6 +632,7 @@ unsigned int sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes,
static int sd_zbc_check_zoned_characteristics(struct scsi_disk *sdkp, static int sd_zbc_check_zoned_characteristics(struct scsi_disk *sdkp,
unsigned char *buf) unsigned char *buf)
{ {
u64 zone_starting_lba_gran;
if (scsi_get_vpd_page(sdkp->device, 0xb6, buf, 64)) { if (scsi_get_vpd_page(sdkp->device, 0xb6, buf, 64)) {
sd_printk(KERN_NOTICE, sdkp, sd_printk(KERN_NOTICE, sdkp,
@@ -560,12 +646,36 @@ static int sd_zbc_check_zoned_characteristics(struct scsi_disk *sdkp,
sdkp->zones_optimal_open = get_unaligned_be32(&buf[8]); sdkp->zones_optimal_open = get_unaligned_be32(&buf[8]);
sdkp->zones_optimal_nonseq = get_unaligned_be32(&buf[12]); sdkp->zones_optimal_nonseq = get_unaligned_be32(&buf[12]);
sdkp->zones_max_open = 0; sdkp->zones_max_open = 0;
} else { return 0;
}
/* Host-managed */ /* Host-managed */
sdkp->urswrz = buf[4] & 1; sdkp->urswrz = buf[4] & 1;
sdkp->zones_optimal_open = 0; sdkp->zones_optimal_open = 0;
sdkp->zones_optimal_nonseq = 0; sdkp->zones_optimal_nonseq = 0;
sdkp->zones_max_open = get_unaligned_be32(&buf[16]); sdkp->zones_max_open = get_unaligned_be32(&buf[16]);
/* Check zone alignment method */
switch (buf[23] & 0xf) {
case 0:
case ZBC_CONSTANT_ZONE_LENGTH:
/* Use zone length */
break;
case ZBC_CONSTANT_ZONE_START_OFFSET:
zone_starting_lba_gran = get_unaligned_be64(&buf[24]);
if (zone_starting_lba_gran == 0 ||
!is_power_of_2(zone_starting_lba_gran) ||
logical_to_sectors(sdkp->device, zone_starting_lba_gran) >
UINT_MAX) {
sd_printk(KERN_ERR, sdkp,
"Invalid zone starting LBA granularity %llu\n",
zone_starting_lba_gran);
return -ENODEV;
}
sdkp->zone_starting_lba_gran = zone_starting_lba_gran;
break;
default:
sd_printk(KERN_ERR, sdkp, "Invalid zone alignment method\n");
return -ENODEV;
} }
/* /*
@@ -587,7 +697,7 @@ static int sd_zbc_check_zoned_characteristics(struct scsi_disk *sdkp,
* sd_zbc_check_capacity - Check the device capacity * sd_zbc_check_capacity - Check the device capacity
* @sdkp: Target disk * @sdkp: Target disk
* @buf: command buffer * @buf: command buffer
* @zblocks: zone size in number of blocks * @zblocks: zone size in logical blocks
* *
* Get the device zone size and check that the device capacity as reported * Get the device zone size and check that the device capacity as reported
* by READ CAPACITY matches the max_lba value (plus one) of the report zones * by READ CAPACITY matches the max_lba value (plus one) of the report zones
@@ -621,6 +731,7 @@ static int sd_zbc_check_capacity(struct scsi_disk *sdkp, unsigned char *buf,
} }
} }
if (sdkp->zone_starting_lba_gran == 0) {
/* Get the size of the first reported zone */ /* Get the size of the first reported zone */
rec = buf + 64; rec = buf + 64;
zone_blocks = get_unaligned_be64(&rec[8]); zone_blocks = get_unaligned_be64(&rec[8]);
@@ -630,6 +741,16 @@ static int sd_zbc_check_capacity(struct scsi_disk *sdkp, unsigned char *buf,
"Zone size too large\n"); "Zone size too large\n");
return -EFBIG; return -EFBIG;
} }
} else {
zone_blocks = sdkp->zone_starting_lba_gran;
}
if (!is_power_of_2(zone_blocks)) {
sd_printk(KERN_ERR, sdkp,
"Zone size %llu is not a power of two.\n",
zone_blocks);
return -EINVAL;
}
*zblocks = zone_blocks; *zblocks = zone_blocks;
@@ -641,16 +762,16 @@ static void sd_zbc_print_zones(struct scsi_disk *sdkp)
if (!sd_is_zoned(sdkp) || !sdkp->capacity) if (!sd_is_zoned(sdkp) || !sdkp->capacity)
return; return;
if (sdkp->capacity & (sdkp->zone_blocks - 1)) if (sdkp->capacity & (sdkp->zone_info.zone_blocks - 1))
sd_printk(KERN_NOTICE, sdkp, sd_printk(KERN_NOTICE, sdkp,
"%u zones of %u logical blocks + 1 runt zone\n", "%u zones of %u logical blocks + 1 runt zone\n",
sdkp->nr_zones - 1, sdkp->zone_info.nr_zones - 1,
sdkp->zone_blocks); sdkp->zone_info.zone_blocks);
else else
sd_printk(KERN_NOTICE, sdkp, sd_printk(KERN_NOTICE, sdkp,
"%u zones of %u logical blocks\n", "%u zones of %u logical blocks\n",
sdkp->nr_zones, sdkp->zone_info.nr_zones,
sdkp->zone_blocks); sdkp->zone_info.zone_blocks);
} }
static int sd_zbc_init_disk(struct scsi_disk *sdkp) static int sd_zbc_init_disk(struct scsi_disk *sdkp)
@@ -677,10 +798,8 @@ static void sd_zbc_clear_zone_info(struct scsi_disk *sdkp)
kfree(sdkp->zone_wp_update_buf); kfree(sdkp->zone_wp_update_buf);
sdkp->zone_wp_update_buf = NULL; sdkp->zone_wp_update_buf = NULL;
sdkp->nr_zones = 0; sdkp->early_zone_info = (struct zoned_disk_info){ };
sdkp->rev_nr_zones = 0; sdkp->zone_info = (struct zoned_disk_info){ };
sdkp->zone_blocks = 0;
sdkp->rev_zone_blocks = 0;
mutex_unlock(&sdkp->rev_mutex); mutex_unlock(&sdkp->rev_mutex);
} }
@@ -698,12 +817,17 @@ static void sd_zbc_revalidate_zones_cb(struct gendisk *disk)
swap(sdkp->zones_wp_offset, sdkp->rev_wp_offset); swap(sdkp->zones_wp_offset, sdkp->rev_wp_offset);
} }
/*
* Call blk_revalidate_disk_zones() if any of the zoned disk properties have
* changed that make it necessary to call that function. Called by
* sd_revalidate_disk() after the gendisk capacity has been set.
*/
int sd_zbc_revalidate_zones(struct scsi_disk *sdkp) int sd_zbc_revalidate_zones(struct scsi_disk *sdkp)
{ {
struct gendisk *disk = sdkp->disk; struct gendisk *disk = sdkp->disk;
struct request_queue *q = disk->queue; struct request_queue *q = disk->queue;
u32 zone_blocks = sdkp->rev_zone_blocks; u32 zone_blocks = sdkp->early_zone_info.zone_blocks;
unsigned int nr_zones = sdkp->rev_nr_zones; unsigned int nr_zones = sdkp->early_zone_info.nr_zones;
u32 max_append; u32 max_append;
int ret = 0; int ret = 0;
unsigned int flags; unsigned int flags;
@@ -734,14 +858,14 @@ int sd_zbc_revalidate_zones(struct scsi_disk *sdkp)
*/ */
mutex_lock(&sdkp->rev_mutex); mutex_lock(&sdkp->rev_mutex);
if (sdkp->zone_blocks == zone_blocks && if (sdkp->zone_info.zone_blocks == zone_blocks &&
sdkp->nr_zones == nr_zones && sdkp->zone_info.nr_zones == nr_zones &&
disk->queue->nr_zones == nr_zones) disk->queue->nr_zones == nr_zones)
goto unlock; goto unlock;
flags = memalloc_noio_save(); flags = memalloc_noio_save();
sdkp->zone_blocks = zone_blocks; sdkp->zone_info.zone_blocks = zone_blocks;
sdkp->nr_zones = nr_zones; sdkp->zone_info.nr_zones = nr_zones;
sdkp->rev_wp_offset = kvcalloc(nr_zones, sizeof(u32), GFP_KERNEL); sdkp->rev_wp_offset = kvcalloc(nr_zones, sizeof(u32), GFP_KERNEL);
if (!sdkp->rev_wp_offset) { if (!sdkp->rev_wp_offset) {
ret = -ENOMEM; ret = -ENOMEM;
@@ -756,8 +880,7 @@ int sd_zbc_revalidate_zones(struct scsi_disk *sdkp)
sdkp->rev_wp_offset = NULL; sdkp->rev_wp_offset = NULL;
if (ret) { if (ret) {
sdkp->zone_blocks = 0; sdkp->zone_info = (struct zoned_disk_info){ };
sdkp->nr_zones = 0;
sdkp->capacity = 0; sdkp->capacity = 0;
goto unlock; goto unlock;
} }
@@ -776,7 +899,16 @@ unlock:
return ret; return ret;
} }
int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf) /**
* sd_zbc_read_zones - Read zone information and update the request queue
* @sdkp: SCSI disk pointer.
* @buf: 512 byte buffer used for storing SCSI command output.
*
* Read zone information and update the request queue zone characteristics and
* also the zoned device information in *sdkp. Called by sd_revalidate_disk()
* before the gendisk capacity has been set.
*/
int sd_zbc_read_zones(struct scsi_disk *sdkp, u8 buf[SD_BUF_SIZE])
{ {
struct gendisk *disk = sdkp->disk; struct gendisk *disk = sdkp->disk;
struct request_queue *q = disk->queue; struct request_queue *q = disk->queue;
@@ -834,8 +966,8 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
if (blk_queue_zoned_model(q) == BLK_ZONED_HM) if (blk_queue_zoned_model(q) == BLK_ZONED_HM)
blk_queue_zone_write_granularity(q, sdkp->physical_block_size); blk_queue_zone_write_granularity(q, sdkp->physical_block_size);
sdkp->rev_nr_zones = nr_zones; sdkp->early_zone_info.nr_zones = nr_zones;
sdkp->rev_zone_blocks = zone_blocks; sdkp->early_zone_info.zone_blocks = zone_blocks;
return 0; return 0;

View File

@@ -98,13 +98,7 @@ repeat:
} }
if (unlikely(!PageUptodate(page))) { if (unlikely(!PageUptodate(page))) {
if (page->index == sbi->metapage_eio_ofs && f2fs_handle_page_eio(sbi, page->index, META);
sbi->metapage_eio_cnt++ == MAX_RETRY_META_PAGE_EIO) {
set_ckpt_flags(sbi, CP_ERROR_FLAG);
} else {
sbi->metapage_eio_ofs = page->index;
sbi->metapage_eio_cnt = 0;
}
f2fs_put_page(page, 1); f2fs_put_page(page, 1);
return ERR_PTR(-EIO); return ERR_PTR(-EIO);
} }
@@ -158,7 +152,7 @@ static bool __is_bitmap_valid(struct f2fs_sb_info *sbi, block_t blkaddr,
f2fs_err(sbi, "Inconsistent error blkaddr:%u, sit bitmap:%d", f2fs_err(sbi, "Inconsistent error blkaddr:%u, sit bitmap:%d",
blkaddr, exist); blkaddr, exist);
set_sbi_flag(sbi, SBI_NEED_FSCK); set_sbi_flag(sbi, SBI_NEED_FSCK);
WARN_ON(1); dump_stack();
} }
return exist; return exist;
} }
@@ -196,7 +190,7 @@ bool f2fs_is_valid_blkaddr(struct f2fs_sb_info *sbi,
f2fs_warn(sbi, "access invalid blkaddr:%u", f2fs_warn(sbi, "access invalid blkaddr:%u",
blkaddr); blkaddr);
set_sbi_flag(sbi, SBI_NEED_FSCK); set_sbi_flag(sbi, SBI_NEED_FSCK);
WARN_ON(1); dump_stack();
return false; return false;
} else { } else {
return __is_bitmap_valid(sbi, blkaddr, type); return __is_bitmap_valid(sbi, blkaddr, type);
@@ -1009,9 +1003,7 @@ static void __add_dirty_inode(struct inode *inode, enum inode_type type)
return; return;
set_inode_flag(inode, flag); set_inode_flag(inode, flag);
if (!f2fs_is_volatile_file(inode)) list_add_tail(&F2FS_I(inode)->dirty_list, &sbi->inode_list[type]);
list_add_tail(&F2FS_I(inode)->dirty_list,
&sbi->inode_list[type]);
stat_inc_dirty_inode(sbi, type); stat_inc_dirty_inode(sbi, type);
} }

View File

@@ -1503,9 +1503,7 @@ continue_unlock:
if (IS_NOQUOTA(cc->inode)) if (IS_NOQUOTA(cc->inode))
return 0; return 0;
ret = 0; ret = 0;
cond_resched(); f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
congestion_wait(BLK_RW_ASYNC,
DEFAULT_IO_TIMEOUT);
goto retry_write; goto retry_write;
} }
return ret; return ret;

View File

@@ -71,8 +71,7 @@ static bool __is_cp_guaranteed(struct page *page)
if (f2fs_is_compressed_page(page)) if (f2fs_is_compressed_page(page))
return false; return false;
if ((S_ISREG(inode->i_mode) && if ((S_ISREG(inode->i_mode) && IS_NOQUOTA(inode)) ||
(f2fs_is_atomic_file(inode) || IS_NOQUOTA(inode))) ||
page_private_gcing(page)) page_private_gcing(page))
return true; return true;
return false; return false;
@@ -356,7 +355,7 @@ static void f2fs_write_end_io(struct bio *bio)
} }
struct block_device *f2fs_target_device(struct f2fs_sb_info *sbi, struct block_device *f2fs_target_device(struct f2fs_sb_info *sbi,
block_t blk_addr, struct bio *bio) block_t blk_addr, sector_t *sector)
{ {
struct block_device *bdev = sbi->sb->s_bdev; struct block_device *bdev = sbi->sb->s_bdev;
int i; int i;
@@ -371,10 +370,9 @@ struct block_device *f2fs_target_device(struct f2fs_sb_info *sbi,
} }
} }
} }
if (bio) {
bio_set_dev(bio, bdev); if (sector)
bio->bi_iter.bi_sector = SECTOR_FROM_BLOCK(blk_addr); *sector = SECTOR_FROM_BLOCK(blk_addr);
}
return bdev; return bdev;
} }
@@ -391,22 +389,55 @@ int f2fs_target_device_index(struct f2fs_sb_info *sbi, block_t blkaddr)
return 0; return 0;
} }
static unsigned int f2fs_io_flags(struct f2fs_io_info *fio)
{
unsigned int temp_mask = (1 << NR_TEMP_TYPE) - 1;
unsigned int fua_flag, meta_flag, io_flag;
unsigned int op_flags = 0;
if (fio->op != REQ_OP_WRITE)
return 0;
if (fio->type == DATA)
io_flag = fio->sbi->data_io_flag;
else if (fio->type == NODE)
io_flag = fio->sbi->node_io_flag;
else
return 0;
fua_flag = io_flag & temp_mask;
meta_flag = (io_flag >> NR_TEMP_TYPE) & temp_mask;
/*
* data/node io flag bits per temp:
* REQ_META | REQ_FUA |
* 5 | 4 | 3 | 2 | 1 | 0 |
* Cold | Warm | Hot | Cold | Warm | Hot |
*/
if ((1 << fio->temp) & meta_flag)
op_flags |= REQ_META;
if ((1 << fio->temp) & fua_flag)
op_flags |= REQ_FUA;
return op_flags;
}
static struct bio *__bio_alloc(struct f2fs_io_info *fio, int npages) static struct bio *__bio_alloc(struct f2fs_io_info *fio, int npages)
{ {
struct f2fs_sb_info *sbi = fio->sbi; struct f2fs_sb_info *sbi = fio->sbi;
struct block_device *bdev;
sector_t sector;
struct bio *bio; struct bio *bio;
bdev = f2fs_target_device(sbi, fio->new_blkaddr, &sector);
bio = bio_alloc_bioset(GFP_NOIO, npages, &f2fs_bioset); bio = bio_alloc_bioset(GFP_NOIO, npages, &f2fs_bioset);
bio_set_dev(bio, bdev);
f2fs_target_device(sbi, fio->new_blkaddr, bio); bio_set_op_attrs(bio, fio->op, fio->op_flags | f2fs_io_flags(fio));
bio->bi_iter.bi_sector = sector;
if (is_read_io(fio->op)) { if (is_read_io(fio->op)) {
bio->bi_end_io = f2fs_read_end_io; bio->bi_end_io = f2fs_read_end_io;
bio->bi_private = NULL; bio->bi_private = NULL;
} else { } else {
bio->bi_end_io = f2fs_write_end_io; bio->bi_end_io = f2fs_write_end_io;
bio->bi_private = sbi; bio->bi_private = sbi;
bio->bi_write_hint = f2fs_io_type_to_rw_hint(sbi,
fio->type, fio->temp);
} }
iostat_alloc_and_bind_ctx(sbi, bio, NULL); iostat_alloc_and_bind_ctx(sbi, bio, NULL);
@@ -506,34 +537,6 @@ void f2fs_submit_bio(struct f2fs_sb_info *sbi,
__submit_bio(sbi, bio, type); __submit_bio(sbi, bio, type);
} }
static void __attach_io_flag(struct f2fs_io_info *fio)
{
struct f2fs_sb_info *sbi = fio->sbi;
unsigned int temp_mask = (1 << NR_TEMP_TYPE) - 1;
unsigned int io_flag, fua_flag, meta_flag;
if (fio->type == DATA)
io_flag = sbi->data_io_flag;
else if (fio->type == NODE)
io_flag = sbi->node_io_flag;
else
return;
fua_flag = io_flag & temp_mask;
meta_flag = (io_flag >> NR_TEMP_TYPE) & temp_mask;
/*
* data/node io flag bits per temp:
* REQ_META | REQ_FUA |
* 5 | 4 | 3 | 2 | 1 | 0 |
* Cold | Warm | Hot | Cold | Warm | Hot |
*/
if ((1 << fio->temp) & meta_flag)
fio->op_flags |= REQ_META;
if ((1 << fio->temp) & fua_flag)
fio->op_flags |= REQ_FUA;
}
static void __submit_merged_bio(struct f2fs_bio_info *io) static void __submit_merged_bio(struct f2fs_bio_info *io)
{ {
struct f2fs_io_info *fio = &io->fio; struct f2fs_io_info *fio = &io->fio;
@@ -541,9 +544,6 @@ static void __submit_merged_bio(struct f2fs_bio_info *io)
if (!io->bio) if (!io->bio)
return; return;
__attach_io_flag(fio);
bio_set_op_attrs(io->bio, fio->op, fio->op_flags);
if (is_read_io(fio->op)) if (is_read_io(fio->op))
trace_f2fs_prepare_read_bio(io->sbi->sb, fio->type, io->bio); trace_f2fs_prepare_read_bio(io->sbi->sb, fio->type, io->bio);
else else
@@ -590,6 +590,34 @@ static bool __has_merged_page(struct bio *bio, struct inode *inode,
return false; return false;
} }
int f2fs_init_write_merge_io(struct f2fs_sb_info *sbi)
{
int i;
for (i = 0; i < NR_PAGE_TYPE; i++) {
int n = (i == META) ? 1 : NR_TEMP_TYPE;
int j;
sbi->write_io[i] = f2fs_kmalloc(sbi,
array_size(n, sizeof(struct f2fs_bio_info)),
GFP_KERNEL);
if (!sbi->write_io[i])
return -ENOMEM;
for (j = HOT; j < n; j++) {
init_f2fs_rwsem(&sbi->write_io[i][j].io_rwsem);
sbi->write_io[i][j].sbi = sbi;
sbi->write_io[i][j].bio = NULL;
spin_lock_init(&sbi->write_io[i][j].io_lock);
INIT_LIST_HEAD(&sbi->write_io[i][j].io_list);
INIT_LIST_HEAD(&sbi->write_io[i][j].bio_list);
init_f2fs_rwsem(&sbi->write_io[i][j].bio_list_lock);
}
}
return 0;
}
static void __f2fs_submit_merged_write(struct f2fs_sb_info *sbi, static void __f2fs_submit_merged_write(struct f2fs_sb_info *sbi,
enum page_type type, enum temp_type temp) enum page_type type, enum temp_type temp)
{ {
@@ -601,10 +629,9 @@ static void __f2fs_submit_merged_write(struct f2fs_sb_info *sbi,
/* change META to META_FLUSH in the checkpoint procedure */ /* change META to META_FLUSH in the checkpoint procedure */
if (type >= META_FLUSH) { if (type >= META_FLUSH) {
io->fio.type = META_FLUSH; io->fio.type = META_FLUSH;
io->fio.op = REQ_OP_WRITE; io->bio->bi_opf |= REQ_META | REQ_PRIO | REQ_SYNC;
io->fio.op_flags = REQ_META | REQ_PRIO | REQ_SYNC;
if (!test_opt(sbi, NOBARRIER)) if (!test_opt(sbi, NOBARRIER))
io->fio.op_flags |= REQ_PREFLUSH | REQ_FUA; io->bio->bi_opf |= REQ_PREFLUSH | REQ_FUA;
} }
__submit_merged_bio(io); __submit_merged_bio(io);
f2fs_up_write(&io->io_rwsem); f2fs_up_write(&io->io_rwsem);
@@ -685,9 +712,6 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
if (fio->io_wbc && !is_read_io(fio->op)) if (fio->io_wbc && !is_read_io(fio->op))
wbc_account_cgroup_owner(fio->io_wbc, page, PAGE_SIZE); wbc_account_cgroup_owner(fio->io_wbc, page, PAGE_SIZE);
__attach_io_flag(fio);
bio_set_op_attrs(bio, fio->op, fio->op_flags);
inc_page_count(fio->sbi, is_read_io(fio->op) ? inc_page_count(fio->sbi, is_read_io(fio->op) ?
__read_io_type(page): WB_DATA_TYPE(fio->page)); __read_io_type(page): WB_DATA_TYPE(fio->page));
@@ -881,10 +905,8 @@ int f2fs_merge_page_bio(struct f2fs_io_info *fio)
alloc_new: alloc_new:
if (!bio) { if (!bio) {
bio = __bio_alloc(fio, BIO_MAX_VECS); bio = __bio_alloc(fio, BIO_MAX_VECS);
__attach_io_flag(fio);
f2fs_set_bio_crypt_ctx(bio, fio->page->mapping->host, f2fs_set_bio_crypt_ctx(bio, fio->page->mapping->host,
fio->page->index, fio, GFP_NOIO); fio->page->index, fio, GFP_NOIO);
bio_set_op_attrs(bio, fio->op, fio->op_flags);
add_bio_entry(fio->sbi, bio, page, fio->temp); add_bio_entry(fio->sbi, bio, page, fio->temp);
} else { } else {
@@ -990,17 +1012,18 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
struct bio *bio; struct bio *bio;
struct bio_post_read_ctx *ctx = NULL; struct bio_post_read_ctx *ctx = NULL;
unsigned int post_read_steps = 0; unsigned int post_read_steps = 0;
sector_t sector;
struct block_device *bdev = f2fs_target_device(sbi, blkaddr, &sector);
bio = bio_alloc_bioset(for_write ? GFP_NOIO : GFP_KERNEL, bio = bio_alloc_bioset(for_write ? GFP_NOIO : GFP_KERNEL,
bio_max_segs(nr_pages), &f2fs_bioset); bio_max_segs(nr_pages), &f2fs_bioset);
bio_set_dev(bio, bdev);
bio_set_op_attrs(bio, REQ_OP_READ, op_flag);
if (!bio) if (!bio)
return ERR_PTR(-ENOMEM); return ERR_PTR(-ENOMEM);
bio->bi_iter.bi_sector = sector;
f2fs_set_bio_crypt_ctx(bio, inode, first_idx, NULL, GFP_NOFS); f2fs_set_bio_crypt_ctx(bio, inode, first_idx, NULL, GFP_NOFS);
f2fs_target_device(sbi, blkaddr, bio);
bio->bi_end_io = f2fs_read_end_io; bio->bi_end_io = f2fs_read_end_io;
bio_set_op_attrs(bio, REQ_OP_READ, op_flag);
if (fscrypt_inode_uses_fs_layer_crypto(inode)) if (fscrypt_inode_uses_fs_layer_crypto(inode))
post_read_steps |= STEP_DECRYPT; post_read_steps |= STEP_DECRYPT;
@@ -2586,7 +2609,12 @@ int f2fs_do_write_data_page(struct f2fs_io_info *fio)
bool ipu_force = false; bool ipu_force = false;
int err = 0; int err = 0;
/* Use COW inode to make dnode_of_data for atomic write */
if (f2fs_is_atomic_file(inode))
set_new_dnode(&dn, F2FS_I(inode)->cow_inode, NULL, NULL, 0);
else
set_new_dnode(&dn, inode, NULL, NULL, 0); set_new_dnode(&dn, inode, NULL, NULL, 0);
if (need_inplace_update(fio) && if (need_inplace_update(fio) &&
f2fs_lookup_extent_cache(inode, page->index, &ei)) { f2fs_lookup_extent_cache(inode, page->index, &ei)) {
fio->old_blkaddr = ei.blk + page->index - ei.fofs; fio->old_blkaddr = ei.blk + page->index - ei.fofs;
@@ -2623,6 +2651,7 @@ got_it:
err = -EFSCORRUPTED; err = -EFSCORRUPTED;
goto out_writepage; goto out_writepage;
} }
/* /*
* If current allocation needs SSR, * If current allocation needs SSR,
* it had better in-place writes for updated data. * it had better in-place writes for updated data.
@@ -2759,11 +2788,6 @@ int f2fs_write_single_data_page(struct page *page, int *submitted,
write: write:
if (f2fs_is_drop_cache(inode)) if (f2fs_is_drop_cache(inode))
goto out; goto out;
/* we should not write 0'th page having journal header */
if (f2fs_is_volatile_file(inode) && (!page->index ||
(!wbc->for_reclaim &&
f2fs_available_free_memory(sbi, BASE_CHECK))))
goto redirty_out;
/* Dentry/quota blocks are controlled by checkpoint */ /* Dentry/quota blocks are controlled by checkpoint */
if (S_ISDIR(inode->i_mode) || IS_NOQUOTA(inode)) { if (S_ISDIR(inode->i_mode) || IS_NOQUOTA(inode)) {
@@ -3071,8 +3095,7 @@ result:
} else if (ret == -EAGAIN) { } else if (ret == -EAGAIN) {
ret = 0; ret = 0;
if (wbc->sync_mode == WB_SYNC_ALL) { if (wbc->sync_mode == WB_SYNC_ALL) {
cond_resched(); f2fs_io_schedule_timeout(
congestion_wait(BLK_RW_ASYNC,
DEFAULT_IO_TIMEOUT); DEFAULT_IO_TIMEOUT);
goto retry_write; goto retry_write;
} }
@@ -3337,6 +3360,100 @@ unlock_out:
return err; return err;
} }
static int __find_data_block(struct inode *inode, pgoff_t index,
block_t *blk_addr)
{
struct dnode_of_data dn;
struct page *ipage;
struct extent_info ei = {0, };
int err = 0;
ipage = f2fs_get_node_page(F2FS_I_SB(inode), inode->i_ino);
if (IS_ERR(ipage))
return PTR_ERR(ipage);
set_new_dnode(&dn, inode, ipage, ipage, 0);
if (f2fs_lookup_extent_cache(inode, index, &ei)) {
dn.data_blkaddr = ei.blk + index - ei.fofs;
} else {
/* hole case */
err = f2fs_get_dnode_of_data(&dn, index, LOOKUP_NODE);
if (err) {
dn.data_blkaddr = NULL_ADDR;
err = 0;
}
}
*blk_addr = dn.data_blkaddr;
f2fs_put_dnode(&dn);
return err;
}
static int __reserve_data_block(struct inode *inode, pgoff_t index,
block_t *blk_addr, bool *node_changed)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct dnode_of_data dn;
struct page *ipage;
int err = 0;
f2fs_do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, true);
ipage = f2fs_get_node_page(sbi, inode->i_ino);
if (IS_ERR(ipage)) {
err = PTR_ERR(ipage);
goto unlock_out;
}
set_new_dnode(&dn, inode, ipage, ipage, 0);
err = f2fs_get_block(&dn, index);
*blk_addr = dn.data_blkaddr;
*node_changed = dn.node_changed;
f2fs_put_dnode(&dn);
unlock_out:
f2fs_do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, false);
return err;
}
static int prepare_atomic_write_begin(struct f2fs_sb_info *sbi,
struct page *page, loff_t pos, unsigned int len,
block_t *blk_addr, bool *node_changed)
{
struct inode *inode = page->mapping->host;
struct inode *cow_inode = F2FS_I(inode)->cow_inode;
pgoff_t index = page->index;
int err = 0;
block_t ori_blk_addr;
/* If pos is beyond the end of file, reserve a new block in COW inode */
if ((pos & PAGE_MASK) >= i_size_read(inode))
return __reserve_data_block(cow_inode, index, blk_addr,
node_changed);
/* Look for the block in COW inode first */
err = __find_data_block(cow_inode, index, blk_addr);
if (err)
return err;
else if (*blk_addr != NULL_ADDR)
return 0;
/* Look for the block in the original inode */
err = __find_data_block(inode, index, &ori_blk_addr);
if (err)
return err;
/* Finally, we should reserve a new block in COW inode for the update */
err = __reserve_data_block(cow_inode, index, blk_addr, node_changed);
if (err)
return err;
if (ori_blk_addr != NULL_ADDR)
*blk_addr = ori_blk_addr;
return 0;
}
static int f2fs_write_begin(struct file *file, struct address_space *mapping, static int f2fs_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned flags, loff_t pos, unsigned len, unsigned flags,
struct page **pagep, void **fsdata) struct page **pagep, void **fsdata)
@@ -3345,7 +3462,7 @@ static int f2fs_write_begin(struct file *file, struct address_space *mapping,
struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct page *page = NULL; struct page *page = NULL;
pgoff_t index = ((unsigned long long) pos) >> PAGE_SHIFT; pgoff_t index = ((unsigned long long) pos) >> PAGE_SHIFT;
bool need_balance = false, drop_atomic = false; bool need_balance = false;
block_t blkaddr = NULL_ADDR; block_t blkaddr = NULL_ADDR;
int err = 0; int err = 0;
@@ -3372,14 +3489,6 @@ static int f2fs_write_begin(struct file *file, struct address_space *mapping,
goto fail; goto fail;
} }
if ((f2fs_is_atomic_file(inode) &&
!f2fs_available_free_memory(sbi, INMEM_PAGES)) ||
is_inode_flag_set(inode, FI_ATOMIC_REVOKE_REQUEST)) {
err = -ENOMEM;
drop_atomic = true;
goto fail;
}
/* /*
* We should check this at this moment to avoid deadlock on inode page * We should check this at this moment to avoid deadlock on inode page
* and #0 page. The locking rule for inline_data conversion should be: * and #0 page. The locking rule for inline_data conversion should be:
@@ -3427,6 +3536,10 @@ repeat:
*pagep = page; *pagep = page;
if (f2fs_is_atomic_file(inode))
err = prepare_atomic_write_begin(sbi, page, pos, len,
&blkaddr, &need_balance);
else
err = prepare_write_begin(sbi, page, pos, len, err = prepare_write_begin(sbi, page, pos, len,
&blkaddr, &need_balance); &blkaddr, &need_balance);
if (err) if (err)
@@ -3483,8 +3596,6 @@ repeat:
fail: fail:
f2fs_put_page(page, 1); f2fs_put_page(page, 1);
f2fs_write_failed(inode, pos + len); f2fs_write_failed(inode, pos + len);
if (drop_atomic)
f2fs_drop_inmem_pages_all(sbi, false);
return err; return err;
} }
@@ -3529,8 +3640,12 @@ static int f2fs_write_end(struct file *file,
set_page_dirty(page); set_page_dirty(page);
if (pos + copied > i_size_read(inode) && if (pos + copied > i_size_read(inode) &&
!f2fs_verity_in_progress(inode)) !f2fs_verity_in_progress(inode)) {
f2fs_i_size_write(inode, pos + copied); f2fs_i_size_write(inode, pos + copied);
if (f2fs_is_atomic_file(inode))
f2fs_i_size_write(F2FS_I(inode)->cow_inode,
pos + copied);
}
unlock_out: unlock_out:
f2fs_put_page(page, 1); f2fs_put_page(page, 1);
f2fs_update_time(F2FS_I_SB(inode), REQ_TIME); f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
@@ -3564,9 +3679,6 @@ void f2fs_invalidate_page(struct page *page, unsigned int offset,
inode->i_ino == F2FS_COMPRESS_INO(sbi)) inode->i_ino == F2FS_COMPRESS_INO(sbi))
clear_page_private_data(page); clear_page_private_data(page);
if (page_private_atomic(page))
return f2fs_drop_inmem_page(inode, page);
detach_page_private(page); detach_page_private(page);
set_page_private(page, 0); set_page_private(page, 0);
} }
@@ -3577,10 +3689,6 @@ int f2fs_release_page(struct page *page, gfp_t wait)
if (PageDirty(page)) if (PageDirty(page))
return 0; return 0;
/* This is atomic written page, keep Private */
if (page_private_atomic(page))
return 0;
if (test_opt(F2FS_P_SB(page), COMPRESS_CACHE)) { if (test_opt(F2FS_P_SB(page), COMPRESS_CACHE)) {
struct inode *inode = page->mapping->host; struct inode *inode = page->mapping->host;
@@ -3606,18 +3714,6 @@ static int f2fs_set_data_page_dirty(struct page *page)
if (PageSwapCache(page)) if (PageSwapCache(page))
return __set_page_dirty_nobuffers(page); return __set_page_dirty_nobuffers(page);
if (f2fs_is_atomic_file(inode) && !f2fs_is_commit_atomic_write(inode)) {
if (!page_private_atomic(page)) {
f2fs_register_inmem_page(inode, page);
return 1;
}
/*
* Previously, this page has been registered, we just
* return here.
*/
return 0;
}
if (!PageDirty(page)) { if (!PageDirty(page)) {
__set_page_dirty_nobuffers(page); __set_page_dirty_nobuffers(page);
f2fs_update_dirty_page(inode, page); f2fs_update_dirty_page(inode, page);
@@ -3697,42 +3793,14 @@ out:
int f2fs_migrate_page(struct address_space *mapping, int f2fs_migrate_page(struct address_space *mapping,
struct page *newpage, struct page *page, enum migrate_mode mode) struct page *newpage, struct page *page, enum migrate_mode mode)
{ {
int rc, extra_count; int rc, extra_count = 0;
struct f2fs_inode_info *fi = F2FS_I(mapping->host);
bool atomic_written = page_private_atomic(page);
BUG_ON(PageWriteback(page)); BUG_ON(PageWriteback(page));
/* migrating an atomic written page is safe with the inmem_lock hold */
if (atomic_written) {
if (mode != MIGRATE_SYNC)
return -EBUSY;
if (!mutex_trylock(&fi->inmem_lock))
return -EAGAIN;
}
/* one extra reference was held for atomic_write page */
extra_count = atomic_written ? 1 : 0;
rc = migrate_page_move_mapping(mapping, newpage, rc = migrate_page_move_mapping(mapping, newpage,
page, extra_count); page, extra_count);
if (rc != MIGRATEPAGE_SUCCESS) { if (rc != MIGRATEPAGE_SUCCESS)
if (atomic_written)
mutex_unlock(&fi->inmem_lock);
return rc; return rc;
}
if (atomic_written) {
struct inmem_pages *cur;
list_for_each_entry(cur, &fi->inmem_pages, list)
if (cur->page == page) {
cur->page = newpage;
break;
}
mutex_unlock(&fi->inmem_lock);
put_page(page);
get_page(newpage);
}
/* guarantee to start from no stale private field */ /* guarantee to start from no stale private field */
set_page_private(newpage, 0); set_page_private(newpage, 0);

View File

@@ -91,11 +91,8 @@ static void update_general_status(struct f2fs_sb_info *sbi)
si->ndirty_files = sbi->ndirty_inode[FILE_INODE]; si->ndirty_files = sbi->ndirty_inode[FILE_INODE];
si->nquota_files = sbi->nquota_files; si->nquota_files = sbi->nquota_files;
si->ndirty_all = sbi->ndirty_inode[DIRTY_META]; si->ndirty_all = sbi->ndirty_inode[DIRTY_META];
si->inmem_pages = get_pages(sbi, F2FS_INMEM_PAGES);
si->aw_cnt = sbi->atomic_files; si->aw_cnt = sbi->atomic_files;
si->vw_cnt = atomic_read(&sbi->vw_cnt);
si->max_aw_cnt = atomic_read(&sbi->max_aw_cnt); si->max_aw_cnt = atomic_read(&sbi->max_aw_cnt);
si->max_vw_cnt = atomic_read(&sbi->max_vw_cnt);
si->nr_dio_read = get_pages(sbi, F2FS_DIO_READ); si->nr_dio_read = get_pages(sbi, F2FS_DIO_READ);
si->nr_dio_write = get_pages(sbi, F2FS_DIO_WRITE); si->nr_dio_write = get_pages(sbi, F2FS_DIO_WRITE);
si->nr_wb_cp_data = get_pages(sbi, F2FS_WB_CP_DATA); si->nr_wb_cp_data = get_pages(sbi, F2FS_WB_CP_DATA);
@@ -167,8 +164,6 @@ static void update_general_status(struct f2fs_sb_info *sbi)
si->alloc_nids = NM_I(sbi)->nid_cnt[PREALLOC_NID]; si->alloc_nids = NM_I(sbi)->nid_cnt[PREALLOC_NID];
si->io_skip_bggc = sbi->io_skip_bggc; si->io_skip_bggc = sbi->io_skip_bggc;
si->other_skip_bggc = sbi->other_skip_bggc; si->other_skip_bggc = sbi->other_skip_bggc;
si->skipped_atomic_files[BG_GC] = sbi->skipped_atomic_files[BG_GC];
si->skipped_atomic_files[FG_GC] = sbi->skipped_atomic_files[FG_GC];
si->util_free = (int)(free_user_blocks(sbi) >> sbi->log_blocks_per_seg) si->util_free = (int)(free_user_blocks(sbi) >> sbi->log_blocks_per_seg)
* 100 / (int)(sbi->user_block_count >> sbi->log_blocks_per_seg) * 100 / (int)(sbi->user_block_count >> sbi->log_blocks_per_seg)
/ 2; / 2;
@@ -296,7 +291,6 @@ get_cache:
sizeof(struct nat_entry); sizeof(struct nat_entry);
si->cache_mem += NM_I(sbi)->nat_cnt[DIRTY_NAT] * si->cache_mem += NM_I(sbi)->nat_cnt[DIRTY_NAT] *
sizeof(struct nat_entry_set); sizeof(struct nat_entry_set);
si->cache_mem += si->inmem_pages * sizeof(struct inmem_pages);
for (i = 0; i < MAX_INO_ENTRY; i++) for (i = 0; i < MAX_INO_ENTRY; i++)
si->cache_mem += sbi->im[i].ino_num * sizeof(struct ino_entry); si->cache_mem += sbi->im[i].ino_num * sizeof(struct ino_entry);
si->cache_mem += atomic_read(&sbi->total_ext_tree) * si->cache_mem += atomic_read(&sbi->total_ext_tree) *
@@ -491,10 +485,6 @@ static int stat_show(struct seq_file *s, void *v)
si->bg_data_blks); si->bg_data_blks);
seq_printf(s, " - node blocks : %d (%d)\n", si->node_blks, seq_printf(s, " - node blocks : %d (%d)\n", si->node_blks,
si->bg_node_blks); si->bg_node_blks);
seq_printf(s, "Skipped : atomic write %llu (%llu)\n",
si->skipped_atomic_files[BG_GC] +
si->skipped_atomic_files[FG_GC],
si->skipped_atomic_files[BG_GC]);
seq_printf(s, "BG skip : IO: %u, Other: %u\n", seq_printf(s, "BG skip : IO: %u, Other: %u\n",
si->io_skip_bggc, si->other_skip_bggc); si->io_skip_bggc, si->other_skip_bggc);
seq_puts(s, "\nExtent Cache:\n"); seq_puts(s, "\nExtent Cache:\n");
@@ -519,10 +509,8 @@ static int stat_show(struct seq_file *s, void *v)
si->flush_list_empty, si->flush_list_empty,
si->nr_discarding, si->nr_discarded, si->nr_discarding, si->nr_discarded,
si->nr_discard_cmd, si->undiscard_blks); si->nr_discard_cmd, si->undiscard_blks);
seq_printf(s, " - inmem: %4d, atomic IO: %4d (Max. %4d), " seq_printf(s, " - atomic IO: %4d (Max. %4d)\n",
"volatile IO: %4d (Max. %4d)\n", si->aw_cnt, si->max_aw_cnt);
si->inmem_pages, si->aw_cnt, si->max_aw_cnt,
si->vw_cnt, si->max_vw_cnt);
seq_printf(s, " - compress: %4d, hit:%8d\n", si->compress_pages, si->compress_page_hit); seq_printf(s, " - compress: %4d, hit:%8d\n", si->compress_pages, si->compress_page_hit);
seq_printf(s, " - nodes: %4d in %4d\n", seq_printf(s, " - nodes: %4d in %4d\n",
si->ndirty_node, si->node_pages); si->ndirty_node, si->node_pages);
@@ -623,9 +611,7 @@ int f2fs_build_stats(struct f2fs_sb_info *sbi)
for (i = META_CP; i < META_MAX; i++) for (i = META_CP; i < META_MAX; i++)
atomic_set(&sbi->meta_count[i], 0); atomic_set(&sbi->meta_count[i], 0);
atomic_set(&sbi->vw_cnt, 0);
atomic_set(&sbi->max_aw_cnt, 0); atomic_set(&sbi->max_aw_cnt, 0);
atomic_set(&sbi->max_vw_cnt, 0);
raw_spin_lock_irqsave(&f2fs_stat_lock, flags); raw_spin_lock_irqsave(&f2fs_stat_lock, flags);
list_add_tail(&si->stat_list, &f2fs_stat_list); list_add_tail(&si->stat_list, &f2fs_stat_list);

View File

@@ -82,7 +82,8 @@ int f2fs_init_casefolded_name(const struct inode *dir,
#ifdef CONFIG_UNICODE #ifdef CONFIG_UNICODE
struct super_block *sb = dir->i_sb; struct super_block *sb = dir->i_sb;
if (IS_CASEFOLDED(dir)) { if (IS_CASEFOLDED(dir) &&
!is_dot_dotdot(fname->usr_fname->name, fname->usr_fname->len)) {
fname->cf_name.name = f2fs_kmem_cache_alloc(f2fs_cf_name_slab, fname->cf_name.name = f2fs_kmem_cache_alloc(f2fs_cf_name_slab,
GFP_NOFS, false, F2FS_SB(sb)); GFP_NOFS, false, F2FS_SB(sb));
if (!fname->cf_name.name) if (!fname->cf_name.name)

View File

@@ -152,7 +152,6 @@ struct f2fs_mount_info {
int s_jquota_fmt; /* Format of quota to use */ int s_jquota_fmt; /* Format of quota to use */
#endif #endif
/* For which write hints are passed down to block layer */ /* For which write hints are passed down to block layer */
int whint_mode;
int alloc_mode; /* segment allocation policy */ int alloc_mode; /* segment allocation policy */
int fsync_mode; /* fsync policy */ int fsync_mode; /* fsync policy */
int fs_mode; /* fs mode: LFS or ADAPTIVE */ int fs_mode; /* fs mode: LFS or ADAPTIVE */
@@ -507,11 +506,11 @@ struct f2fs_filename {
#ifdef CONFIG_UNICODE #ifdef CONFIG_UNICODE
/* /*
* For casefolded directories: the casefolded name, but it's left NULL * For casefolded directories: the casefolded name, but it's left NULL
* if the original name is not valid Unicode, if the directory is both * if the original name is not valid Unicode, if the original name is
* casefolded and encrypted and its encryption key is unavailable, or if * "." or "..", if the directory is both casefolded and encrypted and
* the filesystem is doing an internal operation where usr_fname is also * its encryption key is unavailable, or if the filesystem is doing an
* NULL. In all these cases we fall back to treating the name as an * internal operation where usr_fname is also NULL. In all these cases
* opaque byte sequence. * we fall back to treating the name as an opaque byte sequence.
*/ */
struct fscrypt_str cf_name; struct fscrypt_str cf_name;
#endif #endif
@@ -577,8 +576,8 @@ enum {
/* maximum retry quota flush count */ /* maximum retry quota flush count */
#define DEFAULT_RETRY_QUOTA_FLUSH_COUNT 8 #define DEFAULT_RETRY_QUOTA_FLUSH_COUNT 8
/* maximum retry of EIO'ed meta page */ /* maximum retry of EIO'ed page */
#define MAX_RETRY_META_PAGE_EIO 100 #define MAX_RETRY_PAGE_EIO 100
#define F2FS_LINK_MAX 0xffffffff /* maximum link count per file */ #define F2FS_LINK_MAX 0xffffffff /* maximum link count per file */
@@ -715,7 +714,6 @@ enum {
enum { enum {
GC_FAILURE_PIN, GC_FAILURE_PIN,
GC_FAILURE_ATOMIC,
MAX_GC_FAILURE MAX_GC_FAILURE
}; };
@@ -737,8 +735,6 @@ enum {
FI_UPDATE_WRITE, /* inode has in-place-update data */ FI_UPDATE_WRITE, /* inode has in-place-update data */
FI_NEED_IPU, /* used for ipu per file */ FI_NEED_IPU, /* used for ipu per file */
FI_ATOMIC_FILE, /* indicate atomic file */ FI_ATOMIC_FILE, /* indicate atomic file */
FI_ATOMIC_COMMIT, /* indicate the state of atomical committing */
FI_VOLATILE_FILE, /* indicate volatile file */
FI_FIRST_BLOCK_WRITTEN, /* indicate #0 data block was written */ FI_FIRST_BLOCK_WRITTEN, /* indicate #0 data block was written */
FI_DROP_CACHE, /* drop dirty page cache */ FI_DROP_CACHE, /* drop dirty page cache */
FI_DATA_EXIST, /* indicate data exists */ FI_DATA_EXIST, /* indicate data exists */
@@ -751,7 +747,6 @@ enum {
FI_EXTRA_ATTR, /* indicate file has extra attribute */ FI_EXTRA_ATTR, /* indicate file has extra attribute */
FI_PROJ_INHERIT, /* indicate file inherits projectid */ FI_PROJ_INHERIT, /* indicate file inherits projectid */
FI_PIN_FILE, /* indicate file should not be gced */ FI_PIN_FILE, /* indicate file should not be gced */
FI_ATOMIC_REVOKE_REQUEST, /* request to drop atomic data */
FI_VERITY_IN_PROGRESS, /* building fs-verity Merkle tree */ FI_VERITY_IN_PROGRESS, /* building fs-verity Merkle tree */
FI_COMPRESSED_FILE, /* indicate file's data can be compressed */ FI_COMPRESSED_FILE, /* indicate file's data can be compressed */
FI_COMPRESS_CORRUPT, /* indicate compressed cluster is corrupted */ FI_COMPRESS_CORRUPT, /* indicate compressed cluster is corrupted */
@@ -793,11 +788,9 @@ struct f2fs_inode_info {
#endif #endif
struct list_head dirty_list; /* dirty list for dirs and files */ struct list_head dirty_list; /* dirty list for dirs and files */
struct list_head gdirty_list; /* linked in global dirty list */ struct list_head gdirty_list; /* linked in global dirty list */
struct list_head inmem_ilist; /* list for inmem inodes */ struct task_struct *atomic_write_task; /* store atomic write task */
struct list_head inmem_pages; /* inmemory pages managed by f2fs */
struct task_struct *inmem_task; /* store inmemory task */
struct mutex inmem_lock; /* lock for inmemory pages */
struct extent_tree *extent_tree; /* cached extent_tree entry */ struct extent_tree *extent_tree; /* cached extent_tree entry */
struct inode *cow_inode; /* copy-on-write inode for atomic write */
/* avoid racing between foreground op and gc */ /* avoid racing between foreground op and gc */
struct f2fs_rwsem i_gc_rwsem[2]; struct f2fs_rwsem i_gc_rwsem[2];
@@ -1091,7 +1084,6 @@ enum count_type {
F2FS_DIRTY_QDATA, F2FS_DIRTY_QDATA,
F2FS_DIRTY_NODES, F2FS_DIRTY_NODES,
F2FS_DIRTY_META, F2FS_DIRTY_META,
F2FS_INMEM_PAGES,
F2FS_DIRTY_IMETA, F2FS_DIRTY_IMETA,
F2FS_WB_CP_DATA, F2FS_WB_CP_DATA,
F2FS_WB_DATA, F2FS_WB_DATA,
@@ -1116,16 +1108,12 @@ enum count_type {
*/ */
#define PAGE_TYPE_OF_BIO(type) ((type) > META ? META : (type)) #define PAGE_TYPE_OF_BIO(type) ((type) > META ? META : (type))
enum page_type { enum page_type {
DATA, DATA = 0,
NODE, NODE = 1, /* should not change this */
META, META,
NR_PAGE_TYPE, NR_PAGE_TYPE,
META_FLUSH, META_FLUSH,
INMEM, /* the below types are used by tracepoints only. */ IPU, /* the below types are used by tracepoints only. */
INMEM_DROP,
INMEM_INVALIDATE,
INMEM_REVOKE,
IPU,
OPU, OPU,
}; };
@@ -1275,6 +1263,15 @@ struct atgc_management {
unsigned long long age_threshold; /* age threshold */ unsigned long long age_threshold; /* age threshold */
}; };
struct f2fs_gc_control {
unsigned int victim_segno; /* target victim segment number */
int init_gc_type; /* FG_GC or BG_GC */
bool no_bg_gc; /* check the space and stop bg_gc */
bool should_migrate_blocks; /* should migrate blocks */
bool err_gc_skipped; /* return EAGAIN if GC skipped */
unsigned int nr_free_secs; /* # of free sections to do GC */
};
/* For s_flag in struct f2fs_sb_info */ /* For s_flag in struct f2fs_sb_info */
enum { enum {
SBI_IS_DIRTY, /* dirty flag for checkpoint */ SBI_IS_DIRTY, /* dirty flag for checkpoint */
@@ -1331,12 +1328,6 @@ enum {
FS_MODE_FRAGMENT_BLK, /* block fragmentation mode */ FS_MODE_FRAGMENT_BLK, /* block fragmentation mode */
}; };
enum {
WHINT_MODE_OFF, /* not pass down write hints */
WHINT_MODE_USER, /* try to pass down hints given by users */
WHINT_MODE_FS, /* pass down hints with F2FS policy */
};
enum { enum {
ALLOC_MODE_DEFAULT, /* stay default */ ALLOC_MODE_DEFAULT, /* stay default */
ALLOC_MODE_REUSE, /* reuse segments as much as possible */ ALLOC_MODE_REUSE, /* reuse segments as much as possible */
@@ -1619,8 +1610,8 @@ struct f2fs_sb_info {
/* keep migration IO order for LFS mode */ /* keep migration IO order for LFS mode */
struct f2fs_rwsem io_order_lock; struct f2fs_rwsem io_order_lock;
mempool_t *write_io_dummy; /* Dummy pages */ mempool_t *write_io_dummy; /* Dummy pages */
pgoff_t metapage_eio_ofs; /* EIO page offset */ pgoff_t page_eio_ofs[NR_PAGE_TYPE]; /* EIO page offset */
int metapage_eio_cnt; /* EIO count */ int page_eio_cnt[NR_PAGE_TYPE]; /* EIO count */
/* for checkpoint */ /* for checkpoint */
struct f2fs_checkpoint *ckpt; /* raw checkpoint pointer */ struct f2fs_checkpoint *ckpt; /* raw checkpoint pointer */
@@ -1723,7 +1714,6 @@ struct f2fs_sb_info {
/* for skip statistic */ /* for skip statistic */
unsigned int atomic_files; /* # of opened atomic file */ unsigned int atomic_files; /* # of opened atomic file */
unsigned long long skipped_atomic_files[2]; /* FG_GC and BG_GC */
unsigned long long skipped_gc_rwsem; /* FG_GC only */ unsigned long long skipped_gc_rwsem; /* FG_GC only */
/* threshold for gc trials on pinned files */ /* threshold for gc trials on pinned files */
@@ -1754,9 +1744,7 @@ struct f2fs_sb_info {
atomic_t inline_dir; /* # of inline_dentry inodes */ atomic_t inline_dir; /* # of inline_dentry inodes */
atomic_t compr_inode; /* # of compressed inodes */ atomic_t compr_inode; /* # of compressed inodes */
atomic64_t compr_blocks; /* # of compressed blocks */ atomic64_t compr_blocks; /* # of compressed blocks */
atomic_t vw_cnt; /* # of volatile writes */
atomic_t max_aw_cnt; /* max # of atomic writes */ atomic_t max_aw_cnt; /* max # of atomic writes */
atomic_t max_vw_cnt; /* max # of volatile writes */
unsigned int io_skip_bggc; /* skip background gc for in-flight IO */ unsigned int io_skip_bggc; /* skip background gc for in-flight IO */
unsigned int other_skip_bggc; /* skip background gc for other reasons */ unsigned int other_skip_bggc; /* skip background gc for other reasons */
unsigned int ndirty_inode[NR_INODE_TYPE]; /* # of dirty inodes */ unsigned int ndirty_inode[NR_INODE_TYPE]; /* # of dirty inodes */
@@ -1767,7 +1755,7 @@ struct f2fs_sb_info {
unsigned int data_io_flag; unsigned int data_io_flag;
unsigned int node_io_flag; unsigned int node_io_flag;
/* For sysfs suppport */ /* For sysfs support */
struct kobject s_kobj; /* /sys/fs/f2fs/<devname> */ struct kobject s_kobj; /* /sys/fs/f2fs/<devname> */
struct completion s_kobj_unregister; struct completion s_kobj_unregister;
@@ -2610,11 +2598,17 @@ static inline void dec_valid_node_count(struct f2fs_sb_info *sbi,
{ {
spin_lock(&sbi->stat_lock); spin_lock(&sbi->stat_lock);
f2fs_bug_on(sbi, !sbi->total_valid_block_count); if (unlikely(!sbi->total_valid_block_count ||
f2fs_bug_on(sbi, !sbi->total_valid_node_count); !sbi->total_valid_node_count)) {
f2fs_warn(sbi, "dec_valid_node_count: inconsistent block counts, total_valid_block:%u, total_valid_node:%u",
sbi->total_valid_node_count--; sbi->total_valid_block_count,
sbi->total_valid_node_count);
set_sbi_flag(sbi, SBI_NEED_FSCK);
} else {
sbi->total_valid_block_count--; sbi->total_valid_block_count--;
sbi->total_valid_node_count--;
}
if (sbi->reserved_blocks && if (sbi->reserved_blocks &&
sbi->current_reserved_blocks < sbi->reserved_blocks) sbi->current_reserved_blocks < sbi->reserved_blocks)
sbi->current_reserved_blocks++; sbi->current_reserved_blocks++;
@@ -3171,6 +3165,10 @@ static inline int inline_xattr_size(struct inode *inode)
return 0; return 0;
} }
/*
* Notice: check inline_data flag without inode page lock is unsafe.
* It could change at any time by f2fs_convert_inline_page().
*/
static inline int f2fs_has_inline_data(struct inode *inode) static inline int f2fs_has_inline_data(struct inode *inode)
{ {
return is_inode_flag_set(inode, FI_INLINE_DATA); return is_inode_flag_set(inode, FI_INLINE_DATA);
@@ -3201,16 +3199,6 @@ static inline bool f2fs_is_atomic_file(struct inode *inode)
return is_inode_flag_set(inode, FI_ATOMIC_FILE); return is_inode_flag_set(inode, FI_ATOMIC_FILE);
} }
static inline bool f2fs_is_commit_atomic_write(struct inode *inode)
{
return is_inode_flag_set(inode, FI_ATOMIC_COMMIT);
}
static inline bool f2fs_is_volatile_file(struct inode *inode)
{
return is_inode_flag_set(inode, FI_VOLATILE_FILE);
}
static inline bool f2fs_is_first_block_written(struct inode *inode) static inline bool f2fs_is_first_block_written(struct inode *inode)
{ {
return is_inode_flag_set(inode, FI_FIRST_BLOCK_WRITTEN); return is_inode_flag_set(inode, FI_FIRST_BLOCK_WRITTEN);
@@ -3443,6 +3431,8 @@ void f2fs_handle_failed_inode(struct inode *inode);
int f2fs_update_extension_list(struct f2fs_sb_info *sbi, const char *name, int f2fs_update_extension_list(struct f2fs_sb_info *sbi, const char *name,
bool hot, bool set); bool hot, bool set);
struct dentry *f2fs_get_parent(struct dentry *child); struct dentry *f2fs_get_parent(struct dentry *child);
int f2fs_get_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
struct inode **new_inode);
/* /*
* dir.c * dir.c
@@ -3578,11 +3568,8 @@ void f2fs_destroy_node_manager_caches(void);
* segment.c * segment.c
*/ */
bool f2fs_need_SSR(struct f2fs_sb_info *sbi); bool f2fs_need_SSR(struct f2fs_sb_info *sbi);
void f2fs_register_inmem_page(struct inode *inode, struct page *page); int f2fs_commit_atomic_write(struct inode *inode);
void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi, bool gc_failure); void f2fs_abort_atomic_write(struct inode *inode, bool clean);
void f2fs_drop_inmem_pages(struct inode *inode);
void f2fs_drop_inmem_page(struct inode *inode, struct page *page);
int f2fs_commit_inmem_pages(struct inode *inode);
void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need); void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need);
void f2fs_balance_fs_bg(struct f2fs_sb_info *sbi, bool from_bg); void f2fs_balance_fs_bg(struct f2fs_sb_info *sbi, bool from_bg);
int f2fs_issue_flush(struct f2fs_sb_info *sbi, nid_t ino); int f2fs_issue_flush(struct f2fs_sb_info *sbi, nid_t ino);
@@ -3655,8 +3642,6 @@ void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
int __init f2fs_create_segment_manager_caches(void); int __init f2fs_create_segment_manager_caches(void);
void f2fs_destroy_segment_manager_caches(void); void f2fs_destroy_segment_manager_caches(void);
int f2fs_rw_hint_to_seg_type(enum rw_hint hint); int f2fs_rw_hint_to_seg_type(enum rw_hint hint);
enum rw_hint f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi,
enum page_type type, enum temp_type temp);
unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi, unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
unsigned int segno); unsigned int segno);
unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi, unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi,
@@ -3726,6 +3711,7 @@ int f2fs_init_bio_entry_cache(void);
void f2fs_destroy_bio_entry_cache(void); void f2fs_destroy_bio_entry_cache(void);
void f2fs_submit_bio(struct f2fs_sb_info *sbi, void f2fs_submit_bio(struct f2fs_sb_info *sbi,
struct bio *bio, enum page_type type); struct bio *bio, enum page_type type);
int f2fs_init_write_merge_io(struct f2fs_sb_info *sbi);
void f2fs_submit_merged_write(struct f2fs_sb_info *sbi, enum page_type type); void f2fs_submit_merged_write(struct f2fs_sb_info *sbi, enum page_type type);
void f2fs_submit_merged_write_cond(struct f2fs_sb_info *sbi, void f2fs_submit_merged_write_cond(struct f2fs_sb_info *sbi,
struct inode *inode, struct page *page, struct inode *inode, struct page *page,
@@ -3737,7 +3723,7 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio);
int f2fs_merge_page_bio(struct f2fs_io_info *fio); int f2fs_merge_page_bio(struct f2fs_io_info *fio);
void f2fs_submit_page_write(struct f2fs_io_info *fio); void f2fs_submit_page_write(struct f2fs_io_info *fio);
struct block_device *f2fs_target_device(struct f2fs_sb_info *sbi, struct block_device *f2fs_target_device(struct f2fs_sb_info *sbi,
block_t blk_addr, struct bio *bio); block_t blk_addr, sector_t *sector);
int f2fs_target_device_index(struct f2fs_sb_info *sbi, block_t blkaddr); int f2fs_target_device_index(struct f2fs_sb_info *sbi, block_t blkaddr);
void f2fs_set_data_blkaddr(struct dnode_of_data *dn); void f2fs_set_data_blkaddr(struct dnode_of_data *dn);
void f2fs_update_data_blkaddr(struct dnode_of_data *dn, block_t blkaddr); void f2fs_update_data_blkaddr(struct dnode_of_data *dn, block_t blkaddr);
@@ -3788,8 +3774,7 @@ extern const struct iomap_ops f2fs_iomap_ops;
int f2fs_start_gc_thread(struct f2fs_sb_info *sbi); int f2fs_start_gc_thread(struct f2fs_sb_info *sbi);
void f2fs_stop_gc_thread(struct f2fs_sb_info *sbi); void f2fs_stop_gc_thread(struct f2fs_sb_info *sbi);
block_t f2fs_start_bidx_of_node(unsigned int node_ofs, struct inode *inode); block_t f2fs_start_bidx_of_node(unsigned int node_ofs, struct inode *inode);
int f2fs_gc(struct f2fs_sb_info *sbi, bool sync, bool background, bool force, int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control);
unsigned int segno);
void f2fs_build_gc_manager(struct f2fs_sb_info *sbi); void f2fs_build_gc_manager(struct f2fs_sb_info *sbi);
int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count); int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count);
int __init f2fs_create_garbage_collection_cache(void); int __init f2fs_create_garbage_collection_cache(void);
@@ -3817,7 +3802,6 @@ struct f2fs_stat_info {
int ext_tree, zombie_tree, ext_node; int ext_tree, zombie_tree, ext_node;
int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta; int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta;
int ndirty_data, ndirty_qdata; int ndirty_data, ndirty_qdata;
int inmem_pages;
unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all; unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all;
int nats, dirty_nats, sits, dirty_sits; int nats, dirty_nats, sits, dirty_sits;
int free_nids, avail_nids, alloc_nids; int free_nids, avail_nids, alloc_nids;
@@ -3835,7 +3819,7 @@ struct f2fs_stat_info {
int inline_xattr, inline_inode, inline_dir, append, update, orphans; int inline_xattr, inline_inode, inline_dir, append, update, orphans;
int compr_inode; int compr_inode;
unsigned long long compr_blocks; unsigned long long compr_blocks;
int aw_cnt, max_aw_cnt, vw_cnt, max_vw_cnt; int aw_cnt, max_aw_cnt;
unsigned int valid_count, valid_node_count, valid_inode_count, discard_blks; unsigned int valid_count, valid_node_count, valid_inode_count, discard_blks;
unsigned int bimodal, avg_vblocks; unsigned int bimodal, avg_vblocks;
int util_free, util_valid, util_invalid; int util_free, util_valid, util_invalid;
@@ -3847,7 +3831,6 @@ struct f2fs_stat_info {
int bg_node_segs, bg_data_segs; int bg_node_segs, bg_data_segs;
int tot_blks, data_blks, node_blks; int tot_blks, data_blks, node_blks;
int bg_data_blks, bg_node_blks; int bg_data_blks, bg_node_blks;
unsigned long long skipped_atomic_files[2];
int curseg[NR_CURSEG_TYPE]; int curseg[NR_CURSEG_TYPE];
int cursec[NR_CURSEG_TYPE]; int cursec[NR_CURSEG_TYPE];
int curzone[NR_CURSEG_TYPE]; int curzone[NR_CURSEG_TYPE];
@@ -3947,17 +3930,6 @@ static inline struct f2fs_stat_info *F2FS_STAT(struct f2fs_sb_info *sbi)
if (cur > max) \ if (cur > max) \
atomic_set(&F2FS_I_SB(inode)->max_aw_cnt, cur); \ atomic_set(&F2FS_I_SB(inode)->max_aw_cnt, cur); \
} while (0) } while (0)
#define stat_inc_volatile_write(inode) \
(atomic_inc(&F2FS_I_SB(inode)->vw_cnt))
#define stat_dec_volatile_write(inode) \
(atomic_dec(&F2FS_I_SB(inode)->vw_cnt))
#define stat_update_max_volatile_write(inode) \
do { \
int cur = atomic_read(&F2FS_I_SB(inode)->vw_cnt); \
int max = atomic_read(&F2FS_I_SB(inode)->max_vw_cnt); \
if (cur > max) \
atomic_set(&F2FS_I_SB(inode)->max_vw_cnt, cur); \
} while (0)
#define stat_inc_seg_count(sbi, type, gc_type) \ #define stat_inc_seg_count(sbi, type, gc_type) \
do { \ do { \
struct f2fs_stat_info *si = F2FS_STAT(sbi); \ struct f2fs_stat_info *si = F2FS_STAT(sbi); \
@@ -4019,9 +3991,6 @@ void f2fs_update_sit_info(struct f2fs_sb_info *sbi);
#define stat_add_compr_blocks(inode, blocks) do { } while (0) #define stat_add_compr_blocks(inode, blocks) do { } while (0)
#define stat_sub_compr_blocks(inode, blocks) do { } while (0) #define stat_sub_compr_blocks(inode, blocks) do { } while (0)
#define stat_update_max_atomic_write(inode) do { } while (0) #define stat_update_max_atomic_write(inode) do { } while (0)
#define stat_inc_volatile_write(inode) do { } while (0)
#define stat_dec_volatile_write(inode) do { } while (0)
#define stat_update_max_volatile_write(inode) do { } while (0)
#define stat_inc_meta_count(sbi, blkaddr) do { } while (0) #define stat_inc_meta_count(sbi, blkaddr) do { } while (0)
#define stat_inc_seg_type(sbi, curseg) do { } while (0) #define stat_inc_seg_type(sbi, curseg) do { } while (0)
#define stat_inc_block_count(sbi, curseg) do { } while (0) #define stat_inc_block_count(sbi, curseg) do { } while (0)
@@ -4054,6 +4023,7 @@ extern struct kmem_cache *f2fs_inode_entry_slab;
* inline.c * inline.c
*/ */
bool f2fs_may_inline_data(struct inode *inode); bool f2fs_may_inline_data(struct inode *inode);
bool f2fs_sanity_check_inline_data(struct inode *inode);
bool f2fs_may_inline_dentry(struct inode *inode); bool f2fs_may_inline_dentry(struct inode *inode);
void f2fs_do_read_inline_data(struct page *page, struct page *ipage); void f2fs_do_read_inline_data(struct page *page, struct page *ipage);
void f2fs_truncate_inline_inode(struct inode *inode, void f2fs_truncate_inline_inode(struct inode *inode,
@@ -4424,8 +4394,7 @@ static inline bool f2fs_lfs_mode(struct f2fs_sb_info *sbi)
static inline bool f2fs_may_compress(struct inode *inode) static inline bool f2fs_may_compress(struct inode *inode)
{ {
if (IS_SWAPFILE(inode) || f2fs_is_pinned_file(inode) || if (IS_SWAPFILE(inode) || f2fs_is_pinned_file(inode) ||
f2fs_is_atomic_file(inode) || f2fs_is_atomic_file(inode))
f2fs_is_volatile_file(inode))
return false; return false;
return S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode); return S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode);
} }
@@ -4433,8 +4402,8 @@ static inline bool f2fs_may_compress(struct inode *inode)
static inline void f2fs_i_compr_blocks_update(struct inode *inode, static inline void f2fs_i_compr_blocks_update(struct inode *inode,
u64 blocks, bool add) u64 blocks, bool add)
{ {
int diff = F2FS_I(inode)->i_cluster_size - blocks;
struct f2fs_inode_info *fi = F2FS_I(inode); struct f2fs_inode_info *fi = F2FS_I(inode);
int diff = fi->i_cluster_size - blocks;
/* don't update i_compr_blocks if saved blocks were released */ /* don't update i_compr_blocks if saved blocks were released */
if (!add && !atomic_read(&fi->i_compr_blocks)) if (!add && !atomic_read(&fi->i_compr_blocks))
@@ -4536,6 +4505,27 @@ static inline bool f2fs_block_unit_discard(struct f2fs_sb_info *sbi)
return F2FS_OPTION(sbi).discard_unit == DISCARD_UNIT_BLOCK; return F2FS_OPTION(sbi).discard_unit == DISCARD_UNIT_BLOCK;
} }
static inline void f2fs_io_schedule_timeout(long timeout)
{
set_current_state(TASK_UNINTERRUPTIBLE);
io_schedule_timeout(timeout);
}
static inline void f2fs_handle_page_eio(struct f2fs_sb_info *sbi, pgoff_t ofs,
enum page_type type)
{
if (unlikely(f2fs_cp_error(sbi)))
return;
if (ofs == sbi->page_eio_ofs[type]) {
if (sbi->page_eio_cnt[type]++ == MAX_RETRY_PAGE_EIO)
set_ckpt_flags(sbi, CP_ERROR_FLAG);
} else {
sbi->page_eio_ofs[type] = ofs;
sbi->page_eio_cnt[type] = 0;
}
}
#define EFSBADCRC EBADMSG /* Bad CRC detected */ #define EFSBADCRC EBADMSG /* Bad CRC detected */
#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */ #define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */

View File

@@ -374,7 +374,8 @@ sync_nodes:
f2fs_remove_ino_entry(sbi, ino, APPEND_INO); f2fs_remove_ino_entry(sbi, ino, APPEND_INO);
clear_inode_flag(inode, FI_APPEND_WRITE); clear_inode_flag(inode, FI_APPEND_WRITE);
flush_out: flush_out:
if (!atomic && F2FS_OPTION(sbi).fsync_mode != FSYNC_MODE_NOBARRIER) if ((!atomic && F2FS_OPTION(sbi).fsync_mode != FSYNC_MODE_NOBARRIER) ||
(atomic && !test_opt(sbi, NOBARRIER) && f2fs_sb_has_blkzoned(sbi)))
ret = f2fs_issue_flush(sbi, inode->i_ino); ret = f2fs_issue_flush(sbi, inode->i_ino);
if (!ret) { if (!ret) {
f2fs_remove_ino_entry(sbi, ino, UPDATE_INO); f2fs_remove_ino_entry(sbi, ino, UPDATE_INO);
@@ -1439,12 +1440,20 @@ static int f2fs_do_zero_range(struct dnode_of_data *dn, pgoff_t start,
ret = -ENOSPC; ret = -ENOSPC;
break; break;
} }
if (dn->data_blkaddr != NEW_ADDR) {
if (dn->data_blkaddr == NEW_ADDR)
continue;
if (!f2fs_is_valid_blkaddr(sbi, dn->data_blkaddr,
DATA_GENERIC_ENHANCE)) {
ret = -EFSCORRUPTED;
break;
}
f2fs_invalidate_blocks(sbi, dn->data_blkaddr); f2fs_invalidate_blocks(sbi, dn->data_blkaddr);
dn->data_blkaddr = NEW_ADDR; dn->data_blkaddr = NEW_ADDR;
f2fs_set_data_blkaddr(dn); f2fs_set_data_blkaddr(dn);
} }
}
f2fs_update_extent_cache_range(dn, start, 0, index - start); f2fs_update_extent_cache_range(dn, start, 0, index - start);
@@ -1640,6 +1649,11 @@ static int expand_inode_data(struct inode *inode, loff_t offset,
struct f2fs_map_blocks map = { .m_next_pgofs = NULL, struct f2fs_map_blocks map = { .m_next_pgofs = NULL,
.m_next_extent = NULL, .m_seg_type = NO_CHECK_TYPE, .m_next_extent = NULL, .m_seg_type = NO_CHECK_TYPE,
.m_may_create = true }; .m_may_create = true };
struct f2fs_gc_control gc_control = { .victim_segno = NULL_SEGNO,
.init_gc_type = FG_GC,
.should_migrate_blocks = false,
.err_gc_skipped = true,
.nr_free_secs = 0 };
pgoff_t pg_start, pg_end; pgoff_t pg_start, pg_end;
loff_t new_size = i_size_read(inode); loff_t new_size = i_size_read(inode);
loff_t off_end; loff_t off_end;
@@ -1677,8 +1691,8 @@ next_alloc:
if (has_not_enough_free_secs(sbi, 0, if (has_not_enough_free_secs(sbi, 0,
GET_SEC_FROM_SEG(sbi, overprovision_segments(sbi)))) { GET_SEC_FROM_SEG(sbi, overprovision_segments(sbi)))) {
f2fs_down_write(&sbi->gc_lock); f2fs_down_write(&sbi->gc_lock);
err = f2fs_gc(sbi, true, false, false, NULL_SEGNO); err = f2fs_gc(sbi, &gc_control);
if (err && err != -ENODATA && err != -EAGAIN) if (err && err != -ENODATA)
goto out_err; goto out_err;
} }
@@ -1768,6 +1782,10 @@ static long f2fs_fallocate(struct file *file, int mode,
inode_lock(inode); inode_lock(inode);
ret = file_modified(file);
if (ret)
goto out;
if (mode & FALLOC_FL_PUNCH_HOLE) { if (mode & FALLOC_FL_PUNCH_HOLE) {
if (offset >= inode->i_size) if (offset >= inode->i_size)
goto out; goto out;
@@ -1806,16 +1824,8 @@ static int f2fs_release_file(struct inode *inode, struct file *filp)
atomic_read(&inode->i_writecount) != 1) atomic_read(&inode->i_writecount) != 1)
return 0; return 0;
/* some remained atomic pages should discarded */
if (f2fs_is_atomic_file(inode)) if (f2fs_is_atomic_file(inode))
f2fs_drop_inmem_pages(inode); f2fs_abort_atomic_write(inode, true);
if (f2fs_is_volatile_file(inode)) {
set_inode_flag(inode, FI_DROP_CACHE);
filemap_fdatawrite(inode->i_mapping);
clear_inode_flag(inode, FI_DROP_CACHE);
clear_inode_flag(inode, FI_VOLATILE_FILE);
stat_dec_volatile_write(inode);
}
return 0; return 0;
} }
@@ -1830,8 +1840,8 @@ static int f2fs_file_flush(struct file *file, fl_owner_t id)
* before dropping file lock, it needs to do in ->flush. * before dropping file lock, it needs to do in ->flush.
*/ */
if (f2fs_is_atomic_file(inode) && if (f2fs_is_atomic_file(inode) &&
F2FS_I(inode)->inmem_task == current) F2FS_I(inode)->atomic_write_task == current)
f2fs_drop_inmem_pages(inode); f2fs_abort_atomic_write(inode, true);
return 0; return 0;
} }
@@ -1994,6 +2004,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)
struct user_namespace *mnt_userns = file_mnt_user_ns(filp); struct user_namespace *mnt_userns = file_mnt_user_ns(filp);
struct f2fs_inode_info *fi = F2FS_I(inode); struct f2fs_inode_info *fi = F2FS_I(inode);
struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct inode *pinode;
int ret; int ret;
if (!inode_owner_or_capable(mnt_userns, inode)) if (!inode_owner_or_capable(mnt_userns, inode))
@@ -2016,44 +2027,55 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)
goto out; goto out;
} }
if (f2fs_is_atomic_file(inode)) { if (f2fs_is_atomic_file(inode))
if (is_inode_flag_set(inode, FI_ATOMIC_REVOKE_REQUEST))
ret = -EINVAL;
goto out; goto out;
}
ret = f2fs_convert_inline_inode(inode); ret = f2fs_convert_inline_inode(inode);
if (ret) if (ret)
goto out; goto out;
f2fs_down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); f2fs_down_write(&fi->i_gc_rwsem[WRITE]);
/* /*
* Should wait end_io to count F2FS_WB_CP_DATA correctly by * Should wait end_io to count F2FS_WB_CP_DATA correctly by
* f2fs_is_atomic_file. * f2fs_is_atomic_file.
*/ */
if (get_dirty_pages(inode)) if (get_dirty_pages(inode))
f2fs_warn(F2FS_I_SB(inode), "Unexpected flush for atomic writes: ino=%lu, npages=%u", f2fs_warn(sbi, "Unexpected flush for atomic writes: ino=%lu, npages=%u",
inode->i_ino, get_dirty_pages(inode)); inode->i_ino, get_dirty_pages(inode));
ret = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX); ret = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX);
if (ret) { if (ret) {
f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); f2fs_up_write(&fi->i_gc_rwsem[WRITE]);
goto out; goto out;
} }
/* Create a COW inode for atomic write */
pinode = f2fs_iget(inode->i_sb, fi->i_pino);
if (IS_ERR(pinode)) {
f2fs_up_write(&fi->i_gc_rwsem[WRITE]);
ret = PTR_ERR(pinode);
goto out;
}
ret = f2fs_get_tmpfile(mnt_userns, pinode, &fi->cow_inode);
iput(pinode);
if (ret) {
f2fs_up_write(&fi->i_gc_rwsem[WRITE]);
goto out;
}
f2fs_i_size_write(fi->cow_inode, i_size_read(inode));
spin_lock(&sbi->inode_lock[ATOMIC_FILE]); spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
if (list_empty(&fi->inmem_ilist))
list_add_tail(&fi->inmem_ilist, &sbi->inode_list[ATOMIC_FILE]);
sbi->atomic_files++; sbi->atomic_files++;
spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
/* add inode in inmem_list first and set atomic_file */
set_inode_flag(inode, FI_ATOMIC_FILE); set_inode_flag(inode, FI_ATOMIC_FILE);
clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST); set_inode_flag(fi->cow_inode, FI_ATOMIC_FILE);
f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); clear_inode_flag(fi->cow_inode, FI_INLINE_DATA);
f2fs_up_write(&fi->i_gc_rwsem[WRITE]);
f2fs_update_time(F2FS_I_SB(inode), REQ_TIME); f2fs_update_time(sbi, REQ_TIME);
F2FS_I(inode)->inmem_task = current; fi->atomic_write_task = current;
stat_update_max_atomic_write(inode); stat_update_max_atomic_write(inode);
out: out:
inode_unlock(inode); inode_unlock(inode);
@@ -2078,130 +2100,23 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp)
inode_lock(inode); inode_lock(inode);
if (f2fs_is_volatile_file(inode)) {
ret = -EINVAL;
goto err_out;
}
if (f2fs_is_atomic_file(inode)) { if (f2fs_is_atomic_file(inode)) {
ret = f2fs_commit_inmem_pages(inode); ret = f2fs_commit_atomic_write(inode);
if (ret) if (ret)
goto err_out; goto unlock_out;
ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 0, true); ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 0, true);
if (!ret) if (!ret)
f2fs_drop_inmem_pages(inode); f2fs_abort_atomic_write(inode, false);
} else { } else {
ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 1, false); ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 1, false);
} }
err_out: unlock_out:
if (is_inode_flag_set(inode, FI_ATOMIC_REVOKE_REQUEST)) {
clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
ret = -EINVAL;
}
inode_unlock(inode); inode_unlock(inode);
mnt_drop_write_file(filp); mnt_drop_write_file(filp);
return ret; return ret;
} }
static int f2fs_ioc_start_volatile_write(struct file *filp)
{
struct inode *inode = file_inode(filp);
struct user_namespace *mnt_userns = file_mnt_user_ns(filp);
int ret;
if (!inode_owner_or_capable(mnt_userns, inode))
return -EACCES;
if (!S_ISREG(inode->i_mode))
return -EINVAL;
ret = mnt_want_write_file(filp);
if (ret)
return ret;
inode_lock(inode);
if (f2fs_is_volatile_file(inode))
goto out;
ret = f2fs_convert_inline_inode(inode);
if (ret)
goto out;
stat_inc_volatile_write(inode);
stat_update_max_volatile_write(inode);
set_inode_flag(inode, FI_VOLATILE_FILE);
f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
out:
inode_unlock(inode);
mnt_drop_write_file(filp);
return ret;
}
static int f2fs_ioc_release_volatile_write(struct file *filp)
{
struct inode *inode = file_inode(filp);
struct user_namespace *mnt_userns = file_mnt_user_ns(filp);
int ret;
if (!inode_owner_or_capable(mnt_userns, inode))
return -EACCES;
ret = mnt_want_write_file(filp);
if (ret)
return ret;
inode_lock(inode);
if (!f2fs_is_volatile_file(inode))
goto out;
if (!f2fs_is_first_block_written(inode)) {
ret = truncate_partial_data_page(inode, 0, true);
goto out;
}
ret = punch_hole(inode, 0, F2FS_BLKSIZE);
out:
inode_unlock(inode);
mnt_drop_write_file(filp);
return ret;
}
static int f2fs_ioc_abort_volatile_write(struct file *filp)
{
struct inode *inode = file_inode(filp);
struct user_namespace *mnt_userns = file_mnt_user_ns(filp);
int ret;
if (!inode_owner_or_capable(mnt_userns, inode))
return -EACCES;
ret = mnt_want_write_file(filp);
if (ret)
return ret;
inode_lock(inode);
if (f2fs_is_atomic_file(inode))
f2fs_drop_inmem_pages(inode);
if (f2fs_is_volatile_file(inode)) {
clear_inode_flag(inode, FI_VOLATILE_FILE);
stat_dec_volatile_write(inode);
ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 0, true);
}
clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
inode_unlock(inode);
mnt_drop_write_file(filp);
f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
return ret;
}
static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg) static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg)
{ {
struct inode *inode = file_inode(filp); struct inode *inode = file_inode(filp);
@@ -2440,6 +2355,10 @@ static int f2fs_ioc_gc(struct file *filp, unsigned long arg)
{ {
struct inode *inode = file_inode(filp); struct inode *inode = file_inode(filp);
struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct f2fs_gc_control gc_control = { .victim_segno = NULL_SEGNO,
.no_bg_gc = false,
.should_migrate_blocks = false,
.nr_free_secs = 0 };
__u32 sync; __u32 sync;
int ret; int ret;
@@ -2465,7 +2384,9 @@ static int f2fs_ioc_gc(struct file *filp, unsigned long arg)
f2fs_down_write(&sbi->gc_lock); f2fs_down_write(&sbi->gc_lock);
} }
ret = f2fs_gc(sbi, sync, true, false, NULL_SEGNO); gc_control.init_gc_type = sync ? FG_GC : BG_GC;
gc_control.err_gc_skipped = sync;
ret = f2fs_gc(sbi, &gc_control);
out: out:
mnt_drop_write_file(filp); mnt_drop_write_file(filp);
return ret; return ret;
@@ -2474,6 +2395,12 @@ out:
static int __f2fs_ioc_gc_range(struct file *filp, struct f2fs_gc_range *range) static int __f2fs_ioc_gc_range(struct file *filp, struct f2fs_gc_range *range)
{ {
struct f2fs_sb_info *sbi = F2FS_I_SB(file_inode(filp)); struct f2fs_sb_info *sbi = F2FS_I_SB(file_inode(filp));
struct f2fs_gc_control gc_control = {
.init_gc_type = range->sync ? FG_GC : BG_GC,
.no_bg_gc = false,
.should_migrate_blocks = false,
.err_gc_skipped = range->sync,
.nr_free_secs = 0 };
u64 end; u64 end;
int ret; int ret;
@@ -2501,8 +2428,8 @@ do_more:
f2fs_down_write(&sbi->gc_lock); f2fs_down_write(&sbi->gc_lock);
} }
ret = f2fs_gc(sbi, range->sync, true, false, gc_control.victim_segno = GET_SEGNO(sbi, range->start);
GET_SEGNO(sbi, range->start)); ret = f2fs_gc(sbi, &gc_control);
if (ret) { if (ret) {
if (ret == -EBUSY) if (ret == -EBUSY)
ret = -EAGAIN; ret = -EAGAIN;
@@ -2677,6 +2604,7 @@ do_map:
} }
set_page_dirty(page); set_page_dirty(page);
set_page_private_gcing(page);
f2fs_put_page(page, 1); f2fs_put_page(page, 1);
idx++; idx++;
@@ -2916,6 +2844,11 @@ static int f2fs_ioc_flush_device(struct file *filp, unsigned long arg)
unsigned int start_segno = 0, end_segno = 0; unsigned int start_segno = 0, end_segno = 0;
unsigned int dev_start_segno = 0, dev_end_segno = 0; unsigned int dev_start_segno = 0, dev_end_segno = 0;
struct f2fs_flush_device range; struct f2fs_flush_device range;
struct f2fs_gc_control gc_control = {
.init_gc_type = FG_GC,
.should_migrate_blocks = true,
.err_gc_skipped = true,
.nr_free_secs = 0 };
int ret; int ret;
if (!capable(CAP_SYS_ADMIN)) if (!capable(CAP_SYS_ADMIN))
@@ -2959,7 +2892,9 @@ static int f2fs_ioc_flush_device(struct file *filp, unsigned long arg)
sm->last_victim[GC_CB] = end_segno + 1; sm->last_victim[GC_CB] = end_segno + 1;
sm->last_victim[GC_GREEDY] = end_segno + 1; sm->last_victim[GC_GREEDY] = end_segno + 1;
sm->last_victim[ALLOC_NEXT] = end_segno + 1; sm->last_victim[ALLOC_NEXT] = end_segno + 1;
ret = f2fs_gc(sbi, true, true, true, start_segno);
gc_control.victim_segno = start_segno;
ret = f2fs_gc(sbi, &gc_control);
if (ret == -EAGAIN) if (ret == -EAGAIN)
ret = 0; ret = 0;
else if (ret < 0) else if (ret < 0)
@@ -3020,7 +2955,7 @@ static int f2fs_ioc_setproject(struct inode *inode, __u32 projid)
kprojid = make_kprojid(&init_user_ns, (projid_t)projid); kprojid = make_kprojid(&init_user_ns, (projid_t)projid);
if (projid_eq(kprojid, F2FS_I(inode)->i_projid)) if (projid_eq(kprojid, fi->i_projid))
return 0; return 0;
err = -EPERM; err = -EPERM;
@@ -3040,7 +2975,7 @@ static int f2fs_ioc_setproject(struct inode *inode, __u32 projid)
if (err) if (err)
goto out_unlock; goto out_unlock;
F2FS_I(inode)->i_projid = kprojid; fi->i_projid = kprojid;
inode->i_ctime = current_time(inode); inode->i_ctime = current_time(inode);
f2fs_mark_inode_dirty_sync(inode, true); f2fs_mark_inode_dirty_sync(inode, true);
out_unlock: out_unlock:
@@ -3990,7 +3925,7 @@ static int f2fs_ioc_decompress_file(struct file *filp, unsigned long arg)
struct f2fs_inode_info *fi = F2FS_I(inode); struct f2fs_inode_info *fi = F2FS_I(inode);
pgoff_t page_idx = 0, last_idx; pgoff_t page_idx = 0, last_idx;
unsigned int blk_per_seg = sbi->blocks_per_seg; unsigned int blk_per_seg = sbi->blocks_per_seg;
int cluster_size = F2FS_I(inode)->i_cluster_size; int cluster_size = fi->i_cluster_size;
int count, ret; int count, ret;
if (!f2fs_sb_has_compression(sbi) || if (!f2fs_sb_has_compression(sbi) ||
@@ -4013,11 +3948,6 @@ static int f2fs_ioc_decompress_file(struct file *filp, unsigned long arg)
goto out; goto out;
} }
if (f2fs_is_mmap_file(inode)) {
ret = -EBUSY;
goto out;
}
ret = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX); ret = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX);
if (ret) if (ret)
goto out; goto out;
@@ -4085,11 +4015,6 @@ static int f2fs_ioc_compress_file(struct file *filp, unsigned long arg)
goto out; goto out;
} }
if (f2fs_is_mmap_file(inode)) {
ret = -EBUSY;
goto out;
}
ret = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX); ret = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX);
if (ret) if (ret)
goto out; goto out;
@@ -4139,11 +4064,9 @@ static long __f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
case F2FS_IOC_COMMIT_ATOMIC_WRITE: case F2FS_IOC_COMMIT_ATOMIC_WRITE:
return f2fs_ioc_commit_atomic_write(filp); return f2fs_ioc_commit_atomic_write(filp);
case F2FS_IOC_START_VOLATILE_WRITE: case F2FS_IOC_START_VOLATILE_WRITE:
return f2fs_ioc_start_volatile_write(filp);
case F2FS_IOC_RELEASE_VOLATILE_WRITE: case F2FS_IOC_RELEASE_VOLATILE_WRITE:
return f2fs_ioc_release_volatile_write(filp);
case F2FS_IOC_ABORT_VOLATILE_WRITE: case F2FS_IOC_ABORT_VOLATILE_WRITE:
return f2fs_ioc_abort_volatile_write(filp); return -EOPNOTSUPP;
case F2FS_IOC_SHUTDOWN: case F2FS_IOC_SHUTDOWN:
return f2fs_ioc_shutdown(filp, arg); return f2fs_ioc_shutdown(filp, arg);
case FITRIM: case FITRIM:
@@ -4343,17 +4266,39 @@ out:
static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *to) static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
{ {
struct inode *inode = file_inode(iocb->ki_filp); struct inode *inode = file_inode(iocb->ki_filp);
const loff_t pos = iocb->ki_pos;
ssize_t ret; ssize_t ret;
if (!f2fs_is_compress_backend_ready(inode)) if (!f2fs_is_compress_backend_ready(inode))
return -EOPNOTSUPP; return -EOPNOTSUPP;
if (f2fs_should_use_dio(inode, iocb, to)) if (trace_f2fs_dataread_start_enabled()) {
return f2fs_dio_read_iter(iocb, to); char *p = f2fs_kmalloc(F2FS_I_SB(inode), PATH_MAX, GFP_KERNEL);
char *path;
if (!p)
goto skip_read_trace;
path = dentry_path_raw(file_dentry(iocb->ki_filp), p, PATH_MAX);
if (IS_ERR(path)) {
kfree(p);
goto skip_read_trace;
}
trace_f2fs_dataread_start(inode, pos, iov_iter_count(to),
current->pid, path, current->comm);
kfree(p);
}
skip_read_trace:
if (f2fs_should_use_dio(inode, iocb, to)) {
ret = f2fs_dio_read_iter(iocb, to);
} else {
ret = filemap_read(iocb, to, 0); ret = filemap_read(iocb, to, 0);
if (ret > 0) if (ret > 0)
f2fs_update_iostat(F2FS_I_SB(inode), APP_BUFFERED_READ_IO, ret); f2fs_update_iostat(F2FS_I_SB(inode), APP_BUFFERED_READ_IO, ret);
}
if (trace_f2fs_dataread_end_enabled())
trace_f2fs_dataread_end(inode, pos, ret);
return ret; return ret;
} }
@@ -4496,10 +4441,8 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
struct f2fs_inode_info *fi = F2FS_I(inode); struct f2fs_inode_info *fi = F2FS_I(inode);
struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
const bool do_opu = f2fs_lfs_mode(sbi); const bool do_opu = f2fs_lfs_mode(sbi);
const int whint_mode = F2FS_OPTION(sbi).whint_mode;
const loff_t pos = iocb->ki_pos; const loff_t pos = iocb->ki_pos;
const ssize_t count = iov_iter_count(from); const ssize_t count = iov_iter_count(from);
const enum rw_hint hint = iocb->ki_hint;
unsigned int dio_flags; unsigned int dio_flags;
struct iomap_dio *dio; struct iomap_dio *dio;
ssize_t ret; ssize_t ret;
@@ -4542,9 +4485,6 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
if (do_opu) if (do_opu)
f2fs_down_read(&fi->i_gc_rwsem[READ]); f2fs_down_read(&fi->i_gc_rwsem[READ]);
} }
if (whint_mode == WHINT_MODE_OFF)
iocb->ki_hint = WRITE_LIFE_NOT_SET;
/* /*
* We have to use __iomap_dio_rw() and iomap_dio_complete() instead of * We have to use __iomap_dio_rw() and iomap_dio_complete() instead of
* the higher-level function iomap_dio_rw() in order to ensure that the * the higher-level function iomap_dio_rw() in order to ensure that the
@@ -4566,8 +4506,6 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
ret = iomap_dio_complete(dio); ret = iomap_dio_complete(dio);
} }
if (whint_mode == WHINT_MODE_OFF)
iocb->ki_hint = hint;
if (do_opu) if (do_opu)
f2fs_up_read(&fi->i_gc_rwsem[READ]); f2fs_up_read(&fi->i_gc_rwsem[READ]);
f2fs_up_read(&fi->i_gc_rwsem[WRITE]); f2fs_up_read(&fi->i_gc_rwsem[WRITE]);
@@ -4663,14 +4601,36 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
/* Possibly preallocate the blocks for the write. */ /* Possibly preallocate the blocks for the write. */
target_size = iocb->ki_pos + iov_iter_count(from); target_size = iocb->ki_pos + iov_iter_count(from);
preallocated = f2fs_preallocate_blocks(iocb, from, dio); preallocated = f2fs_preallocate_blocks(iocb, from, dio);
if (preallocated < 0) if (preallocated < 0) {
ret = preallocated; ret = preallocated;
else } else {
if (trace_f2fs_datawrite_start_enabled()) {
char *p = f2fs_kmalloc(F2FS_I_SB(inode),
PATH_MAX, GFP_KERNEL);
char *path;
if (!p)
goto skip_write_trace;
path = dentry_path_raw(file_dentry(iocb->ki_filp),
p, PATH_MAX);
if (IS_ERR(path)) {
kfree(p);
goto skip_write_trace;
}
trace_f2fs_datawrite_start(inode, orig_pos, orig_count,
current->pid, path, current->comm);
kfree(p);
}
skip_write_trace:
/* Do the actual write. */ /* Do the actual write. */
ret = dio ? ret = dio ?
f2fs_dio_write_iter(iocb, from, &may_need_sync): f2fs_dio_write_iter(iocb, from, &may_need_sync):
f2fs_buffered_write_iter(iocb, from); f2fs_buffered_write_iter(iocb, from);
if (trace_f2fs_datawrite_end_enabled())
trace_f2fs_datawrite_end(inode, orig_pos, ret);
}
/* Don't leave any preallocated blocks around past i_size. */ /* Don't leave any preallocated blocks around past i_size. */
if (preallocated && i_size_read(inode) < target_size) { if (preallocated && i_size_read(inode) < target_size) {
f2fs_down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); f2fs_down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);

View File

@@ -35,6 +35,10 @@ static int gc_thread_func(void *data)
wait_queue_head_t *wq = &sbi->gc_thread->gc_wait_queue_head; wait_queue_head_t *wq = &sbi->gc_thread->gc_wait_queue_head;
wait_queue_head_t *fggc_wq = &sbi->gc_thread->fggc_wq; wait_queue_head_t *fggc_wq = &sbi->gc_thread->fggc_wq;
unsigned int wait_ms; unsigned int wait_ms;
struct f2fs_gc_control gc_control = {
.victim_segno = NULL_SEGNO,
.should_migrate_blocks = false,
.err_gc_skipped = false };
wait_ms = gc_th->min_sleep_time; wait_ms = gc_th->min_sleep_time;
@@ -141,8 +145,12 @@ do_gc:
if (foreground) if (foreground)
sync_mode = false; sync_mode = false;
gc_control.init_gc_type = sync_mode ? FG_GC : BG_GC;
gc_control.no_bg_gc = foreground;
gc_control.nr_free_secs = foreground ? 1 : 0;
/* if return value is not zero, no victim was selected */ /* if return value is not zero, no victim was selected */
if (f2fs_gc(sbi, sync_mode, !foreground, false, NULL_SEGNO)) if (f2fs_gc(sbi, &gc_control))
wait_ms = gc_th->no_gc_sleep_time; wait_ms = gc_th->no_gc_sleep_time;
if (foreground) if (foreground)
@@ -646,6 +654,54 @@ static void release_victim_entry(struct f2fs_sb_info *sbi)
f2fs_bug_on(sbi, !list_empty(&am->victim_list)); f2fs_bug_on(sbi, !list_empty(&am->victim_list));
} }
static bool f2fs_pin_section(struct f2fs_sb_info *sbi, unsigned int segno)
{
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
if (!dirty_i->enable_pin_section)
return false;
if (!test_and_set_bit(secno, dirty_i->pinned_secmap))
dirty_i->pinned_secmap_cnt++;
return true;
}
static bool f2fs_pinned_section_exists(struct dirty_seglist_info *dirty_i)
{
return dirty_i->pinned_secmap_cnt;
}
static bool f2fs_section_is_pinned(struct dirty_seglist_info *dirty_i,
unsigned int secno)
{
return dirty_i->enable_pin_section &&
f2fs_pinned_section_exists(dirty_i) &&
test_bit(secno, dirty_i->pinned_secmap);
}
static void f2fs_unpin_all_sections(struct f2fs_sb_info *sbi, bool enable)
{
unsigned int bitmap_size = f2fs_bitmap_size(MAIN_SECS(sbi));
if (f2fs_pinned_section_exists(DIRTY_I(sbi))) {
memset(DIRTY_I(sbi)->pinned_secmap, 0, bitmap_size);
DIRTY_I(sbi)->pinned_secmap_cnt = 0;
}
DIRTY_I(sbi)->enable_pin_section = enable;
}
static int f2fs_gc_pinned_control(struct inode *inode, int gc_type,
unsigned int segno)
{
if (!f2fs_is_pinned_file(inode))
return 0;
if (gc_type != FG_GC)
return -EBUSY;
if (!f2fs_pin_section(F2FS_I_SB(inode), segno))
f2fs_pin_file_control(inode, true);
return -EAGAIN;
}
/* /*
* This function is called from two paths. * This function is called from two paths.
* One is garbage collection and the other is SSR segment selection. * One is garbage collection and the other is SSR segment selection.
@@ -787,6 +843,9 @@ retry:
if (gc_type == BG_GC && test_bit(secno, dirty_i->victim_secmap)) if (gc_type == BG_GC && test_bit(secno, dirty_i->victim_secmap))
goto next; goto next;
if (gc_type == FG_GC && f2fs_section_is_pinned(dirty_i, secno))
goto next;
if (is_atgc) { if (is_atgc) {
add_victim_entry(sbi, &p, segno); add_victim_entry(sbi, &p, segno);
goto next; goto next;
@@ -1194,18 +1253,9 @@ static int move_data_block(struct inode *inode, block_t bidx,
goto out; goto out;
} }
if (f2fs_is_atomic_file(inode)) { err = f2fs_gc_pinned_control(inode, gc_type, segno);
F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC]++; if (err)
F2FS_I_SB(inode)->skipped_atomic_files[gc_type]++;
err = -EAGAIN;
goto out; goto out;
}
if (f2fs_is_pinned_file(inode)) {
f2fs_pin_file_control(inode, true);
err = -EAGAIN;
goto out;
}
set_new_dnode(&dn, inode, NULL, NULL, 0); set_new_dnode(&dn, inode, NULL, NULL, 0);
err = f2fs_get_dnode_of_data(&dn, bidx, LOOKUP_NODE); err = f2fs_get_dnode_of_data(&dn, bidx, LOOKUP_NODE);
@@ -1344,18 +1394,9 @@ static int move_data_page(struct inode *inode, block_t bidx, int gc_type,
goto out; goto out;
} }
if (f2fs_is_atomic_file(inode)) { err = f2fs_gc_pinned_control(inode, gc_type, segno);
F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC]++; if (err)
F2FS_I_SB(inode)->skipped_atomic_files[gc_type]++;
err = -EAGAIN;
goto out; goto out;
}
if (f2fs_is_pinned_file(inode)) {
if (gc_type == FG_GC)
f2fs_pin_file_control(inode, true);
err = -EAGAIN;
goto out;
}
if (gc_type == BG_GC) { if (gc_type == BG_GC) {
if (PageWriteback(page)) { if (PageWriteback(page)) {
@@ -1476,11 +1517,19 @@ next_step:
ofs_in_node = le16_to_cpu(entry->ofs_in_node); ofs_in_node = le16_to_cpu(entry->ofs_in_node);
if (phase == 3) { if (phase == 3) {
int err;
inode = f2fs_iget(sb, dni.ino); inode = f2fs_iget(sb, dni.ino);
if (IS_ERR(inode) || is_bad_inode(inode) || if (IS_ERR(inode) || is_bad_inode(inode) ||
special_file(inode->i_mode)) special_file(inode->i_mode))
continue; continue;
err = f2fs_gc_pinned_control(inode, gc_type, segno);
if (err == -EAGAIN) {
iput(inode);
return submitted;
}
if (!f2fs_down_write_trylock( if (!f2fs_down_write_trylock(
&F2FS_I(inode)->i_gc_rwsem[WRITE])) { &F2FS_I(inode)->i_gc_rwsem[WRITE])) {
iput(inode); iput(inode);
@@ -1700,23 +1749,21 @@ skip:
return seg_freed; return seg_freed;
} }
int f2fs_gc(struct f2fs_sb_info *sbi, bool sync, int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
bool background, bool force, unsigned int segno)
{ {
int gc_type = sync ? FG_GC : BG_GC; int gc_type = gc_control->init_gc_type;
unsigned int segno = gc_control->victim_segno;
int sec_freed = 0, seg_freed = 0, total_freed = 0; int sec_freed = 0, seg_freed = 0, total_freed = 0;
int ret = 0; int ret = 0;
struct cp_control cpc; struct cp_control cpc;
unsigned int init_segno = segno;
struct gc_inode_list gc_list = { struct gc_inode_list gc_list = {
.ilist = LIST_HEAD_INIT(gc_list.ilist), .ilist = LIST_HEAD_INIT(gc_list.ilist),
.iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS), .iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
}; };
unsigned long long last_skipped = sbi->skipped_atomic_files[FG_GC];
unsigned long long first_skipped;
unsigned int skipped_round = 0, round = 0; unsigned int skipped_round = 0, round = 0;
trace_f2fs_gc_begin(sbi->sb, sync, background, trace_f2fs_gc_begin(sbi->sb, gc_type, gc_control->no_bg_gc,
gc_control->nr_free_secs,
get_pages(sbi, F2FS_DIRTY_NODES), get_pages(sbi, F2FS_DIRTY_NODES),
get_pages(sbi, F2FS_DIRTY_DENTS), get_pages(sbi, F2FS_DIRTY_DENTS),
get_pages(sbi, F2FS_DIRTY_IMETA), get_pages(sbi, F2FS_DIRTY_IMETA),
@@ -1727,7 +1774,6 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
cpc.reason = __get_cp_reason(sbi); cpc.reason = __get_cp_reason(sbi);
sbi->skipped_gc_rwsem = 0; sbi->skipped_gc_rwsem = 0;
first_skipped = last_skipped;
gc_more: gc_more:
if (unlikely(!(sbi->sb->s_flags & SB_ACTIVE))) { if (unlikely(!(sbi->sb->s_flags & SB_ACTIVE))) {
ret = -EINVAL; ret = -EINVAL;
@@ -1744,8 +1790,7 @@ gc_more:
* threshold, we can make them free by checkpoint. Then, we * threshold, we can make them free by checkpoint. Then, we
* secure free segments which doesn't need fggc any more. * secure free segments which doesn't need fggc any more.
*/ */
if (prefree_segments(sbi) && if (prefree_segments(sbi)) {
!is_sbi_flag_set(sbi, SBI_CP_DISABLED)) {
ret = f2fs_write_checkpoint(sbi, &cpc); ret = f2fs_write_checkpoint(sbi, &cpc);
if (ret) if (ret)
goto stop; goto stop;
@@ -1755,54 +1800,69 @@ gc_more:
} }
/* f2fs_balance_fs doesn't need to do BG_GC in critical path. */ /* f2fs_balance_fs doesn't need to do BG_GC in critical path. */
if (gc_type == BG_GC && !background) { if (gc_type == BG_GC && gc_control->no_bg_gc) {
ret = -EINVAL; ret = -EINVAL;
goto stop; goto stop;
} }
retry:
ret = __get_victim(sbi, &segno, gc_type); ret = __get_victim(sbi, &segno, gc_type);
if (ret) if (ret) {
/* allow to search victim from sections has pinned data */
if (ret == -ENODATA && gc_type == FG_GC &&
f2fs_pinned_section_exists(DIRTY_I(sbi))) {
f2fs_unpin_all_sections(sbi, false);
goto retry;
}
goto stop; goto stop;
}
seg_freed = do_garbage_collect(sbi, segno, &gc_list, gc_type, force); seg_freed = do_garbage_collect(sbi, segno, &gc_list, gc_type,
if (gc_type == FG_GC && gc_control->should_migrate_blocks);
seg_freed == f2fs_usable_segs_in_sec(sbi, segno))
sec_freed++;
total_freed += seg_freed; total_freed += seg_freed;
if (gc_type == FG_GC) { if (seg_freed == f2fs_usable_segs_in_sec(sbi, segno))
if (sbi->skipped_atomic_files[FG_GC] > last_skipped || sec_freed++;
sbi->skipped_gc_rwsem)
skipped_round++;
last_skipped = sbi->skipped_atomic_files[FG_GC];
round++;
}
if (gc_type == FG_GC) if (gc_type == FG_GC)
sbi->cur_victim_sec = NULL_SEGNO; sbi->cur_victim_sec = NULL_SEGNO;
if (sync) if (gc_control->init_gc_type == FG_GC ||
!has_not_enough_free_secs(sbi,
(gc_type == FG_GC) ? sec_freed : 0, 0)) {
if (gc_type == FG_GC && sec_freed < gc_control->nr_free_secs)
goto go_gc_more;
goto stop; goto stop;
if (has_not_enough_free_secs(sbi, sec_freed, 0)) {
if (skipped_round <= MAX_SKIP_GC_COUNT ||
skipped_round * 2 < round) {
segno = NULL_SEGNO;
goto gc_more;
} }
if (first_skipped < last_skipped && /* FG_GC stops GC by skip_count */
(last_skipped - first_skipped) > if (gc_type == FG_GC) {
sbi->skipped_gc_rwsem) { if (sbi->skipped_gc_rwsem)
f2fs_drop_inmem_pages_all(sbi, true); skipped_round++;
segno = NULL_SEGNO; round++;
goto gc_more; if (skipped_round > MAX_SKIP_GC_COUNT &&
} skipped_round * 2 >= round) {
if (gc_type == FG_GC && !is_sbi_flag_set(sbi, SBI_CP_DISABLED))
ret = f2fs_write_checkpoint(sbi, &cpc); ret = f2fs_write_checkpoint(sbi, &cpc);
goto stop;
} }
}
/* Write checkpoint to reclaim prefree segments */
if (free_sections(sbi) < NR_CURSEG_PERSIST_TYPE &&
prefree_segments(sbi)) {
ret = f2fs_write_checkpoint(sbi, &cpc);
if (ret)
goto stop;
}
go_gc_more:
segno = NULL_SEGNO;
goto gc_more;
stop: stop:
SIT_I(sbi)->last_victim[ALLOC_NEXT] = 0; SIT_I(sbi)->last_victim[ALLOC_NEXT] = 0;
SIT_I(sbi)->last_victim[FLUSH_DEVICE] = init_segno; SIT_I(sbi)->last_victim[FLUSH_DEVICE] = gc_control->victim_segno;
if (gc_type == FG_GC)
f2fs_unpin_all_sections(sbi, true);
trace_f2fs_gc_end(sbi->sb, ret, total_freed, sec_freed, trace_f2fs_gc_end(sbi->sb, ret, total_freed, sec_freed,
get_pages(sbi, F2FS_DIRTY_NODES), get_pages(sbi, F2FS_DIRTY_NODES),
@@ -1817,7 +1877,7 @@ stop:
put_gc_inode(&gc_list); put_gc_inode(&gc_list);
if (sync && !ret) if (gc_control->err_gc_skipped && !ret)
ret = sec_freed ? 0 : -EAGAIN; ret = sec_freed ? 0 : -EAGAIN;
return ret; return ret;
} }

View File

@@ -91,7 +91,7 @@ static u32 TEA_hash_name(const u8 *p, size_t len)
/* /*
* Compute @fname->hash. For all directories, @fname->disk_name must be set. * Compute @fname->hash. For all directories, @fname->disk_name must be set.
* For casefolded directories, @fname->usr_fname must be set, and also * For casefolded directories, @fname->usr_fname must be set, and also
* @fname->cf_name if the filename is valid Unicode. * @fname->cf_name if the filename is valid Unicode and is not "." or "..".
*/ */
void f2fs_hash_filename(const struct inode *dir, struct f2fs_filename *fname) void f2fs_hash_filename(const struct inode *dir, struct f2fs_filename *fname)
{ {
@@ -110,10 +110,11 @@ void f2fs_hash_filename(const struct inode *dir, struct f2fs_filename *fname)
/* /*
* If the casefolded name is provided, hash it instead of the * If the casefolded name is provided, hash it instead of the
* on-disk name. If the casefolded name is *not* provided, that * on-disk name. If the casefolded name is *not* provided, that
* should only be because the name wasn't valid Unicode, so fall * should only be because the name wasn't valid Unicode or was
* back to treating the name as an opaque byte sequence. Note * "." or "..", so fall back to treating the name as an opaque
* that to handle encrypted directories, the fallback must use * byte sequence. Note that to handle encrypted directories,
* usr_fname (plaintext) rather than disk_name (ciphertext). * the fallback must use usr_fname (plaintext) rather than
* disk_name (ciphertext).
*/ */
WARN_ON_ONCE(!fname->usr_fname->name); WARN_ON_ONCE(!fname->usr_fname->name);
if (fname->cf_name.name) { if (fname->cf_name.name) {

View File

@@ -15,21 +15,40 @@
#include <trace/events/f2fs.h> #include <trace/events/f2fs.h>
#include <trace/events/android_fs.h> #include <trace/events/android_fs.h>
bool f2fs_may_inline_data(struct inode *inode) static bool support_inline_data(struct inode *inode)
{ {
if (f2fs_is_atomic_file(inode)) if (f2fs_is_atomic_file(inode))
return false; return false;
if (!S_ISREG(inode->i_mode) && !S_ISLNK(inode->i_mode)) if (!S_ISREG(inode->i_mode) && !S_ISLNK(inode->i_mode))
return false; return false;
if (i_size_read(inode) > MAX_INLINE_DATA(inode)) if (i_size_read(inode) > MAX_INLINE_DATA(inode))
return false; return false;
return true;
}
if (f2fs_post_read_required(inode)) bool f2fs_may_inline_data(struct inode *inode)
{
if (!support_inline_data(inode))
return false; return false;
return !f2fs_post_read_required(inode);
}
bool f2fs_sanity_check_inline_data(struct inode *inode)
{
if (!f2fs_has_inline_data(inode))
return false;
if (!support_inline_data(inode))
return true; return true;
/*
* used by sanity_check_inode(), when disk layout fields has not
* been synchronized to inmem fields.
*/
return (S_ISREG(inode->i_mode) &&
(file_is_encrypt(inode) || file_is_verity(inode) ||
(F2FS_I(inode)->i_flags & F2FS_COMPR_FL)));
} }
bool f2fs_may_inline_dentry(struct inode *inode) bool f2fs_may_inline_dentry(struct inode *inode)

View File

@@ -260,8 +260,8 @@ static bool sanity_check_inode(struct inode *inode, struct page *node_page)
return false; return false;
} }
if (F2FS_I(inode)->extent_tree) { if (fi->extent_tree) {
struct extent_info *ei = &F2FS_I(inode)->extent_tree->largest; struct extent_info *ei = &fi->extent_tree->largest;
if (ei->len && if (ei->len &&
(!f2fs_is_valid_blkaddr(sbi, ei->blk, (!f2fs_is_valid_blkaddr(sbi, ei->blk,
@@ -276,8 +276,7 @@ static bool sanity_check_inode(struct inode *inode, struct page *node_page)
} }
} }
if (f2fs_has_inline_data(inode) && if (f2fs_sanity_check_inline_data(inode)) {
(!S_ISREG(inode->i_mode) && !S_ISLNK(inode->i_mode))) {
set_sbi_flag(sbi, SBI_NEED_FSCK); set_sbi_flag(sbi, SBI_NEED_FSCK);
f2fs_warn(sbi, "%s: inode (ino=%lx, mode=%u) should not have inline_data, run fsck to fix", f2fs_warn(sbi, "%s: inode (ino=%lx, mode=%u) should not have inline_data, run fsck to fix",
__func__, inode->i_ino, inode->i_mode); __func__, inode->i_ino, inode->i_mode);
@@ -466,10 +465,10 @@ static int do_read_inode(struct inode *inode)
} }
} }
F2FS_I(inode)->i_disk_time[0] = inode->i_atime; fi->i_disk_time[0] = inode->i_atime;
F2FS_I(inode)->i_disk_time[1] = inode->i_ctime; fi->i_disk_time[1] = inode->i_ctime;
F2FS_I(inode)->i_disk_time[2] = inode->i_mtime; fi->i_disk_time[2] = inode->i_mtime;
F2FS_I(inode)->i_disk_time[3] = F2FS_I(inode)->i_crtime; fi->i_disk_time[3] = fi->i_crtime;
f2fs_put_page(node_page, 1); f2fs_put_page(node_page, 1);
stat_inc_inline_xattr(inode); stat_inc_inline_xattr(inode);
@@ -745,9 +744,8 @@ void f2fs_evict_inode(struct inode *inode)
nid_t xnid = F2FS_I(inode)->i_xattr_nid; nid_t xnid = F2FS_I(inode)->i_xattr_nid;
int err = 0; int err = 0;
/* some remained atomic pages should discarded */
if (f2fs_is_atomic_file(inode)) if (f2fs_is_atomic_file(inode))
f2fs_drop_inmem_pages(inode); f2fs_abort_atomic_write(inode, true);
trace_f2fs_evict_inode(inode); trace_f2fs_evict_inode(inode);
truncate_inode_pages_final(&inode->i_data); truncate_inode_pages_final(&inode->i_data);
@@ -796,8 +794,22 @@ retry:
f2fs_lock_op(sbi); f2fs_lock_op(sbi);
err = f2fs_remove_inode_page(inode); err = f2fs_remove_inode_page(inode);
f2fs_unlock_op(sbi); f2fs_unlock_op(sbi);
if (err == -ENOENT) if (err == -ENOENT) {
err = 0; err = 0;
/*
* in fuzzed image, another node may has the same
* block address as inode's, if it was truncated
* previously, truncation of inode node will fail.
*/
if (is_inode_flag_set(inode, FI_DIRTY_INODE)) {
f2fs_warn(F2FS_I_SB(inode),
"f2fs_evict_inode: inconsistent node id, ino:%lu",
inode->i_ino);
f2fs_inode_synced(inode);
set_sbi_flag(sbi, SBI_NEED_FSCK);
}
}
} }
/* give more chances, if ENOMEM case */ /* give more chances, if ENOMEM case */

View File

@@ -37,13 +37,10 @@ static struct inode *f2fs_new_inode(struct user_namespace *mnt_userns,
if (!inode) if (!inode)
return ERR_PTR(-ENOMEM); return ERR_PTR(-ENOMEM);
f2fs_lock_op(sbi);
if (!f2fs_alloc_nid(sbi, &ino)) { if (!f2fs_alloc_nid(sbi, &ino)) {
f2fs_unlock_op(sbi);
err = -ENOSPC; err = -ENOSPC;
goto fail; goto fail;
} }
f2fs_unlock_op(sbi);
nid_free = true; nid_free = true;
@@ -92,8 +89,6 @@ static struct inode *f2fs_new_inode(struct user_namespace *mnt_userns,
if (test_opt(sbi, INLINE_XATTR)) if (test_opt(sbi, INLINE_XATTR))
set_inode_flag(inode, FI_INLINE_XATTR); set_inode_flag(inode, FI_INLINE_XATTR);
if (test_opt(sbi, INLINE_DATA) && f2fs_may_inline_data(inode))
set_inode_flag(inode, FI_INLINE_DATA);
if (f2fs_may_inline_dentry(inode)) if (f2fs_may_inline_dentry(inode))
set_inode_flag(inode, FI_INLINE_DENTRY); set_inode_flag(inode, FI_INLINE_DENTRY);
@@ -110,10 +105,6 @@ static struct inode *f2fs_new_inode(struct user_namespace *mnt_userns,
f2fs_init_extent_tree(inode, NULL); f2fs_init_extent_tree(inode, NULL);
stat_inc_inline_xattr(inode);
stat_inc_inline_inode(inode);
stat_inc_inline_dir(inode);
F2FS_I(inode)->i_flags = F2FS_I(inode)->i_flags =
f2fs_mask_flags(mode, F2FS_I(dir)->i_flags & F2FS_FL_INHERITED); f2fs_mask_flags(mode, F2FS_I(dir)->i_flags & F2FS_FL_INHERITED);
@@ -130,6 +121,14 @@ static struct inode *f2fs_new_inode(struct user_namespace *mnt_userns,
set_compress_context(inode); set_compress_context(inode);
} }
/* Should enable inline_data after compression set */
if (test_opt(sbi, INLINE_DATA) && f2fs_may_inline_data(inode))
set_inode_flag(inode, FI_INLINE_DATA);
stat_inc_inline_xattr(inode);
stat_inc_inline_inode(inode);
stat_inc_inline_dir(inode);
f2fs_set_inode_flags(inode); f2fs_set_inode_flags(inode);
trace_f2fs_new_inode(inode, 0); trace_f2fs_new_inode(inode, 0);
@@ -328,6 +327,8 @@ static void set_compress_inode(struct f2fs_sb_info *sbi, struct inode *inode,
if (!is_extension_exist(name, ext[i], false)) if (!is_extension_exist(name, ext[i], false))
continue; continue;
/* Do not use inline_data with compression */
clear_inode_flag(inode, FI_INLINE_DATA);
set_compress_context(inode); set_compress_context(inode);
return; return;
} }
@@ -461,6 +462,13 @@ static int __recover_dot_dentries(struct inode *dir, nid_t pino)
return 0; return 0;
} }
if (!S_ISDIR(dir->i_mode)) {
f2fs_err(sbi, "inconsistent inode status, skip recovering inline_dots inode (ino:%lu, i_mode:%u, pino:%u)",
dir->i_ino, dir->i_mode, pino);
set_sbi_flag(sbi, SBI_NEED_FSCK);
return -ENOTDIR;
}
err = f2fs_dquot_initialize(dir); err = f2fs_dquot_initialize(dir);
if (err) if (err)
return err; return err;
@@ -836,8 +844,8 @@ out:
} }
static int __f2fs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir, static int __f2fs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
struct dentry *dentry, umode_t mode, struct dentry *dentry, umode_t mode, bool is_whiteout,
struct inode **whiteout) struct inode **new_inode)
{ {
struct f2fs_sb_info *sbi = F2FS_I_SB(dir); struct f2fs_sb_info *sbi = F2FS_I_SB(dir);
struct inode *inode; struct inode *inode;
@@ -851,7 +859,7 @@ static int __f2fs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
if (IS_ERR(inode)) if (IS_ERR(inode))
return PTR_ERR(inode); return PTR_ERR(inode);
if (whiteout) { if (is_whiteout) {
init_special_inode(inode, inode->i_mode, WHITEOUT_DEV); init_special_inode(inode, inode->i_mode, WHITEOUT_DEV);
inode->i_op = &f2fs_special_inode_operations; inode->i_op = &f2fs_special_inode_operations;
} else { } else {
@@ -876,21 +884,25 @@ static int __f2fs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
f2fs_add_orphan_inode(inode); f2fs_add_orphan_inode(inode);
f2fs_alloc_nid_done(sbi, inode->i_ino); f2fs_alloc_nid_done(sbi, inode->i_ino);
if (whiteout) { if (is_whiteout) {
f2fs_i_links_write(inode, false); f2fs_i_links_write(inode, false);
spin_lock(&inode->i_lock); spin_lock(&inode->i_lock);
inode->i_state |= I_LINKABLE; inode->i_state |= I_LINKABLE;
spin_unlock(&inode->i_lock); spin_unlock(&inode->i_lock);
*whiteout = inode;
} else { } else {
if (dentry)
d_tmpfile(dentry, inode); d_tmpfile(dentry, inode);
else
f2fs_i_links_write(inode, false);
} }
/* link_count was changed by d_tmpfile as well. */ /* link_count was changed by d_tmpfile as well. */
f2fs_unlock_op(sbi); f2fs_unlock_op(sbi);
unlock_new_inode(inode); unlock_new_inode(inode);
if (new_inode)
*new_inode = inode;
f2fs_balance_fs(sbi, true); f2fs_balance_fs(sbi, true);
return 0; return 0;
@@ -911,7 +923,7 @@ static int f2fs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
if (!f2fs_is_checkpoint_ready(sbi)) if (!f2fs_is_checkpoint_ready(sbi))
return -ENOSPC; return -ENOSPC;
return __f2fs_tmpfile(mnt_userns, dir, dentry, mode, NULL); return __f2fs_tmpfile(mnt_userns, dir, dentry, mode, false, NULL);
} }
static int f2fs_create_whiteout(struct user_namespace *mnt_userns, static int f2fs_create_whiteout(struct user_namespace *mnt_userns,
@@ -921,7 +933,13 @@ static int f2fs_create_whiteout(struct user_namespace *mnt_userns,
return -EIO; return -EIO;
return __f2fs_tmpfile(mnt_userns, dir, NULL, return __f2fs_tmpfile(mnt_userns, dir, NULL,
S_IFCHR | WHITEOUT_MODE, whiteout); S_IFCHR | WHITEOUT_MODE, true, whiteout);
}
int f2fs_get_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
struct inode **new_inode)
{
return __f2fs_tmpfile(mnt_userns, dir, NULL, S_IFREG, false, new_inode);
} }
static int f2fs_rename(struct user_namespace *mnt_userns, struct inode *old_dir, static int f2fs_rename(struct user_namespace *mnt_userns, struct inode *old_dir,

View File

@@ -90,10 +90,6 @@ bool f2fs_available_free_memory(struct f2fs_sb_info *sbi, int type)
atomic_read(&sbi->total_ext_node) * atomic_read(&sbi->total_ext_node) *
sizeof(struct extent_node)) >> PAGE_SHIFT; sizeof(struct extent_node)) >> PAGE_SHIFT;
res = mem_size < ((avail_ram * nm_i->ram_thresh / 100) >> 1); res = mem_size < ((avail_ram * nm_i->ram_thresh / 100) >> 1);
} else if (type == INMEM_PAGES) {
/* it allows 20% / total_ram for inmemory pages */
mem_size = get_pages(sbi, F2FS_INMEM_PAGES);
res = mem_size < (val.totalram / 5);
} else if (type == DISCARD_CACHE) { } else if (type == DISCARD_CACHE) {
mem_size = (atomic_read(&dcc->discard_cmd_cnt) * mem_size = (atomic_read(&dcc->discard_cmd_cnt) *
sizeof(struct discard_cmd)) >> PAGE_SHIFT; sizeof(struct discard_cmd)) >> PAGE_SHIFT;
@@ -1416,8 +1412,7 @@ repeat:
err = read_node_page(page, 0); err = read_node_page(page, 0);
if (err < 0) { if (err < 0) {
f2fs_put_page(page, 1); goto out_put_err;
return ERR_PTR(err);
} else if (err == LOCKED_PAGE) { } else if (err == LOCKED_PAGE) {
err = 0; err = 0;
goto page_hit; goto page_hit;
@@ -1443,7 +1438,9 @@ repeat:
goto out_err; goto out_err;
} }
page_hit: page_hit:
if (unlikely(nid != nid_of_node(page))) { if (likely(nid == nid_of_node(page)))
return page;
f2fs_warn(sbi, "inconsistent node block, nid:%lu, node_footer[nid:%u,ino:%u,ofs:%u,cpver:%llu,blkaddr:%u]", f2fs_warn(sbi, "inconsistent node block, nid:%lu, node_footer[nid:%u,ino:%u,ofs:%u,cpver:%llu,blkaddr:%u]",
nid, nid_of_node(page), ino_of_node(page), nid, nid_of_node(page), ino_of_node(page),
ofs_of_node(page), cpver_of_node(page), ofs_of_node(page), cpver_of_node(page),
@@ -1452,11 +1449,11 @@ page_hit:
err = -EINVAL; err = -EINVAL;
out_err: out_err:
ClearPageUptodate(page); ClearPageUptodate(page);
out_put_err:
f2fs_handle_page_eio(sbi, page->index, NODE);
f2fs_put_page(page, 1); f2fs_put_page(page, 1);
return ERR_PTR(err); return ERR_PTR(err);
} }
return page;
}
struct page *f2fs_get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid) struct page *f2fs_get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid)
{ {
@@ -1631,7 +1628,7 @@ static int __write_node_page(struct page *page, bool atomic, bool *submitted,
goto redirty_out; goto redirty_out;
} }
if (atomic && !test_opt(sbi, NOBARRIER)) if (atomic && !test_opt(sbi, NOBARRIER) && !f2fs_sb_has_blkzoned(sbi))
fio.op_flags |= REQ_PREFLUSH | REQ_FUA; fio.op_flags |= REQ_PREFLUSH | REQ_FUA;
/* should add to global list before clearing PAGECACHE status */ /* should add to global list before clearing PAGECACHE status */

View File

@@ -147,7 +147,6 @@ enum mem_type {
DIRTY_DENTS, /* indicates dirty dentry pages */ DIRTY_DENTS, /* indicates dirty dentry pages */
INO_ENTRIES, /* indicates inode entries */ INO_ENTRIES, /* indicates inode entries */
EXTENT_CACHE, /* indicates extent cache */ EXTENT_CACHE, /* indicates extent cache */
INMEM_PAGES, /* indicates inmemory pages */
DISCARD_CACHE, /* indicates memory of cached discard cmds */ DISCARD_CACHE, /* indicates memory of cached discard cmds */
COMPRESS_PAGE, /* indicates memory of cached compressed pages */ COMPRESS_PAGE, /* indicates memory of cached compressed pages */
BASE_CHECK, /* check kernel status */ BASE_CHECK, /* check kernel status */

View File

@@ -29,7 +29,7 @@
static struct kmem_cache *discard_entry_slab; static struct kmem_cache *discard_entry_slab;
static struct kmem_cache *discard_cmd_slab; static struct kmem_cache *discard_cmd_slab;
static struct kmem_cache *sit_entry_set_slab; static struct kmem_cache *sit_entry_set_slab;
static struct kmem_cache *inmem_entry_slab; static struct kmem_cache *revoke_entry_slab;
static unsigned long __reverse_ulong(unsigned char *str) static unsigned long __reverse_ulong(unsigned char *str)
{ {
@@ -184,74 +184,42 @@ bool f2fs_need_SSR(struct f2fs_sb_info *sbi)
SM_I(sbi)->min_ssr_sections + reserved_sections(sbi)); SM_I(sbi)->min_ssr_sections + reserved_sections(sbi));
} }
void f2fs_register_inmem_page(struct inode *inode, struct page *page) void f2fs_abort_atomic_write(struct inode *inode, bool clean)
{
struct inmem_pages *new;
set_page_private_atomic(page);
new = f2fs_kmem_cache_alloc(inmem_entry_slab,
GFP_NOFS, true, NULL);
/* add atomic page indices to the list */
new->page = page;
INIT_LIST_HEAD(&new->list);
/* increase reference count with clean state */
get_page(page);
mutex_lock(&F2FS_I(inode)->inmem_lock);
list_add_tail(&new->list, &F2FS_I(inode)->inmem_pages);
inc_page_count(F2FS_I_SB(inode), F2FS_INMEM_PAGES);
mutex_unlock(&F2FS_I(inode)->inmem_lock);
trace_f2fs_register_inmem_page(page, INMEM);
}
static int __revoke_inmem_pages(struct inode *inode,
struct list_head *head, bool drop, bool recover,
bool trylock)
{ {
struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct inmem_pages *cur, *tmp; struct f2fs_inode_info *fi = F2FS_I(inode);
int err = 0;
list_for_each_entry_safe(cur, tmp, head, list) { if (f2fs_is_atomic_file(inode)) {
struct page *page = cur->page; if (clean)
truncate_inode_pages_final(inode->i_mapping);
clear_inode_flag(fi->cow_inode, FI_ATOMIC_FILE);
iput(fi->cow_inode);
fi->cow_inode = NULL;
clear_inode_flag(inode, FI_ATOMIC_FILE);
if (drop) spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
trace_f2fs_commit_inmem_page(page, INMEM_DROP); sbi->atomic_files--;
spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
if (trylock) { }
/*
* to avoid deadlock in between page lock and
* inmem_lock.
*/
if (!trylock_page(page))
continue;
} else {
lock_page(page);
} }
f2fs_wait_on_page_writeback(page, DATA, true, true); static int __replace_atomic_write_block(struct inode *inode, pgoff_t index,
block_t new_addr, block_t *old_addr, bool recover)
if (recover) { {
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct dnode_of_data dn; struct dnode_of_data dn;
struct node_info ni; struct node_info ni;
int err;
trace_f2fs_commit_inmem_page(page, INMEM_REVOKE);
retry: retry:
set_new_dnode(&dn, inode, NULL, NULL, 0); set_new_dnode(&dn, inode, NULL, NULL, 0);
err = f2fs_get_dnode_of_data(&dn, page->index, err = f2fs_get_dnode_of_data(&dn, index, LOOKUP_NODE_RA);
LOOKUP_NODE);
if (err) { if (err) {
if (err == -ENOMEM) { if (err == -ENOMEM) {
congestion_wait(BLK_RW_ASYNC, f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
DEFAULT_IO_TIMEOUT);
cond_resched();
goto retry; goto retry;
} }
err = -EAGAIN; return err;
goto next;
} }
err = f2fs_get_node_info(sbi, dn.nid, &ni, false); err = f2fs_get_node_info(sbi, dn.nid, &ni, false);
@@ -260,230 +228,131 @@ retry:
return err; return err;
} }
if (cur->old_addr == NEW_ADDR) { if (recover) {
/* dn.data_blkaddr is always valid */
if (!__is_valid_data_blkaddr(new_addr)) {
if (new_addr == NULL_ADDR)
dec_valid_block_count(sbi, inode, 1);
f2fs_invalidate_blocks(sbi, dn.data_blkaddr); f2fs_invalidate_blocks(sbi, dn.data_blkaddr);
f2fs_update_data_blkaddr(&dn, NEW_ADDR); f2fs_update_data_blkaddr(&dn, new_addr);
} else } else {
f2fs_replace_block(sbi, &dn, dn.data_blkaddr, f2fs_replace_block(sbi, &dn, dn.data_blkaddr,
cur->old_addr, ni.version, true, true); new_addr, ni.version, true, true);
}
} else {
blkcnt_t count = 1;
*old_addr = dn.data_blkaddr;
f2fs_truncate_data_blocks_range(&dn, 1);
dec_valid_block_count(sbi, F2FS_I(inode)->cow_inode, count);
inc_valid_block_count(sbi, inode, &count);
f2fs_replace_block(sbi, &dn, dn.data_blkaddr, new_addr,
ni.version, true, false);
}
f2fs_put_dnode(&dn); f2fs_put_dnode(&dn);
return 0;
} }
next:
/* we don't need to invalidate this in the sccessful status */
if (drop || recover) {
ClearPageUptodate(page);
clear_page_private_gcing(page);
}
detach_page_private(page);
set_page_private(page, 0);
f2fs_put_page(page, 1);
static void __complete_revoke_list(struct inode *inode, struct list_head *head,
bool revoke)
{
struct revoke_entry *cur, *tmp;
list_for_each_entry_safe(cur, tmp, head, list) {
if (revoke)
__replace_atomic_write_block(inode, cur->index,
cur->old_addr, NULL, true);
list_del(&cur->list); list_del(&cur->list);
kmem_cache_free(inmem_entry_slab, cur); kmem_cache_free(revoke_entry_slab, cur);
dec_page_count(F2FS_I_SB(inode), F2FS_INMEM_PAGES);
} }
return err;
} }
void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi, bool gc_failure) static int __f2fs_commit_atomic_write(struct inode *inode)
{
struct list_head *head = &sbi->inode_list[ATOMIC_FILE];
struct inode *inode;
struct f2fs_inode_info *fi;
unsigned int count = sbi->atomic_files;
unsigned int looped = 0;
next:
spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
if (list_empty(head)) {
spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
return;
}
fi = list_first_entry(head, struct f2fs_inode_info, inmem_ilist);
inode = igrab(&fi->vfs_inode);
if (inode)
list_move_tail(&fi->inmem_ilist, head);
spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
if (inode) {
if (gc_failure) {
if (!fi->i_gc_failures[GC_FAILURE_ATOMIC])
goto skip;
}
set_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
f2fs_drop_inmem_pages(inode);
skip:
iput(inode);
}
congestion_wait(BLK_RW_ASYNC, DEFAULT_IO_TIMEOUT);
cond_resched();
if (gc_failure) {
if (++looped >= count)
return;
}
goto next;
}
void f2fs_drop_inmem_pages(struct inode *inode)
{ {
struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct f2fs_inode_info *fi = F2FS_I(inode); struct f2fs_inode_info *fi = F2FS_I(inode);
struct inode *cow_inode = fi->cow_inode;
do { struct revoke_entry *new;
mutex_lock(&fi->inmem_lock);
if (list_empty(&fi->inmem_pages)) {
fi->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
if (!list_empty(&fi->inmem_ilist))
list_del_init(&fi->inmem_ilist);
if (f2fs_is_atomic_file(inode)) {
clear_inode_flag(inode, FI_ATOMIC_FILE);
sbi->atomic_files--;
}
spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
mutex_unlock(&fi->inmem_lock);
break;
}
__revoke_inmem_pages(inode, &fi->inmem_pages,
true, false, true);
mutex_unlock(&fi->inmem_lock);
} while (1);
}
void f2fs_drop_inmem_page(struct inode *inode, struct page *page)
{
struct f2fs_inode_info *fi = F2FS_I(inode);
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct list_head *head = &fi->inmem_pages;
struct inmem_pages *cur = NULL;
f2fs_bug_on(sbi, !page_private_atomic(page));
mutex_lock(&fi->inmem_lock);
list_for_each_entry(cur, head, list) {
if (cur->page == page)
break;
}
f2fs_bug_on(sbi, list_empty(head) || cur->page != page);
list_del(&cur->list);
mutex_unlock(&fi->inmem_lock);
dec_page_count(sbi, F2FS_INMEM_PAGES);
kmem_cache_free(inmem_entry_slab, cur);
ClearPageUptodate(page);
clear_page_private_atomic(page);
f2fs_put_page(page, 0);
detach_page_private(page);
set_page_private(page, 0);
trace_f2fs_commit_inmem_page(page, INMEM_INVALIDATE);
}
static int __f2fs_commit_inmem_pages(struct inode *inode)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct f2fs_inode_info *fi = F2FS_I(inode);
struct inmem_pages *cur, *tmp;
struct f2fs_io_info fio = {
.sbi = sbi,
.ino = inode->i_ino,
.type = DATA,
.op = REQ_OP_WRITE,
.op_flags = REQ_SYNC | REQ_PRIO,
.io_type = FS_DATA_IO,
};
struct list_head revoke_list; struct list_head revoke_list;
bool submit_bio = false; block_t blkaddr;
int err = 0; struct dnode_of_data dn;
pgoff_t len = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
pgoff_t off = 0, blen, index;
int ret = 0, i;
INIT_LIST_HEAD(&revoke_list); INIT_LIST_HEAD(&revoke_list);
list_for_each_entry_safe(cur, tmp, &fi->inmem_pages, list) { while (len) {
struct page *page = cur->page; blen = min_t(pgoff_t, ADDRS_PER_BLOCK(cow_inode), len);
lock_page(page); set_new_dnode(&dn, cow_inode, NULL, NULL, 0);
if (page->mapping == inode->i_mapping) { ret = f2fs_get_dnode_of_data(&dn, off, LOOKUP_NODE_RA);
trace_f2fs_commit_inmem_page(page, INMEM); if (ret && ret != -ENOENT) {
goto out;
f2fs_wait_on_page_writeback(page, DATA, true, true); } else if (ret == -ENOENT) {
ret = 0;
set_page_dirty(page); if (dn.max_level == 0)
if (clear_page_dirty_for_io(page)) { goto out;
inode_dec_dirty_pages(inode); goto next;
f2fs_remove_dirty_inode(inode);
}
retry:
fio.page = page;
fio.old_blkaddr = NULL_ADDR;
fio.encrypted_page = NULL;
fio.need_lock = LOCK_DONE;
err = f2fs_do_write_data_page(&fio);
if (err) {
if (err == -ENOMEM) {
congestion_wait(BLK_RW_ASYNC,
DEFAULT_IO_TIMEOUT);
cond_resched();
goto retry;
}
unlock_page(page);
break;
}
/* record old blkaddr for revoking */
cur->old_addr = fio.old_blkaddr;
submit_bio = true;
}
unlock_page(page);
list_move_tail(&cur->list, &revoke_list);
} }
if (submit_bio) blen = min((pgoff_t)ADDRS_PER_PAGE(dn.node_page, cow_inode),
f2fs_submit_merged_write_cond(sbi, inode, NULL, 0, DATA); len);
index = off;
for (i = 0; i < blen; i++, dn.ofs_in_node++, index++) {
blkaddr = f2fs_data_blkaddr(&dn);
if (err) { if (!__is_valid_data_blkaddr(blkaddr)) {
/* continue;
* try to revoke all committed pages, but still we could fail } else if (!f2fs_is_valid_blkaddr(sbi, blkaddr,
* due to no memory or other reason, if that happened, EAGAIN DATA_GENERIC_ENHANCE)) {
* will be returned, which means in such case, transaction is f2fs_put_dnode(&dn);
* already not integrity, caller should use journal to do the ret = -EFSCORRUPTED;
* recovery or rewrite & commit last transaction. For other goto out;
* error number, revoking was done by filesystem itself.
*/
err = __revoke_inmem_pages(inode, &revoke_list,
false, true, false);
/* drop all uncommitted pages */
__revoke_inmem_pages(inode, &fi->inmem_pages,
true, false, false);
} else {
__revoke_inmem_pages(inode, &revoke_list,
false, false, false);
} }
return err; new = f2fs_kmem_cache_alloc(revoke_entry_slab, GFP_NOFS,
true, NULL);
ret = __replace_atomic_write_block(inode, index, blkaddr,
&new->old_addr, false);
if (ret) {
f2fs_put_dnode(&dn);
kmem_cache_free(revoke_entry_slab, new);
goto out;
} }
int f2fs_commit_inmem_pages(struct inode *inode) f2fs_update_data_blkaddr(&dn, NULL_ADDR);
new->index = index;
list_add_tail(&new->list, &revoke_list);
}
f2fs_put_dnode(&dn);
next:
off += blen;
len -= blen;
}
out:
__complete_revoke_list(inode, &revoke_list, ret ? true : false);
return ret;
}
int f2fs_commit_atomic_write(struct inode *inode)
{ {
struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct f2fs_inode_info *fi = F2FS_I(inode); struct f2fs_inode_info *fi = F2FS_I(inode);
int err; int err;
f2fs_balance_fs(sbi, true); err = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX);
if (err)
return err;
f2fs_down_write(&fi->i_gc_rwsem[WRITE]); f2fs_down_write(&fi->i_gc_rwsem[WRITE]);
f2fs_lock_op(sbi); f2fs_lock_op(sbi);
set_inode_flag(inode, FI_ATOMIC_COMMIT);
mutex_lock(&fi->inmem_lock); err = __f2fs_commit_atomic_write(inode);
err = __f2fs_commit_inmem_pages(inode);
mutex_unlock(&fi->inmem_lock);
clear_inode_flag(inode, FI_ATOMIC_COMMIT);
f2fs_unlock_op(sbi); f2fs_unlock_op(sbi);
f2fs_up_write(&fi->i_gc_rwsem[WRITE]); f2fs_up_write(&fi->i_gc_rwsem[WRITE]);
@@ -524,8 +393,15 @@ void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need)
io_schedule(); io_schedule();
finish_wait(&sbi->gc_thread->fggc_wq, &wait); finish_wait(&sbi->gc_thread->fggc_wq, &wait);
} else { } else {
struct f2fs_gc_control gc_control = {
.victim_segno = NULL_SEGNO,
.init_gc_type = BG_GC,
.no_bg_gc = true,
.should_migrate_blocks = false,
.err_gc_skipped = false,
.nr_free_secs = 1 };
f2fs_down_write(&sbi->gc_lock); f2fs_down_write(&sbi->gc_lock);
f2fs_gc(sbi, false, false, false, NULL_SEGNO); f2fs_gc(sbi, &gc_control);
} }
} }
} }
@@ -806,8 +682,7 @@ int f2fs_flush_device_cache(struct f2fs_sb_info *sbi)
do { do {
ret = __submit_flush_wait(sbi, FDEV(i).bdev); ret = __submit_flush_wait(sbi, FDEV(i).bdev);
if (ret) if (ret)
congestion_wait(BLK_RW_ASYNC, f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
DEFAULT_IO_TIMEOUT);
} while (ret && --count); } while (ret && --count);
if (ret) { if (ret) {
@@ -1671,33 +1546,32 @@ static unsigned int __wait_discard_cmd_range(struct f2fs_sb_info *sbi,
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info; struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
struct list_head *wait_list = (dpolicy->type == DPOLICY_FSTRIM) ? struct list_head *wait_list = (dpolicy->type == DPOLICY_FSTRIM) ?
&(dcc->fstrim_list) : &(dcc->wait_list); &(dcc->fstrim_list) : &(dcc->wait_list);
struct discard_cmd *dc, *tmp; struct discard_cmd *dc = NULL, *iter, *tmp;
bool need_wait;
unsigned int trimmed = 0; unsigned int trimmed = 0;
next: next:
need_wait = false; dc = NULL;
mutex_lock(&dcc->cmd_lock); mutex_lock(&dcc->cmd_lock);
list_for_each_entry_safe(dc, tmp, wait_list, list) { list_for_each_entry_safe(iter, tmp, wait_list, list) {
if (dc->lstart + dc->len <= start || end <= dc->lstart) if (iter->lstart + iter->len <= start || end <= iter->lstart)
continue; continue;
if (dc->len < dpolicy->granularity) if (iter->len < dpolicy->granularity)
continue; continue;
if (dc->state == D_DONE && !dc->ref) { if (iter->state == D_DONE && !iter->ref) {
wait_for_completion_io(&dc->wait); wait_for_completion_io(&iter->wait);
if (!dc->error) if (!iter->error)
trimmed += dc->len; trimmed += iter->len;
__remove_discard_cmd(sbi, dc); __remove_discard_cmd(sbi, iter);
} else { } else {
dc->ref++; iter->ref++;
need_wait = true; dc = iter;
break; break;
} }
} }
mutex_unlock(&dcc->cmd_lock); mutex_unlock(&dcc->cmd_lock);
if (need_wait) { if (dc) {
trimmed += __wait_one_discard_bio(sbi, dc); trimmed += __wait_one_discard_bio(sbi, dc);
goto next; goto next;
} }
@@ -3140,7 +3014,7 @@ next:
blk_finish_plug(&plug); blk_finish_plug(&plug);
mutex_unlock(&dcc->cmd_lock); mutex_unlock(&dcc->cmd_lock);
trimmed += __wait_all_discard_cmd(sbi, NULL); trimmed += __wait_all_discard_cmd(sbi, NULL);
congestion_wait(BLK_RW_ASYNC, DEFAULT_IO_TIMEOUT); f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
goto next; goto next;
} }
skip: skip:
@@ -3248,101 +3122,6 @@ int f2fs_rw_hint_to_seg_type(enum rw_hint hint)
} }
} }
/* This returns write hints for each segment type. This hints will be
* passed down to block layer. There are mapping tables which depend on
* the mount option 'whint_mode'.
*
* 1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.
*
* 2) whint_mode=user-based. F2FS tries to pass down hints given by users.
*
* User F2FS Block
* ---- ---- -----
* META WRITE_LIFE_NOT_SET
* HOT_NODE "
* WARM_NODE "
* COLD_NODE "
* ioctl(COLD) COLD_DATA WRITE_LIFE_EXTREME
* extension list " "
*
* -- buffered io
* WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
* WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
* WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
* WRITE_LIFE_NONE " "
* WRITE_LIFE_MEDIUM " "
* WRITE_LIFE_LONG " "
*
* -- direct io
* WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
* WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
* WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
* WRITE_LIFE_NONE " WRITE_LIFE_NONE
* WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM
* WRITE_LIFE_LONG " WRITE_LIFE_LONG
*
* 3) whint_mode=fs-based. F2FS passes down hints with its policy.
*
* User F2FS Block
* ---- ---- -----
* META WRITE_LIFE_MEDIUM;
* HOT_NODE WRITE_LIFE_NOT_SET
* WARM_NODE "
* COLD_NODE WRITE_LIFE_NONE
* ioctl(COLD) COLD_DATA WRITE_LIFE_EXTREME
* extension list " "
*
* -- buffered io
* WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
* WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
* WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_LONG
* WRITE_LIFE_NONE " "
* WRITE_LIFE_MEDIUM " "
* WRITE_LIFE_LONG " "
*
* -- direct io
* WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
* WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
* WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
* WRITE_LIFE_NONE " WRITE_LIFE_NONE
* WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM
* WRITE_LIFE_LONG " WRITE_LIFE_LONG
*/
enum rw_hint f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi,
enum page_type type, enum temp_type temp)
{
if (F2FS_OPTION(sbi).whint_mode == WHINT_MODE_USER) {
if (type == DATA) {
if (temp == WARM)
return WRITE_LIFE_NOT_SET;
else if (temp == HOT)
return WRITE_LIFE_SHORT;
else if (temp == COLD)
return WRITE_LIFE_EXTREME;
} else {
return WRITE_LIFE_NOT_SET;
}
} else if (F2FS_OPTION(sbi).whint_mode == WHINT_MODE_FS) {
if (type == DATA) {
if (temp == WARM)
return WRITE_LIFE_LONG;
else if (temp == HOT)
return WRITE_LIFE_SHORT;
else if (temp == COLD)
return WRITE_LIFE_EXTREME;
} else if (type == NODE) {
if (temp == WARM || temp == HOT)
return WRITE_LIFE_NOT_SET;
else if (temp == COLD)
return WRITE_LIFE_NONE;
} else if (type == META) {
return WRITE_LIFE_MEDIUM;
}
}
return WRITE_LIFE_NOT_SET;
}
static int __get_segment_type_2(struct f2fs_io_info *fio) static int __get_segment_type_2(struct f2fs_io_info *fio)
{ {
if (fio->type == DATA) if (fio->type == DATA)
@@ -3388,8 +3167,7 @@ static int __get_segment_type_6(struct f2fs_io_info *fio)
return CURSEG_COLD_DATA; return CURSEG_COLD_DATA;
if (file_is_hot(inode) || if (file_is_hot(inode) ||
is_inode_flag_set(inode, FI_HOT_DATA) || is_inode_flag_set(inode, FI_HOT_DATA) ||
f2fs_is_atomic_file(inode) || f2fs_is_atomic_file(inode))
f2fs_is_volatile_file(inode))
return CURSEG_HOT_DATA; return CURSEG_HOT_DATA;
return f2fs_rw_hint_to_seg_type(inode->i_write_hint); return f2fs_rw_hint_to_seg_type(inode->i_write_hint);
} else { } else {
@@ -4186,10 +3964,12 @@ static void adjust_sit_entry_set(struct sit_entry_set *ses,
return; return;
list_for_each_entry_continue(next, head, set_list) list_for_each_entry_continue(next, head, set_list)
if (ses->entry_cnt <= next->entry_cnt) if (ses->entry_cnt <= next->entry_cnt) {
break;
list_move_tail(&ses->set_list, &next->set_list); list_move_tail(&ses->set_list, &next->set_list);
return;
}
list_move_tail(&ses->set_list, head);
} }
static void add_sit_entry(unsigned int segno, struct list_head *head) static void add_sit_entry(unsigned int segno, struct list_head *head)
@@ -4557,7 +4337,7 @@ static int build_sit_entries(struct f2fs_sb_info *sbi)
unsigned int i, start, end; unsigned int i, start, end;
unsigned int readed, start_blk = 0; unsigned int readed, start_blk = 0;
int err = 0; int err = 0;
block_t total_node_blocks = 0; block_t sit_valid_blocks[2] = {0, 0};
do { do {
readed = f2fs_ra_meta_pages(sbi, start_blk, BIO_MAX_VECS, readed = f2fs_ra_meta_pages(sbi, start_blk, BIO_MAX_VECS,
@@ -4582,8 +4362,8 @@ static int build_sit_entries(struct f2fs_sb_info *sbi)
if (err) if (err)
return err; return err;
seg_info_from_raw_sit(se, &sit); seg_info_from_raw_sit(se, &sit);
if (IS_NODESEG(se->type))
total_node_blocks += se->valid_blocks; sit_valid_blocks[SE_PAGETYPE(se)] += se->valid_blocks;
if (f2fs_block_unit_discard(sbi)) { if (f2fs_block_unit_discard(sbi)) {
/* build discard map only one time */ /* build discard map only one time */
@@ -4623,15 +4403,15 @@ static int build_sit_entries(struct f2fs_sb_info *sbi)
sit = sit_in_journal(journal, i); sit = sit_in_journal(journal, i);
old_valid_blocks = se->valid_blocks; old_valid_blocks = se->valid_blocks;
if (IS_NODESEG(se->type))
total_node_blocks -= old_valid_blocks; sit_valid_blocks[SE_PAGETYPE(se)] -= old_valid_blocks;
err = check_block_count(sbi, start, &sit); err = check_block_count(sbi, start, &sit);
if (err) if (err)
break; break;
seg_info_from_raw_sit(se, &sit); seg_info_from_raw_sit(se, &sit);
if (IS_NODESEG(se->type))
total_node_blocks += se->valid_blocks; sit_valid_blocks[SE_PAGETYPE(se)] += se->valid_blocks;
if (f2fs_block_unit_discard(sbi)) { if (f2fs_block_unit_discard(sbi)) {
if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) { if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) {
@@ -4653,13 +4433,24 @@ static int build_sit_entries(struct f2fs_sb_info *sbi)
} }
up_read(&curseg->journal_rwsem); up_read(&curseg->journal_rwsem);
if (!err && total_node_blocks != valid_node_count(sbi)) { if (err)
return err;
if (sit_valid_blocks[NODE] != valid_node_count(sbi)) {
f2fs_err(sbi, "SIT is corrupted node# %u vs %u", f2fs_err(sbi, "SIT is corrupted node# %u vs %u",
total_node_blocks, valid_node_count(sbi)); sit_valid_blocks[NODE], valid_node_count(sbi));
err = -EFSCORRUPTED; return -EFSCORRUPTED;
} }
return err; if (sit_valid_blocks[DATA] + sit_valid_blocks[NODE] >
valid_user_blocks(sbi)) {
f2fs_err(sbi, "SIT is corrupted data# %u %u vs %u",
sit_valid_blocks[DATA], sit_valid_blocks[NODE],
valid_user_blocks(sbi));
return -EFSCORRUPTED;
}
return 0;
} }
static void init_free_segmap(struct f2fs_sb_info *sbi) static void init_free_segmap(struct f2fs_sb_info *sbi)
@@ -4739,6 +4530,13 @@ static int init_victim_secmap(struct f2fs_sb_info *sbi)
dirty_i->victim_secmap = f2fs_kvzalloc(sbi, bitmap_size, GFP_KERNEL); dirty_i->victim_secmap = f2fs_kvzalloc(sbi, bitmap_size, GFP_KERNEL);
if (!dirty_i->victim_secmap) if (!dirty_i->victim_secmap)
return -ENOMEM; return -ENOMEM;
dirty_i->pinned_secmap = f2fs_kvzalloc(sbi, bitmap_size, GFP_KERNEL);
if (!dirty_i->pinned_secmap)
return -ENOMEM;
dirty_i->pinned_secmap_cnt = 0;
dirty_i->enable_pin_section = true;
return 0; return 0;
} }
@@ -5327,6 +5125,7 @@ static void destroy_victim_secmap(struct f2fs_sb_info *sbi)
{ {
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi); struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
kvfree(dirty_i->pinned_secmap);
kvfree(dirty_i->victim_secmap); kvfree(dirty_i->victim_secmap);
} }
@@ -5437,9 +5236,9 @@ int __init f2fs_create_segment_manager_caches(void)
if (!sit_entry_set_slab) if (!sit_entry_set_slab)
goto destroy_discard_cmd; goto destroy_discard_cmd;
inmem_entry_slab = f2fs_kmem_cache_create("f2fs_inmem_page_entry", revoke_entry_slab = f2fs_kmem_cache_create("f2fs_revoke_entry",
sizeof(struct inmem_pages)); sizeof(struct revoke_entry));
if (!inmem_entry_slab) if (!revoke_entry_slab)
goto destroy_sit_entry_set; goto destroy_sit_entry_set;
return 0; return 0;
@@ -5458,5 +5257,5 @@ void f2fs_destroy_segment_manager_caches(void)
kmem_cache_destroy(sit_entry_set_slab); kmem_cache_destroy(sit_entry_set_slab);
kmem_cache_destroy(discard_cmd_slab); kmem_cache_destroy(discard_cmd_slab);
kmem_cache_destroy(discard_entry_slab); kmem_cache_destroy(discard_entry_slab);
kmem_cache_destroy(inmem_entry_slab); kmem_cache_destroy(revoke_entry_slab);
} }

View File

@@ -24,6 +24,7 @@
#define IS_DATASEG(t) ((t) <= CURSEG_COLD_DATA) #define IS_DATASEG(t) ((t) <= CURSEG_COLD_DATA)
#define IS_NODESEG(t) ((t) >= CURSEG_HOT_NODE && (t) <= CURSEG_COLD_NODE) #define IS_NODESEG(t) ((t) >= CURSEG_HOT_NODE && (t) <= CURSEG_COLD_NODE)
#define SE_PAGETYPE(se) ((IS_NODESEG((se)->type) ? NODE : DATA))
static inline void sanity_check_seg_type(struct f2fs_sb_info *sbi, static inline void sanity_check_seg_type(struct f2fs_sb_info *sbi,
unsigned short seg_type) unsigned short seg_type)
@@ -224,10 +225,10 @@ struct segment_allocation {
#define MAX_SKIP_GC_COUNT 16 #define MAX_SKIP_GC_COUNT 16
struct inmem_pages { struct revoke_entry {
struct list_head list; struct list_head list;
struct page *page;
block_t old_addr; /* for revoking when fail to commit */ block_t old_addr; /* for revoking when fail to commit */
pgoff_t index;
}; };
struct sit_info { struct sit_info {
@@ -294,6 +295,9 @@ struct dirty_seglist_info {
struct mutex seglist_lock; /* lock for segment bitmaps */ struct mutex seglist_lock; /* lock for segment bitmaps */
int nr_dirty[NR_DIRTY_TYPE]; /* # of dirty segments */ int nr_dirty[NR_DIRTY_TYPE]; /* # of dirty segments */
unsigned long *victim_secmap; /* background GC victims */ unsigned long *victim_secmap; /* background GC victims */
unsigned long *pinned_secmap; /* pinned victims from foreground GC */
unsigned int pinned_secmap_cnt; /* count of victims which has pinned data */
bool enable_pin_section; /* enable pinning section */
}; };
/* victim selection function for cleaning and SSR */ /* victim selection function for cleaning and SSR */
@@ -572,11 +576,10 @@ static inline int reserved_sections(struct f2fs_sb_info *sbi)
return GET_SEC_FROM_SEG(sbi, reserved_segments(sbi)); return GET_SEC_FROM_SEG(sbi, reserved_segments(sbi));
} }
static inline bool has_curseg_enough_space(struct f2fs_sb_info *sbi) static inline bool has_curseg_enough_space(struct f2fs_sb_info *sbi,
unsigned int node_blocks, unsigned int dent_blocks)
{ {
unsigned int node_blocks = get_pages(sbi, F2FS_DIRTY_NODES) +
get_pages(sbi, F2FS_DIRTY_DENTS);
unsigned int dent_blocks = get_pages(sbi, F2FS_DIRTY_DENTS);
unsigned int segno, left_blocks; unsigned int segno, left_blocks;
int i; int i;
@@ -602,19 +605,28 @@ static inline bool has_curseg_enough_space(struct f2fs_sb_info *sbi)
static inline bool has_not_enough_free_secs(struct f2fs_sb_info *sbi, static inline bool has_not_enough_free_secs(struct f2fs_sb_info *sbi,
int freed, int needed) int freed, int needed)
{ {
int node_secs = get_blocktype_secs(sbi, F2FS_DIRTY_NODES); unsigned int total_node_blocks = get_pages(sbi, F2FS_DIRTY_NODES) +
int dent_secs = get_blocktype_secs(sbi, F2FS_DIRTY_DENTS); get_pages(sbi, F2FS_DIRTY_DENTS) +
int imeta_secs = get_blocktype_secs(sbi, F2FS_DIRTY_IMETA); get_pages(sbi, F2FS_DIRTY_IMETA);
unsigned int total_dent_blocks = get_pages(sbi, F2FS_DIRTY_DENTS);
unsigned int node_secs = total_node_blocks / BLKS_PER_SEC(sbi);
unsigned int dent_secs = total_dent_blocks / BLKS_PER_SEC(sbi);
unsigned int node_blocks = total_node_blocks % BLKS_PER_SEC(sbi);
unsigned int dent_blocks = total_dent_blocks % BLKS_PER_SEC(sbi);
unsigned int free, need_lower, need_upper;
if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING))) if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
return false; return false;
if (free_sections(sbi) + freed == reserved_sections(sbi) + needed && free = free_sections(sbi) + freed;
has_curseg_enough_space(sbi)) need_lower = node_secs + dent_secs + reserved_sections(sbi) + needed;
need_upper = need_lower + (node_blocks ? 1 : 0) + (dent_blocks ? 1 : 0);
if (free > need_upper)
return false; return false;
return (free_sections(sbi) + freed) <= else if (free <= need_lower)
(node_secs + 2 * dent_secs + imeta_secs + return true;
reserved_sections(sbi) + needed); return !has_curseg_enough_space(sbi, node_blocks, dent_blocks);
} }
static inline bool f2fs_is_checkpoint_ready(struct f2fs_sb_info *sbi) static inline bool f2fs_is_checkpoint_ready(struct f2fs_sb_info *sbi)

View File

@@ -138,7 +138,6 @@ enum {
Opt_jqfmt_vfsold, Opt_jqfmt_vfsold,
Opt_jqfmt_vfsv0, Opt_jqfmt_vfsv0,
Opt_jqfmt_vfsv1, Opt_jqfmt_vfsv1,
Opt_whint,
Opt_alloc, Opt_alloc,
Opt_fsync, Opt_fsync,
Opt_test_dummy_encryption, Opt_test_dummy_encryption,
@@ -214,7 +213,6 @@ static match_table_t f2fs_tokens = {
{Opt_jqfmt_vfsold, "jqfmt=vfsold"}, {Opt_jqfmt_vfsold, "jqfmt=vfsold"},
{Opt_jqfmt_vfsv0, "jqfmt=vfsv0"}, {Opt_jqfmt_vfsv0, "jqfmt=vfsv0"},
{Opt_jqfmt_vfsv1, "jqfmt=vfsv1"}, {Opt_jqfmt_vfsv1, "jqfmt=vfsv1"},
{Opt_whint, "whint_mode=%s"},
{Opt_alloc, "alloc_mode=%s"}, {Opt_alloc, "alloc_mode=%s"},
{Opt_fsync, "fsync_mode=%s"}, {Opt_fsync, "fsync_mode=%s"},
{Opt_test_dummy_encryption, "test_dummy_encryption=%s"}, {Opt_test_dummy_encryption, "test_dummy_encryption=%s"},
@@ -534,10 +532,11 @@ static int f2fs_set_test_dummy_encryption(struct super_block *sb,
return -EINVAL; return -EINVAL;
} }
f2fs_warn(sbi, "Test dummy encryption mode enabled"); f2fs_warn(sbi, "Test dummy encryption mode enabled");
#else
f2fs_warn(sbi, "Test dummy encryption mount option ignored");
#endif
return 0; return 0;
#else
f2fs_warn(sbi, "test_dummy_encryption option not supported");
return -EINVAL;
#endif
} }
#ifdef CONFIG_F2FS_FS_COMPRESSION #ifdef CONFIG_F2FS_FS_COMPRESSION
@@ -982,22 +981,6 @@ static int parse_options(struct super_block *sb, char *options, bool is_remount)
f2fs_info(sbi, "quota operations not supported"); f2fs_info(sbi, "quota operations not supported");
break; break;
#endif #endif
case Opt_whint:
name = match_strdup(&args[0]);
if (!name)
return -ENOMEM;
if (!strcmp(name, "user-based")) {
F2FS_OPTION(sbi).whint_mode = WHINT_MODE_USER;
} else if (!strcmp(name, "off")) {
F2FS_OPTION(sbi).whint_mode = WHINT_MODE_OFF;
} else if (!strcmp(name, "fs-based")) {
F2FS_OPTION(sbi).whint_mode = WHINT_MODE_FS;
} else {
kfree(name);
return -EINVAL;
}
kfree(name);
break;
case Opt_alloc: case Opt_alloc:
name = match_strdup(&args[0]); name = match_strdup(&args[0]);
if (!name) if (!name)
@@ -1335,12 +1318,6 @@ default_check:
return -EINVAL; return -EINVAL;
} }
/* Not pass down write hints if the number of active logs is lesser
* than NR_CURSEG_PERSIST_TYPE.
*/
if (F2FS_OPTION(sbi).active_logs != NR_CURSEG_PERSIST_TYPE)
F2FS_OPTION(sbi).whint_mode = WHINT_MODE_OFF;
if (f2fs_sb_has_readonly(sbi) && !f2fs_readonly(sbi->sb)) { if (f2fs_sb_has_readonly(sbi) && !f2fs_readonly(sbi->sb)) {
f2fs_err(sbi, "Allow to mount readonly mode only"); f2fs_err(sbi, "Allow to mount readonly mode only");
return -EROFS; return -EROFS;
@@ -1366,9 +1343,6 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
spin_lock_init(&fi->i_size_lock); spin_lock_init(&fi->i_size_lock);
INIT_LIST_HEAD(&fi->dirty_list); INIT_LIST_HEAD(&fi->dirty_list);
INIT_LIST_HEAD(&fi->gdirty_list); INIT_LIST_HEAD(&fi->gdirty_list);
INIT_LIST_HEAD(&fi->inmem_ilist);
INIT_LIST_HEAD(&fi->inmem_pages);
mutex_init(&fi->inmem_lock);
init_f2fs_rwsem(&fi->i_gc_rwsem[READ]); init_f2fs_rwsem(&fi->i_gc_rwsem[READ]);
init_f2fs_rwsem(&fi->i_gc_rwsem[WRITE]); init_f2fs_rwsem(&fi->i_gc_rwsem[WRITE]);
init_f2fs_rwsem(&fi->i_xattr_sem); init_f2fs_rwsem(&fi->i_xattr_sem);
@@ -1409,9 +1383,8 @@ static int f2fs_drop_inode(struct inode *inode)
atomic_inc(&inode->i_count); atomic_inc(&inode->i_count);
spin_unlock(&inode->i_lock); spin_unlock(&inode->i_lock);
/* some remained atomic pages should discarded */
if (f2fs_is_atomic_file(inode)) if (f2fs_is_atomic_file(inode))
f2fs_drop_inmem_pages(inode); f2fs_abort_atomic_write(inode, true);
/* should remain fi->extent_tree for writepage */ /* should remain fi->extent_tree for writepage */
f2fs_destroy_extent_node(inode); f2fs_destroy_extent_node(inode);
@@ -1734,18 +1707,23 @@ static int f2fs_statfs(struct dentry *dentry, struct kstatfs *buf)
u64 id = huge_encode_dev(sb->s_bdev->bd_dev); u64 id = huge_encode_dev(sb->s_bdev->bd_dev);
block_t total_count, user_block_count, start_count; block_t total_count, user_block_count, start_count;
u64 avail_node_count; u64 avail_node_count;
unsigned int total_valid_node_count;
total_count = le64_to_cpu(sbi->raw_super->block_count); total_count = le64_to_cpu(sbi->raw_super->block_count);
user_block_count = sbi->user_block_count;
start_count = le32_to_cpu(sbi->raw_super->segment0_blkaddr); start_count = le32_to_cpu(sbi->raw_super->segment0_blkaddr);
buf->f_type = F2FS_SUPER_MAGIC; buf->f_type = F2FS_SUPER_MAGIC;
buf->f_bsize = sbi->blocksize; buf->f_bsize = sbi->blocksize;
buf->f_blocks = total_count - start_count; buf->f_blocks = total_count - start_count;
spin_lock(&sbi->stat_lock);
user_block_count = sbi->user_block_count;
total_valid_node_count = valid_node_count(sbi);
avail_node_count = sbi->total_node_count - F2FS_RESERVED_NODE_NUM;
buf->f_bfree = user_block_count - valid_user_blocks(sbi) - buf->f_bfree = user_block_count - valid_user_blocks(sbi) -
sbi->current_reserved_blocks; sbi->current_reserved_blocks;
spin_lock(&sbi->stat_lock);
if (unlikely(buf->f_bfree <= sbi->unusable_block_count)) if (unlikely(buf->f_bfree <= sbi->unusable_block_count))
buf->f_bfree = 0; buf->f_bfree = 0;
else else
@@ -1758,14 +1736,12 @@ static int f2fs_statfs(struct dentry *dentry, struct kstatfs *buf)
else else
buf->f_bavail = 0; buf->f_bavail = 0;
avail_node_count = sbi->total_node_count - F2FS_RESERVED_NODE_NUM;
if (avail_node_count > user_block_count) { if (avail_node_count > user_block_count) {
buf->f_files = user_block_count; buf->f_files = user_block_count;
buf->f_ffree = buf->f_bavail; buf->f_ffree = buf->f_bavail;
} else { } else {
buf->f_files = avail_node_count; buf->f_files = avail_node_count;
buf->f_ffree = min(avail_node_count - valid_node_count(sbi), buf->f_ffree = min(avail_node_count - total_valid_node_count,
buf->f_bavail); buf->f_bavail);
} }
@@ -1981,10 +1957,6 @@ static int f2fs_show_options(struct seq_file *seq, struct dentry *root)
seq_puts(seq, ",prjquota"); seq_puts(seq, ",prjquota");
#endif #endif
f2fs_show_quota_options(seq, sbi->sb); f2fs_show_quota_options(seq, sbi->sb);
if (F2FS_OPTION(sbi).whint_mode == WHINT_MODE_USER)
seq_printf(seq, ",whint_mode=%s", "user-based");
else if (F2FS_OPTION(sbi).whint_mode == WHINT_MODE_FS)
seq_printf(seq, ",whint_mode=%s", "fs-based");
fscrypt_show_test_dummy_encryption(seq, ',', sbi->sb); fscrypt_show_test_dummy_encryption(seq, ',', sbi->sb);
@@ -2036,7 +2008,6 @@ static void default_options(struct f2fs_sb_info *sbi)
F2FS_OPTION(sbi).active_logs = NR_CURSEG_PERSIST_TYPE; F2FS_OPTION(sbi).active_logs = NR_CURSEG_PERSIST_TYPE;
F2FS_OPTION(sbi).inline_xattr_size = DEFAULT_INLINE_XATTR_ADDRS; F2FS_OPTION(sbi).inline_xattr_size = DEFAULT_INLINE_XATTR_ADDRS;
F2FS_OPTION(sbi).whint_mode = WHINT_MODE_OFF;
F2FS_OPTION(sbi).alloc_mode = ALLOC_MODE_DEFAULT; F2FS_OPTION(sbi).alloc_mode = ALLOC_MODE_DEFAULT;
F2FS_OPTION(sbi).fsync_mode = FSYNC_MODE_POSIX; F2FS_OPTION(sbi).fsync_mode = FSYNC_MODE_POSIX;
F2FS_OPTION(sbi).s_resuid = make_kuid(&init_user_ns, F2FS_DEF_RESUID); F2FS_OPTION(sbi).s_resuid = make_kuid(&init_user_ns, F2FS_DEF_RESUID);
@@ -2087,7 +2058,7 @@ static int f2fs_disable_checkpoint(struct f2fs_sb_info *sbi)
{ {
unsigned int s_flags = sbi->sb->s_flags; unsigned int s_flags = sbi->sb->s_flags;
struct cp_control cpc; struct cp_control cpc;
unsigned int gc_mode; unsigned int gc_mode = sbi->gc_mode;
int err = 0; int err = 0;
int ret; int ret;
block_t unusable; block_t unusable;
@@ -2098,14 +2069,25 @@ static int f2fs_disable_checkpoint(struct f2fs_sb_info *sbi)
} }
sbi->sb->s_flags |= SB_ACTIVE; sbi->sb->s_flags |= SB_ACTIVE;
/* check if we need more GC first */
unusable = f2fs_get_unusable_blocks(sbi);
if (!f2fs_disable_cp_again(sbi, unusable))
goto skip_gc;
f2fs_update_time(sbi, DISABLE_TIME); f2fs_update_time(sbi, DISABLE_TIME);
gc_mode = sbi->gc_mode;
sbi->gc_mode = GC_URGENT_HIGH; sbi->gc_mode = GC_URGENT_HIGH;
while (!f2fs_time_over(sbi, DISABLE_TIME)) { while (!f2fs_time_over(sbi, DISABLE_TIME)) {
struct f2fs_gc_control gc_control = {
.victim_segno = NULL_SEGNO,
.init_gc_type = FG_GC,
.should_migrate_blocks = false,
.err_gc_skipped = true,
.nr_free_secs = 1 };
f2fs_down_write(&sbi->gc_lock); f2fs_down_write(&sbi->gc_lock);
err = f2fs_gc(sbi, true, false, false, NULL_SEGNO); err = f2fs_gc(sbi, &gc_control);
if (err == -ENODATA) { if (err == -ENODATA) {
err = 0; err = 0;
break; break;
@@ -2126,6 +2108,7 @@ static int f2fs_disable_checkpoint(struct f2fs_sb_info *sbi)
goto restore_flag; goto restore_flag;
} }
skip_gc:
f2fs_down_write(&sbi->gc_lock); f2fs_down_write(&sbi->gc_lock);
cpc.reason = CP_PAUSE; cpc.reason = CP_PAUSE;
set_sbi_flag(sbi, SBI_CP_DISABLED); set_sbi_flag(sbi, SBI_CP_DISABLED);
@@ -2152,8 +2135,7 @@ static void f2fs_enable_checkpoint(struct f2fs_sb_info *sbi)
/* we should flush all the data to keep data consistency */ /* we should flush all the data to keep data consistency */
do { do {
sync_inodes_sb(sbi->sb); sync_inodes_sb(sbi->sb);
cond_resched(); f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
congestion_wait(BLK_RW_ASYNC, DEFAULT_IO_TIMEOUT);
} while (get_pages(sbi, F2FS_DIRTY_DATA) && retry--); } while (get_pages(sbi, F2FS_DIRTY_DATA) && retry--);
if (unlikely(retry < 0)) if (unlikely(retry < 0))
@@ -2318,8 +2300,7 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data)
need_stop_gc = true; need_stop_gc = true;
} }
if (*flags & SB_RDONLY || if (*flags & SB_RDONLY) {
F2FS_OPTION(sbi).whint_mode != org_mount_opt.whint_mode) {
sync_inodes_sb(sb); sync_inodes_sb(sb);
set_sbi_flag(sbi, SBI_IS_DIRTY); set_sbi_flag(sbi, SBI_IS_DIRTY);
@@ -2522,8 +2503,7 @@ retry:
&page, &fsdata); &page, &fsdata);
if (unlikely(err)) { if (unlikely(err)) {
if (err == -ENOMEM) { if (err == -ENOMEM) {
congestion_wait(BLK_RW_ASYNC, f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
DEFAULT_IO_TIMEOUT);
goto retry; goto retry;
} }
set_sbi_flag(F2FS_SB(sb), SBI_QUOTA_NEED_REPAIR); set_sbi_flag(F2FS_SB(sb), SBI_QUOTA_NEED_REPAIR);
@@ -2720,6 +2700,7 @@ int f2fs_quota_sync(struct super_block *sb, int type)
if (!sb_has_quota_active(sb, cnt)) if (!sb_has_quota_active(sb, cnt))
continue; continue;
if (!f2fs_sb_has_quota_ino(sbi))
inode_lock(dqopt->files[cnt]); inode_lock(dqopt->files[cnt]);
/* /*
@@ -2739,6 +2720,7 @@ int f2fs_quota_sync(struct super_block *sb, int type)
f2fs_up_read(&sbi->quota_sem); f2fs_up_read(&sbi->quota_sem);
f2fs_unlock_op(sbi); f2fs_unlock_op(sbi);
if (!f2fs_sb_has_quota_ino(sbi))
inode_unlock(dqopt->files[cnt]); inode_unlock(dqopt->files[cnt]);
if (ret) if (ret)
@@ -3684,22 +3666,29 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi)
struct block_device *bdev = FDEV(devi).bdev; struct block_device *bdev = FDEV(devi).bdev;
sector_t nr_sectors = bdev_nr_sectors(bdev); sector_t nr_sectors = bdev_nr_sectors(bdev);
struct f2fs_report_zones_args rep_zone_arg; struct f2fs_report_zones_args rep_zone_arg;
u64 zone_sectors;
int ret; int ret;
if (!f2fs_sb_has_blkzoned(sbi)) if (!f2fs_sb_has_blkzoned(sbi))
return 0; return 0;
if (sbi->blocks_per_blkz && sbi->blocks_per_blkz != zone_sectors = bdev_zone_sectors(bdev);
SECTOR_TO_BLOCK(bdev_zone_sectors(bdev))) if (!is_power_of_2(zone_sectors)) {
f2fs_err(sbi, "F2FS does not support non power of 2 zone sizes\n");
return -EINVAL; return -EINVAL;
sbi->blocks_per_blkz = SECTOR_TO_BLOCK(bdev_zone_sectors(bdev)); }
if (sbi->blocks_per_blkz && sbi->blocks_per_blkz !=
SECTOR_TO_BLOCK(zone_sectors))
return -EINVAL;
sbi->blocks_per_blkz = SECTOR_TO_BLOCK(zone_sectors);
if (sbi->log_blocks_per_blkz && sbi->log_blocks_per_blkz != if (sbi->log_blocks_per_blkz && sbi->log_blocks_per_blkz !=
__ilog2_u32(sbi->blocks_per_blkz)) __ilog2_u32(sbi->blocks_per_blkz))
return -EINVAL; return -EINVAL;
sbi->log_blocks_per_blkz = __ilog2_u32(sbi->blocks_per_blkz); sbi->log_blocks_per_blkz = __ilog2_u32(sbi->blocks_per_blkz);
FDEV(devi).nr_blkz = SECTOR_TO_BLOCK(nr_sectors) >> FDEV(devi).nr_blkz = SECTOR_TO_BLOCK(nr_sectors) >>
sbi->log_blocks_per_blkz; sbi->log_blocks_per_blkz;
if (nr_sectors & (bdev_zone_sectors(bdev) - 1)) if (nr_sectors & (zone_sectors - 1))
FDEV(devi).nr_blkz++; FDEV(devi).nr_blkz++;
FDEV(devi).blkz_seq = f2fs_kvzalloc(sbi, FDEV(devi).blkz_seq = f2fs_kvzalloc(sbi,
@@ -4099,30 +4088,9 @@ try_onemore:
set_sbi_flag(sbi, SBI_POR_DOING); set_sbi_flag(sbi, SBI_POR_DOING);
spin_lock_init(&sbi->stat_lock); spin_lock_init(&sbi->stat_lock);
for (i = 0; i < NR_PAGE_TYPE; i++) { err = f2fs_init_write_merge_io(sbi);
int n = (i == META) ? 1 : NR_TEMP_TYPE; if (err)
int j;
sbi->write_io[i] =
f2fs_kmalloc(sbi,
array_size(n,
sizeof(struct f2fs_bio_info)),
GFP_KERNEL);
if (!sbi->write_io[i]) {
err = -ENOMEM;
goto free_bio_info; goto free_bio_info;
}
for (j = HOT; j < n; j++) {
init_f2fs_rwsem(&sbi->write_io[i][j].io_rwsem);
sbi->write_io[i][j].sbi = sbi;
sbi->write_io[i][j].bio = NULL;
spin_lock_init(&sbi->write_io[i][j].io_lock);
INIT_LIST_HEAD(&sbi->write_io[i][j].io_list);
INIT_LIST_HEAD(&sbi->write_io[i][j].bio_list);
init_f2fs_rwsem(&sbi->write_io[i][j].bio_list_lock);
}
}
init_f2fs_rwsem(&sbi->cp_rwsem); init_f2fs_rwsem(&sbi->cp_rwsem);
init_f2fs_rwsem(&sbi->quota_sem); init_f2fs_rwsem(&sbi->quota_sem);

View File

@@ -128,7 +128,7 @@ static int f2fs_begin_enable_verity(struct file *filp)
if (f2fs_verity_in_progress(inode)) if (f2fs_verity_in_progress(inode))
return -EBUSY; return -EBUSY;
if (f2fs_is_atomic_file(inode) || f2fs_is_volatile_file(inode)) if (f2fs_is_atomic_file(inode))
return -EOPNOTSUPP; return -EOPNOTSUPP;
/* /*

View File

@@ -307,7 +307,9 @@ enum zbc_zone_type {
ZBC_ZONE_TYPE_CONV = 0x1, ZBC_ZONE_TYPE_CONV = 0x1,
ZBC_ZONE_TYPE_SEQWRITE_REQ = 0x2, ZBC_ZONE_TYPE_SEQWRITE_REQ = 0x2,
ZBC_ZONE_TYPE_SEQWRITE_PREF = 0x3, ZBC_ZONE_TYPE_SEQWRITE_PREF = 0x3,
/* 0x4 to 0xf are reserved */ ZBC_ZONE_TYPE_SEQ_OR_BEFORE_REQ = 0x4,
ZBC_ZONE_TYPE_GAP = 0x5,
/* 0x6 to 0xf are reserved */
}; };
/* Zone conditions of REPORT ZONES zone descriptors */ /* Zone conditions of REPORT ZONES zone descriptors */
@@ -323,6 +325,11 @@ enum zbc_zone_cond {
ZBC_ZONE_COND_OFFLINE = 0xf, ZBC_ZONE_COND_OFFLINE = 0xf,
}; };
enum zbc_zone_alignment_method {
ZBC_CONSTANT_ZONE_LENGTH = 0x1,
ZBC_CONSTANT_ZONE_START_OFFSET = 0x8,
};
/* Version descriptor values for INQUIRY */ /* Version descriptor values for INQUIRY */
enum scsi_version_descriptor { enum scsi_version_descriptor {
SCSI_VERSION_DESCRIPTOR_FCP4 = 0x0a40, SCSI_VERSION_DESCRIPTOR_FCP4 = 0x0a40,

View File

@@ -15,10 +15,6 @@ TRACE_DEFINE_ENUM(NODE);
TRACE_DEFINE_ENUM(DATA); TRACE_DEFINE_ENUM(DATA);
TRACE_DEFINE_ENUM(META); TRACE_DEFINE_ENUM(META);
TRACE_DEFINE_ENUM(META_FLUSH); TRACE_DEFINE_ENUM(META_FLUSH);
TRACE_DEFINE_ENUM(INMEM);
TRACE_DEFINE_ENUM(INMEM_DROP);
TRACE_DEFINE_ENUM(INMEM_INVALIDATE);
TRACE_DEFINE_ENUM(INMEM_REVOKE);
TRACE_DEFINE_ENUM(IPU); TRACE_DEFINE_ENUM(IPU);
TRACE_DEFINE_ENUM(OPU); TRACE_DEFINE_ENUM(OPU);
TRACE_DEFINE_ENUM(HOT); TRACE_DEFINE_ENUM(HOT);
@@ -59,10 +55,6 @@ TRACE_DEFINE_ENUM(CP_RESIZE);
{ DATA, "DATA" }, \ { DATA, "DATA" }, \
{ META, "META" }, \ { META, "META" }, \
{ META_FLUSH, "META_FLUSH" }, \ { META_FLUSH, "META_FLUSH" }, \
{ INMEM, "INMEM" }, \
{ INMEM_DROP, "INMEM_DROP" }, \
{ INMEM_INVALIDATE, "INMEM_INVALIDATE" }, \
{ INMEM_REVOKE, "INMEM_REVOKE" }, \
{ IPU, "IN-PLACE" }, \ { IPU, "IN-PLACE" }, \
{ OPU, "OUT-OF-PLACE" }) { OPU, "OUT-OF-PLACE" })
@@ -652,19 +644,22 @@ TRACE_EVENT(f2fs_background_gc,
TRACE_EVENT(f2fs_gc_begin, TRACE_EVENT(f2fs_gc_begin,
TP_PROTO(struct super_block *sb, bool sync, bool background, TP_PROTO(struct super_block *sb, int gc_type, bool no_bg_gc,
unsigned int nr_free_secs,
long long dirty_nodes, long long dirty_dents, long long dirty_nodes, long long dirty_dents,
long long dirty_imeta, unsigned int free_sec, long long dirty_imeta, unsigned int free_sec,
unsigned int free_seg, int reserved_seg, unsigned int free_seg, int reserved_seg,
unsigned int prefree_seg), unsigned int prefree_seg),
TP_ARGS(sb, sync, background, dirty_nodes, dirty_dents, dirty_imeta, TP_ARGS(sb, gc_type, no_bg_gc, nr_free_secs, dirty_nodes,
dirty_dents, dirty_imeta,
free_sec, free_seg, reserved_seg, prefree_seg), free_sec, free_seg, reserved_seg, prefree_seg),
TP_STRUCT__entry( TP_STRUCT__entry(
__field(dev_t, dev) __field(dev_t, dev)
__field(bool, sync) __field(int, gc_type)
__field(bool, background) __field(bool, no_bg_gc)
__field(unsigned int, nr_free_secs)
__field(long long, dirty_nodes) __field(long long, dirty_nodes)
__field(long long, dirty_dents) __field(long long, dirty_dents)
__field(long long, dirty_imeta) __field(long long, dirty_imeta)
@@ -676,8 +671,9 @@ TRACE_EVENT(f2fs_gc_begin,
TP_fast_assign( TP_fast_assign(
__entry->dev = sb->s_dev; __entry->dev = sb->s_dev;
__entry->sync = sync; __entry->gc_type = gc_type;
__entry->background = background; __entry->no_bg_gc = no_bg_gc;
__entry->nr_free_secs = nr_free_secs;
__entry->dirty_nodes = dirty_nodes; __entry->dirty_nodes = dirty_nodes;
__entry->dirty_dents = dirty_dents; __entry->dirty_dents = dirty_dents;
__entry->dirty_imeta = dirty_imeta; __entry->dirty_imeta = dirty_imeta;
@@ -687,12 +683,13 @@ TRACE_EVENT(f2fs_gc_begin,
__entry->prefree_seg = prefree_seg; __entry->prefree_seg = prefree_seg;
), ),
TP_printk("dev = (%d,%d), sync = %d, background = %d, nodes = %lld, " TP_printk("dev = (%d,%d), gc_type = %s, no_background_GC = %d, nr_free_secs = %u, "
"dents = %lld, imeta = %lld, free_sec:%u, free_seg:%u, " "nodes = %lld, dents = %lld, imeta = %lld, free_sec:%u, free_seg:%u, "
"rsv_seg:%d, prefree_seg:%u", "rsv_seg:%d, prefree_seg:%u",
show_dev(__entry->dev), show_dev(__entry->dev),
__entry->sync, show_gc_type(__entry->gc_type),
__entry->background, (__entry->gc_type == BG_GC) ? __entry->no_bg_gc : -1,
__entry->nr_free_secs,
__entry->dirty_nodes, __entry->dirty_nodes,
__entry->dirty_dents, __entry->dirty_dents,
__entry->dirty_imeta, __entry->dirty_imeta,
@@ -1290,20 +1287,6 @@ DEFINE_EVENT(f2fs__page, f2fs_vm_page_mkwrite,
TP_ARGS(page, type) TP_ARGS(page, type)
); );
DEFINE_EVENT(f2fs__page, f2fs_register_inmem_page,
TP_PROTO(struct page *page, int type),
TP_ARGS(page, type)
);
DEFINE_EVENT(f2fs__page, f2fs_commit_inmem_page,
TP_PROTO(struct page *page, int type),
TP_ARGS(page, type)
);
TRACE_EVENT(f2fs_filemap_fault, TRACE_EVENT(f2fs_filemap_fault,
TP_PROTO(struct inode *inode, pgoff_t index, unsigned long ret), TP_PROTO(struct inode *inode, pgoff_t index, unsigned long ret),
@@ -2068,6 +2051,100 @@ TRACE_EVENT(f2fs_fiemap,
__entry->ret) __entry->ret)
); );
DECLARE_EVENT_CLASS(f2fs__rw_start,
TP_PROTO(struct inode *inode, loff_t offset, int bytes,
pid_t pid, char *pathname, char *command),
TP_ARGS(inode, offset, bytes, pid, pathname, command),
TP_STRUCT__entry(
__string(pathbuf, pathname)
__field(loff_t, offset)
__field(int, bytes)
__field(loff_t, i_size)
__string(cmdline, command)
__field(pid_t, pid)
__field(ino_t, ino)
),
TP_fast_assign(
/*
* Replace the spaces in filenames and cmdlines
* because this screws up the tooling that parses
* the traces.
*/
__assign_str(pathbuf, pathname);
(void)strreplace(__get_str(pathbuf), ' ', '_');
__entry->offset = offset;
__entry->bytes = bytes;
__entry->i_size = i_size_read(inode);
__assign_str(cmdline, command);
(void)strreplace(__get_str(cmdline), ' ', '_');
__entry->pid = pid;
__entry->ino = inode->i_ino;
),
TP_printk("entry_name %s, offset %llu, bytes %d, cmdline %s,"
" pid %d, i_size %llu, ino %lu",
__get_str(pathbuf), __entry->offset, __entry->bytes,
__get_str(cmdline), __entry->pid, __entry->i_size,
(unsigned long) __entry->ino)
);
DECLARE_EVENT_CLASS(f2fs__rw_end,
TP_PROTO(struct inode *inode, loff_t offset, int bytes),
TP_ARGS(inode, offset, bytes),
TP_STRUCT__entry(
__field(ino_t, ino)
__field(loff_t, offset)
__field(int, bytes)
),
TP_fast_assign(
__entry->ino = inode->i_ino;
__entry->offset = offset;
__entry->bytes = bytes;
),
TP_printk("ino %lu, offset %llu, bytes %d",
(unsigned long) __entry->ino,
__entry->offset, __entry->bytes)
);
DEFINE_EVENT(f2fs__rw_start, f2fs_dataread_start,
TP_PROTO(struct inode *inode, loff_t offset, int bytes,
pid_t pid, char *pathname, char *command),
TP_ARGS(inode, offset, bytes, pid, pathname, command)
);
DEFINE_EVENT(f2fs__rw_end, f2fs_dataread_end,
TP_PROTO(struct inode *inode, loff_t offset, int bytes),
TP_ARGS(inode, offset, bytes)
);
DEFINE_EVENT(f2fs__rw_start, f2fs_datawrite_start,
TP_PROTO(struct inode *inode, loff_t offset, int bytes,
pid_t pid, char *pathname, char *command),
TP_ARGS(inode, offset, bytes, pid, pathname, command)
);
DEFINE_EVENT(f2fs__rw_end, f2fs_datawrite_end,
TP_PROTO(struct inode *inode, loff_t offset, int bytes),
TP_ARGS(inode, offset, bytes)
);
#endif /* _TRACE_F2FS_H */ #endif /* _TRACE_F2FS_H */
/* This part must be outside protection */ /* This part must be outside protection */