svn commit: r206815 - in stable/8/sys: boot/zfs
cddl/contrib/opensolaris/uts/common/fs/zfs modules/zfs sys
Pawel Jakub Dawidek
pjd at FreeBSD.org
Sun Apr 18 21:36:35 UTC 2010
Author: pjd
Date: Sun Apr 18 21:36:34 2010
New Revision: 206815
URL: http://svn.freebsd.org/changeset/base/206815
Log:
MFC r203504,r204067,r204073,r204101,r204804,r205079,r205080,r205132,r205133,
r205134,r205231,r205253,r205264,r205346,r206051,r206667,r206792,r206793,
r206794,r206795,r206796,r206797:
r203504:
Open provider for writting when we find the right one. Opening too much
providers for writing provokes huge traffic related to taste events send
by GEOM on close. This can lead to various problems with opening GEOM
providers that are created on top of other GEOM providers.
Reorted by: Kurt Touet <ktouet at gmail.com>, mr
Tested by: mr, Baginski Darren <kickbsd at ya.ru>
r204067:
Update comment. We also look for GPT partitions.
r204073:
Add tunable and sysctl to skip hostid check on pool import.
r204101:
Don't set f_bsize to recordsize. It might confuse some software (like squid).
Submitted by: Alexander Zagrebin <alexz at visp.ru>
r204804:
Remove racy assertion.
Reported by: Attila Nagy <bra at fsn.hu>
Obtained from: OpenSolaris, Bug ID 6827260
r205079:
Remove bogus assertion.
Reported by: Johan Ström <johan at stromnet.se>
Obtained from: OpenSolaris, Bug ID 6920880
r205080:
Force commit to correct Bug ID:
Obtained from: OpenSolaris, Bug ID 6920880
r205132:
Don't bottleneck on acquiring the stream locks - this avoids a massive
drop off in throughput with large numbers of simultaneous reads
r205133:
fix compilation under ZIO_USE_UMA
r205134:
make UMA the default allocator for ZFS buffers - this avoids
a great deal of contention in kmem_alloc
r205231:
- reduce contention by breaking up ARC state locks in to 16 for data
and 16 for metadata
- export L2ARC tunables as sysctls
- add several kstats to track L2ARC state more precisely
- avoid holding a contended lock when atomically incrementing a
contended counter (no lock protection needed for atomics)
r205253:
use CACHE_LINE_SIZE instead of hardcoding 128 for lock pad
pointed out by Marius Nuennerich and jhb@
r205264:
- cache line align arcs_lock array (h/t Marius Nuennerich)
- fix ARCS_LOCK_PAD to use architecture defined CACHE_LINE_SIZE
- cache line align buf_hash_table ht_locks array
r205346:
The same code is used to import and to create pool.
The order of operations is the following:
1. Try to open vdev by remembered path and guid.
2. If 1 failed, try to find vdev which guid matches and ignore the path.
3. If 2 failed this means either that the vdev we're looking for is gone
or that pool is being created and vdev doesn't contain proper guid yet.
To be able to handle pool creation we open vdev by path anyway.
Because of 3 it is possible that we open wrong vdev on import which can lead to
confusions.
The solution for this is to check spa_load_state. On pool creation it will be
equal to SPA_LOAD_NONE and we can open vdev only by path immediately and if it
is not equal to SPA_LOAD_NONE we first open by path+guid and when that fails,
we open by guid. We no longer open wrong vdev on import.
r206051:
IOCPARM_MAX defines maximum size of a structure that can be passed
directly to ioctl(2). Because of how ioctl command is build using _IO*()
macros we have only 13 bits to encode structure size. So the structure
can be up to 8kB-1.
Currently we define IOCPARM_MAX as PAGE_SIZE.
This is IMHO wrong for three main reasons:
1. It is confusing on archs with page size larger than 8kB (not really
sure if we support such archs (sparc64?)), as even if PAGE_SIZE is
bigger than 8kB, we won't be able to encode anything larger in ioctl
command.
2. It is a waste. Why the structure can be only 4kB on most archs if we
have 13 bits dedicated for that, not 12?
3. It shouldn't depend on architecture and page size. My ioctl command
can work on one arch, but can't on the other?
Increase IOCPARM_MAX to 8kB and make it independed of PAGE_SIZE and
architecture it is compiled for. This allows to use all the bits on all the
archs for size. Note that this doesn't mean we will copy more on every ioctl(2)
call. No. We still copyin(9)/copyout(9) only exact number of bytes encoded in
ioctl command.
Practical use for this change is ZFS. zfs_cmd_t structure used for ZFS
ioctls is larger than 4kB.
Silence on: arch@
r206667:
Fix 3-way deadlock that can happen because of ZFS and vnode lock
order reversal.
thread0 (vfs_fhtovp) thread1 (vop_getattr) thread2 (zfs_recv)
-------------------- --------------------- ------------------
vn_lock
rrw_enter_read
rrw_enter_write (hangs)
rrw_enter_read (hangs)
vn_lock (hangs)
Reported by: Attila Nagy <bra at fsn.hu>
r206792:
Set ARC_L2_WRITING on L2ARC header creation.
Obtained from: OpenSolaris
r206793:
Remove racy assertion.
Obtained from: OpenSolaris
r206794:
Extend locks scope to match OpenSolaris.
r206795:
Add missing list and lock destruction.
r206796:
Style fixes.
r206797:
Restore previous order.
Modified:
stable/8/sys/boot/zfs/zfs.c
stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c
stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_zfetch.c
stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c
stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c
stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c
stable/8/sys/modules/zfs/Makefile
stable/8/sys/sys/ioccom.h
Directory Properties:
stable/8/sys/ (props changed)
stable/8/sys/amd64/include/xen/ (props changed)
stable/8/sys/cddl/contrib/opensolaris/ (props changed)
stable/8/sys/contrib/dev/acpica/ (props changed)
stable/8/sys/contrib/dev/uath/ (props changed)
stable/8/sys/contrib/pf/ (props changed)
stable/8/sys/dev/xen/xenpci/ (props changed)
Modified: stable/8/sys/boot/zfs/zfs.c
==============================================================================
--- stable/8/sys/boot/zfs/zfs.c Sun Apr 18 21:29:28 2010 (r206814)
+++ stable/8/sys/boot/zfs/zfs.c Sun Apr 18 21:36:34 2010 (r206815)
@@ -397,7 +397,7 @@ zfs_dev_init(void)
/*
* Open all the disks we can find and see if we can reconstruct
* ZFS pools from them. Bogusly assumes that the disks are named
- * diskN or diskNsM.
+ * diskN, diskNpM or diskNsM.
*/
zfs_init();
for (unit = 0; unit < 32 /* XXX */; unit++) {
Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
==============================================================================
--- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c Sun Apr 18 21:29:28 2010 (r206814)
+++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c Sun Apr 18 21:36:34 2010 (r206815)
@@ -186,6 +186,11 @@ SYSCTL_QUAD(_vfs_zfs, OID_AUTO, arc_min,
SYSCTL_INT(_vfs_zfs, OID_AUTO, mdcomp_disable, CTLFLAG_RDTUN,
&zfs_mdcomp_disable, 0, "Disable metadata compression");
+#ifdef ZIO_USE_UMA
+extern kmem_cache_t *zio_buf_cache[];
+extern kmem_cache_t *zio_data_buf_cache[];
+#endif
+
/*
* Note that buffers can be in one of 6 states:
* ARC_anon - anonymous (discussed below)
@@ -218,13 +223,31 @@ SYSCTL_INT(_vfs_zfs, OID_AUTO, mdcomp_di
* second level ARC benefit from these fast lookups.
*/
+#define ARCS_LOCK_PAD CACHE_LINE_SIZE
+struct arcs_lock {
+ kmutex_t arcs_lock;
+#ifdef _KERNEL
+ unsigned char pad[(ARCS_LOCK_PAD - sizeof (kmutex_t))];
+#endif
+};
+
+/*
+ * must be power of two for mask use to work
+ *
+ */
+#define ARC_BUFC_NUMDATALISTS 16
+#define ARC_BUFC_NUMMETADATALISTS 16
+#define ARC_BUFC_NUMLISTS (ARC_BUFC_NUMMETADATALISTS + ARC_BUFC_NUMDATALISTS)
+
typedef struct arc_state {
- list_t arcs_list[ARC_BUFC_NUMTYPES]; /* list of evictable buffers */
uint64_t arcs_lsize[ARC_BUFC_NUMTYPES]; /* amount of evictable data */
uint64_t arcs_size; /* total amount of data in this state */
- kmutex_t arcs_mtx;
+ list_t arcs_lists[ARC_BUFC_NUMLISTS]; /* list of evictable buffers */
+ struct arcs_lock arcs_locks[ARC_BUFC_NUMLISTS] __aligned(CACHE_LINE_SIZE);
} arc_state_t;
+#define ARCS_LOCK(s, i) (&((s)->arcs_locks[(i)].arcs_lock))
+
/* The 6 states: */
static arc_state_t ARC_anon;
static arc_state_t ARC_mru;
@@ -248,7 +271,9 @@ typedef struct arc_stats {
kstat_named_t arcstat_mru_ghost_hits;
kstat_named_t arcstat_mfu_hits;
kstat_named_t arcstat_mfu_ghost_hits;
+ kstat_named_t arcstat_allocated;
kstat_named_t arcstat_deleted;
+ kstat_named_t arcstat_stolen;
kstat_named_t arcstat_recycle_miss;
kstat_named_t arcstat_mutex_miss;
kstat_named_t arcstat_evict_skip;
@@ -280,6 +305,19 @@ typedef struct arc_stats {
kstat_named_t arcstat_l2_size;
kstat_named_t arcstat_l2_hdr_size;
kstat_named_t arcstat_memory_throttle_count;
+ kstat_named_t arcstat_l2_write_trylock_fail;
+ kstat_named_t arcstat_l2_write_passed_headroom;
+ kstat_named_t arcstat_l2_write_spa_mismatch;
+ kstat_named_t arcstat_l2_write_in_l2;
+ kstat_named_t arcstat_l2_write_hdr_io_in_progress;
+ kstat_named_t arcstat_l2_write_not_cacheable;
+ kstat_named_t arcstat_l2_write_full;
+ kstat_named_t arcstat_l2_write_buffer_iter;
+ kstat_named_t arcstat_l2_write_pios;
+ kstat_named_t arcstat_l2_write_bytes_written;
+ kstat_named_t arcstat_l2_write_buffer_bytes_scanned;
+ kstat_named_t arcstat_l2_write_buffer_list_iter;
+ kstat_named_t arcstat_l2_write_buffer_list_null_iter;
} arc_stats_t;
static arc_stats_t arc_stats = {
@@ -297,7 +335,9 @@ static arc_stats_t arc_stats = {
{ "mru_ghost_hits", KSTAT_DATA_UINT64 },
{ "mfu_hits", KSTAT_DATA_UINT64 },
{ "mfu_ghost_hits", KSTAT_DATA_UINT64 },
+ { "allocated", KSTAT_DATA_UINT64 },
{ "deleted", KSTAT_DATA_UINT64 },
+ { "stolen", KSTAT_DATA_UINT64 },
{ "recycle_miss", KSTAT_DATA_UINT64 },
{ "mutex_miss", KSTAT_DATA_UINT64 },
{ "evict_skip", KSTAT_DATA_UINT64 },
@@ -328,7 +368,20 @@ static arc_stats_t arc_stats = {
{ "l2_io_error", KSTAT_DATA_UINT64 },
{ "l2_size", KSTAT_DATA_UINT64 },
{ "l2_hdr_size", KSTAT_DATA_UINT64 },
- { "memory_throttle_count", KSTAT_DATA_UINT64 }
+ { "memory_throttle_count", KSTAT_DATA_UINT64 },
+ { "l2_write_trylock_fail", KSTAT_DATA_UINT64 },
+ { "l2_write_passed_headroom", KSTAT_DATA_UINT64 },
+ { "l2_write_spa_mismatch", KSTAT_DATA_UINT64 },
+ { "l2_write_in_l2", KSTAT_DATA_UINT64 },
+ { "l2_write_io_in_progress", KSTAT_DATA_UINT64 },
+ { "l2_write_not_cacheable", KSTAT_DATA_UINT64 },
+ { "l2_write_full", KSTAT_DATA_UINT64 },
+ { "l2_write_buffer_iter", KSTAT_DATA_UINT64 },
+ { "l2_write_pios", KSTAT_DATA_UINT64 },
+ { "l2_write_bytes_written", KSTAT_DATA_UINT64 },
+ { "l2_write_buffer_bytes_scanned", KSTAT_DATA_UINT64 },
+ { "l2_write_buffer_list_iter", KSTAT_DATA_UINT64 },
+ { "l2_write_buffer_list_null_iter", KSTAT_DATA_UINT64 }
};
#define ARCSTAT(stat) (arc_stats.stat.value.ui64)
@@ -336,7 +389,7 @@ static arc_stats_t arc_stats = {
#define ARCSTAT_INCR(stat, val) \
atomic_add_64(&arc_stats.stat.value.ui64, (val));
-#define ARCSTAT_BUMP(stat) ARCSTAT_INCR(stat, 1)
+#define ARCSTAT_BUMP(stat) ARCSTAT_INCR(stat, 1)
#define ARCSTAT_BUMPDOWN(stat) ARCSTAT_INCR(stat, -1)
#define ARCSTAT_MAX(stat, val) { \
@@ -370,7 +423,7 @@ static arc_stats_t arc_stats = {
}
kstat_t *arc_ksp;
-static arc_state_t *arc_anon;
+static arc_state_t *arc_anon;
static arc_state_t *arc_mru;
static arc_state_t *arc_mru_ghost;
static arc_state_t *arc_mfu;
@@ -514,7 +567,7 @@ static void arc_evict_ghost(arc_state_t
* Hash table routines
*/
-#define HT_LOCK_PAD 128
+#define HT_LOCK_PAD CACHE_LINE_SIZE
struct ht_lock {
kmutex_t ht_lock;
@@ -527,7 +580,7 @@ struct ht_lock {
typedef struct buf_hash_table {
uint64_t ht_mask;
arc_buf_hdr_t **ht_table;
- struct ht_lock ht_locks[BUF_LOCKS];
+ struct ht_lock ht_locks[BUF_LOCKS] __aligned(CACHE_LINE_SIZE);
} buf_hash_table_t;
static buf_hash_table_t buf_hash_table;
@@ -541,13 +594,19 @@ static buf_hash_table_t buf_hash_table;
uint64_t zfs_crc64_table[256];
+#ifdef ZIO_USE_UMA
+extern kmem_cache_t *zio_buf_cache[];
+extern kmem_cache_t *zio_data_buf_cache[];
+#endif
+
/*
* Level 2 ARC
*/
-#define L2ARC_WRITE_SIZE (8 * 1024 * 1024) /* initial write max */
-#define L2ARC_HEADROOM 4 /* num of writes */
+#define L2ARC_WRITE_SIZE (64 * 1024 * 1024) /* initial write max */
+#define L2ARC_HEADROOM 128 /* num of writes */
#define L2ARC_FEED_SECS 1 /* caching interval */
+#define L2ARC_FEED_SECS_SHIFT 1 /* caching interval shift */
#define l2arc_writes_sent ARCSTAT(arcstat_l2_writes_sent)
#define l2arc_writes_done ARCSTAT(arcstat_l2_writes_done)
@@ -559,7 +618,66 @@ uint64_t l2arc_write_max = L2ARC_WRITE_S
uint64_t l2arc_write_boost = L2ARC_WRITE_SIZE; /* extra write during warmup */
uint64_t l2arc_headroom = L2ARC_HEADROOM; /* number of dev writes */
uint64_t l2arc_feed_secs = L2ARC_FEED_SECS; /* interval seconds */
-boolean_t l2arc_noprefetch = B_TRUE; /* don't cache prefetch bufs */
+uint64_t l2arc_feed_secs_shift = L2ARC_FEED_SECS_SHIFT; /* interval seconds shift */
+boolean_t l2arc_noprefetch = B_FALSE; /* don't cache prefetch bufs */
+
+
+SYSCTL_QUAD(_vfs_zfs, OID_AUTO, l2arc_write_max, CTLFLAG_RW,
+ &l2arc_write_max, 0, "max write size");
+SYSCTL_QUAD(_vfs_zfs, OID_AUTO, l2arc_write_boost, CTLFLAG_RW,
+ &l2arc_write_boost, 0, "extra write during warmup");
+SYSCTL_QUAD(_vfs_zfs, OID_AUTO, l2arc_headroom, CTLFLAG_RW,
+ &l2arc_headroom, 0, "number of dev writes");
+SYSCTL_QUAD(_vfs_zfs, OID_AUTO, l2arc_feed_secs, CTLFLAG_RW,
+ &l2arc_feed_secs, 0, "interval seconds");
+SYSCTL_QUAD(_vfs_zfs, OID_AUTO, l2arc_feed_secs_shift, CTLFLAG_RW,
+ &l2arc_feed_secs_shift, 0, "power of 2 division of feed seconds");
+
+SYSCTL_INT(_vfs_zfs, OID_AUTO, l2arc_noprefetch, CTLFLAG_RW,
+ &l2arc_noprefetch, 0, "don't cache prefetch bufs");
+
+
+SYSCTL_QUAD(_vfs_zfs, OID_AUTO, anon_size, CTLFLAG_RD,
+ &ARC_anon.arcs_size, 0, "size of anonymous state");
+SYSCTL_QUAD(_vfs_zfs, OID_AUTO, anon_metadata_lsize, CTLFLAG_RD,
+ &ARC_anon.arcs_lsize[ARC_BUFC_METADATA], 0, "size of anonymous state");
+SYSCTL_QUAD(_vfs_zfs, OID_AUTO, anon_data_lsize, CTLFLAG_RD,
+ &ARC_anon.arcs_lsize[ARC_BUFC_DATA], 0, "size of anonymous state");
+
+SYSCTL_QUAD(_vfs_zfs, OID_AUTO, mru_size, CTLFLAG_RD,
+ &ARC_mru.arcs_size, 0, "size of mru state");
+SYSCTL_QUAD(_vfs_zfs, OID_AUTO, mru_metadata_lsize, CTLFLAG_RD,
+ &ARC_mru.arcs_lsize[ARC_BUFC_METADATA], 0, "size of metadata in mru state");
+SYSCTL_QUAD(_vfs_zfs, OID_AUTO, mru_data_lsize, CTLFLAG_RD,
+ &ARC_mru.arcs_lsize[ARC_BUFC_DATA], 0, "size of data in mru state");
+
+SYSCTL_QUAD(_vfs_zfs, OID_AUTO, mru_ghost_size, CTLFLAG_RD,
+ &ARC_mru_ghost.arcs_size, 0, "size of mru ghost state");
+SYSCTL_QUAD(_vfs_zfs, OID_AUTO, mru_ghost_metadata_lsize, CTLFLAG_RD,
+ &ARC_mru_ghost.arcs_lsize[ARC_BUFC_METADATA], 0,
+ "size of metadata in mru ghost state");
+SYSCTL_QUAD(_vfs_zfs, OID_AUTO, mru_ghost_data_lsize, CTLFLAG_RD,
+ &ARC_mru_ghost.arcs_lsize[ARC_BUFC_DATA], 0,
+ "size of data in mru ghost state");
+
+SYSCTL_QUAD(_vfs_zfs, OID_AUTO, mfu_size, CTLFLAG_RD,
+ &ARC_mfu.arcs_size, 0, "size of mfu state");
+SYSCTL_QUAD(_vfs_zfs, OID_AUTO, mfu_metadata_lsize, CTLFLAG_RD,
+ &ARC_mfu.arcs_lsize[ARC_BUFC_METADATA], 0, "size of metadata in mfu state");
+SYSCTL_QUAD(_vfs_zfs, OID_AUTO, mfu_data_lsize, CTLFLAG_RD,
+ &ARC_mfu.arcs_lsize[ARC_BUFC_DATA], 0, "size of data in mfu state");
+
+SYSCTL_QUAD(_vfs_zfs, OID_AUTO, mfu_ghost_size, CTLFLAG_RD,
+ &ARC_mfu_ghost.arcs_size, 0, "size of mfu ghost state");
+SYSCTL_QUAD(_vfs_zfs, OID_AUTO, mfu_ghost_metadata_lsize, CTLFLAG_RD,
+ &ARC_mfu_ghost.arcs_lsize[ARC_BUFC_METADATA], 0,
+ "size of metadata in mfu ghost state");
+SYSCTL_QUAD(_vfs_zfs, OID_AUTO, mfu_ghost_data_lsize, CTLFLAG_RD,
+ &ARC_mfu_ghost.arcs_lsize[ARC_BUFC_DATA], 0,
+ "size of data in mfu ghost state");
+
+SYSCTL_QUAD(_vfs_zfs, OID_AUTO, l2c_only_size, CTLFLAG_RD,
+ &ARC_l2c_only.arcs_size, 0, "size of mru state");
/*
* L2ARC Internals
@@ -953,18 +1071,38 @@ arc_buf_freeze(arc_buf_t *buf)
}
static void
+get_buf_info(arc_buf_hdr_t *ab, arc_state_t *state, list_t **list, kmutex_t **lock)
+{
+ uint64_t buf_hashid = buf_hash(ab->b_spa, &ab->b_dva, ab->b_birth);
+
+ if (ab->b_type == ARC_BUFC_METADATA)
+ buf_hashid &= (ARC_BUFC_NUMMETADATALISTS - 1);
+ else {
+ buf_hashid &= (ARC_BUFC_NUMDATALISTS - 1);
+ buf_hashid += ARC_BUFC_NUMMETADATALISTS;
+ }
+
+ *list = &state->arcs_lists[buf_hashid];
+ *lock = ARCS_LOCK(state, buf_hashid);
+}
+
+
+static void
add_reference(arc_buf_hdr_t *ab, kmutex_t *hash_lock, void *tag)
{
+
ASSERT(MUTEX_HELD(hash_lock));
if ((refcount_add(&ab->b_refcnt, tag) == 1) &&
(ab->b_state != arc_anon)) {
uint64_t delta = ab->b_size * ab->b_datacnt;
- list_t *list = &ab->b_state->arcs_list[ab->b_type];
uint64_t *size = &ab->b_state->arcs_lsize[ab->b_type];
+ list_t *list;
+ kmutex_t *lock;
- ASSERT(!MUTEX_HELD(&ab->b_state->arcs_mtx));
- mutex_enter(&ab->b_state->arcs_mtx);
+ get_buf_info(ab, ab->b_state, &list, &lock);
+ ASSERT(!MUTEX_HELD(lock));
+ mutex_enter(lock);
ASSERT(list_link_active(&ab->b_arc_node));
list_remove(list, ab);
if (GHOST_STATE(ab->b_state)) {
@@ -975,7 +1113,7 @@ add_reference(arc_buf_hdr_t *ab, kmutex_
ASSERT(delta > 0);
ASSERT3U(*size, >=, delta);
atomic_add_64(size, -delta);
- mutex_exit(&ab->b_state->arcs_mtx);
+ mutex_exit(lock);
/* remove the prefetch flag if we get a reference */
if (ab->b_flags & ARC_PREFETCH)
ab->b_flags &= ~ARC_PREFETCH;
@@ -994,14 +1132,17 @@ remove_reference(arc_buf_hdr_t *ab, kmut
if (((cnt = refcount_remove(&ab->b_refcnt, tag)) == 0) &&
(state != arc_anon)) {
uint64_t *size = &state->arcs_lsize[ab->b_type];
+ list_t *list;
+ kmutex_t *lock;
- ASSERT(!MUTEX_HELD(&state->arcs_mtx));
- mutex_enter(&state->arcs_mtx);
+ get_buf_info(ab, state, &list, &lock);
+ ASSERT(!MUTEX_HELD(lock));
+ mutex_enter(lock);
ASSERT(!list_link_active(&ab->b_arc_node));
- list_insert_head(&state->arcs_list[ab->b_type], ab);
+ list_insert_head(list, ab);
ASSERT(ab->b_datacnt > 0);
atomic_add_64(size, ab->b_size * ab->b_datacnt);
- mutex_exit(&state->arcs_mtx);
+ mutex_exit(lock);
}
return (cnt);
}
@@ -1016,6 +1157,8 @@ arc_change_state(arc_state_t *new_state,
arc_state_t *old_state = ab->b_state;
int64_t refcnt = refcount_count(&ab->b_refcnt);
uint64_t from_delta, to_delta;
+ list_t *list;
+ kmutex_t *lock;
ASSERT(MUTEX_HELD(hash_lock));
ASSERT(new_state != old_state);
@@ -1030,14 +1173,16 @@ arc_change_state(arc_state_t *new_state,
*/
if (refcnt == 0) {
if (old_state != arc_anon) {
- int use_mutex = !MUTEX_HELD(&old_state->arcs_mtx);
+ int use_mutex;
uint64_t *size = &old_state->arcs_lsize[ab->b_type];
+ get_buf_info(ab, old_state, &list, &lock);
+ use_mutex = !MUTEX_HELD(lock);
if (use_mutex)
- mutex_enter(&old_state->arcs_mtx);
+ mutex_enter(lock);
ASSERT(list_link_active(&ab->b_arc_node));
- list_remove(&old_state->arcs_list[ab->b_type], ab);
+ list_remove(list, ab);
/*
* If prefetching out of the ghost cache,
@@ -1052,16 +1197,18 @@ arc_change_state(arc_state_t *new_state,
atomic_add_64(size, -from_delta);
if (use_mutex)
- mutex_exit(&old_state->arcs_mtx);
+ mutex_exit(lock);
}
if (new_state != arc_anon) {
- int use_mutex = !MUTEX_HELD(&new_state->arcs_mtx);
+ int use_mutex;
uint64_t *size = &new_state->arcs_lsize[ab->b_type];
+ get_buf_info(ab, new_state, &list, &lock);
+ use_mutex = !MUTEX_HELD(lock);
if (use_mutex)
- mutex_enter(&new_state->arcs_mtx);
+ mutex_enter(lock);
- list_insert_head(&new_state->arcs_list[ab->b_type], ab);
+ list_insert_head(list, ab);
/* ghost elements have a ghost size */
if (GHOST_STATE(new_state)) {
@@ -1072,7 +1219,7 @@ arc_change_state(arc_state_t *new_state,
atomic_add_64(size, to_delta);
if (use_mutex)
- mutex_exit(&new_state->arcs_mtx);
+ mutex_exit(lock);
}
}
@@ -1462,21 +1609,48 @@ arc_evict(arc_state_t *state, spa_t *spa
{
arc_state_t *evicted_state;
uint64_t bytes_evicted = 0, skipped = 0, missed = 0;
+ int64_t bytes_remaining;
arc_buf_hdr_t *ab, *ab_prev = NULL;
- list_t *list = &state->arcs_list[type];
+ list_t *evicted_list, *list, *evicted_list_start, *list_start;
+ kmutex_t *lock, *evicted_lock;
kmutex_t *hash_lock;
boolean_t have_lock;
void *stolen = NULL;
+ static int evict_metadata_offset, evict_data_offset;
+ int i, idx, offset, list_count, count;
ASSERT(state == arc_mru || state == arc_mfu);
evicted_state = (state == arc_mru) ? arc_mru_ghost : arc_mfu_ghost;
- mutex_enter(&state->arcs_mtx);
- mutex_enter(&evicted_state->arcs_mtx);
+ if (type == ARC_BUFC_METADATA) {
+ offset = 0;
+ list_count = ARC_BUFC_NUMMETADATALISTS;
+ list_start = &state->arcs_lists[0];
+ evicted_list_start = &evicted_state->arcs_lists[0];
+ idx = evict_metadata_offset;
+ } else {
+ offset = ARC_BUFC_NUMMETADATALISTS;
+ list_start = &state->arcs_lists[offset];
+ evicted_list_start = &evicted_state->arcs_lists[offset];
+ list_count = ARC_BUFC_NUMDATALISTS;
+ idx = evict_data_offset;
+ }
+ bytes_remaining = evicted_state->arcs_lsize[type];
+ count = 0;
+
+evict_start:
+ list = &list_start[idx];
+ evicted_list = &evicted_list_start[idx];
+ lock = ARCS_LOCK(state, (offset + idx));
+ evicted_lock = ARCS_LOCK(evicted_state, (offset + idx));
+
+ mutex_enter(lock);
+ mutex_enter(evicted_lock);
for (ab = list_tail(list); ab; ab = ab_prev) {
ab_prev = list_prev(list, ab);
+ bytes_remaining -= (ab->b_size * ab->b_datacnt);
/* prefetch buffers have a minimum lifespan */
if (HDR_IO_IN_PROGRESS(ab) ||
(spa && ab->b_spa != spa) ||
@@ -1536,17 +1710,35 @@ arc_evict(arc_state_t *state, spa_t *spa
mutex_exit(hash_lock);
if (bytes >= 0 && bytes_evicted >= bytes)
break;
+ if (bytes_remaining > 0) {
+ mutex_exit(evicted_lock);
+ mutex_exit(lock);
+ idx = ((idx + 1) & (list_count - 1));
+ count++;
+ goto evict_start;
+ }
} else {
missed += 1;
}
}
- mutex_exit(&evicted_state->arcs_mtx);
- mutex_exit(&state->arcs_mtx);
+ mutex_exit(evicted_lock);
+ mutex_exit(lock);
+
+ idx = ((idx + 1) & (list_count - 1));
+ count++;
- if (bytes_evicted < bytes)
- dprintf("only evicted %lld bytes from %x",
- (longlong_t)bytes_evicted, state);
+ if (bytes_evicted < bytes) {
+ if (count < list_count)
+ goto evict_start;
+ else
+ dprintf("only evicted %lld bytes from %x",
+ (longlong_t)bytes_evicted, state);
+ }
+ if (type == ARC_BUFC_METADATA)
+ evict_metadata_offset = idx;
+ else
+ evict_data_offset = idx;
if (skipped)
ARCSTAT_INCR(arcstat_evict_skip, skipped);
@@ -1574,6 +1766,8 @@ arc_evict(arc_state_t *state, spa_t *spa
arc_evict_ghost(arc_mfu_ghost, NULL, todelete);
}
}
+ if (stolen)
+ ARCSTAT_BUMP(arcstat_stolen);
return (stolen);
}
@@ -1586,14 +1780,28 @@ static void
arc_evict_ghost(arc_state_t *state, spa_t *spa, int64_t bytes)
{
arc_buf_hdr_t *ab, *ab_prev;
- list_t *list = &state->arcs_list[ARC_BUFC_DATA];
- kmutex_t *hash_lock;
+ list_t *list, *list_start;
+ kmutex_t *hash_lock, *lock;
uint64_t bytes_deleted = 0;
uint64_t bufs_skipped = 0;
+ static int evict_offset;
+ int list_count, idx = evict_offset;
+ int offset, count = 0;
ASSERT(GHOST_STATE(state));
-top:
- mutex_enter(&state->arcs_mtx);
+
+ /*
+ * data lists come after metadata lists
+ */
+ list_start = &state->arcs_lists[ARC_BUFC_NUMMETADATALISTS];
+ list_count = ARC_BUFC_NUMDATALISTS;
+ offset = ARC_BUFC_NUMMETADATALISTS;
+
+evict_start:
+ list = &list_start[idx];
+ lock = ARCS_LOCK(state, idx + offset);
+
+ mutex_enter(lock);
for (ab = list_tail(list); ab; ab = ab_prev) {
ab_prev = list_prev(list, ab);
if (spa && ab->b_spa != spa)
@@ -1623,20 +1831,31 @@ top:
break;
} else {
if (bytes < 0) {
- mutex_exit(&state->arcs_mtx);
+ /*
+ * we're draining the ARC, retry
+ */
+ mutex_exit(lock);
mutex_enter(hash_lock);
mutex_exit(hash_lock);
- goto top;
+ goto evict_start;
}
bufs_skipped += 1;
}
}
- mutex_exit(&state->arcs_mtx);
+ mutex_exit(lock);
+ idx = ((idx + 1) & (ARC_BUFC_NUMDATALISTS - 1));
+ count++;
+
+ if (count < list_count)
+ goto evict_start;
- if (list == &state->arcs_list[ARC_BUFC_DATA] &&
+ evict_offset = idx;
+ if ((uintptr_t)list > (uintptr_t)&state->arcs_lists[ARC_BUFC_NUMMETADATALISTS] &&
(bytes < 0 || bytes_deleted < bytes)) {
- list = &state->arcs_list[ARC_BUFC_METADATA];
- goto top;
+ list_start = &state->arcs_lists[0];
+ list_count = ARC_BUFC_NUMMETADATALISTS;
+ offset = count = 0;
+ goto evict_start;
}
if (bufs_skipped) {
@@ -1718,7 +1937,7 @@ arc_do_user_evicts(void)
/*
* Move list over to avoid LOR
*/
-restart:
+restart:
mutex_enter(&arc_eviction_mtx);
tmp_arc_eviction_list = arc_eviction_list;
arc_eviction_list = NULL;
@@ -1750,22 +1969,22 @@ restart:
void
arc_flush(spa_t *spa)
{
- while (list_head(&arc_mru->arcs_list[ARC_BUFC_DATA])) {
+ while (arc_mru->arcs_lsize[ARC_BUFC_DATA]) {
(void) arc_evict(arc_mru, spa, -1, FALSE, ARC_BUFC_DATA);
if (spa)
break;
}
- while (list_head(&arc_mru->arcs_list[ARC_BUFC_METADATA])) {
+ while (arc_mru->arcs_lsize[ARC_BUFC_METADATA]) {
(void) arc_evict(arc_mru, spa, -1, FALSE, ARC_BUFC_METADATA);
if (spa)
break;
}
- while (list_head(&arc_mfu->arcs_list[ARC_BUFC_DATA])) {
+ while (arc_mfu->arcs_lsize[ARC_BUFC_DATA]) {
(void) arc_evict(arc_mfu, spa, -1, FALSE, ARC_BUFC_DATA);
if (spa)
break;
}
- while (list_head(&arc_mfu->arcs_list[ARC_BUFC_METADATA])) {
+ while (arc_mfu->arcs_lsize[ARC_BUFC_METADATA]) {
(void) arc_evict(arc_mfu, spa, -1, FALSE, ARC_BUFC_METADATA);
if (spa)
break;
@@ -1829,7 +2048,7 @@ arc_reclaim_needed(void)
return (0);
/*
- * If pages are needed or we're within 2048 pages
+ * If pages are needed or we're within 2048 pages
* of needing to page need to reclaim
*/
if (vm_pages_needed || (vm_paging_target() > -2048))
@@ -1896,8 +2115,6 @@ arc_kmem_reap_now(arc_reclaim_strategy_t
size_t i;
kmem_cache_t *prev_cache = NULL;
kmem_cache_t *prev_data_cache = NULL;
- extern kmem_cache_t *zio_buf_cache[];
- extern kmem_cache_t *zio_data_buf_cache[];
#endif
#ifdef _KERNEL
@@ -2203,6 +2420,7 @@ out:
arc_anon->arcs_size + arc_mru->arcs_size > arc_p)
arc_p = MIN(arc_c, arc_p + size);
}
+ ARCSTAT_BUMP(arcstat_allocated);
}
/*
@@ -2502,7 +2720,6 @@ arc_read(zio_t *pio, spa_t *spa, blkptr_
uint32_t *arc_flags, const zbookmark_t *zb)
{
int err;
- arc_buf_hdr_t *hdr = pbuf->b_hdr;
ASSERT(!refcount_is_zero(&pbuf->b_hdr->b_refcnt));
ASSERT3U((char *)bp - (char *)pbuf->b_data, <, pbuf->b_hdr->b_size);
@@ -2510,8 +2727,6 @@ arc_read(zio_t *pio, spa_t *spa, blkptr_
err = arc_read_nolock(pio, spa, bp, done, private, priority,
zio_flags, arc_flags, zb);
-
- ASSERT3P(hdr, ==, pbuf->b_hdr);
rw_exit(&pbuf->b_lock);
return (err);
}
@@ -2728,7 +2943,7 @@ top:
* released by l2arc_read_done().
*/
rzio = zio_read_phys(pio, vd, addr, size,
- buf->b_data, ZIO_CHECKSUM_OFF,
+ buf->b_data, ZIO_CHECKSUM_OFF,
l2arc_read_done, cb, priority, zio_flags |
ZIO_FLAG_DONT_CACHE | ZIO_FLAG_CANFAIL |
ZIO_FLAG_DONT_PROPAGATE |
@@ -2823,6 +3038,8 @@ arc_buf_evict(arc_buf_t *buf)
arc_buf_hdr_t *hdr;
kmutex_t *hash_lock;
arc_buf_t **bufp;
+ list_t *list, *evicted_list;
+ kmutex_t *lock, *evicted_lock;
rw_enter(&buf->b_lock, RW_WRITER);
hdr = buf->b_hdr;
@@ -2871,16 +3088,18 @@ arc_buf_evict(arc_buf_t *buf)
evicted_state =
(old_state == arc_mru) ? arc_mru_ghost : arc_mfu_ghost;
- mutex_enter(&old_state->arcs_mtx);
- mutex_enter(&evicted_state->arcs_mtx);
+ get_buf_info(hdr, old_state, &list, &lock);
+ get_buf_info(hdr, evicted_state, &evicted_list, &evicted_lock);
+ mutex_enter(lock);
+ mutex_enter(evicted_lock);
arc_change_state(evicted_state, hdr, hash_lock);
ASSERT(HDR_IN_HASH_TABLE(hdr));
hdr->b_flags |= ARC_IN_HASH_TABLE;
hdr->b_flags &= ~ARC_BUF_AVAILABLE;
- mutex_exit(&evicted_state->arcs_mtx);
- mutex_exit(&old_state->arcs_mtx);
+ mutex_exit(evicted_lock);
+ mutex_exit(lock);
}
mutex_exit(hash_lock);
rw_exit(&buf->b_lock);
@@ -3426,7 +3645,8 @@ void
arc_init(void)
{
int prefetch_tunable_set = 0;
-
+ int i;
+
mutex_init(&arc_reclaim_thr_lock, NULL, MUTEX_DEFAULT, NULL);
cv_init(&arc_reclaim_thr_cv, NULL, CV_DEFAULT, NULL);
mutex_init(&arc_lowmem_lock, NULL, MUTEX_DEFAULT, NULL);
@@ -3494,33 +3714,33 @@ arc_init(void)
arc_l2c_only = &ARC_l2c_only;
arc_size = 0;
- mutex_init(&arc_anon->arcs_mtx, NULL, MUTEX_DEFAULT, NULL);
- mutex_init(&arc_mru->arcs_mtx, NULL, MUTEX_DEFAULT, NULL);
- mutex_init(&arc_mru_ghost->arcs_mtx, NULL, MUTEX_DEFAULT, NULL);
- mutex_init(&arc_mfu->arcs_mtx, NULL, MUTEX_DEFAULT, NULL);
- mutex_init(&arc_mfu_ghost->arcs_mtx, NULL, MUTEX_DEFAULT, NULL);
- mutex_init(&arc_l2c_only->arcs_mtx, NULL, MUTEX_DEFAULT, NULL);
-
- list_create(&arc_mru->arcs_list[ARC_BUFC_METADATA],
- sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node));
- list_create(&arc_mru->arcs_list[ARC_BUFC_DATA],
- sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node));
- list_create(&arc_mru_ghost->arcs_list[ARC_BUFC_METADATA],
- sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node));
- list_create(&arc_mru_ghost->arcs_list[ARC_BUFC_DATA],
- sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node));
- list_create(&arc_mfu->arcs_list[ARC_BUFC_METADATA],
- sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node));
- list_create(&arc_mfu->arcs_list[ARC_BUFC_DATA],
- sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node));
- list_create(&arc_mfu_ghost->arcs_list[ARC_BUFC_METADATA],
- sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node));
- list_create(&arc_mfu_ghost->arcs_list[ARC_BUFC_DATA],
- sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node));
- list_create(&arc_l2c_only->arcs_list[ARC_BUFC_METADATA],
- sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node));
- list_create(&arc_l2c_only->arcs_list[ARC_BUFC_DATA],
- sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node));
+ for (i = 0; i < ARC_BUFC_NUMLISTS; i++) {
+ mutex_init(&arc_anon->arcs_locks[i].arcs_lock,
+ NULL, MUTEX_DEFAULT, NULL);
+ mutex_init(&arc_mru->arcs_locks[i].arcs_lock,
+ NULL, MUTEX_DEFAULT, NULL);
+ mutex_init(&arc_mru_ghost->arcs_locks[i].arcs_lock,
+ NULL, MUTEX_DEFAULT, NULL);
+ mutex_init(&arc_mfu->arcs_locks[i].arcs_lock,
+ NULL, MUTEX_DEFAULT, NULL);
+ mutex_init(&arc_mfu_ghost->arcs_locks[i].arcs_lock,
+ NULL, MUTEX_DEFAULT, NULL);
+ mutex_init(&arc_l2c_only->arcs_locks[i].arcs_lock,
+ NULL, MUTEX_DEFAULT, NULL);
+
+ list_create(&arc_mru->arcs_lists[i],
+ sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node));
+ list_create(&arc_mru_ghost->arcs_lists[i],
+ sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node));
+ list_create(&arc_mfu->arcs_lists[i],
+ sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node));
+ list_create(&arc_mfu_ghost->arcs_lists[i],
+ sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node));
+ list_create(&arc_mfu_ghost->arcs_lists[i],
+ sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node));
+ list_create(&arc_l2c_only->arcs_lists[i],
+ sizeof (arc_buf_hdr_t), offsetof(arc_buf_hdr_t, b_arc_node));
+ }
buf_init();
@@ -3557,7 +3777,7 @@ arc_init(void)
#ifdef _KERNEL
if (TUNABLE_INT_FETCH("vfs.zfs.prefetch_disable", &zfs_prefetch_disable))
prefetch_tunable_set = 1;
-
+
#ifdef __i386__
if (prefetch_tunable_set == 0) {
printf("ZFS NOTICE: Prefetch is disabled by default on i386 "
@@ -3566,7 +3786,7 @@ arc_init(void)
"to /boot/loader.conf.\n");
zfs_prefetch_disable=1;
}
-#else
+#else
if ((((uint64_t)physmem * PAGESIZE) < (1ULL << 32)) &&
prefetch_tunable_set == 0) {
printf("ZFS NOTICE: Prefetch is disabled by default if less "
@@ -3575,7 +3795,7 @@ arc_init(void)
"to /boot/loader.conf.\n");
zfs_prefetch_disable=1;
}
-#endif
+#endif
/* Warn about ZFS memory and address space requirements. */
if (((uint64_t)physmem * PAGESIZE) < (256 + 128 + 64) * (1 << 20)) {
printf("ZFS WARNING: Recommended minimum RAM size is 512MB; "
@@ -3594,6 +3814,7 @@ arc_init(void)
void
arc_fini(void)
{
+ int i;
mutex_enter(&arc_reclaim_thr_lock);
arc_thread_exit = 1;
@@ -3615,20 +3836,20 @@ arc_fini(void)
mutex_destroy(&arc_reclaim_thr_lock);
cv_destroy(&arc_reclaim_thr_cv);
- list_destroy(&arc_mru->arcs_list[ARC_BUFC_METADATA]);
- list_destroy(&arc_mru_ghost->arcs_list[ARC_BUFC_METADATA]);
- list_destroy(&arc_mfu->arcs_list[ARC_BUFC_METADATA]);
- list_destroy(&arc_mfu_ghost->arcs_list[ARC_BUFC_METADATA]);
- list_destroy(&arc_mru->arcs_list[ARC_BUFC_DATA]);
- list_destroy(&arc_mru_ghost->arcs_list[ARC_BUFC_DATA]);
- list_destroy(&arc_mfu->arcs_list[ARC_BUFC_DATA]);
- list_destroy(&arc_mfu_ghost->arcs_list[ARC_BUFC_DATA]);
-
- mutex_destroy(&arc_anon->arcs_mtx);
- mutex_destroy(&arc_mru->arcs_mtx);
- mutex_destroy(&arc_mru_ghost->arcs_mtx);
- mutex_destroy(&arc_mfu->arcs_mtx);
- mutex_destroy(&arc_mfu_ghost->arcs_mtx);
+ for (i = 0; i < ARC_BUFC_NUMLISTS; i++) {
+ list_destroy(&arc_mru->arcs_lists[i]);
+ list_destroy(&arc_mru_ghost->arcs_lists[i]);
+ list_destroy(&arc_mfu->arcs_lists[i]);
+ list_destroy(&arc_mfu_ghost->arcs_lists[i]);
+ list_destroy(&arc_l2c_only->arcs_lists[i]);
+
+ mutex_destroy(&arc_anon->arcs_locks[i].arcs_lock);
+ mutex_destroy(&arc_mru->arcs_locks[i].arcs_lock);
+ mutex_destroy(&arc_mru_ghost->arcs_locks[i].arcs_lock);
+ mutex_destroy(&arc_mfu->arcs_locks[i].arcs_lock);
+ mutex_destroy(&arc_mfu_ghost->arcs_locks[i].arcs_lock);
+ mutex_destroy(&arc_l2c_only->arcs_locks[i].arcs_lock);
+ }
mutex_destroy(&zfs_write_limit_lock);
@@ -4024,26 +4245,27 @@ static list_t *
l2arc_list_locked(int list_num, kmutex_t **lock)
{
list_t *list;
+ int idx;
- ASSERT(list_num >= 0 && list_num <= 3);
+ ASSERT(list_num >= 0 && list_num < 2 * ARC_BUFC_NUMLISTS);
- switch (list_num) {
- case 0:
- list = &arc_mfu->arcs_list[ARC_BUFC_METADATA];
- *lock = &arc_mfu->arcs_mtx;
- break;
- case 1:
- list = &arc_mru->arcs_list[ARC_BUFC_METADATA];
- *lock = &arc_mru->arcs_mtx;
- break;
- case 2:
- list = &arc_mfu->arcs_list[ARC_BUFC_DATA];
- *lock = &arc_mfu->arcs_mtx;
- break;
- case 3:
- list = &arc_mru->arcs_list[ARC_BUFC_DATA];
- *lock = &arc_mru->arcs_mtx;
- break;
+ if (list_num < ARC_BUFC_NUMMETADATALISTS) {
+ idx = list_num;
+ list = &arc_mfu->arcs_lists[idx];
+ *lock = ARCS_LOCK(arc_mfu, idx);
+ } else if (list_num < ARC_BUFC_NUMMETADATALISTS * 2) {
+ idx = list_num - ARC_BUFC_NUMMETADATALISTS;
+ list = &arc_mru->arcs_lists[idx];
+ *lock = ARCS_LOCK(arc_mru, idx);
+ } else if (list_num < (ARC_BUFC_NUMMETADATALISTS * 2 +
+ ARC_BUFC_NUMDATALISTS)) {
+ idx = list_num - ARC_BUFC_NUMMETADATALISTS;
+ list = &arc_mfu->arcs_lists[idx];
+ *lock = ARCS_LOCK(arc_mfu, idx);
+ } else {
+ idx = list_num - ARC_BUFC_NUMLISTS;
+ list = &arc_mru->arcs_lists[idx];
+ *lock = ARCS_LOCK(arc_mru, idx);
}
ASSERT(!(MUTEX_HELD(*lock)));
@@ -4210,13 +4432,15 @@ l2arc_write_buffers(spa_t *spa, l2arc_de
head = kmem_cache_alloc(hdr_cache, KM_PUSHPAGE);
head->b_flags |= ARC_L2_WRITE_HEAD;
+ ARCSTAT_BUMP(arcstat_l2_write_buffer_iter);
/*
* Copy buffers for L2ARC writing.
*/
mutex_enter(&l2arc_buflist_mtx);
- for (try = 0; try <= 3; try++) {
+ for (try = 0; try < 2 * ARC_BUFC_NUMLISTS; try++) {
list = l2arc_list_locked(try, &list_lock);
passed_sz = 0;
+ ARCSTAT_BUMP(arcstat_l2_write_buffer_list_iter);
/*
* L2ARC fast warmup.
@@ -4229,52 +4453,65 @@ l2arc_write_buffers(spa_t *spa, l2arc_de
ab = list_head(list);
else
ab = list_tail(list);
+ if (ab == NULL)
+ ARCSTAT_BUMP(arcstat_l2_write_buffer_list_null_iter);
for (; ab; ab = ab_prev) {
if (arc_warm == B_FALSE)
ab_prev = list_next(list, ab);
else
ab_prev = list_prev(list, ab);
+ ARCSTAT_INCR(arcstat_l2_write_buffer_bytes_scanned, ab->b_size);
hash_lock = HDR_LOCK(ab);
have_lock = MUTEX_HELD(hash_lock);
if (!have_lock && !mutex_tryenter(hash_lock)) {
+ ARCSTAT_BUMP(arcstat_l2_write_trylock_fail);
/*
* Skip this buffer rather than waiting.
*/
continue;
}
+ if (ab->b_l2hdr != NULL) {
+ /*
+ * Already in L2ARC.
+ */
+ mutex_exit(hash_lock);
+ ARCSTAT_BUMP(arcstat_l2_write_in_l2);
+ continue;
+ }
+
passed_sz += ab->b_size;
if (passed_sz > headroom) {
/*
* Searched too far.
*/
mutex_exit(hash_lock);
+ ARCSTAT_BUMP(arcstat_l2_write_passed_headroom);
break;
}
if (ab->b_spa != spa) {
mutex_exit(hash_lock);
+ ARCSTAT_BUMP(arcstat_l2_write_spa_mismatch);
continue;
}
- if (ab->b_l2hdr != NULL) {
- /*
- * Already in L2ARC.
- */
+ if (HDR_IO_IN_PROGRESS(ab)) {
mutex_exit(hash_lock);
+ ARCSTAT_BUMP(arcstat_l2_write_hdr_io_in_progress);
continue;
}
-
- if (HDR_IO_IN_PROGRESS(ab) || !HDR_L2CACHE(ab)) {
+ if (!HDR_L2CACHE(ab)) {
mutex_exit(hash_lock);
+ ARCSTAT_BUMP(arcstat_l2_write_not_cacheable);
continue;
}
-
if ((write_sz + ab->b_size) > target_sz) {
full = B_TRUE;
mutex_exit(hash_lock);
+ ARCSTAT_BUMP(arcstat_l2_write_full);
break;
}
@@ -4298,8 +4535,10 @@ l2arc_write_buffers(spa_t *spa, l2arc_de
cb->l2wcb_head = head;
pio = zio_root(spa, l2arc_write_done, cb,
ZIO_FLAG_CANFAIL);
+ ARCSTAT_BUMP(arcstat_l2_write_pios);
}
+ ARCSTAT_INCR(arcstat_l2_write_bytes_written, ab->b_size);
/*
* Create and add a new L2ARC header.
*/
@@ -4395,7 +4634,7 @@ l2arc_feed_thread(void *dummy __unused)
*/
CALLB_CPR_SAFE_BEGIN(&cpr);
(void) cv_timedwait(&l2arc_feed_thr_cv, &l2arc_feed_thr_lock,
- hz * l2arc_feed_secs);
+ hz * l2arc_feed_secs >> l2arc_feed_secs_shift);
CALLB_CPR_SAFE_END(&cpr, &l2arc_feed_thr_lock);
/*
Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c
==============================================================================
--- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c Sun Apr 18 21:29:28 2010 (r206814)
+++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c Sun Apr 18 21:36:34 2010 (r206815)
@@ -2210,9 +2210,6 @@ dbuf_write_ready(zio_t *zio, arc_buf_t *
for (i = db->db.db_size >> SPA_BLKPTRSHIFT; i > 0; i--, ibp++) {
if (BP_IS_HOLE(ibp))
continue;
- ASSERT3U(BP_GET_LSIZE(ibp), ==,
- db->db_level == 1 ? dn->dn_datablksz :
- (1<<dn->dn_phys->dn_indblkshift));
fill += ibp->blk_fill;
}
}
Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_zfetch.c
==============================================================================
--- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_zfetch.c Sun Apr 18 21:29:28 2010 (r206814)
+++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_zfetch.c Sun Apr 18 21:36:34 2010 (r206815)
@@ -49,11 +49,11 @@ uint32_t zfetch_block_cap = 256;
uint64_t zfetch_array_rd_sz = 1024 * 1024;
SYSCTL_DECL(_vfs_zfs);
-SYSCTL_INT(_vfs_zfs, OID_AUTO, prefetch_disable, CTLFLAG_RDTUN,
+SYSCTL_INT(_vfs_zfs, OID_AUTO, prefetch_disable, CTLFLAG_RW,
&zfs_prefetch_disable, 0, "Disable prefetch");
SYSCTL_NODE(_vfs_zfs, OID_AUTO, zfetch, CTLFLAG_RW, 0, "ZFS ZFETCH");
TUNABLE_INT("vfs.zfs.zfetch.max_streams", &zfetch_max_streams);
-SYSCTL_UINT(_vfs_zfs_zfetch, OID_AUTO, max_streams, CTLFLAG_RDTUN,
+SYSCTL_UINT(_vfs_zfs_zfetch, OID_AUTO, max_streams, CTLFLAG_RW,
&zfetch_max_streams, 0, "Max # of streams per zfetch");
TUNABLE_INT("vfs.zfs.zfetch.min_sec_reap", &zfetch_min_sec_reap);
SYSCTL_UINT(_vfs_zfs_zfetch, OID_AUTO, min_sec_reap, CTLFLAG_RDTUN,
@@ -338,8 +338,10 @@ top:
*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
More information about the svn-src-stable-8
mailing list