aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/staging
diff options
context:
space:
mode:
authorDan Magenheimer <dan.magenheimer@oracle.com>2011-07-07 07:37:19 -0700
committerSimon Shields <keepcalm444@gmail.com>2016-06-12 21:19:30 +1000
commitda55e1b8ab1b32978beaaffe9c18e367da161966 (patch)
tree6c4f9078015c071401b7cc50806f9afe2a40a85b /drivers/staging
parentfc7e733d52309daa17ffed203f3e177fcc2ea255 (diff)
downloadkernel_samsung_smdk4412-da55e1b8ab1b32978beaaffe9c18e367da161966.tar.gz
kernel_samsung_smdk4412-da55e1b8ab1b32978beaaffe9c18e367da161966.tar.bz2
kernel_samsung_smdk4412-da55e1b8ab1b32978beaaffe9c18e367da161966.zip
staging: zcache: support multiple clients, prep for KVM and RAMster
This is version 3 of an update to zcache, incorporating feedback from the list. This patch adds support to the in-kernel transcendent memory ("tmem") code and the zcache driver for multiple clients, which will be needed for both RAMster and KVM support. It also adds additional tmem callbacks to support RAMster and corresponding no-op stubs in the zcache driver. In v2, I've also taken the liberty of adding some additional sysfs variables to both surface information and allow policy control. Those experimenting with zcache should find them useful. V3 clarifies some code walking and declaring arrays. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> [v3: error27@gmail.com: fix array bounds/walking] [v2: konrad.wilk@oracle.com: fix bools, add check for NULL, fix a comment] [v2: sjenning@linux.vnet.ibm.com: add info/tunables for poor compression] [v2: marcusklemm@googlemail.com: add tunable for max persistent pages] Acked-by: Dan Carpenter <error27@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: linux-mm@kvack.org Cc: kvm@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> staging: fix zcache building zcache is only building tmem.c and not building zcache.c. To keep the module name, zcache.c must be renamed if symbols from tmem.c are to remain unexported. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> staging: zcache: module is GPL This avoids tainting the kernel as if a proprietary module was loaded. The kernel will still be tainted because this is a staging driver. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> staging: zcache: include module.h for MODULE_LICENSE The oncoming cleanup of module.h usage requires the explicit inclusion of module.h when it was otherwise being included indirectly. Otherwise, building zcache will fail. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> zcache: Use div_u64 for 64-bit division xv_get_total_size_bytes returns a u64 value and it's used in a division. This causes build failures in 32-bit architectures, as reported by Randy Dunlap. Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Nitin Gupta <ngupta@vflare.org> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> zcache: Fix build error when sysfs is not defined Signed-off-by: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> staging: zcache: fix possible sleep under lock zcache_new_pool() calls kmalloc() with GFP_KERNEL which has __GFP_WAIT set. However, zcache_new_pool() gets called on a stack that holds the swap_lock spinlock, leading to a possible sleep-with-lock situation. The lock is obtained in enable_swap_info(). The patch replaces GFP_KERNEL with GFP_ATOMIC. v2: replace with GFP_ATOMIC, not GFP_IOFS Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> staging: zcache: fix typos The patch fixes two typos in zcache-main.c Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> staging: zcache: fix crash on high memory swap zcache_put_page() was modified to pass page_address(page) instead of the actual page structure. In combination with the function signature changes to tmem_put() and zcache_pampd_create(), zcache_pampd_create() tries to (re)derive the page structure from the virtual address. However, if the original page is a high memory page (or any unmapped page), this virt_to_page() fails because the page_address() in zcache_put_page() returned NULL. This patch changes zcache_put_page() and zcache_get_page() to pass the page structure instead of the page's virtual address, which may or may not exist. Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Staging: zcache: signedness bug in tmem_get() "ret" needs to be signed for the error handling to work properly. Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> staging: zcache: fix cleancache crash After commit c5f5c4db3938 ("staging: zcache: fix crash on high memory swap") cleancache crashes on the first successful get. This was caused by a remaining virt_to_page() call in zcache_pampd_get_data_and_free() that only gets run in the cleancache path. The patch converts the virt_to_page() to struct page casting like was done for other instances in c5f5c4db3938. Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Tested-By: Valdis Kletnieks <valdis.kletnieks@vt.edu> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> staging: zcache: fix crash on cpu remove In the case that a cpu is taken offline before zcache_do_preload() is ever called on the cpu, the per-cpu zcache_preloads structure will be uninitialized. In the CPU_DEAD case for zcache_cpu_notifier(), kp->obj is not checked before calling kmem_cache_free() on it. If it is NULL, a crash results. This patch ensures that both kp->obj and kp->page are not NULL before calling the respective free functions. In practice, just checking one or the other should be sufficient since they are assigned together in zcache_do_preload(), but I check both for safety. Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Acked-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> staging: zcache: reduce tmem bucket lock contention tmem uses hash buckets each with their own rbtree and lock to quickly lookup tmem objects. tmem has TMEM_HASH_BUCKETS (256) buckets per pool. However, because of the way the tmem_oid is generated for frontswap pages, only 16 unique tmem_oids are being generated, resulting in only 16 of the 256 buckets being used. This cause high lock contention for the per bucket locks. This patch changes SWIZ_BITS to include more bits of the offset. The result is that all 256 hash buckets are potentially used resulting in a 95% drop in hash bucket lock contention. Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> staging: zcache: remove zcache_direct_reclaim_lock zcache_do_preload() currently does a spin_trylock() on the zcache_direct_reclaim_lock. Holding this lock intends to prevent shrink_zcache_memory() from evicting zbud pages as a result of a preload. However, it also prevents two threads from executing zcache_do_preload() at the same time. The first thread will obtain the lock and the second thread's spin_trylock() will fail (an aborted preload) causing the page to be either lost (cleancache) or pushed out to the swap device (frontswap). It also doesn't ensure that the call to shrink_zcache_memory() is on the same thread as the call to zcache_do_preload(). Additional, there is no need for this mechanism because all zcache_do_preload() calls that come down from cleancache already have PF_MEMALLOC set in the process flags which prevents direct reclaim in the memory manager. If the zcache_do_preload() call is done from the frontswap path, we _want_ reclaim to be done (which it isn't right now). This patch removes the zcache_direct_reclaim_lock and related statistics in zcache. Based on v3.1-rc8 Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Reviewed-by: Dave Hansen <dave@linux.vnet.ibm.com> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Staging: zcache: Fix calls to obsolete function Function "strict_strtol" replaced by "kstrtol" as suggested by the checkpatch script Signed-off-by: Bernhard Heinloth <bernhard@heinloth.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> zcache: fix deadlock condition I discovered this deadlock condition awhile ago working on RAMster but it affects zcache as well. The list spinlock must be locked prior to the page spinlock and released after. As a result, the page copy must also be done while the locks are held. Applies to 3.2. Konrad, please push (via GregKH?)... this is definitely a bug fix so need not be pushed during a -rc0 window. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> zcache: Set SWIZ_BITS to 8 to reduce tmem bucket lock contention. SWIZ_BITS > 8 results in a much larger number of "tmem_obj" allocations, likely one per page-placed-in-frontswap. The tmem_obj is not huge (roughly 100 bytes), but it is large enough to add a not-insignificant memory overhead to zcache. The SWIZ_BITS=8 will get roughly the same lock contention without the space wastage. The effect of SWIZ_BITS can be thought of as "2^SWIZ_BITS is the number of unique oids that be generated" (This concept is limited to frontswap's use of tmem). Acked-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> staging: zcache: fix serialization bug in zv stats In a multithreaded workload, the zv_curr_dist_counts and zv_cumul_dist_counts statistics are being corrupted because the increments and decrements in zv_create and zv_free are not atomic. This patch converts these statistics and their corresponding increments/decrements/reads to atomic operations. Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> staging: zcache: crypto API support This patch allow zcache to use the crypto API for page compression. It replaces the direct LZO compress/decompress calls with calls into the crypto compression API. The compressor to be used is specified in the kernel boot line with the zcache parameter like: zcache=lzo or zcache=deflate. If the specified compressor can't be loaded, zcache uses lzo as the default compressor. Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> staging: zcache: avoid AB-BA deadlock condition Commit 9256a47 fixed a deadlock condition, being sure that the buddy list spinlock is always taken before the page spinlock. However in zbud_free_and_delist() locking order is the opposite (page lock -> list lock). Possible unsafe locking scenario (reported by lockdep): CPU0 CPU1 ---- ---- lock(&(&zbpg->lock)->rlock); lock(zbud_budlists_spinlock); lock(&(&zbpg->lock)->rlock); lock(zbud_budlists_spinlock); Fix by grabbing the locks in opposite order in zbud_free_and_delist(). Signed-off-by: Andrea Righi <andrea@betterlinux.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Diffstat (limited to 'drivers/staging')
-rw-r--r--drivers/staging/zcache/Kconfig7
-rw-r--r--drivers/staging/zcache/Makefile2
-rw-r--r--drivers/staging/zcache/tmem.c102
-rw-r--r--drivers/staging/zcache/tmem.h23
-rw-r--r--drivers/staging/zcache/zcache-main.c (renamed from drivers/staging/zcache/zcache.c)738
5 files changed, 682 insertions, 190 deletions
diff --git a/drivers/staging/zcache/Kconfig b/drivers/staging/zcache/Kconfig
index 7fabcb2bc80..1b7bba7b1a3 100644
--- a/drivers/staging/zcache/Kconfig
+++ b/drivers/staging/zcache/Kconfig
@@ -1,13 +1,12 @@
config ZCACHE
tristate "Dynamic compression of swap pages and clean pagecache pages"
- depends on CLEANCACHE || FRONTSWAP
+ depends on (CLEANCACHE || FRONTSWAP) && CRYPTO
select XVMALLOC
- select LZO_COMPRESS
- select LZO_DECOMPRESS
+ select CRYPTO_LZO
default n
help
Zcache doubles RAM efficiency while providing a significant
- performance boosts on many workloads. Zcache uses lzo1x
+ performance boosts on many workloads. Zcache uses
compression and an in-kernel implementation of transcendent
memory to store clean page cache pages and swap in RAM,
providing a noticeable reduction in disk I/O.
diff --git a/drivers/staging/zcache/Makefile b/drivers/staging/zcache/Makefile
index f5ec64f9447..60daa272c20 100644
--- a/drivers/staging/zcache/Makefile
+++ b/drivers/staging/zcache/Makefile
@@ -1,3 +1,3 @@
-zcache-y := tmem.o
+zcache-y := zcache-main.o tmem.o
obj-$(CONFIG_ZCACHE) += zcache.o
diff --git a/drivers/staging/zcache/tmem.c b/drivers/staging/zcache/tmem.c
index e954d405b13..1ca66ea9b28 100644
--- a/drivers/staging/zcache/tmem.c
+++ b/drivers/staging/zcache/tmem.c
@@ -142,6 +142,7 @@ static void tmem_obj_init(struct tmem_obj *obj, struct tmem_hashbucket *hb,
obj->oid = *oidp;
obj->objnode_count = 0;
obj->pampd_count = 0;
+ (*tmem_pamops.new_obj)(obj);
SET_SENTINEL(obj, OBJ);
while (*new) {
BUG_ON(RB_EMPTY_NODE(*new));
@@ -274,7 +275,7 @@ static void tmem_objnode_free(struct tmem_objnode *objnode)
/*
* lookup index in object and return associated pampd (or NULL if not found)
*/
-static void *tmem_pampd_lookup_in_obj(struct tmem_obj *obj, uint32_t index)
+static void **__tmem_pampd_lookup_in_obj(struct tmem_obj *obj, uint32_t index)
{
unsigned int height, shift;
struct tmem_objnode **slot = NULL;
@@ -303,9 +304,33 @@ static void *tmem_pampd_lookup_in_obj(struct tmem_obj *obj, uint32_t index)
height--;
}
out:
+ return slot != NULL ? (void **)slot : NULL;
+}
+
+static void *tmem_pampd_lookup_in_obj(struct tmem_obj *obj, uint32_t index)
+{
+ struct tmem_objnode **slot;
+
+ slot = (struct tmem_objnode **)__tmem_pampd_lookup_in_obj(obj, index);
return slot != NULL ? *slot : NULL;
}
+static void *tmem_pampd_replace_in_obj(struct tmem_obj *obj, uint32_t index,
+ void *new_pampd)
+{
+ struct tmem_objnode **slot;
+ void *ret = NULL;
+
+ slot = (struct tmem_objnode **)__tmem_pampd_lookup_in_obj(obj, index);
+ if ((slot != NULL) && (*slot != NULL)) {
+ void *old_pampd = *(void **)slot;
+ *(void **)slot = new_pampd;
+ (*tmem_pamops.free)(old_pampd, obj->pool, NULL, 0);
+ ret = new_pampd;
+ }
+ return ret;
+}
+
static int tmem_pampd_add_to_obj(struct tmem_obj *obj, uint32_t index,
void *pampd)
{
@@ -456,7 +481,7 @@ static void tmem_objnode_node_destroy(struct tmem_obj *obj,
if (ht == 1) {
obj->pampd_count--;
(*tmem_pamops.free)(objnode->slots[i],
- obj->pool);
+ obj->pool, NULL, 0);
objnode->slots[i] = NULL;
continue;
}
@@ -473,7 +498,7 @@ static void tmem_pampd_destroy_all_in_obj(struct tmem_obj *obj)
return;
if (obj->objnode_tree_height == 0) {
obj->pampd_count--;
- (*tmem_pamops.free)(obj->objnode_tree_root, obj->pool);
+ (*tmem_pamops.free)(obj->objnode_tree_root, obj->pool, NULL, 0);
} else {
tmem_objnode_node_destroy(obj, obj->objnode_tree_root,
obj->objnode_tree_height);
@@ -481,6 +506,7 @@ static void tmem_pampd_destroy_all_in_obj(struct tmem_obj *obj)
obj->objnode_tree_height = 0;
}
obj->objnode_tree_root = NULL;
+ (*tmem_pamops.free_obj)(obj->pool, obj);
}
/*
@@ -503,15 +529,13 @@ static void tmem_pampd_destroy_all_in_obj(struct tmem_obj *obj)
* always flushes for simplicity.
*/
int tmem_put(struct tmem_pool *pool, struct tmem_oid *oidp, uint32_t index,
- struct page *page)
+ char *data, size_t size, bool raw, bool ephemeral)
{
struct tmem_obj *obj = NULL, *objfound = NULL, *objnew = NULL;
void *pampd = NULL, *pampd_del = NULL;
int ret = -ENOMEM;
- bool ephemeral;
struct tmem_hashbucket *hb;
- ephemeral = is_ephemeral(pool);
hb = &pool->hashbucket[tmem_oid_hash(oidp)];
spin_lock(&hb->lock);
obj = objfound = tmem_obj_find(hb, oidp);
@@ -521,7 +545,7 @@ int tmem_put(struct tmem_pool *pool, struct tmem_oid *oidp, uint32_t index,
/* if found, is a dup put, flush the old one */
pampd_del = tmem_pampd_delete_from_obj(obj, index);
BUG_ON(pampd_del != pampd);
- (*tmem_pamops.free)(pampd, pool);
+ (*tmem_pamops.free)(pampd, pool, oidp, index);
if (obj->pampd_count == 0) {
objnew = obj;
objfound = NULL;
@@ -538,7 +562,8 @@ int tmem_put(struct tmem_pool *pool, struct tmem_oid *oidp, uint32_t index,
}
BUG_ON(obj == NULL);
BUG_ON(((objnew != obj) && (objfound != obj)) || (objnew == objfound));
- pampd = (*tmem_pamops.create)(obj->pool, &obj->oid, index, page);
+ pampd = (*tmem_pamops.create)(data, size, raw, ephemeral,
+ obj->pool, &obj->oid, index);
if (unlikely(pampd == NULL))
goto free;
ret = tmem_pampd_add_to_obj(obj, index, pampd);
@@ -551,7 +576,7 @@ delete_and_free:
(void)tmem_pampd_delete_from_obj(obj, index);
free:
if (pampd)
- (*tmem_pamops.free)(pampd, pool);
+ (*tmem_pamops.free)(pampd, pool, NULL, 0);
if (objnew) {
tmem_obj_free(objnew, hb);
(*tmem_hostops.obj_free)(objnew, pool);
@@ -573,41 +598,52 @@ out:
* "put" done with the same handle).
*/
-int tmem_get(struct tmem_pool *pool, struct tmem_oid *oidp,
- uint32_t index, struct page *page)
+int tmem_get(struct tmem_pool *pool, struct tmem_oid *oidp, uint32_t index,
+ char *data, size_t *size, bool raw, int get_and_free)
{
struct tmem_obj *obj;
void *pampd;
bool ephemeral = is_ephemeral(pool);
- uint32_t ret = -1;
+ int ret = -1;
struct tmem_hashbucket *hb;
+ bool free = (get_and_free == 1) || ((get_and_free == 0) && ephemeral);
+ bool lock_held = false;
hb = &pool->hashbucket[tmem_oid_hash(oidp)];
spin_lock(&hb->lock);
+ lock_held = true;
obj = tmem_obj_find(hb, oidp);
if (obj == NULL)
goto out;
- ephemeral = is_ephemeral(pool);
- if (ephemeral)
+ if (free)
pampd = tmem_pampd_delete_from_obj(obj, index);
else
pampd = tmem_pampd_lookup_in_obj(obj, index);
if (pampd == NULL)
goto out;
- ret = (*tmem_pamops.get_data)(page, pampd, pool);
- if (ret < 0)
- goto out;
- if (ephemeral) {
- (*tmem_pamops.free)(pampd, pool);
+ if (free) {
if (obj->pampd_count == 0) {
tmem_obj_free(obj, hb);
(*tmem_hostops.obj_free)(obj, pool);
obj = NULL;
}
}
+ if (tmem_pamops.is_remote(pampd)) {
+ lock_held = false;
+ spin_unlock(&hb->lock);
+ }
+ if (free)
+ ret = (*tmem_pamops.get_data_and_free)(
+ data, size, raw, pampd, pool, oidp, index);
+ else
+ ret = (*tmem_pamops.get_data)(
+ data, size, raw, pampd, pool, oidp, index);
+ if (ret < 0)
+ goto out;
ret = 0;
out:
- spin_unlock(&hb->lock);
+ if (lock_held)
+ spin_unlock(&hb->lock);
return ret;
}
@@ -632,7 +668,7 @@ int tmem_flush_page(struct tmem_pool *pool,
pampd = tmem_pampd_delete_from_obj(obj, index);
if (pampd == NULL)
goto out;
- (*tmem_pamops.free)(pampd, pool);
+ (*tmem_pamops.free)(pampd, pool, oidp, index);
if (obj->pampd_count == 0) {
tmem_obj_free(obj, hb);
(*tmem_hostops.obj_free)(obj, pool);
@@ -645,6 +681,30 @@ out:
}
/*
+ * If a page in tmem matches the handle, replace the page so that any
+ * subsequent "get" gets the new page. Returns 0 if
+ * there was a page to replace, else returns -1.
+ */
+int tmem_replace(struct tmem_pool *pool, struct tmem_oid *oidp,
+ uint32_t index, void *new_pampd)
+{
+ struct tmem_obj *obj;
+ int ret = -1;
+ struct tmem_hashbucket *hb;
+
+ hb = &pool->hashbucket[tmem_oid_hash(oidp)];
+ spin_lock(&hb->lock);
+ obj = tmem_obj_find(hb, oidp);
+ if (obj == NULL)
+ goto out;
+ new_pampd = tmem_pampd_replace_in_obj(obj, index, new_pampd);
+ ret = (*tmem_pamops.replace_in_obj)(new_pampd, obj);
+out:
+ spin_unlock(&hb->lock);
+ return ret;
+}
+
+/*
* "Flush" all pages in tmem matching this oid.
*/
int tmem_flush_object(struct tmem_pool *pool, struct tmem_oid *oidp)
diff --git a/drivers/staging/zcache/tmem.h b/drivers/staging/zcache/tmem.h
index 2e07e217d51..ed147c4b110 100644
--- a/drivers/staging/zcache/tmem.h
+++ b/drivers/staging/zcache/tmem.h
@@ -147,6 +147,7 @@ struct tmem_obj {
unsigned int objnode_tree_height;
unsigned long objnode_count;
long pampd_count;
+ void *extra; /* for private use by pampd implementation */
DECL_SENTINEL
};
@@ -166,10 +167,18 @@ struct tmem_objnode {
/* pampd abstract datatype methods provided by the PAM implementation */
struct tmem_pamops {
- void *(*create)(struct tmem_pool *, struct tmem_oid *, uint32_t,
- struct page *);
- int (*get_data)(struct page *, void *, struct tmem_pool *);
- void (*free)(void *, struct tmem_pool *);
+ void *(*create)(char *, size_t, bool, int,
+ struct tmem_pool *, struct tmem_oid *, uint32_t);
+ int (*get_data)(char *, size_t *, bool, void *, struct tmem_pool *,
+ struct tmem_oid *, uint32_t);
+ int (*get_data_and_free)(char *, size_t *, bool, void *,
+ struct tmem_pool *, struct tmem_oid *,
+ uint32_t);
+ void (*free)(void *, struct tmem_pool *, struct tmem_oid *, uint32_t);
+ void (*free_obj)(struct tmem_pool *, struct tmem_obj *);
+ bool (*is_remote)(void *);
+ void (*new_obj)(struct tmem_obj *);
+ int (*replace_in_obj)(void *, struct tmem_obj *);
};
extern void tmem_register_pamops(struct tmem_pamops *m);
@@ -184,9 +193,11 @@ extern void tmem_register_hostops(struct tmem_hostops *m);
/* core tmem accessor functions */
extern int tmem_put(struct tmem_pool *, struct tmem_oid *, uint32_t index,
- struct page *page);
+ char *, size_t, bool, bool);
extern int tmem_get(struct tmem_pool *, struct tmem_oid *, uint32_t index,
- struct page *page);
+ char *, size_t *, bool, int);
+extern int tmem_replace(struct tmem_pool *, struct tmem_oid *, uint32_t index,
+ void *);
extern int tmem_flush_page(struct tmem_pool *, struct tmem_oid *,
uint32_t index);
extern int tmem_flush_object(struct tmem_pool *, struct tmem_oid *);
diff --git a/drivers/staging/zcache/zcache.c b/drivers/staging/zcache/zcache-main.c
index 77ac2d4d3ef..276940d11ba 100644
--- a/drivers/staging/zcache/zcache.c
+++ b/drivers/staging/zcache/zcache-main.c
@@ -6,7 +6,8 @@
*
* Zcache provides an in-kernel "host implementation" for transcendent memory
* and, thus indirectly, for cleancache and frontswap. Zcache includes two
- * page-accessible memory [1] interfaces, both utilizing lzo1x compression:
+ * page-accessible memory [1] interfaces, both utilizing the crypto compression
+ * API:
* 1) "compression buddies" ("zbud") is used for ephemeral pages
* 2) xvmalloc is used for persistent pages.
* Xvmalloc (based on the TLSF allocator) has very low fragmentation
@@ -19,14 +20,17 @@
* http://marc.info/?l=linux-mm&m=127811271605009
*/
+#include <linux/module.h>
#include <linux/cpu.h>
#include <linux/highmem.h>
#include <linux/list.h>
-#include <linux/lzo.h>
#include <linux/slab.h>
#include <linux/spinlock.h>
#include <linux/types.h>
#include <linux/atomic.h>
+#include <linux/math64.h>
+#include <linux/crypto.h>
+#include <linux/string.h>
#include "tmem.h"
#include "../zram/xvmalloc.h" /* if built in drivers/staging */
@@ -49,6 +53,68 @@
(__GFP_FS | __GFP_NORETRY | __GFP_NOWARN | __GFP_NOMEMALLOC)
#endif
+#define MAX_POOLS_PER_CLIENT 16
+
+#define MAX_CLIENTS 16
+#define LOCAL_CLIENT ((uint16_t)-1)
+
+MODULE_LICENSE("GPL");
+
+struct zcache_client {
+ struct tmem_pool *tmem_pools[MAX_POOLS_PER_CLIENT];
+ struct xv_pool *xvpool;
+ bool allocated;
+ atomic_t refcount;
+};
+
+static struct zcache_client zcache_host;
+static struct zcache_client zcache_clients[MAX_CLIENTS];
+
+static inline uint16_t get_client_id_from_client(struct zcache_client *cli)
+{
+ BUG_ON(cli == NULL);
+ if (cli == &zcache_host)
+ return LOCAL_CLIENT;
+ return cli - &zcache_clients[0];
+}
+
+static inline bool is_local_client(struct zcache_client *cli)
+{
+ return cli == &zcache_host;
+}
+
+/* crypto API for zcache */
+#define ZCACHE_COMP_NAME_SZ CRYPTO_MAX_ALG_NAME
+static char zcache_comp_name[ZCACHE_COMP_NAME_SZ];
+static struct crypto_comp * __percpu *zcache_comp_pcpu_tfms;
+
+enum comp_op {
+ ZCACHE_COMPOP_COMPRESS,
+ ZCACHE_COMPOP_DECOMPRESS
+};
+
+static inline int zcache_comp_op(enum comp_op op,
+ const u8 *src, unsigned int slen,
+ u8 *dst, unsigned int *dlen)
+{
+ struct crypto_comp *tfm;
+ int ret;
+
+ BUG_ON(!zcache_comp_pcpu_tfms);
+ tfm = *per_cpu_ptr(zcache_comp_pcpu_tfms, get_cpu());
+ BUG_ON(!tfm);
+ switch (op) {
+ case ZCACHE_COMPOP_COMPRESS:
+ ret = crypto_comp_compress(tfm, src, slen, dst, dlen);
+ break;
+ case ZCACHE_COMPOP_DECOMPRESS:
+ ret = crypto_comp_decompress(tfm, src, slen, dst, dlen);
+ break;
+ }
+ put_cpu();
+ return ret;
+}
+
/**********
* Compression buddies ("zbud") provides for packing two (or, possibly
* in the future, more) compressed ephemeral pages into a single "raw"
@@ -72,7 +138,8 @@
#define ZBUD_MAX_BUDS 2
struct zbud_hdr {
- uint32_t pool_id;
+ uint16_t client_id;
+ uint16_t pool_id;
struct tmem_oid oid;
uint32_t index;
uint16_t size; /* compressed size in bytes, zero means unused */
@@ -120,6 +187,7 @@ static unsigned long zcache_zbud_curr_zbytes;
static unsigned long zcache_zbud_cumul_zpages;
static unsigned long zcache_zbud_cumul_zbytes;
static unsigned long zcache_compress_poor;
+static unsigned long zcache_mean_compress_poor;
/* forward references */
static void *zcache_get_free_page(void);
@@ -265,10 +333,12 @@ static void zbud_free_and_delist(struct zbud_hdr *zh)
struct zbud_page *zbpg =
container_of(zh, struct zbud_page, buddy[budnum]);
+ spin_lock(&zbud_budlists_spinlock);
spin_lock(&zbpg->lock);
if (list_empty(&zbpg->bud_list)) {
/* ignore zombie page... see zbud_evict_pages() */
spin_unlock(&zbpg->lock);
+ spin_unlock(&zbud_budlists_spinlock);
return;
}
size = zbud_free(zh);
@@ -276,7 +346,6 @@ static void zbud_free_and_delist(struct zbud_hdr *zh)
zh_other = &zbpg->buddy[(budnum == 0) ? 1 : 0];
if (zh_other->size == 0) { /* was unbuddied: unlist and free */
chunks = zbud_size_to_chunks(size) ;
- spin_lock(&zbud_budlists_spinlock);
BUG_ON(list_empty(&zbud_unbuddied[chunks].list));
list_del_init(&zbpg->bud_list);
zbud_unbuddied[chunks].count--;
@@ -284,7 +353,6 @@ static void zbud_free_and_delist(struct zbud_hdr *zh)
zbud_free_raw_page(zbpg);
} else { /* was buddied: move remaining buddy to unbuddied list */
chunks = zbud_size_to_chunks(zh_other->size) ;
- spin_lock(&zbud_budlists_spinlock);
list_del_init(&zbpg->bud_list);
zcache_zbud_buddied_count--;
list_add_tail(&zbpg->bud_list, &zbud_unbuddied[chunks].list);
@@ -294,7 +362,8 @@ static void zbud_free_and_delist(struct zbud_hdr *zh)
}
}
-static struct zbud_hdr *zbud_create(uint32_t pool_id, struct tmem_oid *oid,
+static struct zbud_hdr *zbud_create(uint16_t client_id, uint16_t pool_id,
+ struct tmem_oid *oid,
uint32_t index, struct page *page,
void *cdata, unsigned size)
{
@@ -323,8 +392,8 @@ static struct zbud_hdr *zbud_create(uint32_t pool_id, struct tmem_oid *oid,
if (unlikely(zbpg == NULL))
goto out;
/* ok, have a page, now compress the data before taking locks */
- spin_lock(&zbpg->lock);
spin_lock(&zbud_budlists_spinlock);
+ spin_lock(&zbpg->lock);
list_add_tail(&zbpg->bud_list, &zbud_unbuddied[nchunks].list);
zbud_unbuddied[nchunks].count++;
zh = &zbpg->buddy[0];
@@ -353,12 +422,12 @@ init_zh:
zh->index = index;
zh->oid = *oid;
zh->pool_id = pool_id;
- /* can wait to copy the data until the list locks are dropped */
- spin_unlock(&zbud_budlists_spinlock);
-
+ zh->client_id = client_id;
to = zbud_data(zh, size);
memcpy(to, cdata, size);
spin_unlock(&zbpg->lock);
+ spin_unlock(&zbud_budlists_spinlock);
+
zbud_cumul_chunk_counts[nchunks]++;
atomic_inc(&zcache_zbud_curr_zpages);
zcache_zbud_cumul_zpages++;
@@ -372,7 +441,7 @@ static int zbud_decompress(struct page *page, struct zbud_hdr *zh)
{
struct zbud_page *zbpg;
unsigned budnum = zbud_budnum(zh);
- size_t out_len = PAGE_SIZE;
+ unsigned int out_len = PAGE_SIZE;
char *to_va, *from_va;
unsigned size;
int ret = 0;
@@ -389,8 +458,9 @@ static int zbud_decompress(struct page *page, struct zbud_hdr *zh)
to_va = kmap_atomic(page, KM_USER0);
size = zh->size;
from_va = zbud_data(zh, size);
- ret = lzo1x_decompress_safe(from_va, size, to_va, &out_len);
- BUG_ON(ret != LZO_E_OK);
+ ret = zcache_comp_op(ZCACHE_COMPOP_DECOMPRESS, from_va, size,
+ to_va, &out_len);
+ BUG_ON(ret);
BUG_ON(out_len != PAGE_SIZE);
kunmap_atomic(to_va, KM_USER0);
out:
@@ -407,7 +477,8 @@ static unsigned long zcache_evicted_raw_pages;
static unsigned long zcache_evicted_buddied_pages;
static unsigned long zcache_evicted_unbuddied_pages;
-static struct tmem_pool *zcache_get_pool_by_id(uint32_t poolid);
+static struct tmem_pool *zcache_get_pool_by_id(uint16_t cli_id,
+ uint16_t poolid);
static void zcache_put_pool(struct tmem_pool *pool);
/*
@@ -417,7 +488,8 @@ static void zbud_evict_zbpg(struct zbud_page *zbpg)
{
struct zbud_hdr *zh;
int i, j;
- uint32_t pool_id[ZBUD_MAX_BUDS], index[ZBUD_MAX_BUDS];
+ uint32_t pool_id[ZBUD_MAX_BUDS], client_id[ZBUD_MAX_BUDS];
+ uint32_t index[ZBUD_MAX_BUDS];
struct tmem_oid oid[ZBUD_MAX_BUDS];
struct tmem_pool *pool;
@@ -426,6 +498,7 @@ static void zbud_evict_zbpg(struct zbud_page *zbpg)
for (i = 0, j = 0; i < ZBUD_MAX_BUDS; i++) {
zh = &zbpg->buddy[i];
if (zh->size) {
+ client_id[j] = zh->client_id;
pool_id[j] = zh->pool_id;
oid[j] = zh->oid;
index[j] = zh->index;
@@ -435,7 +508,7 @@ static void zbud_evict_zbpg(struct zbud_page *zbpg)
}
spin_unlock(&zbpg->lock);
for (i = 0; i < j; i++) {
- pool = zcache_get_pool_by_id(pool_id[i]);
+ pool = zcache_get_pool_by_id(client_id[i], pool_id[i]);
if (pool != NULL) {
tmem_flush_page(pool, &oid[i], index[i]);
zcache_put_pool(pool);
@@ -552,9 +625,8 @@ static int zbud_show_unbuddied_list_counts(char *buf)
int i;
char *p = buf;
- for (i = 0; i < NCHUNKS - 1; i++)
+ for (i = 0; i < NCHUNKS; i++)
p += sprintf(p, "%u ", zbud_unbuddied[i].count);
- p += sprintf(p, "%d\n", zbud_unbuddied[i].count);
return p - buf;
}
@@ -586,7 +658,7 @@ static int zbud_show_cumul_chunk_counts(char *buf)
/**********
* This "zv" PAM implementation combines the TLSF-based xvMalloc
- * with lzo1x compression to maximize the amount of data that can
+ * with the crypto compression API to maximize the amount of data that can
* be packed into a physical page.
*
* Zv represents a PAM page with the index and object (plus a "size" value
@@ -602,7 +674,23 @@ struct zv_hdr {
DECL_SENTINEL
};
-static const int zv_max_page_size = (PAGE_SIZE / 8) * 7;
+/* rudimentary policy limits */
+/* total number of persistent pages may not exceed this percentage */
+static unsigned int zv_page_count_policy_percent = 75;
+/*
+ * byte count defining poor compression; pages with greater zsize will be
+ * rejected
+ */
+static unsigned int zv_max_zsize = (PAGE_SIZE / 8) * 7;
+/*
+ * byte count defining poor *mean* compression; pages with greater zsize
+ * will be rejected until sufficient better-compressed pages are accepted
+ * driving the mean below this threshold
+ */
+static unsigned int zv_max_mean_zsize = (PAGE_SIZE / 8) * 5;
+
+static atomic_t zv_curr_dist_counts[NCHUNKS];
+static atomic_t zv_cumul_dist_counts[NCHUNKS];
static struct zv_hdr *zv_create(struct xv_pool *xvpool, uint32_t pool_id,
struct tmem_oid *oid, uint32_t index,
@@ -611,13 +699,18 @@ static struct zv_hdr *zv_create(struct xv_pool *xvpool, uint32_t pool_id,
struct page *page;
struct zv_hdr *zv = NULL;
uint32_t offset;
+ int alloc_size = clen + sizeof(struct zv_hdr);
+ int chunks = (alloc_size + (CHUNK_SIZE - 1)) >> CHUNK_SHIFT;
int ret;
BUG_ON(!irqs_disabled());
- ret = xv_malloc(xvpool, clen + sizeof(struct zv_hdr),
+ BUG_ON(chunks >= NCHUNKS);
+ ret = xv_malloc(xvpool, alloc_size,
&page, &offset, ZCACHE_GFP_MASK);
if (unlikely(ret))
goto out;
+ atomic_inc(&zv_curr_dist_counts[chunks]);
+ atomic_inc(&zv_cumul_dist_counts[chunks]);
zv = kmap_atomic(page, KM_USER0) + offset;
zv->index = index;
zv->oid = *oid;
@@ -634,11 +727,14 @@ static void zv_free(struct xv_pool *xvpool, struct zv_hdr *zv)
unsigned long flags;
struct page *page;
uint32_t offset;
- uint16_t size;
+ uint16_t size = xv_get_object_size(zv);
+ int chunks = (size + (CHUNK_SIZE - 1)) >> CHUNK_SHIFT;
ASSERT_SENTINEL(zv, ZVH);
- size = xv_get_object_size(zv) - sizeof(*zv);
- BUG_ON(size == 0 || size > zv_max_page_size);
+ BUG_ON(chunks >= NCHUNKS);
+ atomic_dec(&zv_curr_dist_counts[chunks]);
+ size -= sizeof(*zv);
+ BUG_ON(size == 0);
INVERT_SENTINEL(zv, ZVH);
page = virt_to_page(zv);
offset = (unsigned long)zv & ~PAGE_MASK;
@@ -649,22 +745,175 @@ static void zv_free(struct xv_pool *xvpool, struct zv_hdr *zv)
static void zv_decompress(struct page *page, struct zv_hdr *zv)
{
- size_t clen = PAGE_SIZE;
+ unsigned int clen = PAGE_SIZE;
char *to_va;
unsigned size;
int ret;
ASSERT_SENTINEL(zv, ZVH);
size = xv_get_object_size(zv) - sizeof(*zv);
- BUG_ON(size == 0 || size > zv_max_page_size);
+ BUG_ON(size == 0);
to_va = kmap_atomic(page, KM_USER0);
- ret = lzo1x_decompress_safe((char *)zv + sizeof(*zv),
- size, to_va, &clen);
+ ret = zcache_comp_op(ZCACHE_COMPOP_DECOMPRESS, (char *)zv + sizeof(*zv),
+ size, to_va, &clen);
kunmap_atomic(to_va, KM_USER0);
- BUG_ON(ret != LZO_E_OK);
+ BUG_ON(ret);
BUG_ON(clen != PAGE_SIZE);
}
+#ifdef CONFIG_SYSFS
+/*
+ * show a distribution of compression stats for zv pages.
+ */
+
+static int zv_curr_dist_counts_show(char *buf)
+{
+ unsigned long i, n, chunks = 0, sum_total_chunks = 0;
+ char *p = buf;
+
+ for (i = 0; i < NCHUNKS; i++) {
+ n = atomic_read(&zv_curr_dist_counts[i]);
+ p += sprintf(p, "%lu ", n);
+ chunks += n;
+ sum_total_chunks += i * n;
+ }
+ p += sprintf(p, "mean:%lu\n",
+ chunks == 0 ? 0 : sum_total_chunks / chunks);
+ return p - buf;
+}
+
+static int zv_cumul_dist_counts_show(char *buf)
+{
+ unsigned long i, n, chunks = 0, sum_total_chunks = 0;
+ char *p = buf;
+
+ for (i = 0; i < NCHUNKS; i++) {
+ n = atomic_read(&zv_cumul_dist_counts[i]);
+ p += sprintf(p, "%lu ", n);
+ chunks += n;
+ sum_total_chunks += i * n;
+ }
+ p += sprintf(p, "mean:%lu\n",
+ chunks == 0 ? 0 : sum_total_chunks / chunks);
+ return p - buf;
+}
+
+/*
+ * setting zv_max_zsize via sysfs causes all persistent (e.g. swap)
+ * pages that don't compress to less than this value (including metadata
+ * overhead) to be rejected. We don't allow the value to get too close
+ * to PAGE_SIZE.
+ */
+static ssize_t zv_max_zsize_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ return sprintf(buf, "%u\n", zv_max_zsize);
+}
+
+static ssize_t zv_max_zsize_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ unsigned long val;
+ int err;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ err = kstrtoul(buf, 10, &val);
+ if (err || (val == 0) || (val > (PAGE_SIZE / 8) * 7))
+ return -EINVAL;
+ zv_max_zsize = val;
+ return count;
+}
+
+/*
+ * setting zv_max_mean_zsize via sysfs causes all persistent (e.g. swap)
+ * pages that don't compress to less than this value (including metadata
+ * overhead) to be rejected UNLESS the mean compression is also smaller
+ * than this value. In other words, we are load-balancing-by-zsize the
+ * accepted pages. Again, we don't allow the value to get too close
+ * to PAGE_SIZE.
+ */
+static ssize_t zv_max_mean_zsize_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ return sprintf(buf, "%u\n", zv_max_mean_zsize);
+}
+
+static ssize_t zv_max_mean_zsize_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ unsigned long val;
+ int err;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ err = kstrtoul(buf, 10, &val);
+ if (err || (val == 0) || (val > (PAGE_SIZE / 8) * 7))
+ return -EINVAL;
+ zv_max_mean_zsize = val;
+ return count;
+}
+
+/*
+ * setting zv_page_count_policy_percent via sysfs sets an upper bound of
+ * persistent (e.g. swap) pages that will be retained according to:
+ * (zv_page_count_policy_percent * totalram_pages) / 100)
+ * when that limit is reached, further puts will be rejected (until
+ * some pages have been flushed). Note that, due to compression,
+ * this number may exceed 100; it defaults to 75 and we set an
+ * arbitary limit of 150. A poor choice will almost certainly result
+ * in OOM's, so this value should only be changed prudently.
+ */
+static ssize_t zv_page_count_policy_percent_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ return sprintf(buf, "%u\n", zv_page_count_policy_percent);
+}
+
+static ssize_t zv_page_count_policy_percent_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+ unsigned long val;
+ int err;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ err = kstrtoul(buf, 10, &val);
+ if (err || (val == 0) || (val > 150))
+ return -EINVAL;
+ zv_page_count_policy_percent = val;
+ return count;
+}
+
+static struct kobj_attribute zcache_zv_max_zsize_attr = {
+ .attr = { .name = "zv_max_zsize", .mode = 0644 },
+ .show = zv_max_zsize_show,
+ .store = zv_max_zsize_store,
+};
+
+static struct kobj_attribute zcache_zv_max_mean_zsize_attr = {
+ .attr = { .name = "zv_max_mean_zsize", .mode = 0644 },
+ .show = zv_max_mean_zsize_show,
+ .store = zv_max_mean_zsize_store,
+};
+
+static struct kobj_attribute zcache_zv_page_count_policy_percent_attr = {
+ .attr = { .name = "zv_page_count_policy_percent",
+ .mode = 0644 },
+ .show = zv_page_count_policy_percent_show,
+ .store = zv_page_count_policy_percent_store,
+};
+#endif
+
/*
* zcache core code starts here
*/
@@ -677,51 +926,76 @@ static unsigned long zcache_flobj_found;
static unsigned long zcache_failed_eph_puts;
static unsigned long zcache_failed_pers_puts;
-#define MAX_POOLS_PER_CLIENT 16
-
-static struct {
- struct tmem_pool *tmem_pools[MAX_POOLS_PER_CLIENT];
- struct xv_pool *xvpool;
-} zcache_client;
-
/*
* Tmem operations assume the poolid implies the invoking client.
- * Zcache only has one client (the kernel itself), so translate
- * the poolid into the tmem_pool allocated for it. A KVM version
+ * Zcache only has one client (the kernel itself): LOCAL_CLIENT.
+ * RAMster has each client numbered by cluster node, and a KVM version
* of zcache would have one client per guest and each client might
* have a poolid==N.
*/
-static struct tmem_pool *zcache_get_pool_by_id(uint32_t poolid)
+static struct tmem_pool *zcache_get_pool_by_id(uint16_t cli_id, uint16_t poolid)
{
struct tmem_pool *pool = NULL;
+ struct zcache_client *cli = NULL;
- if (poolid >= 0) {
- pool = zcache_client.tmem_pools[poolid];
+ if (cli_id == LOCAL_CLIENT)
+ cli = &zcache_host;
+ else {
+ if (cli_id >= MAX_CLIENTS)
+ goto out;
+ cli = &zcache_clients[cli_id];
+ if (cli == NULL)
+ goto out;
+ atomic_inc(&cli->refcount);
+ }
+ if (poolid < MAX_POOLS_PER_CLIENT) {
+ pool = cli->tmem_pools[poolid];
if (pool != NULL)
atomic_inc(&pool->refcount);
}
+out:
return pool;
}
static void zcache_put_pool(struct tmem_pool *pool)
{
- if (pool != NULL)
- atomic_dec(&pool->refcount);
+ struct zcache_client *cli = NULL;
+
+ if (pool == NULL)
+ BUG();
+ cli = pool->client;
+ atomic_dec(&pool->refcount);
+ atomic_dec(&cli->refcount);
+}
+
+int zcache_new_client(uint16_t cli_id)
+{
+ struct zcache_client *cli = NULL;
+ int ret = -1;
+
+ if (cli_id == LOCAL_CLIENT)
+ cli = &zcache_host;
+ else if ((unsigned int)cli_id < MAX_CLIENTS)
+ cli = &zcache_clients[cli_id];
+ if (cli == NULL)
+ goto out;
+ if (cli->allocated)
+ goto out;
+ cli->allocated = 1;
+#ifdef CONFIG_FRONTSWAP
+ cli->xvpool = xv_create_pool();
+ if (cli->xvpool == NULL)
+ goto out;
+#endif
+ ret = 0;
+out:
+ return ret;
}
/* counters for debugging */
static unsigned long zcache_failed_get_free_pages;
static unsigned long zcache_failed_alloc;
static unsigned long zcache_put_to_flush;
-static unsigned long zcache_aborted_preload;
-static unsigned long zcache_aborted_shrink;
-
-/*
- * Ensure that memory allocation requests in zcache don't result
- * in direct reclaim requests via the shrinker, which would cause
- * an infinite loop. Maybe a GFP flag would be better?
- */
-static DEFINE_SPINLOCK(zcache_direct_reclaim_lock);
/*
* for now, used named slabs so can easily track usage; later can
@@ -760,10 +1034,6 @@ static int zcache_do_preload(struct tmem_pool *pool)
goto out;
if (unlikely(zcache_obj_cache == NULL))
goto out;
- if (!spin_trylock(&zcache_direct_reclaim_lock)) {
- zcache_aborted_preload++;
- goto out;
- }
preempt_disable();
kp = &__get_cpu_var(zcache_preloads);
while (kp->nr < ARRAY_SIZE(kp->objnodes)) {
@@ -772,7 +1042,7 @@ static int zcache_do_preload(struct tmem_pool *pool)
ZCACHE_GFP_MASK);
if (unlikely(objnode == NULL)) {
zcache_failed_alloc++;
- goto unlock_out;
+ goto out;
}
preempt_disable();
kp = &__get_cpu_var(zcache_preloads);
@@ -785,13 +1055,13 @@ static int zcache_do_preload(struct tmem_pool *pool)
obj = kmem_cache_alloc(zcache_obj_cache, ZCACHE_GFP_MASK);
if (unlikely(obj == NULL)) {
zcache_failed_alloc++;
- goto unlock_out;
+ goto out;
}
page = (void *)__get_free_page(ZCACHE_GFP_MASK);
if (unlikely(page == NULL)) {
zcache_failed_get_free_pages++;
kmem_cache_free(zcache_obj_cache, obj);
- goto unlock_out;
+ goto out;
}
preempt_disable();
kp = &__get_cpu_var(zcache_preloads);
@@ -804,8 +1074,6 @@ static int zcache_do_preload(struct tmem_pool *pool)
else
free_page((unsigned long)page);
ret = 0;
-unlock_out:
- spin_unlock(&zcache_direct_reclaim_lock);
out:
return ret;
}
@@ -901,48 +1169,61 @@ static unsigned long zcache_curr_pers_pampd_count_max;
/* forward reference */
static int zcache_compress(struct page *from, void **out_va, size_t *out_len);
-static void *zcache_pampd_create(struct tmem_pool *pool, struct tmem_oid *oid,
- uint32_t index, struct page *page)
+static void *zcache_pampd_create(char *data, size_t size, bool raw, int eph,
+ struct tmem_pool *pool, struct tmem_oid *oid,
+ uint32_t index)
{
void *pampd = NULL, *cdata;
size_t clen;
int ret;
- bool ephemeral = is_ephemeral(pool);
unsigned long count;
-
- if (ephemeral) {
+ struct page *page = (struct page *)(data);
+ struct zcache_client *cli = pool->client;
+ uint16_t client_id = get_client_id_from_client(cli);
+ unsigned long zv_mean_zsize;
+ unsigned long curr_pers_pampd_count;
+ u64 total_zsize;
+
+ if (eph) {
ret = zcache_compress(page, &cdata, &clen);
if (ret == 0)
-
goto out;
if (clen == 0 || clen > zbud_max_buddy_size()) {
zcache_compress_poor++;
goto out;
}
- pampd = (void *)zbud_create(pool->pool_id, oid, index,
- page, cdata, clen);
+ pampd = (void *)zbud_create(client_id, pool->pool_id, oid,
+ index, page, cdata, clen);
if (pampd != NULL) {
count = atomic_inc_return(&zcache_curr_eph_pampd_count);
if (count > zcache_curr_eph_pampd_count_max)
zcache_curr_eph_pampd_count_max = count;
}
} else {
- /*
- * FIXME: This is all the "policy" there is for now.
- * 3/4 totpages should allow ~37% of RAM to be filled with
- * compressed frontswap pages
- */
- if (atomic_read(&zcache_curr_pers_pampd_count) >
- 3 * totalram_pages / 4)
+ curr_pers_pampd_count =
+ atomic_read(&zcache_curr_pers_pampd_count);
+ if (curr_pers_pampd_count >
+ (zv_page_count_policy_percent * totalram_pages) / 100)
goto out;
ret = zcache_compress(page, &cdata, &clen);
if (ret == 0)
goto out;
- if (clen > zv_max_page_size) {
+ /* reject if compression is too poor */
+ if (clen > zv_max_zsize) {
zcache_compress_poor++;
goto out;
}
- pampd = (void *)zv_create(zcache_client.xvpool, pool->pool_id,
+ /* reject if mean compression is too poor */
+ if ((clen > zv_max_mean_zsize) && (curr_pers_pampd_count > 0)) {
+ total_zsize = xv_get_total_size_bytes(cli->xvpool);
+ zv_mean_zsize = div_u64(total_zsize,
+ curr_pers_pampd_count);
+ if (zv_mean_zsize > zv_max_mean_zsize) {
+ zcache_mean_compress_poor++;
+ goto out;
+ }
+ }
+ pampd = (void *)zv_create(cli->xvpool, pool->pool_id,
oid, index, cdata, clen);
if (pampd == NULL)
goto out;
@@ -958,15 +1239,31 @@ out:
* fill the pageframe corresponding to the struct page with the data
* from the passed pampd
*/
-static int zcache_pampd_get_data(struct page *page, void *pampd,
- struct tmem_pool *pool)
+static int zcache_pampd_get_data(char *data, size_t *bufsize, bool raw,
+ void *pampd, struct tmem_pool *pool,
+ struct tmem_oid *oid, uint32_t index)
{
int ret = 0;
- if (is_ephemeral(pool))
- ret = zbud_decompress(page, pampd);
- else
- zv_decompress(page, pampd);
+ BUG_ON(is_ephemeral(pool));
+ zv_decompress((struct page *)(data), pampd);
+ return ret;
+}
+
+/*
+ * fill the pageframe corresponding to the struct page with the data
+ * from the passed pampd
+ */
+static int zcache_pampd_get_data_and_free(char *data, size_t *bufsize, bool raw,
+ void *pampd, struct tmem_pool *pool,
+ struct tmem_oid *oid, uint32_t index)
+{
+ int ret = 0;
+
+ BUG_ON(!is_ephemeral(pool));
+ zbud_decompress((struct page *)(data), pampd);
+ zbud_free_and_delist((struct zbud_hdr *)pampd);
+ atomic_dec(&zcache_curr_eph_pampd_count);
return ret;
}
@@ -974,48 +1271,73 @@ static int zcache_pampd_get_data(struct page *page, void *pampd,
* free the pampd and remove it from any zcache lists
* pampd must no longer be pointed to from any tmem data structures!
*/
-static void zcache_pampd_free(void *pampd, struct tmem_pool *pool)
+static void zcache_pampd_free(void *pampd, struct tmem_pool *pool,
+ struct tmem_oid *oid, uint32_t index)
{
+ struct zcache_client *cli = pool->client;
+
if (is_ephemeral(pool)) {
zbud_free_and_delist((struct zbud_hdr *)pampd);
atomic_dec(&zcache_curr_eph_pampd_count);
BUG_ON(atomic_read(&zcache_curr_eph_pampd_count) < 0);
} else {
- zv_free(zcache_client.xvpool, (struct zv_hdr *)pampd);
+ zv_free(cli->xvpool, (struct zv_hdr *)pampd);
atomic_dec(&zcache_curr_pers_pampd_count);
BUG_ON(atomic_read(&zcache_curr_pers_pampd_count) < 0);
}
}
+static void zcache_pampd_free_obj(struct tmem_pool *pool, struct tmem_obj *obj)
+{
+}
+
+static void zcache_pampd_new_obj(struct tmem_obj *obj)
+{
+}
+
+static int zcache_pampd_replace_in_obj(void *pampd, struct tmem_obj *obj)
+{
+ return -1;
+}
+
+static bool zcache_pampd_is_remote(void *pampd)
+{
+ return 0;
+}
+
static struct tmem_pamops zcache_pamops = {
.create = zcache_pampd_create,
.get_data = zcache_pampd_get_data,
+ .get_data_and_free = zcache_pampd_get_data_and_free,
.free = zcache_pampd_free,
+ .free_obj = zcache_pampd_free_obj,
+ .new_obj = zcache_pampd_new_obj,
+ .replace_in_obj = zcache_pampd_replace_in_obj,
+ .is_remote = zcache_pampd_is_remote,
};
/*
* zcache compression/decompression and related per-cpu stuff
*/
-#define LZO_WORKMEM_BYTES LZO1X_1_MEM_COMPRESS
-#define LZO_DSTMEM_PAGE_ORDER 1
-static DEFINE_PER_CPU(unsigned char *, zcache_workmem);
static DEFINE_PER_CPU(unsigned char *, zcache_dstmem);
+#define ZCACHE_DSTMEM_ORDER 1
static int zcache_compress(struct page *from, void **out_va, size_t *out_len)
{
int ret = 0;
unsigned char *dmem = __get_cpu_var(zcache_dstmem);
- unsigned char *wmem = __get_cpu_var(zcache_workmem);
char *from_va;
BUG_ON(!irqs_disabled());
- if (unlikely(dmem == NULL || wmem == NULL))
- goto out; /* no buffer, so can't compress */
+ if (unlikely(dmem == NULL))
+ goto out; /* no buffer or no compressor so can't compress */
+ *out_len = PAGE_SIZE << ZCACHE_DSTMEM_ORDER;
from_va = kmap_atomic(from, KM_USER0);
mb();
- ret = lzo1x_1_compress(from_va, PAGE_SIZE, dmem, out_len, wmem);
- BUG_ON(ret != LZO_E_OK);
+ ret = zcache_comp_op(ZCACHE_COMPOP_COMPRESS, from_va, PAGE_SIZE, dmem,
+ (unsigned int *)out_len);
+ BUG_ON(ret);
*out_va = dmem;
kunmap_atomic(from_va, KM_USER0);
ret = 1;
@@ -1023,29 +1345,48 @@ out:
return ret;
}
+static int zcache_comp_cpu_up(int cpu)
+{
+ struct crypto_comp *tfm;
+
+ tfm = crypto_alloc_comp(zcache_comp_name, 0, 0);
+ if (IS_ERR(tfm))
+ return NOTIFY_BAD;
+ *per_cpu_ptr(zcache_comp_pcpu_tfms, cpu) = tfm;
+ return NOTIFY_OK;
+}
+
+static void zcache_comp_cpu_down(int cpu)
+{
+ struct crypto_comp *tfm;
+
+ tfm = *per_cpu_ptr(zcache_comp_pcpu_tfms, cpu);
+ crypto_free_comp(tfm);
+ *per_cpu_ptr(zcache_comp_pcpu_tfms, cpu) = NULL;
+}
static int zcache_cpu_notifier(struct notifier_block *nb,
unsigned long action, void *pcpu)
{
- int cpu = (long)pcpu;
+ int ret, cpu = (long)pcpu;
struct zcache_preload *kp;
switch (action) {
case CPU_UP_PREPARE:
+ ret = zcache_comp_cpu_up(cpu);
+ if (ret != NOTIFY_OK) {
+ pr_err("zcache: can't allocate compressor transform\n");
+ return ret;
+ }
per_cpu(zcache_dstmem, cpu) = (void *)__get_free_pages(
- GFP_KERNEL | __GFP_REPEAT,
- LZO_DSTMEM_PAGE_ORDER),
- per_cpu(zcache_workmem, cpu) =
- kzalloc(LZO1X_MEM_COMPRESS,
- GFP_KERNEL | __GFP_REPEAT);
+ GFP_KERNEL | __GFP_REPEAT, ZCACHE_DSTMEM_ORDER);
break;
case CPU_DEAD:
case CPU_UP_CANCELED:
+ zcache_comp_cpu_down(cpu);
free_pages((unsigned long)per_cpu(zcache_dstmem, cpu),
- LZO_DSTMEM_PAGE_ORDER);
+ ZCACHE_DSTMEM_ORDER);
per_cpu(zcache_dstmem, cpu) = NULL;
- kfree(per_cpu(zcache_workmem, cpu));
- per_cpu(zcache_workmem, cpu) = NULL;
kp = &per_cpu(zcache_preloads, cpu);
while (kp->nr) {
kmem_cache_free(zcache_objnode_cache,
@@ -1053,8 +1394,14 @@ static int zcache_cpu_notifier(struct notifier_block *nb,
kp->objnodes[kp->nr - 1] = NULL;
kp->nr--;
}
- kmem_cache_free(zcache_obj_cache, kp->obj);
- free_page((unsigned long)kp->page);
+ if (kp->obj) {
+ kmem_cache_free(zcache_obj_cache, kp->obj);
+ kp->obj = NULL;
+ }
+ if (kp->page) {
+ free_page((unsigned long)kp->page);
+ kp->page = NULL;
+ }
break;
default:
break;
@@ -1119,9 +1466,8 @@ ZCACHE_SYSFS_RO(evicted_buddied_pages);
ZCACHE_SYSFS_RO(failed_get_free_pages);
ZCACHE_SYSFS_RO(failed_alloc);
ZCACHE_SYSFS_RO(put_to_flush);
-ZCACHE_SYSFS_RO(aborted_preload);
-ZCACHE_SYSFS_RO(aborted_shrink);
ZCACHE_SYSFS_RO(compress_poor);
+ZCACHE_SYSFS_RO(mean_compress_poor);
ZCACHE_SYSFS_RO_ATOMIC(zbud_curr_raw_pages);
ZCACHE_SYSFS_RO_ATOMIC(zbud_curr_zpages);
ZCACHE_SYSFS_RO_ATOMIC(curr_obj_count);
@@ -1130,6 +1476,10 @@ ZCACHE_SYSFS_RO_CUSTOM(zbud_unbuddied_list_counts,
zbud_show_unbuddied_list_counts);
ZCACHE_SYSFS_RO_CUSTOM(zbud_cumul_chunk_counts,
zbud_show_cumul_chunk_counts);
+ZCACHE_SYSFS_RO_CUSTOM(zv_curr_dist_counts,
+ zv_curr_dist_counts_show);
+ZCACHE_SYSFS_RO_CUSTOM(zv_cumul_dist_counts,
+ zv_cumul_dist_counts_show);
static struct attribute *zcache_attrs[] = {
&zcache_curr_obj_count_attr.attr,
@@ -1143,6 +1493,7 @@ static struct attribute *zcache_attrs[] = {
&zcache_failed_eph_puts_attr.attr,
&zcache_failed_pers_puts_attr.attr,
&zcache_compress_poor_attr.attr,
+ &zcache_mean_compress_poor_attr.attr,
&zcache_zbud_curr_raw_pages_attr.attr,
&zcache_zbud_curr_zpages_attr.attr,
&zcache_zbud_curr_zbytes_attr.attr,
@@ -1156,10 +1507,13 @@ static struct attribute *zcache_attrs[] = {
&zcache_failed_get_free_pages_attr.attr,
&zcache_failed_alloc_attr.attr,
&zcache_put_to_flush_attr.attr,
- &zcache_aborted_preload_attr.attr,
- &zcache_aborted_shrink_attr.attr,
&zcache_zbud_unbuddied_list_counts_attr.attr,
&zcache_zbud_cumul_chunk_counts_attr.attr,
+ &zcache_zv_curr_dist_counts_attr.attr,
+ &zcache_zv_cumul_dist_counts_attr.attr,
+ &zcache_zv_max_zsize_attr.attr,
+ &zcache_zv_max_mean_zsize_attr.attr,
+ &zcache_zv_page_count_policy_percent_attr.attr,
NULL,
};
@@ -1192,11 +1546,7 @@ static int shrink_zcache_memory(struct shrinker *shrink,
if (!(gfp_mask & __GFP_FS))
/* does this case really need to be skipped? */
goto out;
- if (spin_trylock(&zcache_direct_reclaim_lock)) {
- zbud_evict_pages(nr);
- spin_unlock(&zcache_direct_reclaim_lock);
- } else
- zcache_aborted_shrink++;
+ zbud_evict_pages(nr);
}
ret = (int)atomic_read(&zcache_zbud_curr_raw_pages);
out:
@@ -1212,19 +1562,20 @@ static struct shrinker zcache_shrinker = {
* zcache shims between cleancache/frontswap ops and tmem
*/
-static int zcache_put_page(int pool_id, struct tmem_oid *oidp,
+static int zcache_put_page(int cli_id, int pool_id, struct tmem_oid *oidp,
uint32_t index, struct page *page)
{
struct tmem_pool *pool;
int ret = -1;
BUG_ON(!irqs_disabled());
- pool = zcache_get_pool_by_id(pool_id);
+ pool = zcache_get_pool_by_id(cli_id, pool_id);
if (unlikely(pool == NULL))
goto out;
if (!zcache_freeze && zcache_do_preload(pool) == 0) {
/* preload does preempt_disable on success */
- ret = tmem_put(pool, oidp, index, page);
+ ret = tmem_put(pool, oidp, index, (char *)(page),
+ PAGE_SIZE, 0, is_ephemeral(pool));
if (ret < 0) {
if (is_ephemeral(pool))
zcache_failed_eph_puts++;
@@ -1244,25 +1595,28 @@ out:
return ret;
}
-static int zcache_get_page(int pool_id, struct tmem_oid *oidp,
+static int zcache_get_page(int cli_id, int pool_id, struct tmem_oid *oidp,
uint32_t index, struct page *page)
{
struct tmem_pool *pool;
int ret = -1;
unsigned long flags;
+ size_t size = PAGE_SIZE;
local_irq_save(flags);
- pool = zcache_get_pool_by_id(pool_id);
+ pool = zcache_get_pool_by_id(cli_id, pool_id);
if (likely(pool != NULL)) {
if (atomic_read(&pool->obj_count) > 0)
- ret = tmem_get(pool, oidp, index, page);
+ ret = tmem_get(pool, oidp, index, (char *)(page),
+ &size, 0, is_ephemeral(pool));
zcache_put_pool(pool);
}
local_irq_restore(flags);
return ret;
}
-static int zcache_flush_page(int pool_id, struct tmem_oid *oidp, uint32_t index)
+static int zcache_flush_page(int cli_id, int pool_id,
+ struct tmem_oid *oidp, uint32_t index)
{
struct tmem_pool *pool;
int ret = -1;
@@ -1270,7 +1624,7 @@ static int zcache_flush_page(int pool_id, struct tmem_oid *oidp, uint32_t index)
local_irq_save(flags);
zcache_flush_total++;
- pool = zcache_get_pool_by_id(pool_id);
+ pool = zcache_get_pool_by_id(cli_id, pool_id);
if (likely(pool != NULL)) {
if (atomic_read(&pool->obj_count) > 0)
ret = tmem_flush_page(pool, oidp, index);
@@ -1282,7 +1636,8 @@ static int zcache_flush_page(int pool_id, struct tmem_oid *oidp, uint32_t index)
return ret;
}
-static int zcache_flush_object(int pool_id, struct tmem_oid *oidp)
+static int zcache_flush_object(int cli_id, int pool_id,
+ struct tmem_oid *oidp)
{
struct tmem_pool *pool;
int ret = -1;
@@ -1290,7 +1645,7 @@ static int zcache_flush_object(int pool_id, struct tmem_oid *oidp)
local_irq_save(flags);
zcache_flobj_total++;
- pool = zcache_get_pool_by_id(pool_id);
+ pool = zcache_get_pool_by_id(cli_id, pool_id);
if (likely(pool != NULL)) {
if (atomic_read(&pool->obj_count) > 0)
ret = tmem_flush_object(pool, oidp);
@@ -1302,42 +1657,60 @@ static int zcache_flush_object(int pool_id, struct tmem_oid *oidp)
return ret;
}
-static int zcache_destroy_pool(int pool_id)
+static int zcache_destroy_pool(int cli_id, int pool_id)
{
struct tmem_pool *pool = NULL;
+ struct zcache_client *cli = NULL;
int ret = -1;
if (pool_id < 0)
goto out;
- pool = zcache_client.tmem_pools[pool_id];
+ if (cli_id == LOCAL_CLIENT)
+ cli = &zcache_host;
+ else if ((unsigned int)cli_id < MAX_CLIENTS)
+ cli = &zcache_clients[cli_id];
+ if (cli == NULL)
+ goto out;
+ atomic_inc(&cli->refcount);
+ pool = cli->tmem_pools[pool_id];
if (pool == NULL)
goto out;
- zcache_client.tmem_pools[pool_id] = NULL;
+ cli->tmem_pools[pool_id] = NULL;
/* wait for pool activity on other cpus to quiesce */
while (atomic_read(&pool->refcount) != 0)
;
+ atomic_dec(&cli->refcount);
local_bh_disable();
ret = tmem_destroy_pool(pool);
local_bh_enable();
kfree(pool);
- pr_info("zcache: destroyed pool id=%d\n", pool_id);
+ pr_info("zcache: destroyed pool id=%d, cli_id=%d\n",
+ pool_id, cli_id);
out:
return ret;
}
-static int zcache_new_pool(uint32_t flags)
+static int zcache_new_pool(uint16_t cli_id, uint32_t flags)
{
int poolid = -1;
struct tmem_pool *pool;
+ struct zcache_client *cli = NULL;
- pool = kmalloc(sizeof(struct tmem_pool), GFP_KERNEL);
+ if (cli_id == LOCAL_CLIENT)
+ cli = &zcache_host;
+ else if ((unsigned int)cli_id < MAX_CLIENTS)
+ cli = &zcache_clients[cli_id];
+ if (cli == NULL)
+ goto out;
+ atomic_inc(&cli->refcount);
+ pool = kmalloc(sizeof(struct tmem_pool), GFP_ATOMIC);
if (pool == NULL) {
pr_info("zcache: pool creation failed: out of memory\n");
goto out;
}
for (poolid = 0; poolid < MAX_POOLS_PER_CLIENT; poolid++)
- if (zcache_client.tmem_pools[poolid] == NULL)
+ if (cli->tmem_pools[poolid] == NULL)
break;
if (poolid >= MAX_POOLS_PER_CLIENT) {
pr_info("zcache: pool creation failed: max exceeded\n");
@@ -1346,14 +1719,16 @@ static int zcache_new_pool(uint32_t flags)
goto out;
}
atomic_set(&pool->refcount, 0);
- pool->client = &zcache_client;
+ pool->client = cli;
pool->pool_id = poolid;
tmem_new_pool(pool, flags);
- zcache_client.tmem_pools[poolid] = pool;
- pr_info("zcache: created %s tmem pool, id=%d\n",
+ cli->tmem_pools[poolid] = pool;
+ pr_info("zcache: created %s tmem pool, id=%d, client=%d\n",
flags & TMEM_POOL_PERSIST ? "persistent" : "ephemeral",
- poolid);
+ poolid, cli_id);
out:
+ if (cli != NULL)
+ atomic_dec(&cli->refcount);
return poolid;
}
@@ -1374,7 +1749,7 @@ static void zcache_cleancache_put_page(int pool_id,
struct tmem_oid oid = *(struct tmem_oid *)&key;
if (likely(ind == index))
- (void)zcache_put_page(pool_id, &oid, index, page);
+ (void)zcache_put_page(LOCAL_CLIENT, pool_id, &oid, index, page);
}
static int zcache_cleancache_get_page(int pool_id,
@@ -1386,7 +1761,7 @@ static int zcache_cleancache_get_page(int pool_id,
int ret = -1;
if (likely(ind == index))
- ret = zcache_get_page(pool_id, &oid, index, page);
+ ret = zcache_get_page(LOCAL_CLIENT, pool_id, &oid, index, page);
return ret;
}
@@ -1398,7 +1773,7 @@ static void zcache_cleancache_flush_page(int pool_id,
struct tmem_oid oid = *(struct tmem_oid *)&key;
if (likely(ind == index))
- (void)zcache_flush_page(pool_id, &oid, ind);
+ (void)zcache_flush_page(LOCAL_CLIENT, pool_id, &oid, ind);
}
static void zcache_cleancache_flush_inode(int pool_id,
@@ -1406,13 +1781,13 @@ static void zcache_cleancache_flush_inode(int pool_id,
{
struct tmem_oid oid = *(struct tmem_oid *)&key;
- (void)zcache_flush_object(pool_id, &oid);
+ (void)zcache_flush_object(LOCAL_CLIENT, pool_id, &oid);
}
static void zcache_cleancache_flush_fs(int pool_id)
{
if (pool_id >= 0)
- (void)zcache_destroy_pool(pool_id);
+ (void)zcache_destroy_pool(LOCAL_CLIENT, pool_id);
}
static int zcache_cleancache_init_fs(size_t pagesize)
@@ -1420,7 +1795,7 @@ static int zcache_cleancache_init_fs(size_t pagesize)
BUG_ON(sizeof(struct cleancache_filekey) !=
sizeof(struct tmem_oid));
BUG_ON(pagesize != PAGE_SIZE);
- return zcache_new_pool(0);
+ return zcache_new_pool(LOCAL_CLIENT, 0);
}
static int zcache_cleancache_init_shared_fs(char *uuid, size_t pagesize)
@@ -1429,7 +1804,7 @@ static int zcache_cleancache_init_shared_fs(char *uuid, size_t pagesize)
BUG_ON(sizeof(struct cleancache_filekey) !=
sizeof(struct tmem_oid));
BUG_ON(pagesize != PAGE_SIZE);
- return zcache_new_pool(0);
+ return zcache_new_pool(LOCAL_CLIENT, 0);
}
static struct cleancache_ops zcache_cleancache_ops = {
@@ -1458,8 +1833,10 @@ static int zcache_frontswap_poolid = -1;
/*
* Swizzling increases objects per swaptype, increasing tmem concurrency
* for heavy swaploads. Later, larger nr_cpus -> larger SWIZ_BITS
+ * Setting SWIZ_BITS to 27 basically reconstructs the swap entry from
+ * frontswap_get_page(), but has side-effects. Hence using 8.
*/
-#define SWIZ_BITS 4
+#define SWIZ_BITS 8
#define SWIZ_MASK ((1 << SWIZ_BITS) - 1)
#define _oswiz(_type, _ind) ((_type << SWIZ_BITS) | (_ind & SWIZ_MASK))
#define iswiz(_ind) (_ind >> SWIZ_BITS)
@@ -1483,8 +1860,8 @@ static int zcache_frontswap_put_page(unsigned type, pgoff_t offset,
BUG_ON(!PageLocked(page));
if (likely(ind64 == ind)) {
local_irq_save(flags);
- ret = zcache_put_page(zcache_frontswap_poolid, &oid,
- iswiz(ind), page);
+ ret = zcache_put_page(LOCAL_CLIENT, zcache_frontswap_poolid,
+ &oid, iswiz(ind), page);
local_irq_restore(flags);
}
return ret;
@@ -1502,8 +1879,8 @@ static int zcache_frontswap_get_page(unsigned type, pgoff_t offset,
BUG_ON(!PageLocked(page));
if (likely(ind64 == ind))
- ret = zcache_get_page(zcache_frontswap_poolid, &oid,
- iswiz(ind), page);
+ ret = zcache_get_page(LOCAL_CLIENT, zcache_frontswap_poolid,
+ &oid, iswiz(ind), page);
return ret;
}
@@ -1515,8 +1892,8 @@ static void zcache_frontswap_flush_page(unsigned type, pgoff_t offset)
struct tmem_oid oid = oswiz(type, ind);
if (likely(ind64 == ind))
- (void)zcache_flush_page(zcache_frontswap_poolid, &oid,
- iswiz(ind));
+ (void)zcache_flush_page(LOCAL_CLIENT, zcache_frontswap_poolid,
+ &oid, iswiz(ind));
}
/* flush all pages from the passed swaptype */
@@ -1527,7 +1904,8 @@ static void zcache_frontswap_flush_area(unsigned type)
for (ind = SWIZ_MASK; ind >= 0; ind--) {
oid = oswiz(type, ind);
- (void)zcache_flush_object(zcache_frontswap_poolid, &oid);
+ (void)zcache_flush_object(LOCAL_CLIENT,
+ zcache_frontswap_poolid, &oid);
}
}
@@ -1535,7 +1913,8 @@ static void zcache_frontswap_init(unsigned ignored)
{
/* a single tmem poolid is used for all frontswap "types" (swapfiles) */
if (zcache_frontswap_poolid < 0)
- zcache_frontswap_poolid = zcache_new_pool(TMEM_POOL_PERSIST);
+ zcache_frontswap_poolid =
+ zcache_new_pool(LOCAL_CLIENT, TMEM_POOL_PERSIST);
}
static struct frontswap_ops zcache_frontswap_ops = {
@@ -1592,11 +1971,49 @@ static int __init no_frontswap(char *s)
__setup("nofrontswap", no_frontswap);
+static int __init enable_zcache_compressor(char *s)
+{
+ strncpy(zcache_comp_name, s, ZCACHE_COMP_NAME_SZ);
+ zcache_enabled = 1;
+ return 1;
+}
+__setup("zcache=", enable_zcache_compressor);
+
+
+static int zcache_comp_init(void)
+{
+ int ret = 0;
+
+ /* check crypto algorithm */
+ if (*zcache_comp_name != '\0') {
+ ret = crypto_has_comp(zcache_comp_name, 0, 0);
+ if (!ret)
+ pr_info("zcache: %s not supported\n",
+ zcache_comp_name);
+ }
+ if (!ret)
+ strcpy(zcache_comp_name, "lzo");
+ ret = crypto_has_comp(zcache_comp_name, 0, 0);
+ if (!ret) {
+ ret = 1;
+ goto out;
+ }
+ pr_info("zcache: using %s compressor\n", zcache_comp_name);
+
+ /* alloc percpu transforms */
+ ret = 0;
+ zcache_comp_pcpu_tfms = alloc_percpu(struct crypto_comp *);
+ if (!zcache_comp_pcpu_tfms)
+ ret = 1;
+out:
+ return ret;
+}
+
static int __init zcache_init(void)
{
-#ifdef CONFIG_SYSFS
int ret = 0;
+#ifdef CONFIG_SYSFS
ret = sysfs_create_group(mm_kobj, &zcache_attr_group);
if (ret) {
pr_err("zcache: can't create sysfs\n");
@@ -1614,6 +2031,11 @@ static int __init zcache_init(void)
pr_err("zcache: can't register cpu notifier\n");
goto out;
}
+ ret = zcache_comp_init();
+ if (ret) {
+ pr_err("zcache: compressor initialization failed\n");
+ goto out;
+ }
for_each_online_cpu(cpu) {
void *pcpu = (void *)(long)cpu;
zcache_cpu_notifier(&zcache_cpu_notifier_block,
@@ -1624,6 +2046,11 @@ static int __init zcache_init(void)
sizeof(struct tmem_objnode), 0, 0, NULL);
zcache_obj_cache = kmem_cache_create("zcache_obj",
sizeof(struct tmem_obj), 0, 0, NULL);
+ ret = zcache_new_client(LOCAL_CLIENT);
+ if (ret) {
+ pr_err("zcache: can't create client\n");
+ goto out;
+ }
#endif
#ifdef CONFIG_CLEANCACHE
if (zcache_enabled && use_cleancache) {
@@ -1642,16 +2069,11 @@ static int __init zcache_init(void)
if (zcache_enabled && use_frontswap) {
struct frontswap_ops old_ops;
- zcache_client.xvpool = xv_create_pool();
- if (zcache_client.xvpool == NULL) {
- pr_err("zcache: can't create xvpool\n");
- goto out;
- }
old_ops = zcache_frontswap_register_ops();
pr_info("zcache: frontswap enabled using kernel "
"transcendent memory and xvmalloc\n");
if (old_ops.init != NULL)
- pr_warning("ktmem: frontswap_ops overridden");
+ pr_warning("zcache: frontswap_ops overridden");
}
#endif
out: