| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I got a KASAN report of use-after-free:
==================================================================
BUG: KASAN: use-after-free in klist_iter_exit+0x61/0x70 at addr ffff8800b6581508
Read of size 8 by task trinity-c1/315
=============================================================================
BUG kmalloc-32 (Not tainted): kasan: bad access detected
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Allocated in disk_seqf_start+0x66/0x110 age=144 cpu=1 pid=315
___slab_alloc+0x4f1/0x520
__slab_alloc.isra.58+0x56/0x80
kmem_cache_alloc_trace+0x260/0x2a0
disk_seqf_start+0x66/0x110
traverse+0x176/0x860
seq_read+0x7e3/0x11a0
proc_reg_read+0xbc/0x180
do_loop_readv_writev+0x134/0x210
do_readv_writev+0x565/0x660
vfs_readv+0x67/0xa0
do_preadv+0x126/0x170
SyS_preadv+0xc/0x10
do_syscall_64+0x1a1/0x460
return_from_SYSCALL_64+0x0/0x6a
INFO: Freed in disk_seqf_stop+0x42/0x50 age=160 cpu=1 pid=315
__slab_free+0x17a/0x2c0
kfree+0x20a/0x220
disk_seqf_stop+0x42/0x50
traverse+0x3b5/0x860
seq_read+0x7e3/0x11a0
proc_reg_read+0xbc/0x180
do_loop_readv_writev+0x134/0x210
do_readv_writev+0x565/0x660
vfs_readv+0x67/0xa0
do_preadv+0x126/0x170
SyS_preadv+0xc/0x10
do_syscall_64+0x1a1/0x460
return_from_SYSCALL_64+0x0/0x6a
CPU: 1 PID: 315 Comm: trinity-c1 Tainted: G B 4.7.0+ #62
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
ffffea0002d96000 ffff880119b9f918 ffffffff81d6ce81 ffff88011a804480
ffff8800b6581500 ffff880119b9f948 ffffffff8146c7bd ffff88011a804480
ffffea0002d96000 ffff8800b6581500 fffffffffffffff4 ffff880119b9f970
Call Trace:
[<ffffffff81d6ce81>] dump_stack+0x65/0x84
[<ffffffff8146c7bd>] print_trailer+0x10d/0x1a0
[<ffffffff814704ff>] object_err+0x2f/0x40
[<ffffffff814754d1>] kasan_report_error+0x221/0x520
[<ffffffff8147590e>] __asan_report_load8_noabort+0x3e/0x40
[<ffffffff83888161>] klist_iter_exit+0x61/0x70
[<ffffffff82404389>] class_dev_iter_exit+0x9/0x10
[<ffffffff81d2e8ea>] disk_seqf_stop+0x3a/0x50
[<ffffffff8151f812>] seq_read+0x4b2/0x11a0
[<ffffffff815f8fdc>] proc_reg_read+0xbc/0x180
[<ffffffff814b24e4>] do_loop_readv_writev+0x134/0x210
[<ffffffff814b4c45>] do_readv_writev+0x565/0x660
[<ffffffff814b8a17>] vfs_readv+0x67/0xa0
[<ffffffff814b8de6>] do_preadv+0x126/0x170
[<ffffffff814b92ec>] SyS_preadv+0xc/0x10
This problem can occur in the following situation:
open()
- pread()
- .seq_start()
- iter = kmalloc() // succeeds
- seqf->private = iter
- .seq_stop()
- kfree(seqf->private)
- pread()
- .seq_start()
- iter = kmalloc() // fails
- .seq_stop()
- class_dev_iter_exit(seqf->private) // boom! old pointer
As the comment in disk_seqf_stop() says, stop is called even if start
failed, so we need to reinitialise the private pointer to NULL when seq
iteration stops.
An alternative would be to set the private pointer to NULL when the
kmalloc() in disk_seqf_start() fails.
Change-Id: I41ee55505a213f99a92ce630885e6c31b4b60232
Cc: stable@vger.kernel.org
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix a short sprintf buffer in proc_keys_show(). If the gcc stack protector
is turned on, this can cause a panic due to stack corruption.
The problem is that xbuf[] is not big enough to hold a 64-bit timeout
rendered as weeks:
(gdb) p 0xffffffffffffffffULL/(60*60*24*7)
$2 = 30500568904943
That's 14 chars plus NUL, not 11 chars plus NUL.
Expand the buffer to 16 chars.
I think the unpatched code apparently works if the stack-protector is not
enabled because on a 32-bit machine the buffer won't be overflowed and on a
64-bit machine there's a 64-bit aligned pointer at one side and an int that
isn't checked again on the other side.
The panic incurred looks something like:
Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81352ebe
CPU: 0 PID: 1692 Comm: reproducer Not tainted 4.7.2-201.fc24.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
0000000000000086 00000000fbbd2679 ffff8800a044bc00 ffffffff813d941f
ffffffff81a28d58 ffff8800a044bc98 ffff8800a044bc88 ffffffff811b2cb6
ffff880000000010 ffff8800a044bc98 ffff8800a044bc30 00000000fbbd2679
Call Trace:
[<ffffffff813d941f>] dump_stack+0x63/0x84
[<ffffffff811b2cb6>] panic+0xde/0x22a
[<ffffffff81352ebe>] ? proc_keys_show+0x3ce/0x3d0
[<ffffffff8109f7f9>] __stack_chk_fail+0x19/0x30
[<ffffffff81352ebe>] proc_keys_show+0x3ce/0x3d0
[<ffffffff81350410>] ? key_validate+0x50/0x50
[<ffffffff8134db30>] ? key_default_cmp+0x20/0x20
[<ffffffff8126b31c>] seq_read+0x2cc/0x390
[<ffffffff812b6b12>] proc_reg_read+0x42/0x70
[<ffffffff81244fc7>] __vfs_read+0x37/0x150
[<ffffffff81357020>] ? security_file_permission+0xa0/0xc0
[<ffffffff81246156>] vfs_read+0x96/0x130
[<ffffffff81247635>] SyS_read+0x55/0xc0
[<ffffffff817eb872>] entry_SYSCALL_64_fastpath+0x1a/0xa4
Change-Id: I0787d5a38c730ecb75d3c08f28f0ab36295d59e7
Reported-by: Ondrej Kozina <okozina@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Ondrej Kozina <okozina@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an ancient bug that was actually attempted to be fixed once
(badly) by me eleven years ago in commit 4ceb5db9757a ("Fix
get_user_pages() race for write access") but that was then undone due to
problems on s390 by commit f33ea7f404e5 ("fix get_user_pages bug").
In the meantime, the s390 situation has long been fixed, and we can now
fix it by checking the pte_dirty() bit properly (and do it better). The
s390 dirty bit was implemented in abf09bed3cce ("s390/mm: implement
software dirty bits") which made it into v3.9. Earlier kernels will
have to look at the page state itself.
Also, the VM has become more scalable, and what used a purely
theoretical race back then has become easier to trigger.
To fix it, we introduce a new internal FOLL_COW flag to mark the "yes,
we already did a COW" rather than play racy games with FOLL_WRITE that
is very fundamental, and then use the pte dirty flag to validate that
the FOLL_COW flag is still valid.
Change-Id: Id9bec3722797dff7d0ff0d9f6097c4229e31fd62
Reported-and-tested-by: Phil "not Paul" Oester <kernel@linuxace.com>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[wt: s/gup.c/memory.c; s/follow_page_pte/follow_page_mask;
s/faultin_page/__get_user_page]
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
|
|
| |
Change-Id: I0e5b9979850042d790cb89996163bdc69a4c7879
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When tcp_sendmsg() allocates a fresh and empty skb, it puts it at the
tail of the write queue using tcp_add_write_queue_tail()
Then it attempts to copy user data into this fresh skb.
If the copy fails, we undo the work and remove the fresh skb.
Unfortunately, this undo lacks the change done to tp->highest_sack and
we can leave a dangling pointer (to a freed skb)
Later, tcp_xmit_retransmit_queue() can dereference this pointer and
access freed memory. For regular kernels where memory is not unmapped,
this might cause SACK bugs because tcp_highest_sack_seq() is buggy,
returning garbage instead of tp->snd_nxt, but with various debug
features like CONFIG_DEBUG_PAGEALLOC, this can crash the kernel.
This bug was found by Marco Grassi thanks to syzkaller.
Change-Id: I264f97d30d0a623011d9ee811c63fa0e0c2149a2
Fixes: 6859d49475d4 ("[TCP]: Abstract tp->highest_sack accessing & point to next skb")
Reported-by: Marco Grassi <marco.gra@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
| |
the stack object “map” has a total size of 32 bytes. Its last 4
bytes are padding generated by compiler. These padding bytes are
not initialized and sent out via “nla_put”
Bug: 28620102
Change-Id: I13da380c6fe8abca49e3cf9f05293c02b44d2e5e
Signed-off-by: kangjie <kangjielu@gmail.com>
|
|
|
|
|
|
| |
BUG: 27532522
Change-Id: Ic0710a9a8cfc682acd88ecf3bbfeece2d798c4a4
Signed-off-by: Mohamad Ayyash <mkayyash@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(cherry pick from commit 82262a46627bebb0febcc26664746c25cef08563)
There are two issues with the current implementation for replacing user
controls. The first is that the code does not check if the control is
actually a user control and neither does it check if the control is
owned by the process that tries to remove it. That allows userspace
applications to remove arbitrary controls, which can cause a user after
free if a for example a driver does not expect a control to be removed
from under its feed.
The second issue is that on one hand when a control is replaced the
user_ctl_count limit is not checked and on the other hand the
user_ctl_count is increased (even though the number of user controls
does not change). This allows userspace, once the user_ctl_count limit
has been reached, to repeatedly replace a control until user_ctl_count
overflows. Once that happens new controls can be added effectively
bypassing the user_ctl_count limit.
Both issues can be fixed by instead of open-coding the removal of the
control that is to be replaced to use snd_ctl_remove_user_ctl(). This
function does proper permission checks as well as decrements
user_ctl_count after the control has been removed.
Note that by using snd_ctl_remove_user_ctl() the check which returns
-EBUSY at beginning of the function if the control already exists is
removed. This is not a problem though since the check is quite
useless, because the lock that is protecting the control list is
released between the check and before adding the new control to the
list, which means that it is possible that a different control with
the same settings is added to the list after the check. Luckily there
is another check that is done while holding the lock in snd_ctl_add(),
so we'll rely on that to make sure that the same control is not added
twice.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Jaroslav Kysela <perex@perex.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Change-Id: I0b183e2d52afe8e747f59e1ecca6f6fbbac2d016
Bug: 29916012
|
|
|
|
|
|
| |
Bug: 28760453
Change-Id: I019c2de559db9e4b95860ab852211b456d78c4ca
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The syzkaller fuzzer hit the following use-after-free:
Call Trace:
[<ffffffff8175ea0e>] __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:295
[<ffffffff851cc31a>] __sys_recvmmsg+0x6fa/0x7f0 net/socket.c:2261
[< inline >] SYSC_recvmmsg net/socket.c:2281
[<ffffffff851cc57f>] SyS_recvmmsg+0x16f/0x180 net/socket.c:2270
[<ffffffff86332bb6>] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185
And, as Dmitry rightly assessed, that is because we can drop the
reference and then touch it when the underlying recvmsg calls return
some packets and then hit an error, which will make recvmmsg to set
sock->sk->sk_err, oops, fix it.
Reported-and-Tested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Fixes: a2e2725541fa ("net: Introduce recvmmsg socket syscall")
http://lkml.kernel.org/r/20160122211644.GC2470@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Ie3b6ee89ad3e8cd3a0fe8f50f74aaa4834d0b4ca
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The syzkaller fuzzer hit the following use-after-free:
Call Trace:
[<ffffffff8175ea0e>] __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:295
[<ffffffff851cc31a>] __sys_recvmmsg+0x6fa/0x7f0 net/socket.c:2261
[< inline >] SYSC_recvmmsg net/socket.c:2281
[<ffffffff851cc57f>] SyS_recvmmsg+0x16f/0x180 net/socket.c:2270
[<ffffffff86332bb6>] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185
And, as Dmitry rightly assessed, that is because we can drop the
reference and then touch it when the underlying recvmsg calls return
some packets and then hit an error, which will make recvmmsg to set
sock->sk->sk_err, oops, fix it.
Reported-and-Tested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Fixes: a2e2725541fa ("net: Introduce recvmmsg socket syscall")
http://lkml.kernel.org/r/20160122211644.GC2470@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Ie3b6ee89ad3e8cd3a0fe8f50f74aaa4834d0b4ca
|
|
|
|
|
|
|
| |
Works in conjunction with kptr_restrict.
Bug: 30143283
Change-Id: I2b3ce22f4e206e74614d51453a1d59b7080ab05a
|
|
|
|
|
|
| |
unneeded, only causes crashes
Change-Id: I58a5121ed80c3460f20a4afce32d6925588b877e
|
|
|
|
| |
Change-Id: Iafe4ac15a4c198e3c016ab40fc6d631999c5bdaf
|
|
|
|
| |
Change-Id: Icada3ee0531466768d33785c17809ede47066bdb
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Userspace processes often have multiple allocators that each do
anonymous mmaps to get memory. When examining memory usage of
individual processes or systems as a whole, it is useful to be
able to break down the various heaps that were allocated by
each layer and examine their size, RSS, and physical memory
usage.
This patch adds a user pointer to the shared union in
vm_area_struct that points to a null terminated string inside
the user process containing a name for the vma. vmas that
point to the same address will be merged, but vmas that
point to equivalent strings at different addresses will
not be merged.
Userspace can set the name for a region of memory by calling
prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name);
Setting the name to NULL clears it.
The names of named anonymous vmas are shown in /proc/pid/maps
as [anon:<name>] and in /proc/pid/smaps in a new "Name" field
that is only present for named vmas. If the userspace pointer
is no longer valid all or part of the name will be replaced
with "<fault>".
The idea to store a userspace pointer to reduce the complexity
within mm (at the expense of the complexity of reading
/proc/pid/mem) came from Dave Hansen. This results in no
runtime overhead in the mm subsystem other than comparing
the anon_name pointers when considering vma merging. The pointer
is stored in a union with fieds that are only used on file-backed
mappings, so it does not increase memory usage.
Change-Id: I53b093d98dc24f41377824f34e076edced4a6f07
|
|
|
|
|
|
|
|
|
|
|
| |
This is a fuelgauge driver, not an actual battery driver.
Setting its type to 'Battery' will confuse healthd,
causing healthd to pick this driver instead of the actual battery driver
for reading battery stats.
Issue-Id: NIGHTLIES-3279
Change-Id: Ia45e74599d391a90cb526aa07a2525b64c3eec96
|
|
|
|
|
|
| |
based off a similar patch for klte by Kevin Haggerty
Change-Id: If2b4f1f2c0310fc0a6c3fe49fd680973dce28ef5
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
-Appears to fix frequently occuring, annoying random reboots with
no other observed regressions. I went from having up to 2-3
crashes an hour related to USB, to none all day
-This option is deprecated and has been removed in later kernels
versions anyways.
Special Thanks: Simon Shields, for helping me interpet the crash
dumps and pointing me in the right direction on this.
Change-Id: Ieda0eb6e0dfb3fec4cfbe89540a587eaa6de7995
|
|
|
|
|
|
|
|
| |
Otherwise this function may read data beyond the ruleset blob.
Change-Id: I22f514057d3e0403d1af61f4d9555403ab9f72ea
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
| |
Change-Id: Ida19e5102b7faca17c685a261c20fbbf5c9614f9
Cc: stable@vger.kernel.org # v3.19
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch validates the num_values parameter from userland during the
HIDIOCGUSAGES and HIDIOCSUSAGES commands. Previously, if the report id was set
to HID_REPORT_ID_UNKNOWN, we would fail to validate the num_values parameter
leading to a heap overflow.
Change-Id: I10866ee01c7ba430eab2b5cc3356c9519c7f9730
Cc: stable@vger.kernel.org
Signed-off-by: Scott Bauer <sbauer@plzdonthack.me>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
|
|
|
|
|
|
|
| |
If __key_link_begin() failed then "edit" would be uninitialized. I've
added a check to fix that.
Change-Id: I0e28bdba07f645437db2b08daf67ca27f16c6f5c
Fixes: f70e2e06196a ('KEYS: Do preallocation for __key_link()')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The only users of collect_mounts are in audit_tree.c
In audit_trim_trees and audit_add_tree_rule the path passed into
collect_mounts is generated from kern_path passed an audit_tree
pathname which is guaranteed to be an absolute path. In those cases
collect_mounts is obviously intended to work on mounted paths and
if a race results in paths that are unmounted when collect_mounts
it is reasonable to fail early.
The paths passed into audit_tag_tree don't have the absolute path
check. But are used to play with fsnotify and otherwise interact with
the audit_trees, so again operating only on mounted paths appears
reasonable.
Avoid having to worry about what happens when we try and audit
unmounted filesystems by restricting collect_mounts to mounts
that appear in the mount tree.
Change-Id: I2edfee6d6951a2179ce8f53785b65ddb1eb95629
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
|
|
| |
Change-Id: I6bee287c05b3f5170b8d1e2c1c6738d3c8cc88e9
|
|
|
|
|
|
| |
This reverts commit 6f6a51da4fe6f5b6c11eae5e86d59d214feb538b.
Change-Id: I907a5858269642e72b93fc0ebd138660e4c79b9c
|
|
|
|
|
|
| |
This reverts commit 00cd10b11d1123bd4aee1f8845a11e7d0b662bcc.
Change-Id: I07285ec12e8a2ebf3f7925ea8ce67c51d1d903cf
|
|
|
|
| |
Change-Id: I8c5401059cdec1b0298a162ea26030ef8471671a
|
|
|
|
| |
Change-Id: I802127b600b35f03e864cc1603f7d42e144cca21
|
|
|
|
| |
Change-Id: I6c406d1c1d97ee3a3846c04c92dd625ab621a020
|
|
|
|
|
|
|
| |
* don't enable lz4 yet, but enable zsmalloc in order to continue using
zram.
Change-Id: I78ca1bb9a75c19750e65d76862a65f44986de6ac
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
zcomp_create() verifies the success of zcomp_strm_{multi,single}_create()
through comp->stream, which can potentially be pointing to memory that
was freed if these functions returned an error.
While at it, replace a 'ERR_PTR(-ENOMEM)' by a more generic
'ERR_PTR(error)' as in the future zcomp_strm_{multi,siggle}_create()
could return other error codes. Function documentation updated
accordingly.
Change-Id: I84334ce1929c8212aa70387781ef0a6b0af50fa5
Fixes: beca3ec71fe5 ("zram: add multi stream functionality")
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
| |
Change-Id: I2e26fbcd06541536258313f4f5753ca87ab46d9c
|
|
|
|
|
| |
Change-Id: I34c547039d02366649206395fe3fb3f363fc900e
Signed-off-by: Emanuele Scarlata <scarlataemanuele@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, we maintain LMK rbtree with task->adj_node. However, when
handling oom_score_adj change case, we may del/add a non-leader task
to the RB tree, which is not as expected.
This patch we maintain the LMK rbtree with task->signal->adj_node.
Since signal_struct is shared between main task and threads, we can
avoid non-leader thread adding to tree.
Change-Id: I3ba9e740e03ab04c25497a1cc2c870f051bd5b07
Signed-off-by: Hong-Mei Li <a21834@motorola.com>
Reviewed-on: http://gerrit.mot.com/754225
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver <sltawvr@motorola.com>
Tested-by: Jira Key <jirakey@motorola.com>
Reviewed-by: Zhi-Ming Yuan <a14194@motorola.com>
Reviewed-by: Yi-Wei Zhao <gbjc64@motorola.com>
Submit-Approved: Jira Key <jirakey@motorola.com>
(cherry picked from commit b40634023f9152c6232de9acb80108e0af7e4075)
Signed-off-by: Abdul Salam <salamab@motorola.com>
Reviewed-on: http://gerrit.mot.com/766107
Reviewed-by: Sudharsan Yettapu <sudharsan.yettapu@motorola.com>
Reviewed-by: Ravikumar Vembu <raviv@motorola.com>
(cherry picked from commit f3abd37ce3b4d36ae05cfc1c5cd10e5a3f584e7f)
Reviewed-on: http://gerrit.mot.com/768302
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On some race, the tsk that lmk is using may be deleted from the RB tree
by other thread, and rb_next would return a NULL if we use this tsk to
get next. For this case, we need to skip this round of shrink and wait
for the next turn. Otherwise, tsk would trigger NULL pointer panic.
Change-Id: I37f4bd2827f8a0a28f29192dd71532d1c252f986
Signed-off-by: Hong-Mei Li <a21834@motorola.com>
Reviewed-on: http://gerrit.mot.com/729556
SLTApproved: Slta Waiver <sltawvr@motorola.com>
SME-Granted: SME Approvals Granted
Tested-by: Jira Key <jirakey@motorola.com>
Reviewed-by: Yi-Wei Zhao <gbjc64@motorola.com>
Submit-Approved: Jira Key <jirakey@motorola.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
someone may change a process's oom_score_adj by proc fs, even though the
process has exited. In that case, the task was deleted from the rb tree
already, and the redundant deleting would trigger rb_erase panic finally.
In this patch, we make sure to clear the node after deteting and check
its empty status before rb_erase.
Change-Id: I7628c7d21011099e796b7d366cbc142f96bb8aab
Signed-off-by: Hong-Mei Li <a21834@motorola.com>
Reviewed-on: http://gerrit.mot.com/725306
SLTApproved: Slta Waiver <sltawvr@motorola.com>
SME-Granted: SME Approvals Granted
Tested-by: Jira Key <jirakey@motorola.com>
Reviewed-by: Sheng-Zhe Zhao <a18689@motorola.com>
Submit-Approved: Jira Key <jirakey@motorola.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
there is racing condition: after reading rb tree root, it might be changed
by other tasks before adding new node. it can lead to rb tree corruption.
This patch is to avoid this race condition.
Change-Id: Id86bfd133488ad4ee12cd83c9bf1d1c12ef5598f
Signed-off-by: Yi-wei Zhao <gbjc64@motorola.com>
Reviewed-on: http://gerrit.mot.com/715645
Tested-by: Jira Key <jirakey@motorola.com>
Reviewed-by: Sheng-Zhe Zhao <a18689@motorola.com>
SLTApproved: Christopher Fries <cfries@motorola.com>
Submit-Approved: Jira Key <jirakey@motorola.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Under certain circumstances, a process may take time to handle a SIGKILL.
When lowmemkiller is called again shortly after, it would pick the same
process to kill over and over, so that we cann't get free memory for long time.
Solution is to check fatal_signal_pending() on the selected task, and if it's
already pending, select a new task to kill.
Cherry-pick 5e3358093351e5d48e21250e31896b855542f22c
Reviewed-on: http://gerrit.pcs.mot.com/479831
Change-Id: I53445114451ffaba293f3c7174fb0b01ed0d34b6
Signed-off-by: Tianshui Shi <kfp634@motorola.com>
Reviewed-on: http://gerrit.pcs.mot.com/505410
Tested-by: Jira Key <JIRAKEY@motorola.com>
Reviewed-by: Yi-Wei Zhao <gbjc64@motorola.com>
Reviewed-by: Jason Hrycay <jason.hrycay@motorola.com>
Reviewed-by: Jeffrey Carlyle <jeff.carlyle@motorola.com>
(cherry picked from commit da093001caf06ed2296b4f79c84cc48ce713eac6)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Based on the current LMK implementation, LMK has to scan all processes to
select the correct task to kill during low memory.
The basic idea for the optimization is to :
queue all tasks with oom_score_adj priority, and then LMK just selects the
proper task from the queue(rbtree) to kill.
performance improvement:
the current implementation: average time to find a task to kill : 1004us
the optimized implementation: average time to find a task to kill: 43us
Change-Id: I4dbbdd5673314dbbdabb71c3eff0dc229ce4ea91
Signed-off-by: Hong-Mei Li <a21834@motorola.com>
Reviewed-on: http://gerrit.pcs.mot.com/548917
SLT-Approved: Slta Waiver <sltawvr@motorola.com>
Tested-by: Jira Key <jirakey@motorola.com>
Reviewed-by: Yi-Wei Zhao <gbjc64@motorola.com>
Submit-Approved: Jira Key <jirakey@motorola.com>
Signed-off-by: D. Andrei Măceș <dmaces@nd.edu>
Conflicts:
drivers/staging/android/Kconfig
drivers/staging/android/lowmemorykiller.c
fs/proc/base.c
mm/oom_kill.c
Conflicts:
drivers/staging/android/lowmemorykiller.c
mm/oom_kill.c
Conflicts:
mm/oom_kill.c
Conflicts:
drivers/staging/android/lowmemorykiller.c
mm/oom_kill.c
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To maintain the task adj RB tree, we add a task to the RB tree when fork,
and delete it when exit. The place is exactly the same as the linear
p->tasks list, say, nly when the task is thread_group_leader.
When task group_leader is changing, we make sure to add the new leader
into RB tree after its leader flag is set, task->exit_signal.
Cherry-picked from (CR): http://gerrit.mot.com/753419/
Change-Id: I8da47998510e531188feb067b491e92306be9414
Signed-off-by: Hong-Mei Li <a21834@motorola.com>
Reviewed-on: http://gerrit.mot.com/753419
SLTApproved: Slta Waiver <sltawvr@motorola.com>
SME-Granted: SME Approvals Granted
Tested-by: Jira Key <jirakey@motorola.com>
Reviewed-by: Zhi-Ming Yuan <a14194@motorola.com>
Reviewed-by: Yi-Wei Zhao <gbjc64@motorola.com>
Submit-Approved: Jira Key <jirakey@motorola.com>
Reviewed-on: http://gerrit.mot.com/766106
Reviewed-by: Sudharsan Yettapu <sudharsan.yettapu@motorola.com>
Reviewed-by: Ravikumar Vembu <raviv@motorola.com>
(cherry picked from commit e9e92d64142625981490dd5c323aa08467d349e8)
Reviewed-on: http://gerrit.mot.com/768301
Conflicts:
fs/exec.c
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To maintain the task adj RB tree, we add a task to the RB tree when fork,
and delete it when exit. The place is exactly the same as the linear
p->tasks list, only when the task is thread_group_leader.
But to handle the oom_score_adj change case, which did not check the
thread_group_leader, we may del/add a non-leader task to the RB tree.
Finally leave the task in the RB tree, since we would not really delete
a non-leader task from the tree. The orphan task would finally be freed,
and cause later use-after-free panic when accessing RB tree.
Solution:
Move the rbtree adj_node to signal_struct, which is shared between
task and all threads. This can make sure we only add one node for
a thread group.
Change-Id: I1e8dfe490656408863b3726c7bc9e4ee6dc5abc1
Signed-off-by: Hong-Mei Li <a21834@motorola.com>
Reviewed-on: http://gerrit.mot.com/754224
SLTApproved: Slta Waiver <sltawvr@motorola.com>
SME-Granted: SME Approvals Granted
Tested-by: Jira Key <jirakey@motorola.com>
Reviewed-by: Zhi-Ming Yuan <a14194@motorola.com>
Reviewed-by: Yi-Wei Zhao <gbjc64@motorola.com>
Submit-Approved: Jira Key <jirakey@motorola.com>
(cherry picked from commit b3f12a2465542888ec5c868c38022e0e5f7631ca)
Signed-off-by: Abdul Salam <salamab@motorola.com>
Reviewed-on: http://gerrit.mot.com/766108
Reviewed-by: Sudharsan Yettapu <sudharsan.yettapu@motorola.com>
Reviewed-by: Ravikumar Vembu <raviv@motorola.com>
(cherry picked from commit 558ef1fceae5d4c8509cb2a40d98c841525f7ea3)
Reviewed-on: http://gerrit.mot.com/768300
Conflicts:
kernel/fork.c
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
lowmemorykiller debug messages are inscrutable and mostly useful
for debugging the lowmemorykiller, not explaining why a process
was killed. Make the messages more useful by prefixing them
with "lowmemorykiller: " and explaining in more readable terms
what was killed, who it was killed for, and why it was killed.
The messages now look like:
[ 76.997631] lowmemorykiller: Killing 'droid.gallery3d' (2172), adj 1000,
[ 76.997635] to free 27436kB on behalf of 'kswapd0' (29) because
[ 76.997638] cache 122624kB is below limit 122880kB for oom_score_adj 1000
[ 76.997641] Free memory is -53356kB above reserved
A negative number for free memory above reserved means some of the
reserved memory has been used and is being regenerated by kswapd,
which is likely what called the shrinkers.
Change-Id: I1fe983381e73e124b90aa5d91cb66e55eaca390f
Signed-off-by: Colin Cross <ccross@android.com>
Conflicts:
drivers/staging/android/lowmemorykiller.c
|
|
|
|
|
|
|
|
|
|
| |
The select...to kill messages are not very useful when not debugging
the lowmemorykiller itself. After the change to check TIF_MEMDIE
instead of using a task notifer this message can also get very
noisy.
Change-Id: Ice171c25801d6faa454b885a23b24b002423b754
Signed-off-by: Arve Hjønnevåg <arve@android.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The amount of reserved memory varies between devices. Subtract it
here to reduce the amount of devices specific tuning needed for the
minfree values.
Change-Id: I466ae8b18f5972f6f6d8b5a7d8c4ae69660de53a
Signed-off-by: Arve Hjønnevåg <arve@android.com>
Conflicts:
drivers/staging/android/lowmemorykiller.c
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The conversion to use oom_score_adj instead of the deprecated oom_adj
values breaks existing user-space code. Add a config option to convert
oom_adj values written to oom_score_adj values if they appear to be
valid oom_adj values.
Change-Id: I68308125059b802ee2991feefb07e9703bc48549
Signed-off-by: Arve Hjønnevåg <arve@android.com>
Conflicts:
drivers/staging/android/Kconfig
|
|
|
|
|
|
|
| |
Fix compiler warning about the type of the module parameter.
Cc: San Mehat <san@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
| |
/proc/pid/oom_adj is deprecated and will be removed in August 2012
according to Documentation/feature-removal-schedule.txt. Convert its
usage in the lowmemorykiller to use the new interface, oom_score_adj,
instead.
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
| |
LMK should not try killing kernel threads.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|