From f7232154198f928fc25f420d6190468212a7632a Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@saeurebad.de>
Date: Fri, 23 May 2008 13:04:21 -0700
Subject: mm: don't drop a partial page in a zone's memory map size

In a zone's present pages number, account for all pages occupied by the
memory map, including a partial.

Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/page_alloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

(limited to 'mm/page_alloc.c')
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 63835579323..035300299f9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3378,7 +3378,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 		 * is used by this zone for memmap. This affects the watermark
 		 * and per-cpu initialisations
 		 */
-		memmap_pages = (size * sizeof(struct page)) >> PAGE_SHIFT;
+		memmap_pages =
+			PAGE_ALIGN(size * sizeof(struct page)) >> PAGE_SHIFT;
 		if (realsize >= memmap_pages) {
 			realsize -= memmap_pages;
 			printk(KERN_DEBUG
-- 
cgit v1.2.3


From 7eb54824b76793dd86afb54f182ef9aa64b3a45a Mon Sep 17 00:00:00 2001
From: Andy Whitcroft <apw@shadowen.org>
Date: Fri, 23 May 2008 13:04:50 -0700
Subject: zonelists: handle a node zonelist with no applicable entries

When booting 2.6.26-rc3 on a multi-node x86_32 numa system we are seeing
panics when trying node local allocations:

 BUG: unable to handle kernel NULL pointer dereference at 0000034c
 IP: [<c1042507>] get_page_from_freelist+0x4a/0x18e
 *pdpt = 00000000013a7001 *pde = 0000000000000000
 Oops: 0000 [#1] SMP
 Modules linked in:

 Pid: 0, comm: swapper Not tainted (2.6.26-rc3-00003-g5abc28d #82)
 EIP: 0060:[<c1042507>] EFLAGS: 00010282 CPU: 0
 EIP is at get_page_from_freelist+0x4a/0x18e
 EAX: c1371ed8 EBX: 00000000 ECX: 00000000 EDX: 00000000
 ESI: f7801180 EDI: 00000000 EBP: 00000000 ESP: c1371ec0
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
 Process swapper (pid: 0, ti=c1370000 task=c12f5b40 task.ti=c1370000)
 Stack: 00000000 00000000 00000000 00000000 000612d0 000412d0 00000000 000412d0
        f7801180 f7c0101c f7c01018 c10426e4 f7c01018 00000001 00000044 00000000
        00000001 c12f5b40 00000001 00000010 00000000 000412d0 00000286 000412d0
 Call Trace:
  [<c10426e4>] __alloc_pages_internal+0x99/0x378
  [<c10429ca>] __alloc_pages+0x7/0x9
  [<c105e0e8>] kmem_getpages+0x66/0xef
  [<c105ec55>] cache_grow+0x8f/0x123
  [<c105f117>] ____cache_alloc_node+0xb9/0xe4
  [<c105f427>] kmem_cache_alloc_node+0x92/0xd2
  [<c122118c>] setup_cpu_cache+0xaf/0x177
  [<c105e6ca>] kmem_cache_create+0x2c8/0x353
  [<c13853af>] kmem_cache_init+0x1ce/0x3ad
  [<c13755c5>] start_kernel+0x178/0x1ee

This occurs when we are scanning the zonelists looking for a ZONE_NORMAL
page.  In this system there is only ZONE_DMA and ZONE_NORMAL memory on
node 0, all other nodes are mapped above 4GB physical.  Here is a dump
of the zonelists from this system:

    zonelists pgdat=c1400000
     0: c14006c0:2 f7c006c0:2 f7e006c0:2 c1400360:1 c1400000:0
     1: c14006c0:2 c1400360:1 c1400000:0
    zonelists pgdat=f7c00000
     0: f7c006c0:2 f7e006c0:2 c14006c0:2 c1400360:1 c1400000:0
     1: f7c006c0:2
    zonelists pgdat=f7e00000
     0: f7e006c0:2 c14006c0:2 f7c006c0:2 c1400360:1 c1400000:0
     1: f7e006c0:2

When performing a node local allocation we call get_page_from_freelist()
looking for a page.  It in turn calls first_zones_zonelist() which returns
a preferred_zone.  Where there are no applicable zones this will be NULL.
However we use this unconditionally, leading to this panic.

Where there are no applicable zones there is no possibility of a successful
allocation, so simply fail the allocation.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/page_alloc.c | 3 +++
 1 file changed, 3 insertions(+)

(limited to 'mm/page_alloc.c')

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 035300299f9..7f4c66ff65b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1396,6 +1396,9 @@ get_page_from_freelist(gfp_t gfp_mask, nodemask_t *nodemask, unsigned int order,
 
 	(void)first_zones_zonelist(zonelist, high_zoneidx, nodemask,
 							&preferred_zone);
+	if (!preferred_zone)
+		return NULL;
+
 	classzone_idx = zone_idx(preferred_zone);
 
 zonelist_scan:
-- 
cgit v1.2.3


From cd94b9dbfa300fc42e45f230010623fc08d59563 Mon Sep 17 00:00:00 2001
From: Heiko Carstens <heiko.carstens@de.ibm.com>
Date: Fri, 23 May 2008 13:04:52 -0700
Subject: memory hotplug: fix early allocation handling

Trying to add memory via add_memory() from within an initcall function
results in

bootmem alloc of 163840 bytes failed!
Kernel panic - not syncing: Out of memory

This is caused by zone_wait_table_init() which uses system_state to decide
if it should use the bootmem allocator or not.

When initcalls are handled the system_state is still SYSTEM_BOOTING but
the bootmem allocator doesn't work anymore.  So the allocation will fail.

To fix this use slab_is_available() instead as indicator like we do it
everywhere else.

[akpm@linux-foundation.org: coding-style fix]
Reviewed-by: Andy Whitcroft <apw@shadowen.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'mm/page_alloc.c')

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7f4c66ff65b..8e83f02cd2d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2807,7 +2807,7 @@ int zone_wait_table_init(struct zone *zone, unsigned long zone_size_pages)
 	alloc_size = zone->wait_table_hash_nr_entries
 					* sizeof(wait_queue_head_t);
 
- 	if (system_state == SYSTEM_BOOTING) {
+	if (!slab_is_available()) {
 		zone->wait_table = (wait_queue_head_t *)
 			alloc_bootmem_node(pgdat, alloc_size);
 	} else {
-- 
cgit v1.2.3