aboutsummaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* NFSv4.1: Remove a bogus BUG_ON() in nfs4_layoutreturn_doneTrond Myklebust2012-09-141-6/+2
| | | | | | | | | | | | | | | | | | | | | | commit 47fbf7976e0b7d9dcdd799e2a1baba19064d9631 upstream. Ever since commit 0a57cdac3f (NFSv4.1 send layoutreturn to fence disconnected data server) we've been sending layoutreturn calls while there is potentially still outstanding I/O to the data servers. The reason we do this is to avoid races between replayed writes to the MDS and the original writes to the DS. When this happens, the BUG_ON() in nfs4_layoutreturn_done can be triggered because it assumes that we would never call layoutreturn without knowing that all I/O to the DS is finished. The fix is to remove the BUG_ON() now that the assumptions behind the test are obsolete. Reported-by: Boaz Harrosh <bharrosh@panasas.com> Reported-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv3: Ensure that do_proc_get_root() reports errors correctlyTrond Myklebust2012-09-141-1/+1
| | | | | | | | | | | | | | commit 086600430493e04b802bee6e5b3ce0458e4eb77f upstream. If the rpc call to NFS3PROC_FSINFO fails, then we need to report that error so that the mount fails. Otherwise we can end up with a superblock with completely unusable values for block sizes, maxfilesize, etc. Reported-by: Yuanming Chen <hikvision_linux@163.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vfs: canonicalize create mode in build_open_flags()Miklos Szeredi2012-09-141-3/+4
| | | | | | | | | | | | | | | | | commit e68726ff72cf7ba5e7d789857fcd9a75ca573f03 upstream. Userspace can pass weird create mode in open(2) that we canonicalize to "(mode & S_IALLUGO) | S_IFREG" in vfs_create(). The problem is that we use the uncanonicalized mode before calling vfs_create() with unforseen consequences. So do the canonicalization early in build_open_flags(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vfs: missed source of ->f_pos racesAl Viro2012-09-141-2/+8
| | | | | | | | | | | | commit 0e665d5d1125f9f4ccff56a75e814f10f88861a2 upstream. compat_sys_{read,write}v() need the same "pass a copy of file->f_pos" thing as sys_{read,write}{,v}(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ext4: avoid kmemcheck complaint from reading uninitialized memoryTheodore Ts'o2012-08-261-0/+1
| | | | | | | | | | | | | | | | | | | commit 7e731bc9a12339f344cddf82166b82633d99dd86 upstream. Commit 03179fe923 introduced a kmemcheck complaint in ext4_da_get_block_prep() because we save and restore ei->i_da_metadata_calc_last_lblock even though it is left uninitialized in the case where i_da_metadata_calc_len is zero. This doesn't hurt anything, but silencing the kmemcheck complaint makes it easier for people to find real bugs. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=45631 (which is marked as a regression). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* fuse: verify all ioctl retry iov elementsZach Brown2012-08-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | commit fb6ccff667712c46b4501b920ea73a326e49626a upstream. Commit 7572777eef78ebdee1ecb7c258c0ef94d35bad16 attempted to verify that the total iovec from the client doesn't overflow iov_length() but it only checked the first element. The iovec could still overflow by starting with a small element. The obvious fix is to check all the elements. The overflow case doesn't look dangerous to the kernel as the copy is limited by the length after the overflow. This fix restores the intention of returning an error instead of successfully copying less than the iovec represented. I found this by code inspection. I built it but don't have a test case. I'm cc:ing stable because the initial commit did as well. Signed-off-by: Zach Brown <zab@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nilfs2: fix deadlock issue between chcp and thaw ioctlsRyusuke Konishi2012-08-154-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 572d8b3945a31bee7c40d21556803e4807fd9141 upstream. An fs-thaw ioctl causes deadlock with a chcp or mkcp -s command: chcp D ffff88013870f3d0 0 1325 1324 0x00000004 ... Call Trace: nilfs_transaction_begin+0x11c/0x1a0 [nilfs2] wake_up_bit+0x20/0x20 copy_from_user+0x18/0x30 [nilfs2] nilfs_ioctl_change_cpmode+0x7d/0xcf [nilfs2] nilfs_ioctl+0x252/0x61a [nilfs2] do_page_fault+0x311/0x34c get_unmapped_area+0x132/0x14e do_vfs_ioctl+0x44b/0x490 __set_task_blocked+0x5a/0x61 vm_mmap_pgoff+0x76/0x87 __set_current_blocked+0x30/0x4a sys_ioctl+0x4b/0x6f system_call_fastpath+0x16/0x1b thaw D ffff88013870d890 0 1352 1351 0x00000004 ... Call Trace: rwsem_down_failed_common+0xdb/0x10f call_rwsem_down_write_failed+0x13/0x20 down_write+0x25/0x27 thaw_super+0x13/0x9e do_vfs_ioctl+0x1f5/0x490 vm_mmap_pgoff+0x76/0x87 sys_ioctl+0x4b/0x6f filp_close+0x64/0x6c system_call_fastpath+0x16/0x1b where the thaw ioctl deadlocked at thaw_super() when called while chcp was waiting at nilfs_transaction_begin() called from nilfs_ioctl_change_cpmode(). This deadlock is 100% reproducible. This is because nilfs_ioctl_change_cpmode() first locks sb->s_umount in read mode and then waits for unfreezing in nilfs_transaction_begin(), whereas thaw_super() locks sb->s_umount in write mode. The locking of sb->s_umount here was intended to make snapshot mounts and the downgrade of snapshots to checkpoints exclusive. This fixes the deadlock issue by replacing the sb->s_umount usage in nilfs_ioctl_change_cpmode() with a dedicated mutex which protects snapshot mounts. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ext4: don't let i_reserved_meta_blocks go negativeBrian Foster2012-08-091-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | commit 97795d2a5b8d3c8dc4365d4bd3404191840453ba upstream. If we hit a condition where we have allocated metadata blocks that were not appropriately reserved, we risk underflow of ei->i_reserved_meta_blocks. In turn, this can throw sbi->s_dirtyclusters_counter significantly out of whack and undermine the nondelalloc fallback logic in ext4_nonda_switch(). Warn if this occurs and set i_allocated_meta_blocks to avoid this problem. This condition is reproduced by xfstests 270 against ext2 with delalloc enabled: Mar 28 08:58:02 localhost kernel: [ 171.526344] EXT4-fs (loop1): delayed block allocation failed for inode 14 at logical offset 64486 with max blocks 64 with error -28 Mar 28 08:58:02 localhost kernel: [ 171.526346] EXT4-fs (loop1): This should not happen!! Data will be lost 270 ultimately fails with an inconsistent filesystem and requires an fsck to repair. The cause of the error is an underflow in ext4_da_update_reserve_space() due to an unreserved meta block allocation. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ext4: pass a char * to ext4_count_free() instead of a buffer_head ptrTheodore Ts'o2012-08-094-8/+8
| | | | | | | | | | | commit f6fb99cadcd44660c68e13f6eab28333653621e6 upstream. Make it possible for ext4_count_free to operate on buffers and not just data in buffer_heads. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nfs: skip commit in releasepage if we're freeing memory for fs-related reasonsJeff Layton2012-08-091-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5cf02d09b50b1ee1c2d536c9cf64af5a7d433f56 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nfsd4: our filesystems are normally case sensitiveJ. Bruce Fields2012-08-091-1/+1
| | | | | | | | | | | commit 2930d381d22b9c56f40dd4c63a8fa59719ca2c3c upstream. Actually, xfs and jfs can optionally be case insensitive; we'll handle that case in later patches. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Btrfs: call the ordered free operation without any locks heldChris Mason2012-08-091-1/+8
| | | | | | | | | | | | | | | | commit e9fbcb42201c862fd6ab45c48ead4f47bb2dea9d upstream. Each ordered operation has a free callback, and this was called with the worker spinlock held. Josef made the free callback also call iput, which we can't do with the spinlock. This drops the spinlock for the free operation and grabs it again before moving through the rest of the list. We'll circle back around to this and find a cleaner way that doesn't bounce the lock around so much. Signed-off-by: Chris Mason <chris.mason@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* locks: fix checking of fcntl_setlease argumentJ. Bruce Fields2012-08-091-3/+3
| | | | | | | | | | | | | | | | | | commit 0ec4f431eb56d633da3a55da67d5c4b88886ccc7 upstream. The only checks of the long argument passed to fcntl(fd,F_SETLEASE,.) are done after converting the long to an int. Thus some illegal values may be let through and cause problems in later code. [ They actually *don't* cause problems in mainline, as of Dave Jones's commit 8d657eb3b438 "Remove easily user-triggerable BUG from generic_setlease", but we should fix this anyway. And this patch will be necessary to fix real bugs on earlier kernels. ] Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mm: compaction: introduce sync-light migration for use by compactionMel Gorman2012-08-014-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit a6bc32b899223a877f595ef9ddc1e89ead5072b8 upstream. Stable note: Not tracked in Buzilla. This was part of a series that reduced interactivity stalls experienced when THP was enabled. These stalls were particularly noticable when copying data to a USB stick but the experiences for users varied a lot. This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT mode that avoids writing back pages to backing storage. Async compaction maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT. For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is used. This avoids sync compaction stalling for an excessive length of time, particularly when copying files to a USB stick where there might be a large number of dirty pages backed by a filesystem that does not support ->writepages. [aarcange@redhat.com: This patch is heavily based on Andrea's work] [akpm@linux-foundation.org: fix fs/nfs/write.c build] [akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build] Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Dave Jones <davej@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Andy Isaacson <adi@hexapodia.org> Cc: Nai Xia <nai.xia@gmail.com> Cc: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mm: compaction: determine if dirty pages can be migrated without blocking ↵Mel Gorman2012-08-014-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | within ->migratepage commit b969c4ab9f182a6e1b2a0848be349f99714947b0 upstream. Stable note: Not tracked in Bugzilla. A fix aimed at preserving page aging information by reducing LRU list churning had the side-effect of reducing THP allocation success rates. This was part of a series to restore the success rates while preserving the reclaim fix. Asynchronous compaction is used when allocating transparent hugepages to avoid blocking for long periods of time. Due to reports of stalling, there was a debate on disabling synchronous compaction but this severely impacted allocation success rates. Part of the reason was that many dirty pages are skipped in asynchronous compaction by the following check; if (PageDirty(page) && !sync && mapping->a_ops->migratepage != migrate_page) rc = -EBUSY; This skips over all mapping aops using buffer_migrate_page() even though it is possible to migrate some of these pages without blocking. This patch updates the ->migratepage callback with a "sync" parameter. It is the responsibility of the callback to fail gracefully if migration would block. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Dave Jones <davej@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Andy Isaacson <adi@hexapodia.org> Cc: Nai Xia <nai.xia@gmail.com> Cc: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* UBIFS: fix a bug in empty space fix-upArtem Bityutskiy2012-08-011-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c6727932cfdb13501108b16c38463c09d5ec7a74 upstream. UBIFS has a feature called "empty space fix-up" which is a quirk to work-around limitations of dumb flasher programs. Namely, of those flashers that are unable to skip NAND pages full of 0xFFs while flashing, resulting in empty space at the end of half-filled eraseblocks to be unusable for UBIFS. This feature is relatively new (introduced in v3.0). The fix-up routine (fixup_free_space()) is executed only once at the very first mount if the superblock has the 'space_fixup' flag set (can be done with -F option of mkfs.ubifs). It basically reads all the UBIFS data and metadata and writes it back to the same LEB. The routine assumes the image is pristine and does not have anything in the journal. There was a bug in 'fixup_free_space()' where it fixed up the log incorrectly. All but one LEB of the log of a pristine file-system are empty. And one contains just a commit start node. And 'fixup_free_space()' just unmapped this LEB, which resulted in wiping the commit start node. As a result, some users were unable to mount the file-system next time with the following symptom: UBIFS error (pid 1): replay_log_leb: first log node at LEB 3:0 is not CS node UBIFS error (pid 1): replay_log_leb: log error detected while replaying the log at LEB 3:0 The root-cause of this bug was that 'fixup_free_space()' wrongly assumed that the beginning of empty space in the log head (c->lhead_offs) was known on mount. However, it is not the case - it was always 0. UBIFS does not store in it the master node and finds out by scanning the log on every mount. The fix is simple - just pass commit start node size instead of 0 to 'fixup_leb()'. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com> Reported-by: Iwo Mergler <Iwo.Mergler@netcommwireless.com> Tested-by: Iwo Mergler <Iwo.Mergler@netcommwireless.com> Reported-by: James Nute <newten82@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* cifs: always update the inode cache with the results from a FIND_*Jeff Layton2012-08-011-2/+5
| | | | | | | | | | | | | | | | | | | commit cd60042cc1392e79410dc8de9e9c1abb38a29e57 upstream. When we get back a FIND_FIRST/NEXT result, we have some info about the dentry that we use to instantiate a new inode. We were ignoring and discarding that info when we had an existing dentry in the cache. Fix this by updating the inode in place when we find an existing dentry and the uniqueid is the same. Reported-and-Tested-by: Andrew Bartlett <abartlet@samba.org> Reported-by: Bill Robertson <bill_robertson@debortoli.com.au> Reported-by: Dion Edwards <dion_edwards@debortoli.com.au> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* fifo: Do not restart open() if it already found a partnerAnders Kaseorg2012-07-191-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 05d290d66be6ef77a0b962ebecf01911bd984a78 upstream. If a parent and child process open the two ends of a fifo, and the child immediately exits, the parent may receive a SIGCHLD before its open() returns. In that case, we need to make sure that open() will return successfully after the SIGCHLD handler returns, instead of throwing EINTR or being restarted. Otherwise, the restarted open() would incorrectly wait for a second partner on the other end. The following test demonstrates the EINTR that was wrongly thrown from the parent’s open(). Change .sa_flags = 0 to .sa_flags = SA_RESTART to see a deadlock instead, in which the restarted open() waits for a second reader that will never come. (On my systems, this happens pretty reliably within about 5 to 500 iterations. Others report that it manages to loop ~forever sometimes; YMMV.) #include <sys/stat.h> #include <sys/types.h> #include <sys/wait.h> #include <fcntl.h> #include <signal.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #define CHECK(x) do if ((x) == -1) {perror(#x); abort();} while(0) void handler(int signum) {} int main() { struct sigaction act = {.sa_handler = handler, .sa_flags = 0}; CHECK(sigaction(SIGCHLD, &act, NULL)); CHECK(mknod("fifo", S_IFIFO | S_IRWXU, 0)); for (;;) { int fd; pid_t pid; putc('.', stderr); CHECK(pid = fork()); if (pid == 0) { CHECK(fd = open("fifo", O_RDONLY)); _exit(0); } CHECK(fd = open("fifo", O_WRONLY)); CHECK(close(fd)); CHECK(waitpid(pid, NULL, 0)); } } This is what I suspect was causing the Git test suite to fail in t9010-svn-fe.sh: http://bugs.debian.org/678852 Signed-off-by: Anders Kaseorg <andersk@mit.edu> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* block: fix infinite loop in __getblk_slowJeff Moyer2012-07-191-9/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 91f68c89d8f35fe98ea04159b9a3b42d0149478f upstream. Commit 080399aaaf35 ("block: don't mark buffers beyond end of disk as mapped") exposed a bug in __getblk_slow that causes mount to hang as it loops infinitely waiting for a buffer that lies beyond the end of the disk to become uptodate. The problem was initially reported by Torsten Hilbrich here: https://lkml.org/lkml/2012/6/18/54 and also reported independently here: http://www.sysresccd.org/forums/viewtopic.php?f=13&t=4511 and then Richard W.M. Jones and Marcos Mello noted a few separate bugzillas also associated with the same issue. This patch has been confirmed to fix: https://bugzilla.redhat.com/show_bug.cgi?id=835019 The main problem is here, in __getblk_slow: for (;;) { struct buffer_head * bh; int ret; bh = __find_get_block(bdev, block, size); if (bh) return bh; ret = grow_buffers(bdev, block, size); if (ret < 0) return NULL; if (ret == 0) free_more_memory(); } __find_get_block does not find the block, since it will not be marked as mapped, and so grow_buffers is called to fill in the buffers for the associated page. I believe the for (;;) loop is there primarily to retry in the case of memory pressure keeping grow_buffers from succeeding. However, we also continue to loop for other cases, like the block lying beond the end of the disk. So, the fix I came up with is to only loop when grow_buffers fails due to memory allocation issues (return value of 0). The attached patch was tested by myself, Torsten, and Rich, and was found to resolve the problem in call cases. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Reported-and-Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Josh Boyer <jwboyer@redhat.com> [ Jens is on vacation, taking this directly - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* fs: ramfs: file-nommu: add SetPageUptodate()Bob Liu2012-07-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit fea9f718b3d68147f162ed2d870183ce5e0ad8d8 upstream. There is a bug in the below scenario for !CONFIG_MMU: 1. create a new file 2. mmap the file and write to it 3. read the file can't get the correct value Because sys_read() -> generic_file_aio_read() -> simple_readpage() -> clear_page() which causes the page to be zeroed. Add SetPageUptodate() to ramfs_nommu_expand_for_mapping() so that generic_file_aio_read() do not call simple_readpage(). Signed-off-by: Bob Liu <lliubbo@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* eCryptfs: Properly check for O_RDONLY flag before doing privileged openTyler Hicks2012-07-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | commit 9fe79d7600497ed8a95c3981cbe5b73ab98222f0 upstream. If the first attempt at opening the lower file read/write fails, eCryptfs will retry using a privileged kthread. However, the privileged retry should not happen if the lower file's inode is read-only because a read/write open will still be unsuccessful. The check for determining if the open should be retried was intended to be based on the access mode of the lower file's open flags being O_RDONLY, but the check was incorrectly performed. This would cause the open to be retried by the privileged kthread, resulting in a second failed open of the lower file. This patch corrects the check to determine if the open request should be handled by the privileged kthread. Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* eCryptfs: Fix lockdep warning in miscdev operationsTyler Hicks2012-07-161-12/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 60d65f1f07a7d81d3eb3b91fc13fca80f2fdbb12 upstream. Don't grab the daemon mutex while holding the message context mutex. Addresses this lockdep warning: ecryptfsd/2141 is trying to acquire lock: (&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}, at: [<ffffffffa029c213>] ecryptfs_miscdev_read+0x143/0x470 [ecryptfs] but task is already holding lock: (&(*daemon)->mux){+.+...}, at: [<ffffffffa029c2ec>] ecryptfs_miscdev_read+0x21c/0x470 [ecryptfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&(*daemon)->mux){+.+...}: [<ffffffff810a3b8d>] lock_acquire+0x9d/0x220 [<ffffffff8151c6da>] __mutex_lock_common+0x5a/0x4b0 [<ffffffff8151cc64>] mutex_lock_nested+0x44/0x50 [<ffffffffa029c5d7>] ecryptfs_send_miscdev+0x97/0x120 [ecryptfs] [<ffffffffa029b744>] ecryptfs_send_message+0x134/0x1e0 [ecryptfs] [<ffffffffa029a24e>] ecryptfs_generate_key_packet_set+0x2fe/0xa80 [ecryptfs] [<ffffffffa02960f8>] ecryptfs_write_metadata+0x108/0x250 [ecryptfs] [<ffffffffa0290f80>] ecryptfs_create+0x130/0x250 [ecryptfs] [<ffffffff811963a4>] vfs_create+0xb4/0x120 [<ffffffff81197865>] do_last+0x8c5/0xa10 [<ffffffff811998f9>] path_openat+0xd9/0x460 [<ffffffff81199da2>] do_filp_open+0x42/0xa0 [<ffffffff81187998>] do_sys_open+0xf8/0x1d0 [<ffffffff81187a91>] sys_open+0x21/0x30 [<ffffffff81527d69>] system_call_fastpath+0x16/0x1b -> #0 (&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}: [<ffffffff810a3418>] __lock_acquire+0x1bf8/0x1c50 [<ffffffff810a3b8d>] lock_acquire+0x9d/0x220 [<ffffffff8151c6da>] __mutex_lock_common+0x5a/0x4b0 [<ffffffff8151cc64>] mutex_lock_nested+0x44/0x50 [<ffffffffa029c213>] ecryptfs_miscdev_read+0x143/0x470 [ecryptfs] [<ffffffff811887d3>] vfs_read+0xb3/0x180 [<ffffffff811888ed>] sys_read+0x4d/0x90 [<ffffffff81527d69>] system_call_fastpath+0x16/0x1b Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* eCryptfs: Gracefully refuse miscdev file ops on inherited/passed filesTyler Hicks2012-07-161-7/+16
| | | | | | | | | | | | | | | | | | | | | | | | | commit 8dc6780587c99286c0d3de747a2946a76989414a upstream. File operations on /dev/ecryptfs would BUG() when the operations were performed by processes other than the process that originally opened the file. This could happen with open files inherited after fork() or file descriptors passed through IPC mechanisms. Rather than calling BUG(), an error code can be safely returned in most situations. In ecryptfs_miscdev_release(), eCryptfs still needs to handle the release even if the last file reference is being held by a process that didn't originally open the file. ecryptfs_find_daemon_by_euid() will not be successful, so a pointer to the daemon is stored in the file's private_data. The private_data pointer is initialized when the miscdev file is opened and only used when the file is released. https://launchpad.net/bugs/994247 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Reported-by: Sasha Levin <levinsasha928@gmail.com> Tested-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vfs: make O_PATH file descriptors usable for 'fchdir()'Linus Torvalds2012-07-161-3/+3
| | | | | | | | | | | | | | | | | | commit 332a2e1244bd08b9e3ecd378028513396a004a24 upstream. We already use them for openat() and friends, but fchdir() also wants to be able to use O_PATH file descriptors. This should make it comparable to the O_SEARCH of Solaris. In particular, O_PATH allows you to access (not-quite-open) a directory you don't have read persmission to, only execute permission. Noticed during development of multithread support for ksh93. Reported-by: ольга крыжановская <olga.kryzhanovska@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Btrfs: run delayed directory updates during log replayChris Mason2012-07-161-0/+6
| | | | | | | | | | | | | | | | | commit b6305567e7d31b0bec1b8cb9ec0cadd7f7086f5f upstream. While we are resolving directory modifications in the tree log, we are triggering delayed metadata updates to the filesystem btrees. This commit forces the delayed updates to run so the replay code can find any modifications done. It stops us from crashing because the directory deleltion replay expects items to be removed immediately from the tree. Signed-off-by: Chris Mason <chris.mason@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* udf: Fortify loading of sparing tableJan Kara2012-07-161-33/+53
| | | | | | | | | | | commit 1df2ae31c724e57be9d7ac00d78db8a5dabdd050 upstream. Add sanity checks when loading sparing table from disk to avoid accessing unallocated memory or writing to it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* udf: Avoid run away loop when partition table length is corruptedJan Kara2012-07-161-1/+9
| | | | | | | | | | | commit adee11b2085bee90bd8f4f52123ffb07882d6256 upstream. Check provided length of partition table so that (possibly maliciously) corrupted partition table cannot cause accessing data beyond current buffer. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* udf: Use 'ret' instead of abusing 'i' in udf_load_logicalvol()Jan Kara2012-07-161-4/+2
| | | | | | | | commit cb14d340ef1737c24125dd663eff77734a482d47 upstream. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nilfs2: ensure proper cache clearing for gc-inodesRyusuke Konishi2012-07-162-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit fbb24a3a915f105016f1c828476be11aceac8504 upstream. A gc-inode is a pseudo inode used to buffer the blocks to be moved by garbage collection. Block caches of gc-inodes must be cleared every time a garbage collection function (nilfs_clean_segments) completes. Otherwise, stale blocks buffered in the caches may be wrongly reused in successive calls of the GC function. For user files, this is not a problem because their gc-inodes are distinguished by a checkpoint number as well as an inode number. They never buffer different blocks if either an inode number, a checkpoint number, or a block offset differs. However, gc-inodes of sufile, cpfile and DAT file can store different data for the same block offset. Thus, the nilfs_clean_segments function can move incorrect block for these meta-data files if an old block is cached. I found this is really causing meta-data corruption in nilfs. This fixes the issue by ensuring cache clear of gc-inodes and resolves reported GC problems including checkpoint file corruption, b-tree corruption, and the following warning during GC. nilfs_palloc_freev: entry number 307234 already freed. ... Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* hfsplus: fix overflow in sector calculations in hfsplus_submit_bioJanne Kalliomäki2012-06-221-1/+1
| | | | | | | | | | | | | | | commit a6dc8c04218eb752ff79cdc24a995cf51866caed upstream. The variable io_size was unsigned int, which caused the wrong sector number to be calculated after aligning it. This then caused mount to fail with big volumes, as backup volume header information was searched from a wrong sector. Signed-off-by: Janne Kalliomäki <janne@tuxera.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* fuse: fix stat call on 32 bit platformsPavel Shilovsky2012-06-173-1/+20
| | | | | | | | | | | | | | | | | | commit 45c72cd73c788dd18c8113d4a404d6b4a01decf1 upstream. Now we store attr->ino at inode->i_ino, return attr->ino at the first time and then return inode->i_ino if the attribute timeout isn't expired. That's wrong on 32 bit platforms because attr->ino is 64 bit and inode->i_ino is 32 bit in this case. Fix this by saving 64 bit ino in fuse_inode structure and returning it every time we call getattr. Also squash attr->ino into inode->i_ino explicitly. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ext4: don't set i_flags in EXT4_IOC_SETFLAGSTao Ma2012-06-101-1/+0
| | | | | | | | | | | | | | commit b22b1f178f6799278d3178d894f37facb2085765 upstream. Commit 7990696 uses the ext4_{set,clear}_inode_flags() functions to change the i_flags automatically but fails to remove the error setting of i_flags. So we still have the problem of trashing state flags. Fix this by removing the assignment. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ext4: remove mb_groups before tearing down the buddy_cacheSalman Qazi2012-06-101-2/+3
| | | | | | | | | | | | | | | | | | | | | commit 95599968d19db175829fb580baa6b68939b320fb upstream. We can't have references held on pages in the s_buddy_cache while we are trying to truncate its pages and put the inode. All the pages must be gone before we reach clear_inode. This can only be gauranteed if we can prevent new users from grabbing references to s_buddy_cache's pages. The original bug can be reproduced and the bug fix can be verified by: while true; do mount -t ext4 /dev/ram0 /export/hda3/ram0; \ umount /export/hda3/ram0; done & while true; do cat /proc/fs/ext4/ram0/mb_groups; done Signed-off-by: Salman Qazi <sqazi@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ext4: add ext4_mb_unload_buddy in the error pathSalman Qazi2012-06-101-0/+1
| | | | | | | | | | | | commit 02b7831019ea4e7994968c84b5826fa8b248ffc8 upstream. ext4_free_blocks fails to pair an ext4_mb_load_buddy with a matching ext4_mb_unload_buddy when it fails a memory allocation. Signed-off-by: Salman Qazi <sqazi@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ext4: don't trash state flags in EXT4_IOC_SETFLAGSTheodore Ts'o2012-06-101-3/+9
| | | | | | | | | | | | | | | | commit 79906964a187c405db72a3abc60eb9b50d804fbc upstream. In commit 353eb83c we removed i_state_flags with 64-bit longs, But when handling the EXT4_IOC_SETFLAGS ioctl, we replace i_flags directly, which trashes the state flags which are stored in the high 32-bits of i_flags on 64-bit platforms. So use the the ext4_{set,clear}_inode_flags() functions which use atomic bit manipulation functions instead. Reported-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ext4: add missing save_error_info() to ext4_error()Theodore Ts'o2012-06-101-0/+1
| | | | | | | | | | | | | | | | commit f3fc0210c0fc91900766c995f089c39170e68305 upstream. The ext4_error() function is missing a call to save_error_info(). Since this is the function which marks the file system as containing an error, this oversight (which was introduced in 2.6.36) is quite significant, and should be backported to older stable kernels with high urgency. Reported-by: Ken Sumrall <ksumrall@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: ksumrall@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ext4: force ro mount if ext4_setup_super() failsEric Sandeen2012-06-101-1/+2
| | | | | | | | | | | | | | | | | | | | | | | commit 7e84b6216467b84cd332c8e567bf5aa113fd2f38 upstream. If ext4_setup_super() fails i.e. due to a too-high revision, the error is logged in dmesg but the fs is not mounted RO as indicated. Tested by: # mkfs.ext4 -r 4 /dev/sdb6 # mount /dev/sdb6 /mnt/test # dmesg | grep "too high" [164919.759248] EXT4-fs (sdb6): revision level too high, forcing read-only mode # grep sdb6 /proc/mounts /dev/sdb6 /mnt/test2 ext4 rw,seclabel,relatime,data=ordered 0 0 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vfs: umount_tree() might be called on subtree that had never made itAl Viro2012-06-101-1/+2
| | | | | | | | | | | | | commit 63d37a84ab6004c235314ffd7a76c5eb28c2fae0 upstream. __mnt_make_shortterm() in there undoes the effect of __mnt_make_longterm() we'd done back when we set ->mnt_ns non-NULL; it should not be done to vfsmounts that had never gone through commit_tree() and friends. Kudos to lczerner for catching that one... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4: Map NFS4ERR_SHARE_DENIED into an EACCES error instead of EIOTrond Myklebust2012-06-101-0/+2
| | | | | | | | | | | | | | | commit fb13bfa7e1bcfdcfdece47c24b62f1a1cad957e9 upstream. If a file OPEN is denied due to a share lock, the resulting NFS4ERR_SHARE_DENIED is currently mapped to the default EIO. This patch adds a more appropriate mapping, and brings Linux into line with what Solaris 10 does. See https://bugzilla.kernel.org/show_bug.cgi?id=43286 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* cifs: fix oops while traversing open file list (try #4)Shirish Pargaonkar2012-06-102-24/+34
| | | | | | | | | | | | | | | | | | | | | | commit 2c0c2a08bed7a3b791f88d09d16ace56acb3dd98 upstream. While traversing the linked list of open file handles, if the identfied file handle is invalid, a reopen is attempted and if it fails, we resume traversing where we stopped and cifs can oops while accessing invalid next element, for list might have changed. So mark the invalid file handle and attempt reopen if no valid file handle is found in rest of the list. If reopen fails, move the invalid file handle to the end of the list and start traversing the list again from the begining. Repeat this four times before giving up and returning an error if file reopen keeps failing. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vfs: make AIO use the proper rw_verify_area() area helpersLinus Torvalds2012-06-011-16/+14
| | | | | | | | | | | | | | | | | | | | | commit a70b52ec1aaeaf60f4739edb1b422827cb6f3893 upstream. We had for some reason overlooked the AIO interface, and it didn't use the proper rw_verify_area() helper function that checks (for example) mandatory locking on the file, and that the size of the access doesn't cause us to overflow the provided offset limits etc. Instead, AIO did just the security_file_permission() thing (that rw_verify_area() also does) directly. This fixes it to do all the proper helper functions, which not only means that now mandatory file locking works with AIO too, we can actually remove lines of code. Reported-by: Manish Honap <manish_honap_vit@yahoo.co.in> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* block: don't mark buffers beyond end of disk as mappedJeff Moyer2012-06-012-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 080399aaaf3531f5b8761ec0ac30ff98891e8686 upstream. Hi, We have a bug report open where a squashfs image mounted on ppc64 would exhibit errors due to trying to read beyond the end of the disk. It can easily be reproduced by doing the following: [root@ibm-p750e-02-lp3 ~]# ls -l install.img -rw-r--r-- 1 root root 142032896 Apr 30 16:46 install.img [root@ibm-p750e-02-lp3 ~]# mount -o loop ./install.img /mnt/test [root@ibm-p750e-02-lp3 ~]# dd if=/dev/loop0 of=/dev/null dd: reading `/dev/loop0': Input/output error 277376+0 records in 277376+0 records out 142016512 bytes (142 MB) copied, 0.9465 s, 150 MB/s In dmesg, you'll find the following: squashfs: version 4.0 (2009/01/31) Phillip Lougher [ 43.106012] attempt to access beyond end of device [ 43.106029] loop0: rw=0, want=277410, limit=277408 [ 43.106039] Buffer I/O error on device loop0, logical block 138704 [ 43.106053] attempt to access beyond end of device [ 43.106057] loop0: rw=0, want=277412, limit=277408 [ 43.106061] Buffer I/O error on device loop0, logical block 138705 [ 43.106066] attempt to access beyond end of device [ 43.106070] loop0: rw=0, want=277414, limit=277408 [ 43.106073] Buffer I/O error on device loop0, logical block 138706 [ 43.106078] attempt to access beyond end of device [ 43.106081] loop0: rw=0, want=277416, limit=277408 [ 43.106085] Buffer I/O error on device loop0, logical block 138707 [ 43.106089] attempt to access beyond end of device [ 43.106093] loop0: rw=0, want=277418, limit=277408 [ 43.106096] Buffer I/O error on device loop0, logical block 138708 [ 43.106101] attempt to access beyond end of device [ 43.106104] loop0: rw=0, want=277420, limit=277408 [ 43.106108] Buffer I/O error on device loop0, logical block 138709 [ 43.106112] attempt to access beyond end of device [ 43.106116] loop0: rw=0, want=277422, limit=277408 [ 43.106120] Buffer I/O error on device loop0, logical block 138710 [ 43.106124] attempt to access beyond end of device [ 43.106128] loop0: rw=0, want=277424, limit=277408 [ 43.106131] Buffer I/O error on device loop0, logical block 138711 [ 43.106135] attempt to access beyond end of device [ 43.106139] loop0: rw=0, want=277426, limit=277408 [ 43.106143] Buffer I/O error on device loop0, logical block 138712 [ 43.106147] attempt to access beyond end of device [ 43.106151] loop0: rw=0, want=277428, limit=277408 [ 43.106154] Buffer I/O error on device loop0, logical block 138713 [ 43.106158] attempt to access beyond end of device [ 43.106162] loop0: rw=0, want=277430, limit=277408 [ 43.106166] attempt to access beyond end of device [ 43.106169] loop0: rw=0, want=277432, limit=277408 ... [ 43.106307] attempt to access beyond end of device [ 43.106311] loop0: rw=0, want=277470, limit=2774 Squashfs manages to read in the end block(s) of the disk during the mount operation. Then, when dd reads the block device, it leads to block_read_full_page being called with buffers that are beyond end of disk, but are marked as mapped. Thus, it would end up submitting read I/O against them, resulting in the errors mentioned above. I fixed the problem by modifying init_page_buffers to only set the buffer mapped if it fell inside of i_size. Cheers, Jeff Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Nick Piggin <npiggin@kernel.dk> -- Changes from v1->v2: re-used max_block, as suggested by Nick Piggin. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* wake up s_wait_unfrozen when ->freeze_fs failsKazuya Mio2012-05-211-0/+2
| | | | | | | | | | | | | | | | | | | | | commit e1616300a20c80396109c1cf013ba9a36055a3da upstream. dd slept infinitely when fsfeeze failed because of EIO. To fix this problem, if ->freeze_fs fails, freeze_super() wakes up the tasks waiting for the filesystem to become unfrozen. When s_frozen isn't SB_UNFROZEN in __generic_file_aio_write(), the function sleeps until FITHAW ioctl wakes up s_wait_unfrozen. However, if ->freeze_fs fails, s_frozen is set to SB_UNFROZEN and then freeze_super() returns an error number. In this case, FITHAW ioctl returns EINVAL because s_frozen is already SB_UNFROZEN. There is no way to wake up s_wait_unfrozen, so __generic_file_aio_write() sleeps infinitely. Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ext4: fix error handling on inode bitmap corruptionJan Kara2012-05-211-2/+6
| | | | | | | | | | | | | | | commit acd6ad83517639e8f09a8c5525b1dccd81cd2a10 upstream. When insert_inode_locked() fails in ext4_new_inode() it most likely means inode bitmap got corrupted and we allocated again inode which is already in use. Also doing unlock_new_inode() during error recovery is wrong since the inode does not have I_NEW set. Fix the problem by jumping to fail: (instead of fail_drop:) which declares filesystem error and does not call unlock_new_inode(). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ext3: Fix error handling on inode bitmap corruptionJan Kara2012-05-211-2/+6
| | | | | | | | | | | | | | | | commit 1415dd8705394399d59a3df1ab48d149e1e41e77 upstream. When insert_inode_locked() fails in ext3_new_inode() it most likely means inode bitmap got corrupted and we allocated again inode which is already in use. Also doing unlock_new_inode() during error recovery is wrong since inode does not have I_NEW set. Fix the problem by jumping to fail: (instead of fail_drop:) which declares filesystem error and does not call unlock_new_inode(). Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4: Revalidate uid/gid after openJonathan Nieder2012-05-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a shorter (and more appropriate for stable kernels) analog to the following upstream commit: commit 6926afd1925a54a13684ebe05987868890665e2b Author: Trond Myklebust <Trond.Myklebust@netapp.com> Date: Sat Jan 7 13:22:46 2012 -0500 NFSv4: Save the owner/group name string when doing open ...so that we can do the uid/gid mapping outside the asynchronous RPC context. This fixes a bug in the current NFSv4 atomic open code where the client isn't able to determine what the true uid/gid fields of the file are, (because the asynchronous nature of the OPEN call denies it the ability to do an upcall) and so fills them with default values, marking the inode as needing revalidation. Unfortunately, in some cases, the VFS will do some additional sanity checks on the file, and may override the server's decision to allow the open because it sees the wrong owner/group fields. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Without this patch, logging into two different machines with home directories mounted over NFS4 and then running "vim" and typing ":q" in each reliably produces the following error on the second machine: E137: Viminfo file is not writable: /users/system/rtheys/.viminfo This regression was introduced by 80e52aced138 ("NFSv4: Don't do idmapper upcalls for asynchronous RPC calls", merged during the 2.6.32 cycle) --- after the OPEN call, .viminfo has the default values for st_uid and st_gid (0xfffffffe) cached because we do not want to let rpciod wait for an idmapper upcall to fill them in. The fix used in mainline is to save the owner and group as strings and perform the upcall in _nfs4_proc_open outside the rpciod context, which takes about 600 lines. For stable, we can do something similar with a one-liner: make open check for the stale fields and make a (synchronous) GETATTR call to fill them when needed. Trond dictated the patch, I typed it in, and Rik tested it. Addresses http://bugs.debian.org/659111 and https://bugzilla.redhat.com/789298 Reported-by: Rik Theys <Rik.Theys@esat.kuleuven.be> Explained-by: David Flyn <davidf@rd.bbc.co.uk> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Tested-by: Rik Theys <Rik.Theys@esat.kuleuven.be> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ext4: avoid deadlock on sync-mounted FS w/o journalEric Sandeen2012-05-211-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c1bb05a657fb3d8c6179a4ef7980261fae4521d7 upstream. Processes hang forever on a sync-mounted ext2 file system that is mounted with the ext4 module (default in Fedora 16). I can reproduce this reliably by mounting an ext2 partition with "-o sync" and opening a new file an that partition with vim. vim will hang in "D" state forever. The same happens on ext4 without a journal. I am attaching a small patch here that solves this issue for me. In the sync mounted case without a journal, ext4_handle_dirty_metadata() may call sync_dirty_buffer(), which can't be called with buffer lock held. Also move mb_cache_entry_release inside lock to avoid race fixed previously by 8a2bfdcb ext[34]: EA block reference count racing fix Note too that ext2 fixed this same problem in 2006 with b2f49033 [PATCH] fix deadlock in ext2 Signed-off-by: Martin.Wilck@ts.fujitsu.com [sandeen@redhat.com: move mb_cache_entry_release before unlock, edit commit msg] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* jffs2: Fix lock acquisition order bug in gc pathJosh Cartwright2012-05-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 226bb7df3d22bcf4a1c0fe8206c80cc427498eae upstream. The locking policy is such that the erase_complete_block spinlock is nested within the alloc_sem mutex. This fixes a case in which the acquisition order was erroneously reversed. This issue was caught by the following lockdep splat: ======================================================= [ INFO: possible circular locking dependency detected ] 3.0.5 #1 ------------------------------------------------------- jffs2_gcd_mtd6/299 is trying to acquire lock: (&c->alloc_sem){+.+.+.}, at: [<c01f7714>] jffs2_garbage_collect_pass+0x314/0x890 but task is already holding lock: (&(&c->erase_completion_lock)->rlock){+.+...}, at: [<c01f7708>] jffs2_garbage_collect_pass+0x308/0x890 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&(&c->erase_completion_lock)->rlock){+.+...}: [<c008bec4>] validate_chain+0xe6c/0x10bc [<c008c660>] __lock_acquire+0x54c/0xba4 [<c008d240>] lock_acquire+0xa4/0x114 [<c046780c>] _raw_spin_lock+0x3c/0x4c [<c01f744c>] jffs2_garbage_collect_pass+0x4c/0x890 [<c01f937c>] jffs2_garbage_collect_thread+0x1b4/0x1cc [<c0071a68>] kthread+0x98/0xa0 [<c000f264>] kernel_thread_exit+0x0/0x8 -> #0 (&c->alloc_sem){+.+.+.}: [<c008ad2c>] print_circular_bug+0x70/0x2c4 [<c008c08c>] validate_chain+0x1034/0x10bc [<c008c660>] __lock_acquire+0x54c/0xba4 [<c008d240>] lock_acquire+0xa4/0x114 [<c0466628>] mutex_lock_nested+0x74/0x33c [<c01f7714>] jffs2_garbage_collect_pass+0x314/0x890 [<c01f937c>] jffs2_garbage_collect_thread+0x1b4/0x1cc [<c0071a68>] kthread+0x98/0xa0 [<c000f264>] kernel_thread_exit+0x0/0x8 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&(&c->erase_completion_lock)->rlock); lock(&c->alloc_sem); lock(&(&c->erase_completion_lock)->rlock); lock(&c->alloc_sem); *** DEADLOCK *** 1 lock held by jffs2_gcd_mtd6/299: #0: (&(&c->erase_completion_lock)->rlock){+.+...}, at: [<c01f7708>] jffs2_garbage_collect_pass+0x308/0x890 stack backtrace: [<c00155dc>] (unwind_backtrace+0x0/0x100) from [<c0463dc0>] (dump_stack+0x20/0x24) [<c0463dc0>] (dump_stack+0x20/0x24) from [<c008ae84>] (print_circular_bug+0x1c8/0x2c4) [<c008ae84>] (print_circular_bug+0x1c8/0x2c4) from [<c008c08c>] (validate_chain+0x1034/0x10bc) [<c008c08c>] (validate_chain+0x1034/0x10bc) from [<c008c660>] (__lock_acquire+0x54c/0xba4) [<c008c660>] (__lock_acquire+0x54c/0xba4) from [<c008d240>] (lock_acquire+0xa4/0x114) [<c008d240>] (lock_acquire+0xa4/0x114) from [<c0466628>] (mutex_lock_nested+0x74/0x33c) [<c0466628>] (mutex_lock_nested+0x74/0x33c) from [<c01f7714>] (jffs2_garbage_collect_pass+0x314/0x890) [<c01f7714>] (jffs2_garbage_collect_pass+0x314/0x890) from [<c01f937c>] (jffs2_garbage_collect_thread+0x1b4/0x1cc) [<c01f937c>] (jffs2_garbage_collect_thread+0x1b4/0x1cc) from [<c0071a68>] (kthread+0x98/0xa0) [<c0071a68>] (kthread+0x98/0xa0) from [<c000f264>] (kernel_thread_exit+0x0/0x8) This was introduce in '81cfc9f jffs2: Fix serious write stall due to erase'. Signed-off-by: Josh Cartwright <joshc@linux.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* hfsplus: Fix potential buffer overflowsGreg Kroah-Hartman2012-05-072-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | commit 6f24f892871acc47b40dd594c63606a17c714f77 upstream. Commit ec81aecb2966 ("hfs: fix a potential buffer overflow") fixed a few potential buffer overflows in the hfs filesystem. But as Timo Warns pointed out, these changes also need to be made on the hfsplus filesystem as well. Reported-by: Timo Warns <warns@pre-sense.de> Acked-by: WANG Cong <amwang@redhat.com> Cc: Alexey Khoroshilov <khoroshilov@ispras.ru> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Sage Weil <sage@newdream.net> Cc: Eugene Teo <eteo@redhat.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Dave Anderson <anderson@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs: make the autofsv5 packet file descriptor use a packetized pipeLinus Torvalds2012-05-073-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 64f371bc3107e69efce563a3d0f0e6880de0d537 upstream. The autofs packet size has had a very unfortunate size problem on x86: because the alignment of 'u64' differs in 32-bit and 64-bit modes, and because the packet data was not 8-byte aligned, the size of the autofsv5 packet structure differed between 32-bit and 64-bit modes despite looking otherwise identical (300 vs 304 bytes respectively). We first fixed that up by making the 64-bit compat mode know about this problem in commit a32744d4abae ("autofs: work around unhappy compat problem on x86-64"), and that made a 32-bit 'systemd' work happily on a 64-bit kernel because everything then worked the same way as on a 32-bit kernel. But it turned out that 'automount' had actually known and worked around this problem in user space, so fixing the kernel to do the proper 32-bit compatibility handling actually *broke* 32-bit automount on a 64-bit kernel, because it knew that the packet sizes were wrong and expected those incorrect sizes. As a result, we ended up reverting that compatibility mode fix, and thus breaking systemd again, in commit fcbf94b9dedd. With both automount and systemd doing a single read() system call, and verifying that they get *exactly* the size they expect but using different sizes, it seemed that fixing one of them inevitably seemed to break the other. At one point, a patch I seriously considered applying from Michael Tokarev did a "strcmp()" to see if it was automount that was doing the operation. Ugly, ugly. However, a prettier solution exists now thanks to the packetized pipe mode. By marking the communication pipe as being packetized (by simply setting the O_DIRECT flag), we can always just write the bigger packet size, and if user-space does a smaller read, it will just get that partial end result and the extra alignment padding will simply be thrown away. This makes both automount and systemd happy, since they now get the size they asked for, and the kernel side of autofs simply no longer needs to care - it could pad out the packet arbitrarily. Of course, if there is some *other* user of autofs (please, please, please tell me it ain't so - and we haven't heard of any) that tries to read the packets with multiple writes, that other user will now be broken - the whole point of the packetized mode is that one system call gets exactly one packet, and you cannot read a packet in pieces. Tested-by: Michael Tokarev <mjt@tls.msk.ru> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Miller <davem@davemloft.net> Cc: Ian Kent <raven@themaw.net> Cc: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>