aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Avoid warning when CPU hotplug isn't enabledLinus Torvalds2010-05-271-6/+3
| | | | | | | | | | | | | | | Commit e9fb7631ebcd ("cpu-hotplug: introduce cpu_notify(), __cpu_notify(), cpu_notify_nofail()") also introduced this annoying warning: kernel/cpu.c:157: warning: 'cpu_notify_nofail' defined but not used when CONFIG_HOTPLUG_CPU wasn't set. So move that helper inside the #ifdef CONFIG_HOTPLUG_CPU region, and simplify it while at it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6Linus Torvalds2010-05-2710-306/+708
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: [SCSI] fix race in scsi_target_reap [SCSI] aacraid: Eliminate use after free [SCSI] arcmsr: Support HW reset for EH and polling scheme for scsi device [SCSI] bfa: fix system crash when reading sysfs fc_host statistics [SCSI] iscsi_tcp: remove sk_sleep check [SCSI] ipr: improve interrupt service routine performance [SCSI] ipr: set the data list length in the request control block [SCSI] ipr: fix a register read to use the correct address for 64 bit adapters [SCSI] ipr: include the resource path in the IOA status area structure [SCSI] ipr: implement fixes for 64 bit adapter support [SCSI] be2iscsi: correct return value in mgmt_invalidate_icds()
| * [SCSI] fix race in scsi_target_reapAlan Stern2010-05-251-4/+5
| | | | | | | | | | | | | | | | | | | | | | This patch (as1357) fixes a race in SCSI target allocation and release. Putting a target in the STARGET_DEL state isn't protected by the host lock, so an old target structure could be reused by a new device even though it's about to be deleted. The cure is to change the state while still holding the host lock. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
| * [SCSI] aacraid: Eliminate use after freeJulia Lawall2010-05-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The debugging code using the freed structure is moved before the kfree. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @free@ expression E; position p; @@ kfree@p(E) @@ expression free.E, subE<=free.E, E1; position free.p; @@ kfree@p(E) ... ( subE = E1 | * E ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
| * [SCSI] arcmsr: Support HW reset for EH and polling scheme for scsi deviceNick Cheng2010-05-253-191/+525
| | | | | | | | | | | | | | | | | | | | | | | | 1. To support instantaneous report for SCSI device existing by periodic polling 2. In arcmsr_iop_xfer(), inform AP of F/W's deadlock state to prevent endless waiting 3. To block the coming SCSI command while the driver is handling bus reset 4. To support HW reset in bus reset error handler Signed-off-by: Nick Cheng <nick.cheng@areca.com.tw> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
| * [SCSI] bfa: fix system crash when reading sysfs fc_host statisticsKrishna Gudipati2010-05-251-0/+22
| | | | | | | | | | | | | | | | | | | | The port data structure related to fc_host statistics collection is not initialized. This causes system crash when reading the fc_host statistics. The fix is to initialize port structure during driver attach. Signed-off-by: Krishna Gudipati <kgudipat@brocade.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
| * [SCSI] iscsi_tcp: remove sk_sleep checkMike Christie2010-05-251-4/+2
| | | | | | | | | | | | | | | | There is no need to call sk_sleep before calling wake_up_interruptible, because the wait_queue_head is now with the socket. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
| * [SCSI] ipr: improve interrupt service routine performanceWayne Boyer2010-05-241-33/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | During performance testing on P7 machines it was observed that the interrupt service routine was doing unnecessary MMIO operations. This patch rearranges the logic of the routine and moves some of the code out of the main routine. The result is that there are now fewer MMIO operations in the performance path of the code. Signed-off-by: Wayne Boyer <wayneb@linux.vnet.ibm.com> Acked-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
| * [SCSI] ipr: set the data list length in the request control blockWayne Boyer2010-05-241-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | In bring up testing for the new 64 bit adapters, the first read command failed after loading the driver. The cause was that the command requires more than one scatter gather element and the corresponding code to set the data list length in the request control block was missing. This patch adds the correct assignment. Signed-off-by: Wayne Boyer <wayneb@linux.vnet.ibm.com> Acked-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
| * [SCSI] ipr: fix a register read to use the correct address for 64 bit adaptersWayne Boyer2010-05-242-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix ipr_reset_enable_ioa() to read the correct IOA to host interrupt register address for 64 bit adapters. We need to read the lower 32 bits, not the upper 32 bits. Also change the write of the 64 bit mask value to a single writeq instead of two writel calls. Finally, use the correct u8 type for the type field in the ipr_resource_entry structure. Signed-off-by: Wayne Boyer <wayneb@linux.vnet.ibm.com> Acked-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
| * [SCSI] ipr: include the resource path in the IOA status area structureWayne Boyer2010-05-242-59/+98
| | | | | | | | | | | | | | | | | | | | The IOA status area now includes the new resource path field for 64 bit adapters. This patch changes the driver to fix the ioasa structure and to use the correct structure definition based on the type of adatper. Signed-off-by: Wayne Boyer <wayneb@linux.vnet.ibm.com> Acked-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
| * [SCSI] ipr: implement fixes for 64 bit adapter supportWayne Boyer2010-05-242-5/+11
| | | | | | | | | | | | | | | | | | Implement some small fixes for 64 bit support that were preventing the adapter from becoming operational. Signed-off-by: Wayne Boyer <wayneb@linux.vnet.ibm.com> Acked-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
| * [SCSI] be2iscsi: correct return value in mgmt_invalidate_icds()Dan Carpenter2010-05-241-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This function should return 0 on error. Returning -1 would cause a crash. Also there is an extra space before the newline character and a missing space between the "for" and the "mgmt_invalidate_icds". I put the string on one line. The current version of checkpatch.pl complains that the line is too long, but it makes grepping easier. Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* | Merge branch 'for_linus' of ↵Linus Torvalds2010-05-2723-834/+1136
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits) ext4: Make fsync sync new parent directories in no-journal mode ext4: Drop whitespace at end of lines ext4: Fix compat EXT4_IOC_ADD_GROUP ext4: Conditionally define compat ioctl numbers tracing: Convert more ext4 events to DEFINE_EVENT ext4: Add new tracepoints to track mballoc's buddy bitmap loads ext4: Add a missing trace hook ext4: restart ext4_ext_remove_space() after transaction restart ext4: Clear the EXT4_EOFBLOCKS_FL flag only when warranted ext4: Avoid crashing on NULL ptr dereference on a filesystem error ext4: Use bitops to read/modify i_flags in struct ext4_inode_info ext4: Convert calls of ext4_error() to EXT4_ERROR_INODE() ext4: Convert callers of ext4_get_blocks() to use ext4_map_blocks() ext4: Add new abstraction ext4_map_blocks() underneath ext4_get_blocks() ext4: Use our own write_cache_pages() ext4: Show journal_checksum option ext4: Fix for ext4_mb_collect_stats() ext4: check for a good block group before loading buddy pages ext4: Prevent creation of files larger than RLIMIT_FSIZE using fallocate ext4: Remove extraneous newlines in ext4_msg() calls ... Fixed up trivial conflict in fs/ext4/fsync.c
| * | ext4: Make fsync sync new parent directories in no-journal modeFrank Mayhar2010-05-173-2/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new ext4 state to tell us when a file has been newly created; use that state in ext4_sync_file in no-journal mode to tell us when we need to sync the parent directory as well as the inode and data itself. This fixes a problem in which a panic or power failure may lose the entire file even when using fsync, since the parent directory entry is lost. Addresses-Google-Bug: #2480057 Signed-off-by: Frank Mayhar <fmayhar@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: Drop whitespace at end of linesTheodore Ts'o2010-05-179-23/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch was generated using: #!/usr/bin/perl -i while (<>) { s/[ ]+$//; print; } Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: Fix compat EXT4_IOC_ADD_GROUPBen Hutchings2010-05-172-2/+39
| | | | | | | | | | | | | | | | | | | | | | | | struct ext4_new_group_input needs to be converted because u64 has only 32-bit alignment on some 32-bit architectures, notably i386. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: Conditionally define compat ioctl numbersBen Hutchings2010-05-171-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | It is unnecessary, and in general impossible, to define the compat ioctl numbers except when building the filesystem with CONFIG_COMPAT defined. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | tracing: Convert more ext4 events to DEFINE_EVENTLi Zefan2010-05-171-44/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use DECLARE_EVENT_CLASS, and save ~2.7K: text data bss dec hex filename 274441 7200 260 281901 44d2d fs/ext4/ext4.o.orig 271881 7040 256 279177 44289 fs/ext4/ext4.o 4 events are converted: ext4__mb_new_pa: ext4_mb_new_inode_pa, ext4_mb_new_group_pa ext4__mballoc: ext4_mballoc_discard, ext4_mballoc_free No change in functionality. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: Add new tracepoints to track mballoc's buddy bitmap loadsTheodore Ts'o2010-05-172-0/+35
| | | | | | | | | | | | Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: Add a missing trace hookLi Zefan2010-05-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Commit f8ec9d6837241865cf99bed97bb99f4399fd5a03 added a trace event ext4_da_release_space, but didn't add some corresponding trace hook. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: restart ext4_ext_remove_space() after transaction restartDmitry Monakhov2010-05-171-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If i_data_sem was internally dropped due to transaction restart, it is necessary to restart path look-up because extents tree was possibly modified by ext4_get_block(). https://bugzilla.kernel.org/show_bug.cgi?id=15827 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Jan Kara <jack@suse.cz>
| * | ext4: Clear the EXT4_EOFBLOCKS_FL flag only when warrantedTheodore Ts'o2010-05-171-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dimitry Monakhov discovered an edge case where it was possible for the EXT4_EOFBLOCKS_FL flag could get cleared unnecessarily. This is true; I have a test case that can be exercised via downloading and decompressing the file: wget ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/ext4-testcases/eofblocks-fl-test-case.img.bz2 bunzip2 eofblocks-fl-test-case.img dd if=/dev/zero of=eofblocks-fl-test-case.img bs=1k seek=17925 bs=1k count=1 conv=notrunc However, triggering it in real life is highly unlikely since it requires an extremely fragmented sparse file with a hole in exactly the right place in the extent tree. (It actually took quite a bit of work to generate this test case.) Still, it's nice to get even extreme corner cases to be correct, so this patch makes sure that we don't clear the EXT4_EOFBLOCKS_FL incorrectly even in this corner case. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: Avoid crashing on NULL ptr dereference on a filesystem errorTheodore Ts'o2010-05-161-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the EOFBLOCK_FL flag is set when it should not be and the inode is zero length, then eh_entries is zero, and ex is NULL, so dereferencing ex to print ex->ee_block causes a kernel OOPS in ext4_ext_map_blocks(). On top of that, the error message which is printed isn't very helpful. So we fix this by printing something more explanatory which doesn't involve trying to print ex->ee_block. Addresses-Google-Bug: #2655740 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: Use bitops to read/modify i_flags in struct ext4_inode_infoDmitry Monakhov2010-05-1613-56/+136
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At several places we modify EXT4_I(inode)->i_flags without holding i_mutex (ext4_do_update_inode, ...). These modifications are racy and we can lose updates to i_flags. So convert handling of i_flags to use bitops which are atomic. https://bugzilla.kernel.org/show_bug.cgi?id=15792 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: Convert calls of ext4_error() to EXT4_ERROR_INODE()Theodore Ts'o2010-05-167-103/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | EXT4_ERROR_INODE() tends to provide better error information and in a more consistent format. Some errors were not even identifying the inode or directory which was corrupted, which made them not very useful. Addresses-Google-Bug: #2507977 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: Convert callers of ext4_get_blocks() to use ext4_map_blocks()Theodore Ts'o2010-05-163-229/+150
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This saves a huge amount of stack space by avoiding unnecesary struct buffer_head's from being allocated on the stack. In addition, to make the code easier to understand, collapse and refactor ext4_get_block(), ext4_get_block_write(), noalloc_get_block_write(), into a single function. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: Add new abstraction ext4_map_blocks() underneath ext4_get_blocks()Theodore Ts'o2010-05-163-170/+199
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jack up ext4_get_blocks() and add a new function, ext4_map_blocks() which uses a much smaller structure, struct ext4_map_blocks which is 20 bytes, as opposed to a struct buffer_head, which nearly 5 times bigger on an x86_64 machine. By switching things to use ext4_map_blocks(), we can save stack space by using ext4_map_blocks() since we can avoid allocating a struct buffer_head on the stack. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: Use our own write_cache_pages()Theodore Ts'o2010-05-161-22/+119
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make a copy of write_cache_pages() for the benefit of ext4_da_writepages(). This allows us to simplify the code some, and will allow us to further customize the code in future patches. There are some nasty hacks in write_cache_pages(), which Linus has (correctly) characterized as vile. I've just copied it into write_cache_pages_da(), without trying to clean those bits up lest I break something in the ext4's delalloc implementation, which is a bit fragile right now. This will allow Dave Chinner to clean up write_cache_pages() in mm/page-writeback.c, without worrying about breaking ext4. Eventually write_cache_pages_da() will go away when I rewrite ext4's delayed allocation and create a general ext4_writepages() which is used for all of ext4's writeback. Until now this is the lowest risk way to clean up the core write_cache_pages() function. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dave Chinner <david@fromorbit.com>
| * | ext4: Show journal_checksum optionJan Kara2010-05-161-0/+2
| | | | | | | | | | | | | | | | | | | | | We failed to show journal_checksum option in /proc/mounts. Fix it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: Fix for ext4_mb_collect_stats()Curt Wohlgemuth2010-05-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Fix ext4_mb_collect_stats() to use the correct test for s_bal_success; it should be testing "best-extent.fe_len >= orig-extent.fe_len" , not "orig-extent.fe_len >= goal-extent.fe_len" . Signed-off-by: Curt Wohlgemuth <curtw@google.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: check for a good block group before loading buddy pagesCurt Wohlgemuth2010-05-162-13/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a new field in ext4_group_info to cache the largest available block range in a block group; and don't load the buddy pages until *after* we've done a sanity check on the block group. With large allocation requests (e.g., fallocate(), 8MiB) and relatively full partitions, it's easy to have no block groups with a block extent large enough to satisfy the input request length. This currently causes the loop during cr == 0 in ext4_mb_regular_allocator() to load the buddy bitmap pages for EVERY block group. That can be a lot of pages. The patch below allows us to call ext4_mb_good_group() BEFORE we load the buddy pages (although we have check again after we lock the block group). Addresses-Google-Bug: #2578108 Addresses-Google-Bug: #2704453 Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: Prevent creation of files larger than RLIMIT_FSIZE using fallocateNikanth Karthikesan2010-05-161-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Currently using posix_fallocate one can bypass an RLIMIT_FSIZE limit and create a file larger than the limit. Add a check for that. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Amit Arora <aarora@in.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: Remove extraneous newlines in ext4_msg() callsCurt Wohlgemuth2010-05-162-4/+4
| | | | | | | | | | | | | | | | | | | | | Addresses-Google-Bug: #2562325 Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: Print mount options in when mounting and add a remount messageCurt Wohlgemuth2010-05-161-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a "re-mounted" message to ext4_remount(), and both it and the mount message in ext4_fill_super() now have the original mount options data string. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: don't use quota reservation for speculative metadataEric Sandeen2010-05-162-48/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because we can badly over-reserve metadata when we calculate worst-case, it complicates things for quota, since we must reserve and then claim later, retry on EDQUOT, etc. Quota is also a generally smaller pool than fs free blocks, so this over-reservation hurts more, and more often. I'm of the opinion that it's not the worst thing to allow metadata to push a user slightly over quota. This simplifies the code and avoids the false quota rejections that result from worst-case speculation. This patch stops the speculative quota-charging for worst-case metadata requirements, and just charges quota when the blocks are allocated at writeout. It also is able to remove the try-again loop on EDQUOT. This patch has been tested indirectly by running the xfstests suite with a hack to mount & enable quota prior to the test. I also did a more specific test of fragmenting freespace and then doing a large delalloc write under quota; quota stopped me at the right amount of file IO, and then the writeout generated enough metadata (due to the fragmentation) that it put me slightly over quota, as expected. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | quota: add the option to not fail with EDQUOT in blockEric Sandeen2010-05-162-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To simplify metadata tracking for delalloc writes, ext4 will simply claim metadata blocks at allocation time, without first speculatively reserving the worst case and then freeing what was not used. To do this, we need a mechanism to track allocations in the quota subsystem, but potentially allow that allocation to actually go over quota. This patch adds a DQUOT_SPACE_NOFAIL flag and function variants for this purpose. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | quota: use flags interface for dquot alloc/free spaceEric Sandeen2010-05-162-14/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Switch __dquot_alloc_space and __dquot_free_space to take flags to indicate whether to warn and/or to reserve (or free reserve). This is slightly more readable at the callpoints, and makes it cleaner to add a "nofail" option in the next patch. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: init statistics after journal recoveryDmitry Monakhov2010-05-161-22/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently block/inode/dir counters initialized before journal was recovered. In fact after journal recovery this info will probably change. And freeblocks it critical for correct delalloc mode accounting. https://bugzilla.kernel.org/show_bug.cgi?id=15768 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: clean up inode bitmaps manipulation in ext4_free_inodeDmitry Monakhov2010-05-161-46/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Reorganize locking scheme to batch two atomic operation in to one. This also allow us to state what healthy group must obey following rule ext4_free_inodes_count(sb, gdp) == ext4_count_free(inode_bitmap, NUM); - Fix possible undefined pointer dereference. - Even if group descriptor stats aren't accessible we have to update inode bitmaps. - Move non-group members update out of group_lock. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: Do not zero out uninitialized extents beyond i_sizeDmitry Monakhov2010-05-161-16/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The extents code will sometimes zero out blocks and mark them as initialized instead of splitting an extent into several smaller ones. This optimization however, causes problems if the extent is beyond i_size because fsck will complain if there are uninitialized blocks after i_size as this can not be distinguished from an inode that has an incorrect i_size field. https://bugzilla.kernel.org/show_bug.cgi?id=15742 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | jbd2: Improve scalability by not taking j_state_lock in jbd2_journal_stop()Theodore Ts'o2010-05-161-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the most contended locks in the jbd2 layer is j_state_lock when running dbench. This is especially true if using the real-time kernel with its "sleeping spinlocks" patch that replaces spinlocks with priority inheriting mutexes --- but it also shows up on large SMP benchmarks. Thanks to John Stultz for pointing this out. Reviewed by Mingming Cao and Jan Kara. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: don't scan/accumulate more pages than mballoc will allocateEric Sandeen2010-05-161-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a bug reported on RHEL5 that a 10G dd on a 12G box had a very, very slow sync after that. At issue was the loop in write_cache_pages scanning all the way to the end of the 10G file, even though the subsequent call to mpage_da_submit_io would only actually write a smallish amt; then we went back to the write_cache_pages loop ... wasting tons of time in calling __mpage_da_writepage for thousands of pages we would just revisit (many times) later. Upstream it's not such a big issue for sys_sync because we get to the loop with a much smaller nr_to_write, which limits the loop. However, talking with Aneesh he realized that fsync upstream still gets here with a very large nr_to_write and we face the same problem. This patch makes mpage_add_bh_to_extent stop the loop after we've accumulated 2048 pages, by setting mpd->io_done = 1; which ultimately causes the write_cache_pages loop to break. Repeating the test with a dirty_ratio of 80 (to leave something for fsync to do), I don't see huge IO performance gains, but the reduction in cpu usage is striking: 80% usage with stock, and 2% with the below patch. Instrumenting the loop in write_cache_pages clearly shows that we are wasting time here. Eventually we need to change mpage_da_map_pages() also submit its I/O to the block layer, subsuming mpage_da_submit_io(), and then change it call ext4_get_blocks() multiple times. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: stop issuing discards if not supported by deviceEric Sandeen2010-05-161-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Turn off issuance of discard requests if the device does not support it - similar to the action we take for barriers. This will save a little computation time if a non-discardable device is mounted with -o discard, and also makes it obvious that it's not doing what was asked at mount time ... Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: don't return to userspace after freezing the fs with a mutex heldEric Sandeen2010-05-161-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ext4_freeze() used jbd2_journal_lock_updates() which takes the j_barrier mutex, and then returns to userspace. The kernel does not like this: ================================================ [ BUG: lock held when returning to user space! ] ------------------------------------------------ lvcreate/1075 is leaving the kernel with locks still held! 1 lock held by lvcreate/1075: #0: (&journal->j_barrier){+.+...}, at: [<ffffffff811c6214>] jbd2_journal_lock_updates+0xe1/0xf0 Use vfs_check_frozen() added to ext4_journal_start_sb() and ext4_force_commit() instead. Addresses-Red-Hat-Bugzilla: #568503 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: symlink must be handled via filesystem specific operationDmitry Monakhov2010-05-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | generic setattr implementation is no longer responsible for quota transfer so synlinks must be handled via ext4_setattr. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: check s_log_groups_per_flex in online resize codeEric Sandeen2010-05-161-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If groups_per_flex < 2, sbi->s_flex_groups[] doesn't get filled out, and every other access to this first tests s_log_groups_per_flex; same thing needs to happen in resize or we'll wander off into a null pointer when doing an online resize of the file system. Thanks to Christoph Biedl, who came up with the trivial testcase: # truncate --size 128M fsfile # mkfs.ext3 -F fsfile # tune2fs -O extents,uninit_bg,dir_index,flex_bg,huge_file,dir_nlink,extra_isize fsfile # e2fsck -yDf -C0 fsfile # truncate --size 132M fsfile # losetup /dev/loop0 fsfile # mount /dev/loop0 mnt # resize2fs -p /dev/loop0 https://bugzilla.kernel.org/show_bug.cgi?id=13549 Reported-by: Alessandro Polverini <alex@nibbles.it> Test-case-by: Christoph Biedl <bugzilla.kernel.bpeb@manchmal.in-ulm.de> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: fix quota accounting in case of fallocateDmitry Monakhov2010-05-161-1/+2
| | | | | | | | | | | | | | | | | | | | | allocated_meta_data is already included in 'used' variable. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: allow defrag (EXT4_IOC_MOVE_EXT) in 32bit compat modeChristian Borntraeger2010-05-151-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I have an x86_64 kernel with i386 userspace. e4defrag fails on the EXT4_IOC_MOVE_EXT ioctl because it is not wired up for the compat case. It seems that struct move_extent is compat save, only types with fixed widths are used: { __u32 reserved; /* should be zero */ __u32 donor_fd; /* donor file descriptor */ __u64 orig_start; /* logical start offset in block for orig */ __u64 donor_start; /* logical start offset in block for donor */ __u64 len; /* block length to be moved */ __u64 moved_len; /* moved block length */ }; Lets just wire up EXT4_IOC_MOVE_EXT for the compat case. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com> CC: Akira Fujita <a-fujita@rs.jp.nec.com>
| * | ext4: rename ext4_mb_release_desc() to ext4_mb_unload_buddy()Jing Zhang2010-05-141-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This function cleans up after ext4_mb_load_buddy(), so the renaming makes the code clearer. Signed-off-by: Jing Zhang <zj.barak@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>