aboutsummaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* fs/buffer.c: remove duplicated assignment to b_privateNamhyung Kim2010-10-261-1/+0
| | | | | | | | | | bh->b_private is initialized within init_buffer(), thus this assignment is redundant. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/direct-io.c: fix truncation error in dio_complete() returnEdward Shishkin2010-10-261-1/+1
| | | | | | | | | | | Fix up truncation (ssize_t->int). This only matters with >2G reads/writes, which the kernel doesn't permit. Signed-off-by: Edward Shishkin <edward@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fuse: use clear_highpage() and KM_USER0 instead of KM_USER1Miklos Szeredi2010-10-261-7/+5
| | | | | | | | | | | Commit 7909b1c640 ("fuse: don't use atomic kmap") removed KM_USER0 usage from fuse/dev.c. Switch KM_USER1 uses to KM_USER0 for clarity. Also replace open coded clear_highpage(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* use clear_page()/copy_page() in favor of memset()/memcpy() on whole pagesJan Beulich2010-10-261-1/+1
| | | | | | | | | | | After all that's what they are intended for. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* hostfs: code cleanupsRichard Weinberger2010-10-261-3/+2
| | | | | | | | | Some code cleanups for hostfs. Signed-off-by: Richard Weinberger <richard@nod.at> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/fs-writeback.c: restore lost commentAndrew Morton2010-10-261-0/+4
| | | | | | | | | I had to go back to a 2.6.20 tree to work out why we're adding a number-of-inodes into a number-of-pages count. Restore the lost comment. Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* writeback: remove the internal 5% low bound on dirty_ratioWu Fengguang2010-10-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | The dirty_ratio was silently limited in global_dirty_limits() to >= 5%. This is not a user expected behavior. And it's inconsistent with calc_period_shift(), which uses the plain vm_dirty_ratio value. Let's remove the internal bound. At the same time, fix balance_dirty_pages() to work with the dirty_thresh=0 case. This allows applications to proceed when dirty+writeback pages are all cleaned. And ">" fits with the name "exceeded" better than ">=" does. Neil thinks it is an aesthetic improvement as well as a functional one :) Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Jan Kara <jack@suse.cz> Proposed-by: Con Kolivas <kernel@kolivas.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Neil Brown <neilb@suse.de> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michael Rubin <mrubin@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: add account_page_writeback()Michael Rubin2010-10-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To help developers and applications gain visibility into writeback behaviour this patch adds two counters to /proc/vmstat. # grep nr_dirtied /proc/vmstat nr_dirtied 3747 # grep nr_written /proc/vmstat nr_written 3618 These entries allow user apps to understand writeback behaviour over time and learn how it is impacting their performance. Currently there is no way to inspect dirty and writeback speed over time. It's not possible for nr_dirty/nr_writeback. These entries are necessary to give visibility into writeback behaviour. We have /proc/diskstats which lets us understand the io in the block layer. We have blktrace for more in depth understanding. We have e2fsprogs and debugsfs to give insight into the file systems behaviour, but we don't offer our users the ability understand what writeback is doing. There is no way to know how active it is over the whole system, if it's falling behind or to quantify it's efforts. With these values exported users can easily see how much data applications are sending through writeback and also at what rates writeback is processing this data. Comparing the rates of change between the two allow developers to see when writeback is not able to keep up with incoming traffic and the rate of dirty memory being sent to the IO back end. This allows folks to understand their io workloads and track kernel issues. Non kernel engineers at Google often use these counters to solve puzzling performance problems. Patch #4 adds a pernode vmstat file with nr_dirtied and nr_written Patch #5 add writeback thresholds to /proc/vmstat Currently these values are in debugfs. But they should be promoted to /proc since they are useful for developers who are writing databases and file servers and are not debugging the kernel. The output is as below: # grep threshold /proc/vmstat nr_pages_dirty_threshold 409111 nr_pages_dirty_background_threshold 818223 This patch: This allows code outside of the mm core to safely manipulate page writeback state and not worry about the other accounting. Not using these routines means that some code will lose track of the accounting and we get bugs. Modify nilfs2 to use interface. Signed-off-by: Michael Rubin <mrubin@google.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Jiro SEKIBA <jir@unicus.jp> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* writeback: remove nonblocking/encountered_congestion referencesWu Fengguang2010-10-268-45/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This removes more dead code that was somehow missed by commit 0d99519efef (writeback: remove unused nonblocking and congestion checks). There are no behavior change except for the removal of two entries from one of the ext4 tracing interface. The nonblocking checks in ->writepages are no longer used because the flusher now prefer to block on get_request_wait() than to skip inodes on IO congestion. The latter will lead to more seeky IO. The nonblocking checks in ->writepage are no longer used because it's redundant with the WB_SYNC_NONE check. We no long set ->nonblocking in VM page out and page migration, because a) it's effectively redundant with WB_SYNC_NONE in current code b) it's old semantic of "Don't get stuck on request queues" is mis-behavior: that would skip some dirty inodes on congestion and page out others, which is unfair in terms of LRU age. Inspired by Christoph Hellwig. Thanks! Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: David Howells <dhowells@redhat.com> Cc: Sage Weil <sage@newdream.net> Cc: Steve French <sfrench@samba.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* oom: fix locking for oom_adj and oom_score_adjDavid Rientjes2010-10-261-19/+21
| | | | | | | | | | | | | | | | | | The locking order in oom_adjust_write() and oom_score_adj_write() for task->alloc_lock and task->sighand->siglock is reversed, and lockdep notices that irqs could encounter an ABBA scenario. This fixes the locking order so that we always take task_lock(task) prior to lock_task_sighand(task). Signed-off-by: David Rientjes <rientjes@google.com> Reported-by: Andrew Morton <akpm@linux-foundation.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ying Han <yinghan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* oom: rewrite error handling for oom_adj and oom_score_adj tunablesDavid Rientjes2010-10-261-35/+48
| | | | | | | | | | | | | | | It's better to use proper error handling in oom_adjust_write() and oom_score_adj_write() instead of duplicating the locking order on various exit paths. Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ying Han <yinghan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* oom: add per-mm oom disable countYing Han2010-10-262-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's pointless to kill a task if another thread sharing its mm cannot be killed to allow future memory freeing. A subsequent patch will prevent kills in such cases, but first it's necessary to have a way to flag a task that shares memory with an OOM_DISABLE task that doesn't incur an additional tasklist scan, which would make select_bad_process() an O(n^2) function. This patch adds an atomic counter to struct mm_struct that follows how many threads attached to it have an oom_score_adj of OOM_SCORE_ADJ_MIN. They cannot be killed by the kernel, so their memory cannot be freed in oom conditions. This only requires task_lock() on the task that we're operating on, it does not require mm->mmap_sem since task_lock() pins the mm and the operation is atomic. [rientjes@google.com: changelog and sys_unshare() code] [rientjes@google.com: protect oom_disable_count with task_lock in fork] [rientjes@google.com: use old_mm for oom_disable_count in exec] Signed-off-by: Ying Han <yinghan@google.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* vmcore: it is not experimental any moreWANG Cong2010-10-261-2/+2
| | | | | | | | | | We use vmcore in our production kernel for a long time, it is pretty stable now. So I don't think we need to mark it as experimental any more. Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* hostfs: fix UML crash: remove f_spare from hostfsRichard Weinberger2010-10-263-10/+4
| | | | | | | | | | | | | | | 365b1818 ("add f_flags to struct statfs(64)") resized f_spare within struct statfs which caused a UML crash. There is no need to copy f_spare. Signed-off-by: Richard Weinberger <richard@nod.at> Reported-by: Toralf Förster <toralf.foerster@gmx.de> Tested-by: Toralf Förster <toralf.foerster@gmx.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jeff Dike <jdike@addtoit.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'hwpoison' of ↵Linus Torvalds2010-10-262-0/+25
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (22 commits) Add _addr_lsb field to ia64 siginfo Fix migration.c compilation on s390 HWPOISON: Remove retry loop for try_to_unmap HWPOISON: Turn addr_valid from bitfield into char HWPOISON: Disable DEBUG by default HWPOISON: Convert pr_debugs to pr_info HWPOISON: Improve comments in memory-failure.c x86: HWPOISON: Report correct address granuality for huge hwpoison faults Encode huge page size for VM_FAULT_HWPOISON errors Fix build error with !CONFIG_MIGRATION hugepage: move is_hugepage_on_freelist inside ifdef to avoid warning Clean up __page_set_anon_rmap HWPOISON, hugetlb: fix unpoison for hugepage HWPOISON, hugetlb: soft offlining for hugepage HWPOSION, hugetlb: recover from free hugepage error when !MF_COUNT_INCREASED hugetlb: move refcounting in hugepage allocation inside hugetlb_lock HWPOISON, hugetlb: add free check to dequeue_hwpoison_huge_page() hugetlb: hugepage migration core hugetlb: redefine hugepage copy functions hugetlb: add allocate function for hugepage migration ...
| * Merge branch 'hwpoison-hugepages' into hwpoisonAndi Kleen2010-10-221-0/+15
| |\ | | | | | | | | | | | | Conflicts: mm/memory-failure.c
| | * hugetlb: hugepage migration coreNaoya Horiguchi2010-10-081-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch extends page migration code to support hugepage migration. One of the potential users of this feature is soft offlining which is triggered by memory corrected errors (added by the next patch.) Todo: - there are other users of page migration such as memory policy, memory hotplug and memocy compaction. They are not ready for hugepage support for now. ChangeLog since v4: - define migrate_huge_pages() - remove changes on isolation/putback_lru_page() ChangeLog since v2: - refactor isolate/putback_lru_page() to handle hugepage - add comment about race on unmap_and_move_huge_page() ChangeLog since v1: - divide migration code path for hugepage - define routine checking migration swap entry for hugetlb - replace "goto" with "if/else" in remove_migration_pte() Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andi Kleen <ak@linux.intel.com>
| * | Merge branch 'hwpoison-fixes-2.6.37' into hwpoisonAndi Kleen2010-10-221-0/+10
| |\ \
| | * | HWPOISON/signalfd: add support for addr_lsbHidetoshi Seto2010-10-081-0/+10
| | |/ | | | | | | | | | | | | | | | | | | | | | Similar change as to signal delivery: copy out the si_addr_lsb field to user space in signalfd Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
* | | Merge branch 'for-2.6.37' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2010-10-2623-461/+635
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-2.6.37' of git://linux-nfs.org/~bfields/linux: (99 commits) svcrpc: svc_tcp_sendto XPT_DEAD check is redundant svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue svcrpc: assume svc_delete_xprt() called only once svcrpc: never clear XPT_BUSY on dead xprt nfsd4: fix connection allocation in sequence() nfsd4: only require krb5 principal for NFSv4.0 callbacks nfsd4: move minorversion to client nfsd4: delay session removal till free_client nfsd4: separate callback change and callback probe nfsd4: callback program number is per-session nfsd4: track backchannel connections nfsd4: confirm only on succesful create_session nfsd4: make backchannel sequence number per-session nfsd4: use client pointer to backchannel session nfsd4: move callback setup into session init code nfsd4: don't cache seq_misordered replies SUNRPC: Properly initialize sock_xprt.srcaddr in all cases SUNRPC: Use conventional switch statement when reclassifying sockets sunrpc/xprtrdma: clean up workqueue usage sunrpc: Turn list_for_each-s into the ..._entry-s ... Fix up trivial conflicts (two different deprecation notices added in separate branches) in Documentation/feature-removal-schedule.txt
| * | | nfsd4: fix connection allocation in sequence()J. Bruce Fields2010-10-241-14/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We're doing an allocation under a spinlock, and ignoring the possibility of allocation failure. A better fix wouldn't require an unnecessary allocation in the common case, but we'll leave that for later. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | nfsd4: only require krb5 principal for NFSv4.0 callbacksJ. Bruce Fields2010-10-211-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the sessions backchannel case, we don't need a krb5 principal name for the client; we use the already-created forechannel credentials instead. Some cleanup, while we're there: make it clearer which code here is 4.0- or sessions- specific. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | nfsd4: move minorversion to clientJ. Bruce Fields2010-10-213-7/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The minorversion seems more a property of the client than the callback channel. Some time we should probably also enforce consistent minorversion usage from the client; for now, this is just a cosmetic change. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | nfsd4: delay session removal till free_clientJ. Bruce Fields2010-10-211-8/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Have unhash_client_locked() remove client and associated sessions from global hashes, but delay further dismantling till free_client(). (After unhash_client_locked(), the only remaining references outside the destroying thread are from any connections which have xpt_user callbacks registered.) This will simplify locking on session destruction. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | nfsd4: separate callback change and callback probeJ. Bruce Fields2010-10-213-9/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Only one of the nfsd4_callback_probe callers actually cares about changing the callback information. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | nfsd4: callback program number is per-sessionJ. Bruce Fields2010-10-213-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The callback program is allowed to depend on the session which the callback is going over. No change in behavior yet, while we still only do callbacks over a single session for the lifetime of the client. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | nfsd4: track backchannel connectionsJ. Bruce Fields2010-10-211-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to keep track of which connections are available for use with the backchannel, which for the forechannel, and which for both. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | nfsd4: confirm only on succesful create_sessionJ. Bruce Fields2010-10-211-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Following rfc 5661, section 18.36.4: "If the session is not successfully created, then no changes are made to any client records on the server." We shouldn't be confirming or incrementing the sequence id in this case. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | nfsd4: make backchannel sequence number per-sessionJ. Bruce Fields2010-10-213-15/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we don't deal well with a client that has multiple sessions associated with it (even simultaneously, or serially over the lifetime of the client). In particular, we don't attempt to keep the backchannel running after the original session diseappears. We will fix that soon. Once we do that, we need the slot sequence number to be per-session; otherwise, for example, we cannot correctly handle a case like this: - All session 1 connections are lost. - The client creates session 2. We use it for the backchannel (since it's the only working choice). - The client gives us a new connection to use with session 1. - The client destroys session 2. At this point our only choice is to go back to using session 1. When we do so we must use the sequence number that is next for session 1. We therefore need to maintain multiple sequence number streams. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | nfsd4: use client pointer to backchannel sessionJ. Bruce Fields2010-10-213-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of copying the sessionid, use the new cl_cb_session pointer, which indicates which session we're using for the backchannel. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | nfsd4: move callback setup into session init codeJ. Bruce Fields2010-10-212-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The backchannel should be associated with a session, it isn't really global to the client. We do, however, want a pointer global to the client which tracks which session we're currently using for client-based callbacks. This is a first step in that direction; for now, just reshuffling of code with no significant change in behavior. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | nfsd4: don't cache seq_misordered repliesJ. Bruce Fields2010-10-211-2/+1
| | | | | | | | | | | | | | | | Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | nfsd4: expire clients more promptlyJ. Bruce Fields2010-10-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Expire clients more promptly, at the expense of possibly running the laundromat thread more frequently. Though it's not the default, I'd like it to be feasible to run with a lease time of just a few seconds, at which point a minimum 10 second wait between laundromat runs seems a little much. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | nfsd4: return expired on unfound stateid'sJ. Bruce Fields2010-10-021-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 78155ed75f470710f2aecb3e75e3d97107ba8374 "nfsd4: distinguish expired from stale stateids" attempted to distinguish expired and stale stateid's using time information that may not have been completely reliable, so I reverted it. That was throwing out the baby with the bathwater; we still do want to return expired, but let's do that using the simpler approach of just assuming any stateid is expired if it looks like it was given out by the current server instance, but we can't find it any more. This may help clients that are recovering from network partitions. Reported-by: Bian Naimeng <biannm@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | nfsd4: add new connections to sessionJ. Bruce Fields2010-10-011-2/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As long as we're not implementing any session security, we should just automatically add any new connections that come along to the list of sessions associated with the session. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | nfsd4: refactor connection allocationJ. Bruce Fields2010-10-011-6/+26
| | | | | | | | | | | | | | | | Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | nfsd4: use callbacks on svc_xprt_deletionJ. Bruce Fields2010-10-012-9/+45
| | | | | | | | | | | | | | | | | | | | | | | | Remove connections from the list when they go down. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | nfsd4: keep per-session list of connectionsJ. Bruce Fields2010-10-012-15/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The spec requires us in various places to keep track of the connections associated with each session. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | nfsd4: clean up session allocationJ. Bruce Fields2010-10-011-122/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes: - make sure session memory reservation is released on failure path. - use min_t()/min() for more compact code in several places. - break alloc_init_session into smaller pieces. - miscellaneous other cleanup. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | nfsd4: fix alloc_init_session return typeJ. Bruce Fields2010-10-011-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | This returns an nfs error, not -ERRNO. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | nfsd4: fix alloc_init_session BUILD_BUG_ON()J. Bruce Fields2010-10-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Note we're allocating an array of nfsd4_slot *'s, not nfsd4_slot's. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | nfsd4: Move callback setup to callback queueJ. Bruce Fields2010-10-013-27/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of creating the new rpc client from a regular server thread, set a flag, kick off a null call, and allow the null call to do the work of setting up the client on the callback workqueue. Use a spinlock to ensure the callback work gets a consistent view of the callback parameters. This allows, for example, changing the callback from contexts where sleeping is not allowed. I hope it will also keep the locking simple as we add more session and trunking features, by serializing most of the callback-specific work. This also closes a small race where the the new cb_ident could be used with an old connection (or vice-versa). Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | nfsd4: remove separate cb_args structJ. Bruce Fields2010-10-012-30/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | I don't see the point of the separate struct. It seems to just be getting in the way. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | nfsd4: use generic callback code in null caseJ. Bruce Fields2010-10-013-15/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This will eventually allow us, for example, to kick off null callback from contexts where we can't sleep. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | nfsd4: generic callback codeJ. Bruce Fields2010-10-012-39/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the recall callback code more generic, so that other callbacks will be able to use it too. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | nfsd4: rename nfs4_rpc_args->nfsd4_cb_argsJ. Bruce Fields2010-10-012-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | With apologies for the gratuitous rename, the new name seems more helpful to me. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | nfsd4: combine nfs4_rpc_args and nfsd4_cb_sequenceJ. Bruce Fields2010-10-012-23/+18
| | | | | | | | | | | | | | | | | | | | | | | | These two structs don't really need to be distinct as far as I can tell. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | nfsd4: minor variable renaming (cb -> conn)J. Bruce Fields2010-10-012-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have both nfsd4_callback and nfsd4_cb_conn structures, I get confused if variables of both types are always named cb.... Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| * | | sunrpc: Add net to rpc_create_argsPavel Emelyanov2010-10-015-0/+6
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | sunrpc: Add net argument to svc_create_xprtPavel Emelyanov2010-10-014-7/+8
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>