aboutsummaryrefslogtreecommitdiffstats
path: root/fs/splice.c
Commit message (Collapse)AuthorAgeFilesLines
* [PATCH] Remove SUID when splicing into an inodeJens Axboe2006-10-191-4/+15
| | | | | | | | | | Originally from Mark Fasheh <mark.fasheh@oracle.com> generic_file_splice_write() does not remove S_ISUID or S_ISGID. This is inconsistent with the way we generally write to files. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* [PATCH] Introduce generic_file_splice_write_nolock()Mark Fasheh2006-10-191-14/+66
| | | | | | | | | | | This allows file systems to manage their own i_mutex locking while still re-using the generic_file_splice_write() logic. OCFS2 in particular wants this so that it can order cluster locks within i_mutex. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* [PATCH] Take i_mutex in splice_from_pipe()Mark Fasheh2006-10-191-13/+11
| | | | | | | | | | | | | | | The splice_actor may be calling ->prepare_write() and ->commit_write(). We want i_mutex on the inode being written to before calling those so that we don't race i_size changes. The double locking behavior is done elsewhere in splice.c, and if we eventually want _nolock variants of generic_file_splice_write(), fs modules might have to replicate the nasty locking code. We introduce inode_double_lock() and inode_double_unlock() to consolidate the locking rules into one set of functions. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* [PATCH] splice: fix pipe_to_file() ->prepare_write() error pathJens Axboe2006-10-121-3/+3
| | | | | | Don't jump to the unlock+release path, we already did that. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* [PATCH] Update axboe@suse.de email addressJens Axboe2006-09-301-1/+1
| | | | | | | As people often look for the copyright in files to see who to mail, update the link to a neutral one. Signed-off-by: Jens Axboe <axboe@kernel.dk>
* [PATCH] splice: fix problems with sys_tee()Jens Axboe2006-07-101-105/+133
| | | | | | | | | | | | | | Several issues noticed/fixed: - We cannot reliably block in link_pipe() while holding both input and output mutexes. So do preparatory checks before locking down both mutexes and doing the link. - The ipipe->nrbufs vs i check was bad, because we could have dropped the ipipe lock in-between. This causes us to potentially look at unknown buffers if we were racing with someone else reading this pipe. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: retrieve mapping after locking the pageJens Axboe2006-06-231-17/+29
| | | | | | | | | Otherwise we could be racing with truncate/mapping removal. Problem found/fixed by Nick Piggin <npiggin@suse.de>, logic rewritten by me. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: redo page lookup if add_to_page_cache() returns -EEXISTJens Axboe2006-05-041-0/+2
| | | | | | | | | This can happen quite easily, if several processes are trying to splice the same file at the same time. It's not a failure, it just means someone raced with us in allocating this file page. So just dump the allocated page and relookup the original. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: rename remaining info variables to pipeJens Axboe2006-05-041-10/+10
| | | | | | | Same thing was done in fs/pipe.c and most of fs/splice.c, but we had a few missing still. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: LRU fixupsJens Axboe2006-05-041-22/+11
| | | | | | | | Nick says that the current construct isn't safe. This goes back to the original, but sets PIPE_BUF_FLAG_LRU on user pages as well as they all seem to be on the LRU in the first place. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: fix unlocking of page on error ->prepare_write()Jens Axboe2006-05-041-3/+16
| | | | | | | | | Looking at generic_file_buffered_write(), we need to unlock_page() if prepare write fails and it isn't due to racing with truncate(). Also trim the size if ->prepare_write() fails, if we have to. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] vmsplice: restrict stealing a little moreJens Axboe2006-05-021-1/+1
| | | | | | | Apply the same rules as the anon pipe pages, only allow stealing if no one else is using the page. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: fix page LRU accountingJens Axboe2006-05-021-10/+21
| | | | | | | | | | | | Currently we rely on the PIPE_BUF_FLAG_LRU flag being set correctly to know whether we need to fiddle with page LRU state after stealing it, however for some origins we just don't know if the page is on the LRU list or not. So remove PIPE_BUF_FLAG_LRU and do this check/add manually in pipe_to_file() instead. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] vmsplice: fix badly placed end paranthesisJens Axboe2006-05-021-1/+1
| | | | | | | | We need to use the minium of {len, PAGE_SIZE-off}, not {len, PAGE_SIZE}-off. The latter doesn't make any sense, and could cause us to attempt negative length transfers... Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] vmsplice: allow user to pass in gift pagesJens Axboe2006-05-011-3/+25
| | | | | | | | | | | If SPLICE_F_GIFT is set, the user is basically giving this pages away to the kernel. That means we can steal them for eg page cache uses instead of copying it. The data must be properly page aligned and also a multiple of the page size in length. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] pipe: enable atomic copying of pipe data to/from user spaceJens Axboe2006-05-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pipe ->map() method uses kmap() to virtually map the pages, which is both slow and has known scalability issues on SMP. This patch enables atomic copying of pipe pages, by pre-faulting data and using kmap_atomic() instead. lmbench bw_pipe and lat_pipe measurements agree this is a Good Thing. Here are results from that on a UP machine with highmem (1.5GiB of RAM), running first a UP kernel, SMP kernel, and SMP kernel patched. Vanilla-UP: Pipe bandwidth: 1622.28 MB/sec Pipe bandwidth: 1610.59 MB/sec Pipe bandwidth: 1608.30 MB/sec Pipe latency: 7.3275 microseconds Pipe latency: 7.2995 microseconds Pipe latency: 7.3097 microseconds Vanilla-SMP: Pipe bandwidth: 1382.19 MB/sec Pipe bandwidth: 1317.27 MB/sec Pipe bandwidth: 1355.61 MB/sec Pipe latency: 9.6402 microseconds Pipe latency: 9.6696 microseconds Pipe latency: 9.6153 microseconds Patched-SMP: Pipe bandwidth: 1578.70 MB/sec Pipe bandwidth: 1579.95 MB/sec Pipe bandwidth: 1578.63 MB/sec Pipe latency: 9.1654 microseconds Pipe latency: 9.2266 microseconds Pipe latency: 9.1527 microseconds Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: call handle_ra_miss() on failure to lookup pageJens Axboe2006-05-011-0/+6
| | | | | | | Notify the readahead logic of the missing page. Suggested by Oleg Nesterov. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] pipe: introduce ->pin() buffer operationJens Axboe2006-05-011-61/+30
| | | | | | | | | | | | | The ->map() function is really expensive on highmem machines right now, since it has to use the slower kmap() instead of kmap_atomic(). Splice rarely needs to access the virtual address of a page, so it's a waste of time doing it. Introduce ->pin() to take over the responsibility of making sure the page data is valid. ->map() is then reduced to just kmap(). That way we can also share a most of the pipe buffer ops between pipe.c and splice.c Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: fix bugs in pipe_to_file()Jens Axboe2006-05-011-18/+19
| | | | | | | | | | | | | | | | Found by Oleg Nesterov <oleg@tv-sign.ru>, fixed by me. - Only allow full pages to go to the page cache. - Check page != buf->page instead of using PIPE_BUF_FLAG_STOLEN. - Remember to clear 'stolen' if add_to_page_cache() fails. And as a cleanup on that: - Make the bottom fall-through logic a little less convoluted. Also make the steal path hold an extra reference to the page, so we don't have to differentiate between stolen and non-stolen at the end. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: fix bugs with stealing regular pipe pagesJens Axboe2006-04-301-1/+3
| | | | | | | | | - Check that page has suitable count for stealing in the regular pipes. - pipe_to_file() assumes that the page is locked on succesful steal, so do that in the pipe steal hook - Missing unlock_page() in add_to_page_cache() failure. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: make the read-side do batched page lookupsJens Axboe2006-04-271-30/+65
| | | | | | | | Use the new find_get_pages_contig() to potentially look up the entire splice range in one single call. This speeds up generic_file_splice_read() quite a bit. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: switch to using page_cache_readahead()Jens Axboe2006-04-271-2/+2
| | | | | | Avoids doing useless work, when the file is fully cached. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: rearrange moving to/from pipe helpersJens Axboe2006-04-261-24/+11
| | | | | | We need these for people writing their own ->splice_read/write hooks. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] Add support for the sys_vmsplice syscallJens Axboe2006-04-261-39/+253
| | | | | | | | | | | sys_splice() moves data to/from pipes with a file input/output. sys_vmsplice() moves data to a pipe, with the input being a user address range instead. This uses an approach suggested by Linus, where we can hold partial ranges inside the pages[] map. Hopefully this will be useful for network receive support as well. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: fix offset problemsJens Axboe2006-04-261-19/+27
| | | | | | | | | | Make the move_from_pipe() actors return number of bytes processed, then move_from_pipe() can decide more cleverly when to move on to the next buffer. This fixes problems with pipe offset and differing file offset. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: fix min() warningAndrew Morton2006-04-261-1/+1
| | | | | Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: fix smaller sized splice readsJens Axboe2006-04-201-1/+12
| | | | Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: fixup writeout path after ->map changesJens Axboe2006-04-191-19/+30
| | | | | | | | Since ->map() no longer locks the page, we need to adjust the handling of those pages (and stealing) a little. This now passes full regressions again. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: offset fixesJens Axboe2006-04-191-15/+30
| | | | | | | | - We need to adjust *ppos for writes as well. - Copy back modified offset value if one was passed in, similar to what sendfile does. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] tee: link_pipe() must be careful when dropping one of the pipe locksJens Axboe2006-04-191-4/+14
| | | | | | | We need to ensure that we only drop a lock that is ordered last, to avoid ABBA deadlocks with competing processes. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: cleanup the SPLICE_F_NONBLOCK handlingJens Axboe2006-04-191-14/+16
| | | | | | | | - generic_file_splice_read() more readable and correct - Don't bail on page allocation with NONBLOCK set, just don't allow direct blocking on IO (eg lock_page). Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: close i_size truncate races on readJens Axboe2006-04-191-6/+37
| | | | | | We need to check i_size after doing a blocking readpage. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: add support for sys_tee()Jens Axboe2006-04-111-0/+186
| | | | | | | | | | | | Basically an in-kernel implementation of tee, which uses splice and the pipe buffers as an intelligent way to pass data around by reference. Where the user space tee consumes the input and produces a stdout and file output, this syscall merely duplicates the data inside a pipe to another pipe. No data is copied, the output just grabs a reference to the input pipe data. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: pass offset around for ->splice_read() and ->splice_write()Jens Axboe2006-04-111-42/+44
| | | | | | | | We need not use ->f_pos as the offset for the file input/output. If the user passed an offset pointer in through sys_splice(), just use that and leave ->f_pos alone. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: comment stylesIngo Molnar2006-04-111-11/+12
| | | | | | | | | - capitalize consistently - end sentences in one way or another - update comment text to match the implementation Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: add Ingo as addition copyright holderJens Axboe2006-04-111-4/+5
| | | | | | The comment is also somewhat out of date, correct that as well. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: unlikely() optimizationsJens Axboe2006-04-111-8/+7
| | | | | | | Also corrects a few comments. Patch mainly from Ingo, changes by me. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: speedups and optimizationsJens Axboe2006-04-111-20/+13
| | | | | | | | | | | - Kill the local variables that cache ->nrbufs, they just take up space. - Only set do_wakeup for a real pipe. This is a big win for direct splicing. - Kill i_mutex lock around ->f_pos update, regular io paths don't do this either. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: speedup __generic_file_splice_readJens Axboe2006-04-111-11/+63
| | | | | | | Using find_get_page() is a lot faster than find_or_create_page(). This gets splice a lot closer to sendfile() for fd -> socket transfers. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: add direct fd <-> fd splicing supportJens Axboe2006-04-111-18/+130
| | | | | | | | | | | | It's more efficient for sendfile() emulation. Basically we cache an internal private pipe and just use that as the intermediate area for pages. Direct splicing is not available from sys_splice(), it is only meant to be used for sendfile() emulation. Additional patch from Ingo Molnar to avoid the PIPE_BUFFERS loop at exit for the normal fast path. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: add optional input and output offsetsIngo Molnar2006-04-101-13/+41
| | | | | | | | | | | | | | | | | | add optional input and output offsets to sys_splice(), for seekable file descriptors: asmlinkage long sys_splice(int fd_in, loff_t __user *off_in, int fd_out, loff_t __user *off_out, size_t len, unsigned int flags); semantics are straightforward: f_pos will be updated with the offset provided by user-space, before the splice transfer is about to begin. Providing a NULL offset pointer means the existing f_pos will be used (and updated in situ). Providing an offset for a pipe results in -ESPIPE. Providing an invalid offset pointer results in -EFAULT. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] introduce a "kernel-internal pipe object" abstractionIngo Molnar2006-04-101-59/+63
| | | | | | | | | | | | | | | | | | separate out the 'internal pipe object' abstraction, and make it usable to splice. This cleans up and fixes several aspects of the internal splice APIs and the pipe code: - pipes: the allocation and freeing of pipe_inode_info is now more symmetric and more streamlined with existing kernel practices. - splice: small micro-optimization: less pointer dereferencing in splice methods Signed-off-by: Ingo Molnar <mingo@elte.hu> Update XFS for the ->splice_read/->splice_write changes. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: be smarter about calling do_page_cache_readahead()Jens Axboe2006-04-101-2/+5
| | | | | | | We don't want to call into the read-ahead logic unless we are at the start of a page, _or_ we have multiple pages to read. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: optimize the splice buffer mappingJens Axboe2006-04-101-10/+30
| | | | | | | We don't really need to lock down the pages, just make sure they are uptodate. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: cleanup __generic_file_splice_read()Jens Axboe2006-04-101-49/+10
| | | | | | | The whole shadow/pages logic got overly complex, and this simpler approach is actually faster in testing. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: only call wake_up_interruptible() when we really have toJens Axboe2006-04-101-4/+12
| | | | | | | __wake_up_common() is pretty heavy in the kernel profiles, this brings it down to a more acceptable level. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: potential !page dereferenceDave Jones2006-04-101-1/+2
| | | | | | | | We can get to out: with a NULL page, which we probably don't want to be calling page_cache_release() on. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: mark the io page as accessedJens Axboe2006-04-101-0/+1
| | | | | | | We should do that, since we do the LRU manipulation ourselves now. Suggested by Nick Piggin. Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: fix page stealing LRU handling.Jens Axboe2006-04-021-19/+11
| | | | | | | | | | | | Originally from Nick Piggin, just adapted to the newer branch. You can't check PageLRU without holding zone->lru_lock. The page release code can get away with it only because the page refcount is 0 at that point. Also, you can't reliably remove pages from the LRU unless the refcount is 0. Ever. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Jens Axboe <axboe@suse.de>
* [PATCH] splice: page stealing needs to wait_on_page_writeback()Jens Axboe2006-04-021-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Thanks to Andrew for the good explanation of why this is so. akpm writes: If a page is under writeback and we remove it from pagecache, it's still going to get written to disk. But the VFS no longer knows about that page, nor that this page is about to modify disk blocks. So there might be scenarios in which those blocks-which-are-about-to-be-written-to get reused for something else. When writeback completes, it'll scribble on those blocks. This won't happen in ext2/ext3-style filesystems in normal mode because the page has buffers and try_to_release_page() will fail. But ext2 in nobh mode doesn't attach buffers at all - it just sticks the page in a BIO, finds some new blocks, points the BIO at those blocks and lets it rip. While that write IO's in flight, someone could truncate the file. Truncate won't block on the writeout because the page isn't in pagecache any more. So truncate will the free the blocks from the file under the page's feet. Then something else can reallocate those blocks. Then write data to them. Now, the original write completes, corrupting the filesystem. Signed-off-by: Jens Axboe <axboe@suse.de>