aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* tcp: /proc/net/tcp rto,ato values not scaled properly (v2)Stephen Hemminger2008-06-272-6/+6
| | | | | | | | | | | I found another case where we are sending information to userspace in the wrong HZ scale. This should have been fixed back in 2.5 :-( This means an ABI change but as it stands there is no way for an application like ss to get the right value. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* include/linux/netdevice.h: don't export MAX_HEADER to userspaceAdrian Bunk2008-06-271-0/+4
| | | | | | | Due to the CONFIG_'s the value is anyway not correct in userspace. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* pkt_sched: Remove CONFIG_NET_SCH_RRAdrian Bunk2008-06-271-11/+0
| | | | | | | | | | | Commit d62733c8e437fdb58325617c4b3331769ba82d70 ([SCHED]: Qdisc changes and sch_rr added for multiqueue) added a NET_SCH_RR option that was unused since the code went unconditionally into sch_prio. Reported-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* pkt_sched: ERR_PTR() ususally encodes an negative errno, not positive.WANG Cong2008-06-271-1/+1
| | | | | | | | | Note, in the following patch, 'err' is initialized as: int err = -ENOBUFS; Signed-off-by: WANG Cong <wcong@critical-links.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netdevice: Fix typo of dev_unicast_add() commentWang Chen2008-06-271-1/+1
| | | | | Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* af_unix: fix 'poll for write'/connected DGRAM socketsRainer Weikusat2008-06-271-28/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For n:1 'datagram connections' (eg /dev/log), the unix_dgram_sendmsg routine implements a form of receiver-imposed flow control by comparing the length of the receive queue of the 'peer socket' with the max_ack_backlog value stored in the corresponding sock structure, either blocking the thread which caused the send-routine to be called or returning EAGAIN. This routine is used by both SOCK_DGRAM and SOCK_SEQPACKET sockets. The poll-implementation for these socket types is datagram_poll from core/datagram.c. A socket is deemed to be writeable by this routine when the memory presently consumed by datagrams owned by it is less than the configured socket send buffer size. This is always wrong for PF_UNIX non-stream sockets connected to server sockets dealing with (potentially) multiple clients if the abovementioned receive queue is currently considered to be full. 'poll' will then return, indicating that the socket is writeable, but a subsequent write result in EAGAIN, effectively causing an (usual) application to 'poll for writeability by repeated send request with O_NONBLOCK set' until it has consumed its time quantum. The change below uses a suitably modified variant of the datagram_poll routines for both type of PF_UNIX sockets, which tests if the recv-queue of the peer a socket is connected to is presently considered to be 'full' as part of the 'is this socket writeable'-checking code. The socket being polled is additionally put onto the peer_wait wait queue associated with its peer, because the unix_dgram_recvmsg routine does a wake up on this queue after a datagram was received and the 'other wakeup call' is done implicitly as part of skb destruction, meaning, a process blocked in poll because of a full peer receive queue could otherwise sleep forever if no datagram owned by its socket was already sitting on this queue. Among this change is a small (inline) helper routine named 'unix_recvq_full', which consolidates the actual testing code (in three different places) into a single location. Signed-off-by: Rainer Weikusat <rweikusat@mssgmbh.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: fix for splice receive when used with software LROOctavian Purdila2008-06-271-5/+12
| | | | | | | | | | | | | | | | | | | If an skb has nr_frags set to zero but its frag_list is not empty (as it can happen if software LRO is enabled), and a previous tcp_read_sock has consumed the linear part of the skb, then __skb_splice_bits: (a) incorrectly reports an error and (b) forgets to update the offset to account for the linear part Any of the two problems will cause the subsequent __skb_splice_bits call (the one that handles the frag_list skbs) to either skip data, or, if the unadjusted offset is greater then the size of the next skb in the frag_list, make tcp_splice_read loop forever. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: calculate tcp_mem based on low memory instead of all memoryMiquel van Smoorenburg2008-06-271-3/+6
| | | | | | | | | The tcp_mem array which contains limits on the total amount of memory used by TCP sockets is calculated based on nr_all_pages. On a 32 bits x86 system, we should base this on the number of lowmem pages. Signed-off-by: Miquel van Smoorenburg <miquels@cistron.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
* hamradio: remove unused variableAndre Haupt2008-06-271-2/+0
| | | | | Signed-off-by: Andre Haupt <andre@bitwigglers.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* mac80211: fix an oops in several failure paths in key allocationEmmanuel Grumbach2008-06-271-0/+9
| | | | | | | | | | | This patch fixes an oops in several failure paths in key allocation. This Oops occurs when freeing a key that has not been linked yet, so the key->sdata is not set. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* prism: islpci_eth.c endianness fixHarvey Harrison2008-06-271-1/+1
| | | | | | | | clock is already cpu-endian (see le32_to_cpu slightly before), so le64_to_cpu doesn't make much sense. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* rt2x00: Fix lock dependency errrorIvo van Doorn2008-06-273-15/+28
| | | | | | | | | | | | | | | | | | | This fixes a circular locking dependency in the workqueue handling. The interface work task uses the mac80211 function ieee80211_iterate_active_interfaces() which grabs the RTNL lock. However when the interface is brough down, this happens under the RTNL lock as well, this causes problems because mac80211 will flush the workqueue during the ifdown event. This causes mac80211 to wait until the driver has completed all work which can't finish because it is waiting on the RTNL lock. This is fixed by moving rt2x00 workqueue tasks on a different workqueue, this workqueue can be flushed when the ieee80211_hw structure is removed by the driver (when the driver is unloaded) which does not happen under the RTNL lock. Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* Merge branch 'master' of ↵David S. Miller2008-06-279-70/+115
|\ | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-2.6
| * iwlwifi: improve scanning band selection managementRon Rindjunsky2008-06-252-33/+39
| | | | | | | | | | | | | | | | | | | | This patch modifies the band selection management when scanning, so bands are now scanned according to HW band support. Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * rt2x00: Fix unbalanced mutex lockingIvo van Doorn2008-06-252-30/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The usb_cache_mutex was not correctly released under all circumstances. Both rt73usb as rt2500usb didn't release the mutex under certain conditions when the register access failed. Obviously such failure would lead to deadlocks. In addition under similar circumstances when the bbp register couldn't be read the value must be set to 0xff to indicate that the value is wrong. This too didn't happen under all circumstances. Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * b43legacy: Fix possible NULL pointer dereference in DMA codeMichael Buesch2008-06-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | This fixes a possible NULL pointer dereference in an error path of the DMA allocation error checking code. This is also necessary for a future DMA API change that is on its way into the mainline kernel that adds an additional dev parameter to dma_mapping_error(). Signed-off-by: Michael Buesch <mb@bu3sch.de> Cc: stable <stable@kernel.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * b43: Fix possible MMIO access while device is downMichael Buesch2008-06-251-0/+3
| | | | | | | | | | | | | | | | | | This fixes a possible MMIO access while the device is still down from a suspend cycle. MMIO accesses with the device powered down may cause crashes on certain devices. Signed-off-by: Michael Buesch <mb@bu3sch.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * b43legacy: Do not return TX_BUSY from op_txMichael Buesch2008-06-251-2/+4
| | | | | | | | | | | | | | | | | | Never return TX_BUSY from op_tx. It doesn't make sense to return TX_BUSY, if we can not transmit the packet. Drop the packet and return TX_OK. Signed-off-by: Michael Buesch <mb@bu3sch.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * b43: Do not return TX_BUSY from op_txMichael Buesch2008-06-251-4/+8
| | | | | | | | | | | | | | | | | | | | Never return TX_BUSY from op_tx. It doesn't make sense to return TX_BUSY, if we can not transmit the packet. Drop the packet and return TX_OK. This will fix the resume hang. Signed-off-by: Michael Buesch <mb@bu3sch.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * mac80211: implement EU regulatory domainTony Vroon2008-06-251-0/+18
| | | | | | | | | | | | | | | | | | | | | | Implement missing EU regulatory domain for mac80211. Based on the information in IEEE 802.11-2007 (specifically pages 1142, 1143 & 1148) and ETSI 301 893 (V1.4.1). With thanks to Johannes Berg. Signed-off-by: Tony Vroon <tony@linx.net> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* | Hold RTNL while calling dev_close()Ben Hutchings2008-06-271-0/+3
| | | | | | | | | | | | | | dev_close() must be called holding the RTNL. Compile-tested only. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* | qla3xxx: Hold RTNL while calling dev_close()Ben Hutchings2008-06-271-0/+2
| | | | | | | | | | | | | | dev_close() must be called holding the RTNL. Compile-tested only. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* | [netdrvr] Fix IOMMU overflow checking in s2io.cAndi Kleen2008-06-272-27/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | s2io has IOMMU overflow checking, but unfortunately it is wrong. It didn't use the standard macros, which meant that it only worked on POWER and SPARC because only those define DMA_ERROR_CODE. Convert it to use the standard macros instead. I also commented two more bugs in the IOMMU handling. It assumes that 0 DMA addresses cannot happen, but that's not true in all IOMMU setups. The information if a buffer has been already mapped needs to be stored elsewhere. Didn't fix those because it needs careful checking of the buffer handling by the maintainers. Cc: ram.vepa@neterion.com Cc: santosh.rastapur@neterion.com Cc: sivakumar.subramani@neterion.com Cc: sreenivasa.honnur@neterion.com Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* | e1000: only enable TSO6 via ethtool when using correct hardwareAndy Gospodarek2008-06-271-1/+1
| | | | | | | | | | | | | | | | | | When enabling TSO via ethool on e1000, it is possible to set NETIF_F_TSO6 on hardware that does not support it. Setting TSO via ethtool now matches the settings used when the hardware is probed. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* | e100: Do pci_dma_sync after skb_alloc for proper operation on ixp4xxKevin Hao2008-06-271-0/+2
| | | | | | | | | | | | | | The E100 device can't work on current kernel (2.6.26-rc6) and will cause kernel corruption on intel ixdp4xx. Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* | [netdrvr] netxen: fix netxen_pci_tbl[] breakageAl Viro2008-06-271-7/+11
| | | | | | | | | | | | | | | | | | | | PCI_DEVICE_CLASS sets .device and .vendor to PCI_ANY_DEV, which overrides the effect of preceding PCI_DEVICE() and makes all elements of netxen_pci_tbl[] identical. Introduced in the commit dcd56fdbaeae1008044687b973c4a3e852e8a726. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* | [netdrvr] 3c59x: remove irqs_disabled warning from local_bh_enableIngo Molnar2008-06-271-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Original Author: Michael Buesch <mb@bu3sch.de> net, vortex: fix lockup Ingo Molnar reported: -tip testing found that Johannes Berg's "softirq: remove irqs_disabled warning from local_bh_enable" enhancement to lockdep triggers a new warning on an old testbox that uses 3c59x vortex and netlogging: -----> calling vortex_init+0x0/0xb0 PCI: Found IRQ 10 for device 0000:00:0b.0 PCI: Sharing IRQ 10 with 0000:00:0a.0 PCI: Sharing IRQ 10 with 0000:00:0b.1 3c59x: Donald Becker and others. 0000:00:0b.0: 3Com PCI 3c556 Laptop Tornado at e0800400. PCI: Enabling bus mastering for device 0000:00:0b.0 initcall vortex_init+0x0/0xb0 returned 0 after 47 msecs ... calling init_netconsole+0x0/0x1b0 netconsole: local port 4444 netconsole: local IP 10.0.1.9 netconsole: interface eth0 netconsole: remote port 4444 netconsole: remote IP 10.0.1.16 netconsole: remote ethernet address 00:19:xx:xx:xx:xx netconsole: device eth0 not up yet, forcing it eth0: setting half-duplex. eth0: setting full-duplex. ------------[ cut here ]------------ WARNING: at kernel/softirq.c:137 local_bh_enable_ip+0xd1/0xe0() Pid: 1, comm: swapper Not tainted 2.6.26-rc6-tip #2091 [<c0125ecf>] warn_on_slowpath+0x4f/0x70 [<c0126834>] ? release_console_sem+0x1b4/0x1d0 [<c0126d00>] ? vprintk+0x2a0/0x450 [<c012fde5>] ? __mod_timer+0xa5/0xc0 [<c046f7fd>] ? mdio_sync+0x3d/0x50 [<c0160ef6>] ? marker_probe_cb+0x46/0xa0 [<c0126ed7>] ? printk+0x27/0x50 [<c046f4c3>] ? vortex_set_duplex+0x43/0xc0 [<c046f521>] ? vortex_set_duplex+0xa1/0xc0 [<c0471b92>] ? vortex_timer+0xe2/0x3e0 [<c012b361>] local_bh_enable_ip+0xd1/0xe0 [<c08d9f9f>] _spin_unlock_bh+0x2f/0x40 [<c0471b92>] vortex_timer+0xe2/0x3e0 [<c014743b>] ? trace_hardirqs_on+0xb/0x10 [<c0147358>] ? trace_hardirqs_on_caller+0x88/0x160 [<c012f8b2>] run_timer_softirq+0x162/0x1c0 [<c0471ab0>] ? vortex_timer+0x0/0x3e0 [<c012b361>] local_bh_enable_ip+0xd1/0xe0 [<c08d9f9f>] _spin_unlock_bh+0x2f/0x40 [<c0471b92>] vortex_timer+0xe2/0x3e0 [<c014743b>] ? trace_hardirqs_on+0xb/0x10 [<c0147358>] ? trace_hardirqs_on_caller+0x88/0x160 [<c012f8b2>] run_timer_softirq+0x162/0x1c0 [<c0471ab0>] ? vortex_timer+0x0/0x3e0 [<c0471ab0>] ? vortex_timer+0x0/0x3e0 [<c012b60a>] __do_softirq+0x9a/0x160 [<c012b570>] ? __do_softirq+0x0/0x160 [<c0106775>] call_on_stack+0x15/0x30 [<c012b4f5>] ? irq_exit+0x55/0x60 [<c0106e85>] ? do_IRQ+0x85/0xd0 [<c0147391>] ? trace_hardirqs_on_caller+0xc1/0x160 [<c0104888>] ? common_interrupt+0x28/0x30 [<c08d8ac8>] ? mutex_unlock+0x8/0x10 [<c08d8180>] ? _cond_resched+0x10/0x30 [<c07a3be7>] ? netpoll_setup+0x117/0x390 [<c0cbfcfe>] ? init_netconsole+0x14e/0x1b0 [<c013d539>] ? ktime_get+0x19/0x40 [<c0c9bab2>] ? kernel_init+0x1b2/0x2c0 [<c0cbfbb0>] ? init_netconsole+0x0/0x1b0 [<c0396aa4>] ? trace_hardirqs_on_thunk+0xc/0x10 [<c0103f12>] ? restore_nocheck_notrace+0x0/0xe [<c0c9b900>] ? kernel_init+0x0/0x2c0 [<c0c9b900>] ? kernel_init+0x0/0x2c0 [<c0104aa7>] ? kernel_thread_helper+0x7/0x10 ======================= ---[ end trace 37f9c502aff112e0 ]--- console [netcon0] enabled netconsole: network logging started initcall init_netconsole+0x0/0x1b0 returned 0 after 2914 msecs looking at the driver I think the bug is real and the fix actually is trivial. vp->lock is also taken in hardware IRQ context, so we _have_ to always use irqsafe locking. As we run in a timer with IRQs disabled, we can simply use spin_lock. Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* | ipg: use NULL, not zero, for pointersPekka Enberg2008-06-271-1/+1
| | | | | | | | | | | | | | | | Fixes a sparse warning in a code block that's hidden under JUMBO_FRAME #ifdef. Tested-by: Andrew Savchenko <Bircoph@list.ru> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* | ipg: fix jumbo frame compilationPekka Enberg2008-06-271-7/+7
| | | | | | | | | | | | | | | | | | Make jumbo frame support compile again. It was broken by the cleanup series before the merge because the code is hidden under JUMBO_FRAME #ifdef. Tested-by: Andrew Savchenko <Bircoph@list.ru> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* | drivers/net/r6040.c: Eliminate double sizeofJulia Lawall2008-06-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Taking sizeof the result of sizeof is quite strange and does not seem to be what is wanted here. This was fixed using the following semantic patch. (http://www.emn.fr/x-info/coccinelle/) // <smpl> @@ expression E; @@ - sizeof ( sizeof (E) - ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* | pcnet_cs, axnet_cs: clear bogus interrupt before request_irqKomuro2008-06-272-0/+5
| | | | | | | | | | Signed-off-by: Komuro <komurojun-mbn@nifty.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* | e1000e: fix EEH recovery during reset on PPCJeff Kirsher2008-06-271-1/+2
| | | | | | | | | | | | | | | | | | EEH is not recovering in a reasonable amount of time on PPC during e1000e_down(). Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* | igb: fix EEH recovery during reset on PPCJeff Kirsher2008-06-271-1/+2
| | | | | | | | | | | | | | | | | | EEH is not recovering in a reasonable amount of time on PPC during igb_down(). Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* | ixgbe: fix EEH recovery during reset on PPCPaul Larson2008-06-271-1/+2
| | | | | | | | | | | | | | | | | | | | EEh is not recovering in a resonable amount of time on PPC during ixgbe_down(). Signed-off-by: Paul Larson <pl@us.ibm.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* | tc35815: Fix receiver hangup on Rx FIFO overflowAtsushi Nemoto2008-06-271-1/+1
| | | | | | | | | | | | | | | | | | | | On Rx FIFO overflow error, the controller consume a buffer descriptor but currently the driver does not give it back to the controller. This results unrecoverable 'Buffer List Exhausted' condition. This patch fix this problem by moving a "fbl_count--" line to proper place. Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* | tc35815: Mark carrier-off before starting PHYAtsushi Nemoto2008-06-271-0/+2
| | | | | | | | | | | | | | | | Call netif_carrier_off() before starting PHY device. This is a behavior before converting to generic PHY layer. Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* | s2io: fix documentation about intr_typeMichal Schmidt2008-06-271-3/+3
|/ | | | | | | | | | | The documentation for intr_type module parameter of the s2io driver is not consistent with the code. The comments in drivers/net/s2io.c are OK, but Documentation/networking/s2io.txt is wrong. Pointed out by Andrew Hecox. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* netfilter: ip6table_mangle: don't reroute in LOCAL_INPatrick McHardy2008-06-241-1/+1
| | | | | | | | | | Rerouting should only happen in LOCAL_OUT, in INPUT its useless since the packet has already chosen its final destination. Noticed by Alexey Dobriyan <adobriyan@gmail.com>. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: Don't receive new packets in a dead network namespace.Eric W. Biederman2008-06-203-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Alexey Dobriyan <adobriyan@gmail.com> writes: > Subject: ICMP sockets destruction vs ICMP packets oops > After icmp_sk_exit() nuked ICMP sockets, we get an interrupt. > icmp_reply() wants ICMP socket. > > Steps to reproduce: > > launch shell in new netns > move real NIC to netns > setup routing > ping -i 0 > exit from shell > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 > IP: [<ffffffff803fce17>] icmp_sk+0x17/0x30 > PGD 17f3cd067 PUD 17f3ce067 PMD 0 > Oops: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC > CPU 0 > Modules linked in: usblp usbcore > Pid: 0, comm: swapper Not tainted 2.6.26-rc6-netns-ct #4 > RIP: 0010:[<ffffffff803fce17>] [<ffffffff803fce17>] icmp_sk+0x17/0x30 > RSP: 0018:ffffffff8057fc30 EFLAGS: 00010286 > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff81017c7db900 > RDX: 0000000000000034 RSI: ffff81017c7db900 RDI: ffff81017dc41800 > RBP: ffffffff8057fc40 R08: 0000000000000001 R09: 000000000000a815 > R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff8057fd28 > R13: ffffffff8057fd00 R14: ffff81017c7db938 R15: ffff81017dc41800 > FS: 0000000000000000(0000) GS:ffffffff80525000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > CR2: 0000000000000000 CR3: 000000017fcda000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process swapper (pid: 0, threadinfo ffffffff8053a000, task ffffffff804fa4a0) > Stack: 0000000000000000 ffff81017c7db900 ffffffff8057fcf0 ffffffff803fcfe4 > ffffffff804faa38 0000000000000246 0000000000005a40 0000000000000246 > 000000000001ffff ffff81017dd68dc0 0000000000005a40 0000000055342436 > Call Trace: > <IRQ> [<ffffffff803fcfe4>] icmp_reply+0x44/0x1e0 > [<ffffffff803d3a0a>] ? ip_route_input+0x23a/0x1360 > [<ffffffff803fd645>] icmp_echo+0x65/0x70 > [<ffffffff803fd300>] icmp_rcv+0x180/0x1b0 > [<ffffffff803d6d84>] ip_local_deliver+0xf4/0x1f0 > [<ffffffff803d71bb>] ip_rcv+0x33b/0x650 > [<ffffffff803bb16a>] netif_receive_skb+0x27a/0x340 > [<ffffffff803be57d>] process_backlog+0x9d/0x100 > [<ffffffff803bdd4d>] net_rx_action+0x18d/0x250 > [<ffffffff80237be5>] __do_softirq+0x75/0x100 > [<ffffffff8020c97c>] call_softirq+0x1c/0x30 > [<ffffffff8020f085>] do_softirq+0x65/0xa0 > [<ffffffff80237af7>] irq_exit+0x97/0xa0 > [<ffffffff8020f198>] do_IRQ+0xa8/0x130 > [<ffffffff80212ee0>] ? mwait_idle+0x0/0x60 > [<ffffffff8020bc46>] ret_from_intr+0x0/0xf > <EOI> [<ffffffff80212f2c>] ? mwait_idle+0x4c/0x60 > [<ffffffff80212f23>] ? mwait_idle+0x43/0x60 > [<ffffffff8020a217>] ? cpu_idle+0x57/0xa0 > [<ffffffff8040f380>] ? rest_init+0x70/0x80 > Code: 10 5b 41 5c 41 5d 41 5e c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 > 48 83 ec 08 48 8b 9f 78 01 00 00 e8 2b c7 f1 ff 89 c0 <48> 8b 04 c3 48 83 c4 08 > 5b c9 c3 66 66 66 66 66 2e 0f 1f 84 00 > RIP [<ffffffff803fce17>] icmp_sk+0x17/0x30 > RSP <ffffffff8057fc30> > CR2: 0000000000000000 > ---[ end trace ea161157b76b33e8 ]--- > Kernel panic - not syncing: Aiee, killing interrupt handler! Receiving packets while we are cleaning up a network namespace is a racy proposition. It is possible when the packet arrives that we have removed some but not all of the state we need to fully process it. We have the choice of either playing wack-a-mole with the cleanup routines or simply dropping packets when we don't have a network namespace to handle them. Since the check looks inexpensive in netif_receive_skb let's just drop the incoming packets. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: Make sure N * sizeof(union sctp_addr) does not overflow.David S. Miller2008-06-201-1/+3
| | | | | | | | | | As noticed by Gabriel Campana, the kmalloc() length arg passed in by sctp_getsockopt_local_addrs_old() can overflow if ->addr_num is large enough. Therefore, enforce an appropriate limit. Signed-off-by: David S. Miller <davem@davemloft.net>
* pppoe: warning fixStephen Hemminger2008-06-201-1/+1
| | | | | | | | | | Fix warning: drivers/net/pppoe.c: In function 'pppoe_recvmsg': drivers/net/pppoe.c:945: warning: comparison of distinct pointer types lacks a cast because skb->len is unsigned int and total_len is size_t Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Drop packets for loopback address from outside of the box.YOSHIFUJI Hideaki2008-06-192-0/+15
| | | | | | | | | [ Based upon original report and patch by Karsten Keil. Karsten has verified that this fixes the TAHI test case "ICMPv6 test v6LC.5.1.2 Part F". -DaveM ] Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Remove options header when setsockopt's optlen is 0Shan Wei2008-06-191-4/+7
| | | | | | | | | | | | Remove the sticky Hop-by-Hop options header by calling setsockopt() for IPV6_HOPOPTS with a zero option length, per RFC3542. Routing header and Destination options header does the same as Hop-by-Hop options header. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* mac80211: detect driver tx bugsJohannes Berg2008-06-181-1/+8
| | | | | | | | | | When a driver rejects a frame in it's ->tx() callback, it must also stop queues, otherwise mac80211 can go into a loop here. Detect this situation and abort the loop after five retries, warning about the driver bug. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: genl: fix circular lockingPatrick McHardy2008-06-181-9/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | genetlink has a circular locking dependency when dumping the registered families: - dump start: genl_rcv() : take genl_mutex genl_rcv_msg() : call netlink_dump_start() while holding genl_mutex netlink_dump_start(), netlink_dump() : take nlk->cb_mutex ctrl_dumpfamily() : try to detect this case and not take genl_mutex a second time - dump continuance: netlink_rcv() : call netlink_dump netlink_dump : take nlk->cb_mutex ctrl_dumpfamily() : take genl_mutex Register genl_lock as callback mutex with netlink to fix this. This slightly widens an already existing module unload race, the genl ops used during the dump might go away when the module is unloaded. Thomas Graf is working on a seperate fix for this. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* Revert "mac80211: Use skb_header_cloned() on TX path."David S. Miller2008-06-181-2/+2
| | | | | | | | | | | | | | | | | | | | This reverts commit 608961a5eca8d3c6bd07172febc27b5559408c5d. The problem is that the mac80211 stack not only needs to be able to muck with the link-level headers, it also might need to mangle all of the packet data if doing sw wireless encryption. This fixes kernel bugzilla #10903. Thanks to Didier Raboud (for the bugzilla report), Andrew Prince (for bisecting), Johannes Berg (for bringing this bisection analysis to my attention), and Ilpo (for trying to analyze this purely from the TCP side). In 2.6.27 we can take another stab at this, by using something like skb_cow_data() when the TX path of mac80211 ends up with a non-NULL tx->key. The ESP protocol code in the IPSEC stack can be used as a model for implementation. Signed-off-by: David S. Miller <davem@davemloft.net>
* af_unix: fix 'poll for write'/ connected DGRAM socketsRainer Weikusat2008-06-171-9/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The unix_dgram_sendmsg routine implements a (somewhat crude) form of receiver-imposed flow control by comparing the length of the receive queue of the 'peer socket' with the max_ack_backlog value stored in the corresponding sock structure, either blocking the thread which caused the send-routine to be called or returning EAGAIN. This routine is used by both SOCK_DGRAM and SOCK_SEQPACKET sockets. The poll-implementation for these socket types is datagram_poll from core/datagram.c. A socket is deemed to be writeable by this routine when the memory presently consumed by datagrams owned by it is less than the configured socket send buffer size. This is always wrong for connected PF_UNIX non-stream sockets when the abovementioned receive queue is currently considered to be full. 'poll' will then return, indicating that the socket is writeable, but a subsequent write result in EAGAIN, effectively causing an (usual) application to 'poll for writeability by repeated send request with O_NONBLOCK set' until it has consumed its time quantum. The change below uses a suitably modified variant of the datagram_poll routines for both type of PF_UNIX sockets, which tests if the recv-queue of the peer a socket is connected to is presently considered to be 'full' as part of the 'is this socket writeable'-checking code. The socket being polled is additionally put onto the peer_wait wait queue associated with its peer, because the unix_dgram_sendmsg routine does a wake up on this queue after a datagram was received and the 'other wakeup call' is done implicitly as part of skb destruction, meaning, a process blocked in poll because of a full peer receive queue could otherwise sleep forever if no datagram owned by its socket was already sitting on this queue. Among this change is a small (inline) helper routine named 'unix_recvq_full', which consolidates the actual testing code (in three different places) into a single location. Signed-off-by: Rainer Weikusat <rweikusat@mssgmbh.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'davem-fixes' of ↵David S. Miller2008-06-1711-242/+189
|\ | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
| * atl1: relax eeprom mac address error checkRadu Cristescu2008-06-171-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The atl1 driver tries to determine the MAC address thusly: - If an EEPROM exists, read the MAC address from EEPROM and validate it. - If an EEPROM doesn't exist, try to read a MAC address from SPI flash. - If that fails, try to read a MAC address directly from the MAC Station Address register. - If that fails, assign a random MAC address provided by the kernel. We now have a report of a system fitted with an EEPROM containing all zeros where we expect the MAC address to be, and we currently handle this as an error condition. Turns out, on this system the BIOS writes a valid MAC address to the NIC's MAC Station Address register, but we never try to read it because we return an error when we find the all- zeros address in EEPROM. This patch relaxes the error check and continues looking for a MAC address even if it finds an illegal one in EEPROM. Signed-off-by: Radu Cristescu <advantis@gmx.net> Signed-off-by: Jay Cliburn <jacliburn@bellsouth.net> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
| * net/enc28j60: low power modeDavid Brownell2008-06-171-24/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keep enc28j60 chips in low-power mode when they're not in use. At typically 120 mA, these chips run hot even when idle; this low power mode cuts that power usage by a factor of around 100. This version provides a generic routine to poll a register until its masked value equals some value ... e.g. bit set or cleared. It's basically what the previous wait_phy_ready() did, but this version is generalized to support the handshaking needed to enter and exit low power mode. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Claudio Lanconelli <lanconelli.claudio@eptar.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>