rwatson's Network Stack TODO
Here are some of the things that have ended up on my todo list, kept here so that I don't forget them. And, in the hopefully unlikely event I am run over by a bus, perhaps someone else will find them useful.
Perforce Branches Relating to Networking
- rwatson_netisr - netisr2 implementation using per-CPU netisr threads and IP-layer demux
- rwatson_tcp - Beginnings of implementation of CONN-L netinet locking.
- rwatson_ethercons - Ethernet console support
- rwatson_resock - coalesce two socket-layer mutexes (and should also coalesce sx locks) (for performance comparison purposes)
- rwatson_resock_vertical - coalesce pcb-layer mutex and socket layer mutex (for performance comparison purposes)
- rwatson_socleanup - various socket cleanups over time
To Do
- Generally revisit if_flags, if_drvflags. Look especially hard at the IFF_LINK* flags, which have quite variable semantics and may need to be broken out themselves, as they are sometimes driver flags (and sometimes not).
- General review of struct ifnet locking, ifnet API for device drivers, ifnet locking.
- TCPDEBUG appears not to have been updated for TCP timewait support, and therefore may return incorrect data. It needs to be updated.
- so_upcall needs to be broken out from a single upcall with ambiguous locking into several upcalls each with carefully documented locaking and semantics
- Review accept filter locking following so_upcall cleanup.
- Continue exploring socket locking strategy changes and vertical protocol lock integration in rwatson_resock and rwatson_resock_virtual.
- For each protosw switch entry, decide whether thread is needed, or if ucred would be sufficient. Where possible, pass only a cred.
- Investigate moving to optimistic locking of accept mutex to avoid frequent accept mutex acquires when not needed.
- protosw(9) man page for protocol switch APIs.
- Revisit multi-mbuf allocation, multi-mbuf input.
- Investigate solutions to inpcb tear-down races involving timers with tcp -- right now, timers are stopped but not drained, so timers can race with tear-down, hence NULL checks in timer code.
- Revisit locking and atomicity for uipc_send(), and in particular, with respect to what happens if two threads simultaneously call sendto() on different addresses with the same datagram socket. Currently this is likely prevented by sblock() at the system call layer, but races with connect() aren't, as was shown by the recent race case along the same lines. Somehow this needs to be serialized.
- inpcb freeing support so that pcb's can be freed after a load spike, requires moving away from weak consistency sysctl monitoring.
- Continue pushing of control of state changes from socket layer to protocol layer so as to improve inter-layer consistency and reduce races between layers on socket state transitions (connect/disconnect/connecting/disconnecting/...)
- Analysis of listen state transitions, which appear poorly defined. Consider in particular interactions with kqueue, where the order of registering kqueue events vs. calling listen affects the semantics considerably.
- Remove all remaining IFF_NEEDSGIANT network interface drivers.
- Remove IFF_NEEDSGIANT.
- Explore increasing netisr queue depth due to gallatin's report that overflows are causing TCP problems with 10gbps interfaces.
In Progress
- (IN PROGRESS) More ifaddr, protocol addr locking.
- (IN PROGRESS) Improve the netisr dispatch to allow batchs of packets to be removed at a time by the netisr thread; explore adding a 'running' bit and allowing threads performing direct dispatch to first process pending queued work.
- (IN PROGRESS) Explore disabling preemption of the netisr thread, which is a property of using the ithread mechanism for swi's, and leads to excess context switching for loopback traffic.
Done
- (DONE - rwatson) Add inpcb reference counting so that the pcbinfo lock in tcp_input can be dropped, allowing greater concurrency.
- (DONE - rwatson) Migrate TCP to using read locking where possible (and especially for tcbinfo).
- (DONE - kmacy) if_start needs to become if_startmbuf so that device drivers can either use central queue routines, or use their own queues (such as multi-queue for some ethernet devices, ppp, etc, or no queue for layering such as if_vlan). This will require adding a queue/priority field to the mbuf header, and updating current consumers of multiple queues between link and driver code, such as if_sl, 802.11, etc. Note: this is now named if_transmit.
- (DONE - rwatson, csjp) Implement and merge zero-copy BPF buffers.
- (DONE - rwatson) Remove NET_NEEDS_GIANT.
- (Done - rwatson) Remove netatm from build due to not being MPSAFE.
- (DONE - rwatson) Enable parallel UDP input processing up to socket layer through use of reading locking.
- (DONE - rwatson) Create and use datagram-specific soreceive and sosend routines for UDP avoiding overhead.
- (DONE - rwatson) Enable parallel UDP transmit on the same socket from multiple threads down to the ifq layer through use of read locking.
- (DONE - rwatson) Convert incpb and inpcbinfo mutexes to rwlocks.
- (DONE - ups) The mbuma mbuf allocator fails to drain the mbuf+cluster cache back into component caches when resource limits are approached.
- (DONE) Revisit UNIX domain socket fine-grained locking in rwatson_proto and decide what to do about it. Proper micro-benchmarking.
- (DONE) sosend/soreceive/sopoll cleanup -- all consumers call sosend/soreceive, and protocols use sosend_generic/soreceive_generic/sopoll_generic (renamed current ones) -- sosend/soreceive/sopoll become simply pru_sosend/pru_soreceive/pru_sopoll wrappers.
- (DONE) Move socket buffer utility routines from uipc_socket2.c to uipc_sockbuf.c via repo-copy.
- (DONE) socket(9) man page for kernel socket APIs.
- (DONE) Trim aging mbuf.h types.
- (DONE) Create sbdestroy() with a simplified and cleaned up set of socket buffer cleanups, avoiding calls to locking and sorflush().
- (DONE) Review and clean up sorflush(). Investigate reducing overhead, as this is invoked by a number of apps when they shutdown() a socket for read.
- (DONE) Review Intel TCP offload patches.
- (DONE - bms) SO_NOSIGPIPE appears to be broken again.
- (DONE) Move socket buffer functions to uipc_sockbuf.c from uipc_socket2.c, and other functions to uipc_socket.c.
- (DONE) Fix sb_lock() per Isilon bug report, MFC to 6.x.
- (DONE - bz) Remove i4b as non-MPSAFE
- (DONE - maxim) Remove bg_h4 as non-MPSAFE