c++ - Is it possible to use the hardware de-multiplexing for highload network servers? -
for example, asynchronous io using tcp/ip (using posix poll/select or more advanced epoll, kqueue, poll_set, iocp), network driver starts interruption in different (hardware demultiplexer) cpu-cores, receives messages , dump them single (multiplexer) buffer @ kernel level. then, our thread-acceptor using epoll / kqueue / poll_set / iocp receives single buffer list of descriptors of sockets of messages came , again scatters (demultiplexer) across threads (in thread-pool) running on different cpu-cores.
in short scheme looks like: hardware interruption (hardware demultiplexor) -> network driver in kernel space (multiplexor) -> user's acceptor in user space using epoll / kqueue / poll_set / iocp (demultiplexor)
is not easier , faster, rid of last 2 links, , use "hardware demultiplexor"?
an example. if network packet arrives, network card interrupt cpu. on systems today, these interrupts distributed across cores. i.e. work hardware demultiplexer. after receiving such interruption, can process network's message , wait next interrupt. work demultiplexing done @ level of hardware, using cpu interrupt.
in cortex-a5 mpcore: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0434b/cchdbebe.html
is feasible approach in of linux, in real-time *nix such qnx, , there public projects approach used, may ngnix?
update:
simple answer question - yes can use hardware demultiplexing using /proc/irq/<n>/smp_affinity
: http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux
but second notice - not such thing, because different part of 1 packet can handled different cores, , can take time cache synchronization (l1(corex)->l3->l1(corey)) cache coherency: http://www.alexonlinux.com/why-interrupt-affinity-with-multiple-cores-is-not-such-a-good-thing
solutions:
- hard-bind different ethernet adapters(its irqs) different single cpu-cores
- use large packets , small messages, when packet contain whole message completely
question: may there better solutions, example using soft-irq (without hardware-irq) when recieve batch of network packets network adapter manualy, there?
you ask rather broad question.
... rid of last 2 links, , use "hardware demultiplexor"?
from description understand want hardware provide received data user's application. isn't it? can achieved rdma.
hardware (network card) can provide received data in pre-allocated buffer w/o cpu being involved in procedure.
i elaborate, i'm not sure direction asking about.
Comments
Post a Comment