Windows Packet Filter causes file transfer on shares to reduce a lot?

Home Forums Discussions General Discussion Windows Packet Filter causes file transfer on shares to reduce a lot?

Viewing 10 posts - 16 through 25 (of 25 total)
  • Author
    Posts
  • #11579
    brad.r.hodge
    Participant

    So i tried it with multiple systems with different CPUs, baremetal and VM, and the results are the same.

    Running the SNI inspector/Dnstrace x64 versions on a win10x64 will reduce the file transfer speed through shares around 50%

    for example 70MB/s -> 30MB/s.

    Is there any fix available for this? or there is no solution?

    Although i should mention that on some high end CPUs such as i7 7700k the reduction was around 10-15%, but most of average customers don’t have these so we have to assume the worst case scenario where they have an average or low end CPU.

    #11580
    brad.r.hodge
    Participant

    Also i would be grateful if you can test this yourself on different systems as well, specially with low end and average CPUs and check the results.

    #11584
    Vadim Smirnov
    Moderator

    I have tested 8 years old Core i3-3217U (I don’t have anything slower with Windows installed) sending file to another machine over SMB with and without dnstrace running. Here are the results:

    Core i3-3217U Test results

    You can notice some slow down (8-9%) but it is not close to the 50% throughput reduction you have reported. What was the bottleneck in your tests?

    #11585
    Vadim Smirnov
    Moderator

    P.S. An example, when I have tested the same machine but the target file was located on the HDD (on the screenshot above the file is on the SSD) I have had about 3x-4x slower throughput with 100% HDD load.

    #11586
    brad.r.hodge
    Participant

    This is the result on a i7-2600 @ 3.4GHz Win10x64:

    100MB/s -> 40MB/s

    CPU: 52%
    Memory: 30%
    Disk: 10%

    So i dont know which one is the bottleneck, but when i turn off the dnstrace it goes back to 100MB/s and CPU usage goes to 40% but thats it..

    Everything is latest version, i even used the 64 bit compiled tools from the website and didn’t compile them myself to make sure nothing is wrong.

    #11588
    Vadim Smirnov
    Moderator

    This is the result on a i7-2600 @ 3.4GHz Win10x64:

    100MB/s -> 40MB/s

    CPU: 52%
    Memory: 30%
    Disk: 10%

    The test system was a receiver, right?

    In my test above I’ve been sending the file from the test system. When I have changed the direction, I have experienced more noticeable throughput degradation.

    What is important here is that in both cases it was a maximum performance achievable by single threaded dnstrace application (Resource Monitor showed 25% CPU load over 4 vCPU). This is the bottleneck… Inbound packet injection is more expensive than outbound and this explains the performance/throughput difference for inbound/outbound traffic I experience on i3-3217U. On the other hand Ryzen 7 4800H single-threaded performance is good enough to not to have any throughput degradation at all regardless of the traffic direction.

    Worth to note that Fast I/O won’t be of much help here, it was primarily designed for the customer who uses the driver in the trading platform and needed a fastest possible way to fetch the packet from the network to the application bypassing Windows TCP/IP stack.

    First idea to consider is improve dnstrace performance by splitting its operations over two threads, e.g. one thread to read packets from the driver and second thread to re-inject them back.

    I also think some optimization is possible for the packet re-injection either. E.g. scaling packet re-injection over all available vCPUs in the kernel. Though, it is not that easy as it sounds, breaking packet order in the TCP connection may result re-transmit and other undesired behavior. So, maybe adding Fast I/O for re-injection could be a better choice (currently packets are re-injected in the context of dnstrace, in case of Fast I/O they would be re-injected from the kernel thread).

    #11589
    brad.r.hodge
    Participant

    Yes, in my case i was receiving the file from a remote server.

    First idea to consider is improve dnstrace performance by splitting its operations over two threads, e.g. one thread to read packets from the driver and second thread to re-inject then back.

    Are you sure this will help much? because in this way i guess we have to have a seperate link list for the packets that are needed to get re inject, and the reader thread needs to insert every received packet into this, which i don’t see why would this improve the performance much. seems like its mostly the same as having it all in one thread?

    I think we need to focus on SMB to solve this issue, i found an OSR thread here:

    https://community.osr.com/discussion/290695/wfp-callout-driver-layer2-filtering

    Which has a important part in it:

    However, I am encapsulating packets and needed the ability to be able to create NBL chains in order to improve performance when dealing with large file transfers and the like (i.e. typically for every 1 packet during an SMB file transfer one needs to generate at least 2 packets per 1 original packet because of MTU issues)

    Thoughts?

    #11590
    Vadim Smirnov
    Moderator

    Here is the CPU breakdown of SMB download:

    Function Name Total CPU [unit, %] Self CPU [unit, %] Module Category
    |||||| – CNdisApi::SendPacketsToMstcp 2858 (56.58%) 3 (0.06%) dnstrace.exe IO | Kernel
    |||||| – CNdisApi::SendPacketsToAdapter 1495 (29.60%) 2 (0.04%) dnstrace.exe IO | Kernel
    |||||| – CNdisApi::ReadPackets 349 (6.91%) 6 (0.12%) dnstrace.exe IO | Kernel

    As you may notice splitting reading and re-injection does not make much sense, but splitting SendPacketsToMstcp and SendPacketsToAdapter over two threads definitely will have an effect.

    I can’t see how the OSR post can be related, the author problem is about repackaging packets due to the reduced MTU.

    #11591
    Vadim Smirnov
    Moderator

    I think 3 threads are good to go:

    1. ReadPackets thread which forms re-injection lists, signals re-inject threads and waits the re-inject to complete or even better proceeds to read using secondary buffers set
    2. SendPacketsToMstcp thread waits for ReadPackets signal, re-injects, notifies ReadPackets thread and returns to wait
    3. SendPacketsToAdapter thread waits for ReadPackets signal, re-injects, notifies ReadPackets thread and returns to wait
    #11592
    Vadim Smirnov
    Moderator

    P.S. BTW, if you don’t need the SMB traffic to be processed in user mode then you could load the filter into the driver to pass it over without redirection.

Viewing 10 posts - 16 through 25 (of 25 total)
  • You must be logged in to reply to this topic.