Windows Packet Filter causes file transfer on shares to reduce a lot?

Home Forums Discussions General Windows Packet Filter causes file transfer on shares to reduce a lot?

Viewing 11 posts - 16 through 26 (of 26 total)
  • Author
    Posts
  • #11579
    brad.r.hodge
    Participant

      So i tried it with multiple systems with different CPUs, baremetal and VM, and the results are the same.

      Running the SNI inspector/Dnstrace x64 versions on a win10x64 will reduce the file transfer speed through shares around 50%

      for example 70MB/s -> 30MB/s.

      Is there any fix available for this? or there is no solution?

      Although i should mention that on some high end CPUs such as i7 7700k the reduction was around 10-15%, but most of average customers don’t have these so we have to assume the worst case scenario where they have an average or low end CPU.

      #11580
      brad.r.hodge
      Participant

        Also i would be grateful if you can test this yourself on different systems as well, specially with low end and average CPUs and check the results.

        #11584
        Vadim Smirnov
        Keymaster

          I have tested 8 years old Core i3-3217U (I don’t have anything slower with Windows installed) sending file to another machine over SMB with and without dnstrace running. Here are the results:

          Core i3-3217U Test results

          You can notice some slow down (8-9%) but it is not close to the 50% throughput reduction you have reported. What was the bottleneck in your tests?

          #11585
          Vadim Smirnov
          Keymaster

            P.S. An example, when I have tested the same machine but the target file was located on the HDD (on the screenshot above the file is on the SSD) I have had about 3x-4x slower throughput with 100% HDD load.

            #11586
            brad.r.hodge
            Participant

              This is the result on a i7-2600 @ 3.4GHz Win10x64:

              100MB/s -> 40MB/s

              CPU: 52%
              Memory: 30%
              Disk: 10%

              So i dont know which one is the bottleneck, but when i turn off the dnstrace it goes back to 100MB/s and CPU usage goes to 40% but thats it..

              Everything is latest version, i even used the 64 bit compiled tools from the website and didn’t compile them myself to make sure nothing is wrong.

              #11588
              Vadim Smirnov
              Keymaster

                This is the result on a i7-2600 @ 3.4GHz Win10x64:

                100MB/s -> 40MB/s

                CPU: 52%
                Memory: 30%
                Disk: 10%

                The test system was a receiver, right?

                In my test above I’ve been sending the file from the test system. When I have changed the direction, I have experienced more noticeable throughput degradation.

                What is important here is that in both cases it was a maximum performance achievable by single threaded dnstrace application (Resource Monitor showed 25% CPU load over 4 vCPU). This is the bottleneck… Inbound packet injection is more expensive than outbound and this explains the performance/throughput difference for inbound/outbound traffic I experience on i3-3217U. On the other hand Ryzen 7 4800H single-threaded performance is good enough to not to have any throughput degradation at all regardless of the traffic direction.

                Worth to note that Fast I/O won’t be of much help here, it was primarily designed for the customer who uses the driver in the trading platform and needed a fastest possible way to fetch the packet from the network to the application bypassing Windows TCP/IP stack.

                First idea to consider is improve dnstrace performance by splitting its operations over two threads, e.g. one thread to read packets from the driver and second thread to re-inject them back.

                I also think some optimization is possible for the packet re-injection either. E.g. scaling packet re-injection over all available vCPUs in the kernel. Though, it is not that easy as it sounds, breaking packet order in the TCP connection may result re-transmit and other undesired behavior. So, maybe adding Fast I/O for re-injection could be a better choice (currently packets are re-injected in the context of dnstrace, in case of Fast I/O they would be re-injected from the kernel thread).

                #11589
                brad.r.hodge
                Participant

                  Yes, in my case i was receiving the file from a remote server.

                  First idea to consider is improve dnstrace performance by splitting its operations over two threads, e.g. one thread to read packets from the driver and second thread to re-inject then back.

                  Are you sure this will help much? because in this way i guess we have to have a seperate link list for the packets that are needed to get re inject, and the reader thread needs to insert every received packet into this, which i don’t see why would this improve the performance much. seems like its mostly the same as having it all in one thread?

                  I think we need to focus on SMB to solve this issue, i found an OSR thread here:

                  https://community.osr.com/discussion/290695/wfp-callout-driver-layer2-filtering

                  Which has a important part in it:

                  However, I am encapsulating packets and needed the ability to be able to create NBL chains in order to improve performance when dealing with large file transfers and the like (i.e. typically for every 1 packet during an SMB file transfer one needs to generate at least 2 packets per 1 original packet because of MTU issues)

                  Thoughts?

                  #11590
                  Vadim Smirnov
                  Keymaster

                    Here is the CPU breakdown of SMB download:

                    Function Name Total CPU [unit, %] Self CPU [unit, %] Module Category
                    |||||| – CNdisApi::SendPacketsToMstcp 2858 (56.58%) 3 (0.06%) dnstrace.exe IO | Kernel
                    |||||| – CNdisApi::SendPacketsToAdapter 1495 (29.60%) 2 (0.04%) dnstrace.exe IO | Kernel
                    |||||| – CNdisApi::ReadPackets 349 (6.91%) 6 (0.12%) dnstrace.exe IO | Kernel

                    As you may notice splitting reading and re-injection does not make much sense, but splitting SendPacketsToMstcp and SendPacketsToAdapter over two threads definitely will have an effect.

                    I can’t see how the OSR post can be related, the author problem is about repackaging packets due to the reduced MTU.

                    #11591
                    Vadim Smirnov
                    Keymaster

                      I think 3 threads are good to go:

                      1. ReadPackets thread which forms re-injection lists, signals re-inject threads and waits the re-inject to complete or even better proceeds to read using secondary buffers set
                      2. SendPacketsToMstcp thread waits for ReadPackets signal, re-injects, notifies ReadPackets thread and returns to wait
                      3. SendPacketsToAdapter thread waits for ReadPackets signal, re-injects, notifies ReadPackets thread and returns to wait
                      #11592
                      Vadim Smirnov
                      Keymaster

                        P.S. BTW, if you don’t need the SMB traffic to be processed in user mode then you could load the filter into the driver to pass it over without redirection.

                        #11828
                        Vadim Smirnov
                        Keymaster

                          P.P.S. I have performed some research and significantly improved packets re-injection performance in v3.2.31. Thanks for reporting this. By the way building driver with Jumbo frames support could improve the performance over 1 Gbps wire even further.

                        Viewing 11 posts - 16 through 26 (of 26 total)
                        • You must be logged in to reply to this topic.