Vadim Smirnov

Forum Replies Created

Viewing 15 posts - 466 through 480 (of 1,496 total)
  • Author
    Posts
  • in reply to: Wiresock QR feature #11797
    Vadim Smirnov
    Keymaster

      Hi and thank you for the feedback!

      Currently released version of wg-quick-config can’t show the QR code for the specified configuration. However, it is very easy to fix. Here is an updated wg-quick-config binaries with an extra command line parameter qrcode.

      Example: wg-quick-config -qrcode 1 should show QR code for the first existing configuration. Please note that this command line parameter is not compatible with other ones. I will add this option (maybe in a slightly different form) to the next Wiresock update.

      in reply to: WireSock existing configuration location #11795
      Vadim Smirnov
      Keymaster

        Configuration file is named config.json and it is stored in the folder from which you executed wg-quick-config for the first time as well as server and clients configurations. In you case if server and client configs are in the System32 folder then just find the config.json there and delete it.

        Vadim Smirnov
        Keymaster

          There are actually two ways to filter with ProcessID or ProcessName:

          • The easy one is IP Helper API. You can find the details in process_lookup.h
          • More complicated one is creating WFP callout driver to track network connections (sockets) creation/termination

          The LSP is deprecated and I’m not sure if it is supported on Windows 10.

          in reply to: is MTU decrement just for outbound package #11782
          Vadim Smirnov
          Keymaster

            As for the TCP MSS option you can check CsnatDlg::CheckMTUCorrelation in snatDlg.cpp

            I don’t have an open source sample using ICMP fragmentation needed option, but if packet size exceeds MTU and DF flag is set then you can use the function below to convert it to ICMP type 3 code 4 (“fragmentation needed but don’t fragment set”) and forward back to the host.

            void convert_to_icmp_unreachable(INTERMEDIATE_BUFFER& buffer) const
            {
            	auto* eth_header = reinterpret_cast<ether_header_ptr>(buffer.m_IBuffer);
            	auto* ip_header = reinterpret_cast<iphdr_ptr>(buffer.m_IBuffer + ETHER_HEADER_LENGTH);
            
            	// 1. Copy IP header and 8 bytes of payload after icmp header
            	auto* const next_header = reinterpret_cast<PCHAR>(ip_header) + sizeof(DWORD) * ip_header->ip_hl;
            	const auto payload_length = static_cast<unsigned short>(next_header - reinterpret_cast<char*>(ip_header) + 8);
            	memmove(
            		reinterpret_cast<char*>(eth_header) + ETHER_HEADER_LENGTH + sizeof(iphdr) + sizeof(icmphdr),
            		ip_header,
            		payload_length
            	);
            
            	// 2. Swap MAC addresses
            	std::swap(eth_header->h_dest, eth_header->h_source);
            
            	// 3. Swap IP addresses
            	std::swap(ip_header->ip_dst, ip_header->ip_src);
            
            	// 4. Initialize IP header
            	ip_header->ip_hl = 5;
            	ip_header->ip_v = 4;
            	ip_header->ip_tos = 0;
            	ip_header->ip_len = htons(sizeof(iphdr) + sizeof(icmphdr) + payload_length);
            	ip_header->ip_off = htons(IP_DF);
            	ip_header->ip_ttl = 30;
            	ip_header->ip_p = IPPROTO_ICMP;
            
            	// 5. Initialize ICMP header
            	auto* const icmp_header = reinterpret_cast<icmphdr_ptr>(ip_header + 1);
            	icmp_header->type = 3;
            	icmp_header->code = 4;
            	icmp_header->seq = htons(config_.default_adapter->get_mtu());
            
            	// Recalculate checksum
            	RecalculateICMPChecksum(&buffer);
            	RecalculateIPChecksum(&buffer);
            
            	buffer.m_Length = ETHER_HEADER_LENGTH + sizeof(iphdr) + sizeof(icmphdr) + payload_length;
            }
            in reply to: is MTU decrement just for outbound package #11780
            Vadim Smirnov
            Keymaster

              Yes, this option modifies the MTU for local network adapters. You can’t affect the remote system MTU value directly but you can use the TCP MSS option or/and ICMP fragmentation needed to affect the effective MTU between hosts.

              Vadim Smirnov
              Keymaster

                Hi!

                There is a sample https://github.com/wiresock/ndisapi/tree/master/examples/cpp/socksify which redirects specified local application to the local TCP proxy and then to the specified SOCKS proxy. If I understood you right then this is what you doing in your application. I have also used similar approach in a couple of commercial projects and I can confirm that this works just fine.

                To figure out what is going wrong in your case I would capture and save the traffic to analyze. May be the packet you modified has incorrect checksum or length and thus dropped by the stack.

                in reply to: WindowsPacketFilter/Tools/ebridge not working #11774
                Vadim Smirnov
                Keymaster

                  Regretfully I don’t have TB adapters to test with, but probably TB somehow differs from the ‘normal’ Ethernet. Technically it is emulation of 802.3 media over TB bus, so I would not be surprised if TB adapter simply ignores any network packets having MAC address from another TB adapter.

                  In some approximation this could be similar to the situation when bridge works between wired Ethernet and WiFi where I had to translate wired Ethernet MAC addresses to WiFi and vice versa so that packets from the wired segment would not be rejected by an Access Point.

                  But these are just raw ideas based on my previous experience, I don’t have the relevant hardware to test with.

                  Vadim Smirnov
                  Keymaster

                    P.S. BTW, if you don’t need the SMB traffic to be processed in user mode then you could load the filter into the driver to pass it over without redirection.

                    Vadim Smirnov
                    Keymaster

                      I think 3 threads are good to go:

                      1. ReadPackets thread which forms re-injection lists, signals re-inject threads and waits the re-inject to complete or even better proceeds to read using secondary buffers set
                      2. SendPacketsToMstcp thread waits for ReadPackets signal, re-injects, notifies ReadPackets thread and returns to wait
                      3. SendPacketsToAdapter thread waits for ReadPackets signal, re-injects, notifies ReadPackets thread and returns to wait
                      Vadim Smirnov
                      Keymaster

                        Here is the CPU breakdown of SMB download:

                        Function Name Total CPU [unit, %] Self CPU [unit, %] Module Category
                        |||||| – CNdisApi::SendPacketsToMstcp 2858 (56.58%) 3 (0.06%) dnstrace.exe IO | Kernel
                        |||||| – CNdisApi::SendPacketsToAdapter 1495 (29.60%) 2 (0.04%) dnstrace.exe IO | Kernel
                        |||||| – CNdisApi::ReadPackets 349 (6.91%) 6 (0.12%) dnstrace.exe IO | Kernel

                        As you may notice splitting reading and re-injection does not make much sense, but splitting SendPacketsToMstcp and SendPacketsToAdapter over two threads definitely will have an effect.

                        I can’t see how the OSR post can be related, the author problem is about repackaging packets due to the reduced MTU.

                        Vadim Smirnov
                        Keymaster

                          This is the result on a i7-2600 @ 3.4GHz Win10x64:

                          100MB/s -> 40MB/s

                          CPU: 52%
                          Memory: 30%
                          Disk: 10%

                          The test system was a receiver, right?

                          In my test above I’ve been sending the file from the test system. When I have changed the direction, I have experienced more noticeable throughput degradation.

                          What is important here is that in both cases it was a maximum performance achievable by single threaded dnstrace application (Resource Monitor showed 25% CPU load over 4 vCPU). This is the bottleneck… Inbound packet injection is more expensive than outbound and this explains the performance/throughput difference for inbound/outbound traffic I experience on i3-3217U. On the other hand Ryzen 7 4800H single-threaded performance is good enough to not to have any throughput degradation at all regardless of the traffic direction.

                          Worth to note that Fast I/O won’t be of much help here, it was primarily designed for the customer who uses the driver in the trading platform and needed a fastest possible way to fetch the packet from the network to the application bypassing Windows TCP/IP stack.

                          First idea to consider is improve dnstrace performance by splitting its operations over two threads, e.g. one thread to read packets from the driver and second thread to re-inject them back.

                          I also think some optimization is possible for the packet re-injection either. E.g. scaling packet re-injection over all available vCPUs in the kernel. Though, it is not that easy as it sounds, breaking packet order in the TCP connection may result re-transmit and other undesired behavior. So, maybe adding Fast I/O for re-injection could be a better choice (currently packets are re-injected in the context of dnstrace, in case of Fast I/O they would be re-injected from the kernel thread).

                          Vadim Smirnov
                          Keymaster

                            P.S. An example, when I have tested the same machine but the target file was located on the HDD (on the screenshot above the file is on the SSD) I have had about 3x-4x slower throughput with 100% HDD load.

                            Vadim Smirnov
                            Keymaster

                              I have tested 8 years old Core i3-3217U (I don’t have anything slower with Windows installed) sending file to another machine over SMB with and without dnstrace running. Here are the results:

                              Core i3-3217U Test results

                              You can notice some slow down (8-9%) but it is not close to the 50% throughput reduction you have reported. What was the bottleneck in your tests?

                              Vadim Smirnov
                              Keymaster

                                Yes, sorry, it is my fault… Saturday evening 😉… In that case traffic has passed over virtual network . Here is the test over the cable:

                                Test over the cable

                                You can notice some bandwidth degradation (900 Mbps vs 976 Mbps without filtering) and extra CPU load.

                                Vadim Smirnov
                                Keymaster

                                  Some samples use fast i/o, others don’t, but it is very easy to switch the sample between fast and old model by changing one line of code:

                                  For the Fast I/O:

                                  auto ndis_api = std::make_unique<ndisapi::fastio_packet_filter>(

                                  For the ordinary I/O:

                                  auto ndis_api = std::make_unique<ndisapi::simple_packet_filter>(

                                  And yes, fast i/o does not support WOW64…

                                  the cpu I’m testing with is core i7 5500

                                  It is fast enough and in my post we have discussed above I have tested much older models. But you have mentioned that you use VM, while I tested on the real hardware over real 1 Gbps wire cable.

                                Viewing 15 posts - 466 through 480 (of 1,496 total)