Force go http client to use proxy in EACH request

817 Views Asked by At

I try to use one http client to make multiple requests on the same host through different proxy servers. It is important to make every new req through new proxy (round robin scheme). This is my code sample

package main

import (
    "fmt"
    "net/http"
    "net/url"
)

var client *http.Client

func main() {
    roundRobin := NewRoundRobinProxy(
        "http://myproxy1:8888",
        "http://myproxy2:8888",
        "http://myproxy3:8888")

    client = &http.Client{
        Transport: &http.Transport{
            MaxConnsPerHost:   10,
            DisableKeepAlives: false, // if it's true - it works fine, app really calls Proxy func on EACH req
            Proxy:             roundRobin.Proxy,
        },
    }

    sendReq("https://www.binance.com")
    sendReq("https://www.binance.com")
    sendReq("https://www.binance.com")
    sendReq("https://www.binance.com")
}

func sendReq(urlStr string) {
    req, _ := http.NewRequest("GET", urlStr, nil)

    resp, _ := client.Do(req)
    resp.Body.Close()

    fmt.Println("got resp from ", urlStr)
}

type RoundRobinProxy struct {
    urls   []*url.URL
    cursor int
}

func NewRoundRobinProxy(urls ...string) *RoundRobinProxy {
    p := &RoundRobinProxy{cursor: 0}
    for _, v := range urls {
        u, _ := url.Parse(v)
        p.urls = append(p.urls, u)
    }
    return p
}

func (p *RoundRobinProxy) Proxy(*http.Request) (*url.URL, error) {
    fmt.Println("i'm in proxy, cursor=", p.cursor)
    u := p.urls[p.cursor]
    if p.cursor < len(p.urls)-1 {
        p.cursor++
    } else {
        p.cursor = 0
    }
    return u, nil
}

So if I run this code I expect i'm in proxy... message as times as I have requests (4). But in fact I see this picture:

i'm in proxy, cursor= 0
got resp from  https://www.binance.com
got resp from  https://www.binance.com
got resp from  https://www.binance.com
got resp from  https://www.binance.com

So it uses first proxy in the pool and then cached it somehow.

Yes, the one solution is to set DisableKeepAlives=true. In that case it works

i'm in proxy, cursor= 0
i'm in proxy, cursor= 1
got resp from  https://www.binance.com
i'm in proxy, cursor= 2
i'm in proxy, cursor= 0
got resp from  https://www.binance.com
i'm in proxy, cursor= 1
i'm in proxy, cursor= 2
got resp from  https://www.binance.com
i'm in proxy, cursor= 0
i'm in proxy, cursor= 1
got resp from  https://www.binance.com

There are more in proxy messages, than requests. But it doesn't matter (maybe some redirects done under the hood)

But it's important to reuse tcp connections to avoid handshake overhead on each request.

Are there any ideas besides using pool of clients (each with one proxy) in place of pool of proxies. I wish to find more straightforward and elegant solution)) thanks

1

There are 1 best solutions below

0
Sandy Cash On

This isn't a go issue, it's just the way keepalive works. When you are using TCP keepalive, it keeps the connection open - as you recognize, this lets you avoid some of the handshake overhead. But what you are connected to is the proxy - in this case, yes, the first one in the list.

What's happening:

  1. You setup your client with a proxy function to return the correct URL mapping and keepalive enabled
  2. You issue the request
  3. Your client is returned the proxy address and connects to proxy 0
  4. Subsequent requests to the same URL will go over that existing connection to proxy 0.

Since the connection from the client terminates at the specific proxy, that is what is being kept alive.

I would follow the suggestion of 1:1 client-to-proxy - then you can load-balance across the proxies while still using keepalive.