How to block spam referrals from google analytics and server using fail2ban

664 Views Asked by At

Recently I have seen a huge increase in referral traffic in GA that comes from spammy domains like bidvertiser . com, easyhits4u . com or trafficswirl . com. These are messing a lot the data in GA triggering a sudden decrease in conversion rate rendering the data unusable.

You can easily see which referrals are bad because they have a few charateristics:

  1. high bounce rate
  2. low time spent on pages (even fewer pageviews per user)
  3. 0 conversions (if you measure such a thing)

Looking in the logs I found lines like this

52.33.56.250 - - [10/May/2017:08:39:05 +0000] "GET / HTTP/1.0" 200 18631 "http://ptp4all.com/ptp/promote.aspx?id=628" "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0; .NET4.0E; .NET4.0C; .NET CLR 3.5.30729; .NET CLR 2.0.50727; .NET CLR 3.0.30729; MALCJS)"
74.73.253.77 - - [10/May/2017:08:39:05 +0000] "GET / HTTP/1.0" 200 18631 "http://secure.bidvertiser.com/performance/bdv_rd.dbm?enparms2=7523,1871496,2463272,7474,7474,8973,7684,0,0,7478,0,1870757,475406,91376,112463629579,78645910,nlx.lwlgwre&ioa=0&ncm=1&bd_ref_v=www.bidvertiser.com&TREF=1&WIN_NAME=&Category=1000&ownid=627368&u_agnt=&skter=vgzouvw%2B462c%2B40v10h%2Bghru%2Bmlir%2Bhoveizn%2Bsxgzd&skwdb=ooz_wvvu" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"

How to handle this?

1

There are 1 best solutions below

0
Mike On

There are 2 main things that you need to do:

1. Server level - You must block spammy request from the beggining.

I thought it would be best to prepare dynamic filters that will block requests from specific IPs that are doing the spammy traffic.

I am using for this purpose fail2ban, but there is no rule that will help ypu do this out fo the box. First you need to create a new Jail Filter (I am using Plesk so here is how to do that in Plesk https://docs.plesk.com/en-US/onyx/administrator-guide/server-administration/protection-against-brute-force-attacks-fail2ban/fail2ban-jails-management.73382/). For those that do not use Plesk and use ssh you can have a look here https://www.fail2ban.org/wiki/index.php/MANUAL_0_8

The definition of a jail is like this:

[Definition]
failregex =
ignoreregex = 

Be sure to include ignoreregex as well otherwise it will not save.

After that look for the domains you find in Google Analytics, in your access log. You will find a lot of requests like the one above.

Once you identify the domain you need to add rules like this:

failregex = <HOST>.+bidvertiser\.com
    <HOST>.+easyhits4u\.com
  • HOST - is a keyword for fail2ban to use the ip in the log.
  • please note the ".+" - this will enable fail2ban to ignore all text until they find the domain you are looking for in that line
  • bidvertiser.com - the domain that causes the trouble with the "." escaped by "\".
  • the new line (new domain) should have a TAB character before the rule otherwise it will not save

My rule looks like this:

[Definition]
failregex = <HOST>.+bidvertiser\.com
    <HOST>.+easyhits4u\.com
    <HOST>.+sitexplosion\.com
    <HOST>.+ptp4all\.com
    <HOST>.+trafficswirl\.com
    <HOST>.+bdv_rd\.dbm
ignoreregex = 

You can see the bdv_rd\.dbm. That is not a domain but a script they used to produce the spam. So it could be easy for them to change the domain and use the same script. This adds an extra layer of filtering. I added that there becasue fail2ban will search for any string that matches the pattern.

Note 1: be sure you do not interfere with your own website URL's because this will block legitimate users and you do not want that.

Note 2: you can test your regex in ssh like this

:# fail2ban-regex path/to/log/access_log "<HOST>.+bidvertiser\.com"

This should produce the following output:

 Running tests
=============

Use   failregex line : <HOST>.+bidvertiser\.com
Use         log file : access_log
Use         encoding : UTF-8


Results
=======

Failregex: 925 total
|-  #) [# of hits] regular expression
|   1) [925] <HOST>.+bidvertiser\.com
`-

Ignoreregex: 0 total

Date template hits:
|- [# of hits] date format
|  [4326] Day(?P<_sep>[-/])MON(?P=_sep)Year[ :]?24hour:Minute:Second(?:\.Microseconds)?(?: Zone offset)?
`-

Lines: 4326 lines, 0 ignored, 925 matched, 3401 missed
[processed in 3.14 sec]

Missed line(s): too many to print.  Use --print-all-missed to print all 3401 lines

Now this means your filter found 925 requests matching that domain (a lot if you ask me) that will translate into 925 hits from the referral bidvertiser.com in your Google Analytics.

You can verify this downloading the log and doing the search with a tool like Notepad++.

Now that your definition is ready you should add a jail and a rule.

I use the definition above with the action to block all ports for that IP for 24 hours.


After I installed this in just a few hours I had close to 850 blocked IPs. Some are in Amazon AWS network so I filed an abuse complaint here https://aws.amazon.com/forms/report-abuse .

You can use this service https://ipinfo.io/ to find the owner of the ip.

2. Google Analytics level

Here you have a few options that I will not describe here because it is not the place and there are well written resources on this theme:

https://moz.com/blog/how-to-stop-spam-bots-from-ruining-your-analytics-referral-data https://www.optimizesmart.com/geek-guide-removing-referrer-spam-google-analytics/


A few notes:

  1. These guys use in some places .htaccess blocking. That is an option as well that I did not use here because in my filters I use also script names and not only domains.

  2. Fail2Ban will use iptables to block any other request from these IPs and not only the http/https port.

  3. The first request will always pass and create 1 hit in Analytics and then depending on whether the script still accesses your website another hit when the ban expires

  4. You can use the recidive filter to permanently ban those IPs https://wiki.meurisse.org/wiki/Fail2Ban#Recidive

  5. The Analytics filters will not filter out historic data.