Alexa Certified Site Metrics – AlexaBot 403 Error, Alexa Certify

Home Forums BulletProof Security Free Alexa Certified Site Metrics – AlexaBot 403 Error, Alexa Certify

Viewing 15 posts - 1 through 15 (of 37 total)
  • Author
    Posts
  • #1118
    Shane Strong
    Member

    Hey guys I was hoping I could get some help here to figure out my issue.  I have just registered my site with Alexa and have setup a Certified account.  When Alexa tries to crawl my site with the AlexaBot or User-Agent ia_archiver but it received a 403 error.  Now I did get it to work but I had to disable BulletProof Security and once I re-enabled it the bot was unable to scan any of my site.  I am not sure if there is a spot that I need to change in the htaccess file to allow the AlexaBot but I do know that it will crawl the site as long as I have BulletProof turned off.  I would like to keep BulletProof on if that would be possible.  Any help would be appreciated.
    URL is http://www.shanestrong.com

    #1144
    AITpro Admin
    Keymaster

    I am not exactly sure where you are seeing the 403 Error.  The only 403 Errors I ever see for the AlexaBot on my sites are for directories that I do not want the AlexaBot or any Bots to crawl.  I see that Alexa offers a Certified Site Metrics service.  Is that what is being blocked?  If so, then most likely you would just need to remove HEAD from this nuisance filter in your Root .htaccess file.

    [code deleted – was not relevant to solving the problem]

    #1145
    Shane Strong
    Member

    Thanks I will give that a try and yes I was talking about the Certified Site Metrics.

    #2297
    Dave Smith
    Member

    Hi,
    I’m trying to certify my site with Alexa, but it gets 403 errors when it tries to crawl the site.
    I’ve tried what you recommended in a previous post to another user but it didn’t work, and I’ve also tried removing java from the query string exploits filter but that didn’t work either.
    Could you help please? the Alexa code that appears in my site is below:

    #2304
    AITpro Admin
    Keymaster

    Please post one of the Alexa 403 Errors from your BPS  Security Log. Thanks.

    #2305
    Dave Smith
    Member

    Hi,
    sorry forgot to mention, there is no 403 error listed in the log. I’ve tried clearing the log, rerunning the Alexa verify thing, and then rechecking the log, but there’s nothing in there.

    #2306
    AITpro Admin
    Keymaster

    Go to the BPS htaccess File Editor page and go to Your Current Root htaccess Tab and edit your root .htaccess file and comment out the 3 User Agent security filters below with a pound sign #.  If you are still getting 403 Errors then comment out security filters in groups of 3 then test, comment out 3 more security filters and test, etc. until you get to this security filter – sp_executesql directly below and do not comment this security filter out with a pound sign and also do not comment out any code below this security filter:

    RewriteCond %{QUERY_STRING} (sp_executesql) [NC]
    # BPSQSE BPS QUERY STRING EXPLOITS
    # The libwww-perl User Agent is forbidden - Many bad bots use libwww-perl modules, but some good bots use it too.
    # Good sites such as W3C use it for their W3C-LinkChecker.
    # Add or remove user agents temporarily or permanently from the first User Agent filter below.
    # If you want a list of bad bots / User Agents to block then scroll to the end of this file.
    #RewriteCond %{HTTP_USER_AGENT} (havij|libwww-perl|wget|python|nikto|curl|scan|java|winhttp|clshttp|loader) [NC,OR]
    #RewriteCond %{HTTP_USER_AGENT} (%0A|%0D|%27|%3C|%3E|%00) [NC,OR]
    #RewriteCond %{HTTP_USER_AGENT} (;|<|>|'|"|\)|\(|%0A|%0D|%22|%27|%28|%3C|%3E|%00).*(libwww-perl|wget|python|nikto|curl|scan|java|winhttp|HTTrack|clshttp|archiver|loader|email|harvest|extract|grab|miner) [NC,OR]

    When you find the security filter or filters that are causing this issue/problem please post which security filters are blocking the Alexa Bot. Thanks

    #2328
    Dave Smith
    Member

    Hello,
    due to the odd way that Alexa work (when I signed up I had the chance to try and verify the site as many times as I wanted, when I came back to it later I had only 5 attempts left), and the fact I then inadvertently accidentally wasted a retry, I commented out all of the lines in that section.
     
    Sure enough it then certified the site.
     
    Do you need me to work backwards and un-comment out some of the lines in a kind of half/divide method until it stops working again / we identify the line that is the problem?
     
    Thanks for your help so far, I’m sure it’ll be a big help to others who also want to go down the Alexa route.

    #2329
    AITpro Admin
    Keymaster

    Yes, that would be awesome if you would be willing to find the exact security filters that are blocking the Alexa Certify process.  Thanks.

    The most likly security filters are going to be the top 3 User Agent filters.  So what I do in cases like these where I cannot test the actual software or service is work backwards.  😉  So uncomment all the security filters except for the top 3 User Agent filters and then test, if the Alexa Certify process is still blocked then comment out the next 3 security filters below the 3 you already commented out.  I continue to do this until I get to this security filter below and stop there.  If you comment out this security filter your website will crash because at least 1 security rule needs to exist in the BPS Query String Exploits section of code.

    RewriteCond %{QUERY_STRING} (sp_executesql) [NC]
    #2332
    Dave Smith
    Member

    Hi,
    I’ve changed it so only the top 3 lines are commented out, will let you know what happens.
     
    thanks,
    D

    #2340
    Dave Smith
    Member

    Hello,
    I redid it with just the top 3 lines commented out and it still certifies okay, so I guess it’s in there – I only have 1 retry left and will then have to wait until 27th March to try again!

    #2341
    AITpro Admin
    Keymaster

     Great!  There is a 96% chance that it is the #1 security filter.  A 2% chance that it is the #2 security filter.  A 2% chance that it is the #3 security filter.  The most logical User Agents in rule #1 that are probably being used by Alexa and blocked by BPS are:  libwww-perl, wget or java.  Uncomment security filter 2 and 3 and leave security filter 1 commented out.  Thanks for testing this.

    1. RewriteCond %{HTTP_USER_AGENT} (havij|libwww-perl|wget|python|nikto|curl|scan|java|winhttp|clshttp|loader) [NC,OR]
    2. RewriteCond %{HTTP_USER_AGENT} (%0A|%0D|%27|%3C|%3E|%00) [NC,OR]
    3. RewriteCond %{HTTP_USER_AGENT} (;|<|>|'|"|\)|\(|%0A|%0D|%22|%27|%28|%3C|%3E|%00).*(libwww-perl|wget|python|nikto|curl|scan|java|winhttp|HTTrack|clshttp|archiver|loader|email|harvest|extract|grab|miner) [NC,OR]
    #2378
    Dave Smith
    Member

    Hi,
    okay so I commented out ONLY the top line and it stopped working again, so I guess that means it’s on line 2 or 3 where the problem lies.
    Fortunately Alexa have kindly given me another 5 attempts to certify the site, so what I’ll do is comment out only filters 2 & 3 and report back what happens.
    Could it be possible that it’s line 1 & 2 or 1 & 3 (i.e more than 1 line) that’s the problem?
    Thanks.

    #2383
    AITpro Admin
    Keymaster

    Yes, it could be possible that more than 1 of those User Agent filters is blocking the Alexa Certify check.  I looked around and found out that Alexa was using libwww-perl in the UA, but could not get any recent technical information on either the Alexa site or searching around the Internet.  But using what you have discovered so far I believe that it is going to be ONLY rule #2 that is causing the Alexa Certify process to be blocked, but it could be both #2 and #3 so yep comment out #2 and #3 for your next test and then if that works then the next test would be to ONLY comment out rule #2. Thanks again for doing this testing.  Very much appreciated.  I have searched all over the place to try and find out what is being sent in the Query to a website during the Alexa Certify process and have not been able to find any exact technical specifics.

    #2451
    Dave Smith
    Member

    Hi,
    okay so here’s what I’ve discovered:
    Just commenting out filter 1 causes it to NOT work;
    Commenting out just filters 2 & 3 causes it to NOT work.
    Commenting out filters 1, 2, & 3 and it DOES work.
    So I think it must be 1&2, 1&3, or 1,2, & 3.
    I only have 3 scans left again, I’ll wait for your thoughts before trying anything else.
    Thanks.

Viewing 15 posts - 1 through 15 (of 37 total)
  • You must be logged in to reply to this topic.