Selenium Java – How to avoid bot detection by websites when using chromedriver.exe

Share on facebook
Share on twitter
Share on linkedin
Share on print
Share on email

If you’re looking for ways to make your selenium bot undetectable by websites and indistinguishable from a real human visitor, you’ve come to the right place. In this article I will show you a few different methods & tricks that have been working for me.

If you haven’t already, make sure to check out this article from piprogramming.org that covers 10 tricks to avoid bot detection. While some of what I cover today will be similar, this tutorial builds on the basics provided in the aforementioned article. Also, this article is meant specifically for ChromeDriver users and those developing their bot using Java on the Eclipse IDE. However, you can apply all of these concepts to Firefox and IE.

Let’s cover the must-do’s first:

#1 Clean up your navigator object

The navigator is a JavaScript object that contains information about your browser. You can see what kind of information it has simply by going to inspect element -> console and typing in “console.log(navigator)

By default, when you launch ChromeDriver.exe via Selenium, it will add a variable to the navigator called WebDriver and set it to true. Meaning any website can check if your browser navigator has a webdriver flag. Regardless if the webdriver is set to true or false, if the variable exists then you must be a bot.

That’s why it’s important to set your webdriver.navigator property to undefined. In that case, if a website queries navigator.webdriver, it will get a response of undefined, which is what you would get in any normal browser because it doesn’t have a webdriver flag in the first place.

I was able to remove the webdriver from my navigator using this line of code:

driver.executeScript("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})");

driver being a ChromeDriver object.

#2 Remove trackers from chromedriver.exe

It seems that the developers of ChromeDriver put a “tracker” in the exe file as a sort of back-door for web servers to detect it. This culprit comes in the form of a unique variable that’s set to this exact string: $cdc_asdjflasutopfhvcZLmcfl_ 

Personally, I’m not sure if this was done on purpose or not. But either way, you must replace this variable with another of the same size. The easiest way to do it is using a Hex Editor. In my case I used HxD and searched for var key, and bingo!

**It’s important to note that this unique string might change in the future

While this edit will cover our bases for now, this is the bare minimum you have to do to keep your bot undetected.

Another edit I suggest you guys do, is remove any trace of the string “WebDriver” from chromedriver.exe. If you search for “WebDriver” using a hex editor you will actually find it being used multiple times!

Call me paranoid, but you can never be too careful. Change WebDriver to another word with the same length, and you are one step closer to complete undetection.

If you want to be really thorough, it’s not a bad idea to skim through the 600,000+ lines of chromedriver.exe to see if you can identify any other potential “trackers”. I’m not going to lie, I haven’t done that yet, but it’s definitely on my TODO list.

#3 Automatically grabbing Proxys and switching IP Addresses.

This is another basic function that your bot should be able to do. You should be able to automatically grab a proxy and assign it to your chromedriver.exe. Moreover, you should be able to automatically switch between proxys as well.

There are many different ways of achieving that. For me, I used pubproxy.com and queried their API. Then parsed the response and assigned the proxy to the chromedriver.exe. Here’s the code:

String proxyAPICall = "http://pubproxy.com/api/proxy?format=txt";
if (driver.findElementsByTagName("pre").size() > 0) {
pubProxy = driver.findElementByXPath("/html/body/pre").getText();
// set the proxy
options = new ChromeOptions().addArguments("--proxy-server=" + pubProxy);
System.out.println("proxy: ".concat(pubProxy));
}

pubproxy generates a unique proxy on each API call which was perfect for my use case. Keep in mind unless you subscribe to their premium you can only make 50 API called per day.

#4 THE FUN PART

Now comes the fun part, this is where you need to create a set of “identities” for your browser. This means different user agents, browser window size, screen resolution, and much more.

Essentially you want to keep switching your browser fingerprint, but you also want to have a believable fingerprint. Imaging you are trying to artificially create a new human fingerprint. You change the pattern so much that on closer inspection doesn’t quite look like a normal human fingerprint. Anyone who takes a closer look at this fake fingerprint will instantly be able to tell that it’s not that of a real human being.

It only makes sense to use a similar approach when designing your bot browser fingerprints. The catch is, not only do you have to design a believable browser fingerprint, but you also have to create quite a few of them to keep on avoiding detection.

There are many different ways to do that, but the bare minimum is modifying your: user agent and browser window size.

User Agent

When it comes to your user agent, it’s best to use common user agents for your browser. You can use whatismybrowser.com to find a list of the most common user agents used today sorted by browser type and version.

**Keep in mind, you want to match the Chrome version your chromedriver.exe is using, otherwise it will be a huge red flag if you user agent version doesn’t match your actual browser version.

Browser Window Size & Screen Resolution

You also want to modify your browser window size and screen resolution. After all, people use their browsers at varying sizes. It’s best to set realistic browser sizes that you use yourself.

There’s no magic number that’s best, you have to think about it in terms of mimicking real humans. You can take a look at Screen Resolution Statistics from W3 to get an idea of some common resolutions.

It’s important to note that there’s a difference between the browser window size, and the available screen resolution on your monitor. Those are two separate values you have to pay attention to. To change your browser window size, use the following code:

options.addArguments("window-size=1920,1080");

options being a ChromeOptions object.

To change your screen resolution, you don’t need to actually mess with your monitor. Just modify the variables in your javascript screen object, which looks like this:

Change availWidth, availHeight, width, and height to whichever resolution you want.  Keep an eye on pixelDepth and colorDepth as well, as you want those all to match whatever resolution you will be setting. To change those variables, simply run this code:

driver.executeScript("Object.defineProperty(screen, 'height', {value: 1080, configurable: true, writeable: true});");
driver.executeScript("Object.defineProperty(screen, 'width', {value: 1920, configurable: true, writeable: true});");
driver.executeScript("Object.defineProperty(screen, 'availWidth', {value: 1920, configurable: true, writeable: true});");
driver.executeScript("Object.defineProperty(screen, 'availHeight', {value: 1080, configurable: true, writeable: true});");

Conclusion

If you implement all the methods I talked about in this article, your bot should be undetected by most web servers. Keep in mind this is the bare minimum, meaning that these methods may not be enough for a server that actively looks for Selenium bots.

This is a game of cat and mouse that you will have to keep on playing if you want to keep your bot undetected.

If you liked this article and would like me to give you more ideas on how to make your Selenium bot undetected, please consider sharing it on twitter and tagging me @needforbeans

If enough people show interest in this, I have tons of more ideas on methods I can share with you about keeping your bot undetected!

Via: https://themerkle.com

Leave a Replay