Skip to main content

Building Bots with Mechanize and Selenium

The Sociosploit team conducts much of its research into the exploitation of social media using custom built bots. On occasion, the team will use public APIs (Application Programming Interfaces), but more often than not, these do not provide the same level of exploitative capabilities that could be achieved through browser automation. So to achieve this end, the Sociosploit team primarily uses a combination of two different Python libraries for building web bots for research. Each of the libraries have their own advantages and disadvantages. These libraries include:

Mechanize

Pros:
  • Very lightweight, portable, and requires minimal resources
  • Easy to initially configure and install
  • Cuts down on superfluous requests (due to absense of JavaScript)
Cons:
  • Does not handle JavaScript or client-side functionality
  • Troubleshooting is done exclusively in text

Selenium

Pros:
  • Operations are executed in browser, making JavaScript rendering and manipulation easy
  • Visibility of browser simplifies troubleshooting
Cons:
  • Painful to initially build bot environment
  • Scripts are not portable, as they require installation of supporting web browser API drivers
  • More resource intensive (due to browser usage), and often executes superfluous requests (due to JavaScript)
To demonstrate the basic functionality of each, we will craft a quick LinkedIn login function using both Mechanize and Selenium.

Mechanize

Import libraries and configure browser object
First we need to import the mechanize library, instantiate the browser object, and then configure it.
Import libraries and configure browser object
Next we instantiate an instance of the browser object, then browse to linkedin's homepage and examine the contents (to include forms, links, or source-code).
Interact with Form
Below is a snippet from the response for enumerating the forms. It is apparent that this is the login form, as it has an available TextControl for session_key (i.e. username) and session_password (i.e. password).
Unfortunately, the form does not have a name identifier, so we have to interact with it by using the index value.
Test Login Success
Finally, after logging in, we can confirm that login was successful by examining the updated browser content. One easy way to do this on most sites is to review the links, to determine if there is any exclusively post-authentication functionality (such as a Log Out function, user network management, account management, etc.).

Selenium

Import libraries and configure browser object
First we need to import the selenium library and instantiate the browser object.
Browse to the website and examine
Next we browse to the site and examine it.
Unlike with Mechanize, this will actually start a browser window, and we can examine the contents in the browser. In Firefox, you can right click on any element in the browser, then click "Inspect Element".

This will bring up the element in Inspector. From here, you can right click on the element field and then select Copy --> XPath. This XPath can then be used to interact with the element using the Selenium browser object.

Interact with the login form
We can use these XPath's to supply the username and password to the appropriate fields, and then to click the Submit button.

Confirm login success

Comments

Popular posts from this blog

Bypassing CAPTCHA with Visually-Impaired Robots

As many of you have probably noticed, we rely heavily on bot automation for a lot of the testing that we do at Sociosploit.  And occasionally, we run into sites that leverage CAPTCHA ("Completely Automated Public Turing Test To Tell Computers and Humans Apart") controls to prevent bot automation.   Even if you aren't familiar with the name, you've likely encountered these before. While there are some other vendors who develop CAPTCHAs, Google is currently the leader in CAPTCHA technology.  They currently support 2 products (reCAPTCHA v2 and v3).  As v3 natively only functions as a detective control, I focused my efforts more on identifying ways to possibly bypass reCAPTCHA v2 (which functions more as a preventative control). How reCAPTCHA v2 Works reCAPTCHA v2 starts with a simple checkbox, and evaluates the behavior of the user when clicking it.  While I haven't dissected the underlying operations, I assume this part of the test likely makes determination

Another "Fappening" on the Horizon?

So in case you aren't fully up-to-speed on useless hacker trivia, "The Fappening" (also sometimes referred to as "Celebgate") was a series of targeted end-user cyber attacks which occurred back in 2014 (which strangely feels like forever in tech years), that resulted in unauthorized access to the iCloud accounts of several prominent celebrity figures.  Following these breaches, photographs (for many including personal sexually explicit or nude photos) of the celebrities were then publicly released online.  Most evidence points to the attack vector being spear phishing email attacks which directed the victims to a fake icloud login site, and then collected the victim's credentials to subsequently access their real icloud accounts. Migration to MFA In response to these events, Apple has made iCloud one of the very few social web services that implements compulsory MFA ("Multi-Factor Authentication").  But while they might be ahead of the indust