Threat Actors constantly evolve in their campaigns to be more successful as security tools are getting better and well-trained employees are more vigilant for the attacks. Recently we observed an emerging technique for phishing using HTML Smuggling. It was first discovered during “Duri” phishing campaign in H1 2020, and is now coming back.
There are two types of HTML Smuggling in the wild:
- Data URL encoding
This article will cover first method.
It begins by email with .HTML attachment. Message is mimicking a scan-to-mail feature of HP Deskjet printers:
And of course, attachment is not a PDF, as stated in the text, but a HTML file.
- The initial set of functions from first <script> tag
2. The big base64 encoded part, saved as _n variable
3.The set of functions that contains a base64 decode function and few more
You might wonder where HTML code is. The concept behind this attack is simple. The output of first part (a set of JS functions) is HTML code displaying a first landing page. The second part is of course base64 that is being decoded in the third section where the victims e-mail address is also being added.
There are few ways to quickly deobfuscate the code and gets what is happening behind. I chose to use Developer Tools and a Debugger build in Chromium browser and play a bit with a code. Just quick search for few keywords (there’s only few ways of displaying HTML code via JS) allowed me to find a document.write function:
It’s a good idea to first setup some breakpoints and get an understanding what is the purpose of this code.
Since I already found a document.write function I decided to investigate the content and what is displayed. Small adjustment in the code allowed me to extract whole output:
Let’s now copy it, do some quick formatting, and save in separate file. Partial results can be found below. Just by checking first few lines (like title tag), we can already see that it’s a phishing webpage that’s loading a lot of Microsoft related content from cloud
Just to be sure I fired it up in my sandbox browser and allowed to load some resources. Here’s a preview how the website looks on that stage when run in browser:
Looks legit and at current stage it’s not asking for any credentials. To add more authenticity, attackers decided to show “Stay signed in?” prompt first. Of course, victim does not have any option to choose. Both “No” and “Yes” will run the same JS function as both buttons are “submit” buttons and this is how form tag looks like:
Any option will lead to running start(event). It leads us to third section. Let’s have a look on the code behind:
This part is straightforward and not even encoded or obfuscated. It’s possible to analyze it line by line but to save my time I decided to debug and analyze code in action. Most important line is of course this one:
Windows.location.href will redirect to the output of toText, which will take the base64 code (second section) and add ap variable which is victim’s e-mail address with a “#” symbol at the beginning.
Long story short: Clicking No/Yes will lead to start(event) which will lead to toText function, a function that decodes the second section (base64 code) and creates a Blob via JS createObjectURL(data). As a last step there is a redirection to it via Windows.location.href.
What is a BLOB?
Before we jump further, this information is required to understand what is happening in the next steps in a Browser. According to the Mozilla WebDocs:
The Blob object represents a blob, which is a file-like object of immutable, raw data; they can be read as text or binary data, or converted into a ReadableStream so its methods can be used for processing the data.
Second stage – final phishing page
The final page previews in the Chromium browser – look at the address bar.
Time to analyze some code. Starting from the HTML, this part is just a form with a assigned sendmails() function on button click. Which means sendmails() is called when victim press “Sign in” button:
The sendmails() function is hidden in between the hundreds of various functions. Thanks to the CTRL+F we can easily get to it:
Debugging or following the functions will finally lead us to discover a well-known JS function:
XMLHttpRequest() is used to create GET request and send all the data to attackers’ server. Request consists of many steps of obfuscation:
- Various encoding and decoding
- switching letters
- adding or removing characters
- generating “debugger” keyword to make debugging process a hell.
Seeing that I decided to place some breakpoints around _0x4054be and _0xaa615c and run the script with “secretpassword” in the password field.
Below I present partial results of the debugging process with output of some functions presented in the Chromium hint.
- The URL to the attacker’s webserver built up by multiple decoding, encoding and character swap via enormous number of functions – the output of _0xaa615c:
The “user” variable is a part of URL and it’s being provided to xxx.php file.
- The second variable will be sent via GET as “pass”, the second variable:
- Password is sent just after &pass=
Few debugger steps later, XMLHttpRequest.sent() is being called, which means that the request has been created and it’s now being sent to the attacker’s infrastructure to be stored in his database:
To sum up, the GET request finally looks like this:
Tip of the day – Interception Proxy is the fastest way to get IoCs
Debugging and code analysis is time consuming, and often it is required to get Indicators of Compromise as soon as possible. In this case and similar one, we can just fire-up the interception proxy (for example: Burp Suite) and get them in few seconds.
- Block HTM and HTML attachments in your mail systems
- Include this technique in your security awareness programs
BONUS: Get more Indicators of Compromise by doing few quick checks in PassiveTotal
This won’t be a deep dive into gathering more information about the attackers/campaign via various Threat Intelligence platforms and resource. I just wanted to show how few simple queries allow to make the IOCs list better.
To start up we will jump into domain resolution history of the first domain, the numerical one. There are three IP addresses as a result:
We will start by querying the first one (185.224.196[.]94) for associated domain names. Here are the results:
Just few clicks and already get additional malicious domain.
Now, let’s go for a resolution of 23.254.167[.]187:
And another one, looks similar but it’s brand new: 874387873872387[.]com.
Quick query for it and we can see that there’s only one IP resolution known to this domain – 23.254.167[.]187. Let’s search it up and check the results.
Amazing. A lot of new domains to block on our proxy. Most of them contain a long and randomized number. It’s hard to imagine a real-life scenario when someone wants to have domain name containing randomized sequence of numbers. If your proxy allows the usage of regular expression for blocking, maybe it’s time to use it. We will stop here and take a step back.
Let’s go for “privateworkdocument[.]xyz” that we get from initial analysis. RiskIQ will give us two recent resolutions to following IPs:
The first IP 177.221.141[.]61 is not pointing to more rogue domains. It’s only associated with privateworkdocument[.]xyz domain. We are luckier with the second one (177.221.141[.]81):
Attackers made no mistake and generated separate SSL certificates for each side. All of them has been generated via Let’s Encrypt. They contain different e-mail addresses, but the common part is the VESTA Control Panel mentioned:
Shodan info confirms the usage of Vesta Control Panel by attacker on his websites: