This reconnaissance phase is both about finding information to break in successfully and about searching for data which could help to accelerate sensitive information isolation(like the name of a key employee for example). This post will mostly focus on the technical researches to find these informations, from the attackers point of view.

It is probably a good exercise at some point, maybe at the end of this blog post, to try to use the methods we will describe on your own name or company name. You will probably be surprised by the amount of information one can get, when looking carefully for it.

So, what are the main open ressources of interest for an attacker targeting a company? Where can he find good intel about the target ?

Corporate websites
The target itself can reveal a lot of information about itself and its activities on its own corporate websites. Names of subsidiaries, physical locations, contact addresses, partners, etc. can be found there, amongst other useful information.

Press releases
Press releases can sometimes offer useful information to an attacker. Company X signed a new partnership with company Y? Interesting. Company X has signed with an identified anti-virus company to supply it with their new anti-virus solution? Very interesting.

Public documents / white papers / reports
Interesting information can be found in those documents. Not only does it provide good information about researches or projects running in the company, but moreover it usually provides attackers with the names of all the researchers involved. Piece of gold for social engineering.

Social networks
Facebook announced in 2012 that they had more than a billion people using their site each month, while LinkedIn spread the word at the beginning of 2013 that it had 200 million members. While this last number can be narrowed down a bit because it represents the total number of users who ever registered to the website, the number of registered LinkedIn users is still very impressive, and will probably keep growing. As soon as a company has reached a certain amount of employees, it is pretty fair to think there will be at least one employee on LinkedIn or on another professional network of the kind. Often enough, people also mention their employer in their Facebook profiles or messages. Employees often leak sensitive information without knowing it. They believe no one excepted their friends will read what they write. This is partially true, depending on the security parameters they have on their different social networks accounts.

Job offers
Job offers websites are a target of choice for information gathering. Carefully monitoring the job offers of a company can reveal a lot about it. You could guess what the company works on, in which kind of project they do hire more or less people, find out that they are running a service they do not openly speak about, etc. Also, as an attacker, if you see a company try to hire experts in some fields, it can be valuable. Imagine a company tries to recruit an expert in a very specific Firewall software? You can bet the software is used as the main firewall solution in specific area of the company or maybe everywhere …This is interesting for an attacker, specially when you know his next move will probably be to look for exploits on specific software.

Search engines
Google knows a lot. It provides results in seconds, even more for someone who knows exactly how it works and how it can be optimized for specific searches. Did you know you could find all PDF documents located on one website with just one advanced Google query. Keep in mind that Google is not the only search engine. It is the most used and most popular, but there are many others. Some are even IT specific. This will be covered later in this post.

As for the attacker’s methods of hunting, there are two different approaches for information gathering: the passive reconnaissance, and the active reconnaissance.

Please note that we will focus on every information gathering method of interest for APT attackers only. We will not go through all techniques, since our attackers won’t use most of them. Interested people should read books dedicated to penetration testing to see all these methods.




PASSIVE RECONNAISSANCE

Passive reconnaissance for us describes the methods and tools used by the attackers to find information on their target by exclusively using open ressources.

There are natural steps you can easily think of when it comes to collect information on somebody or on some company/organization. Internet is a wonderful tool when it comes to get information. But are we really aware of every information one can find there?

The passive reconnaissance is of particular interest for APT attackers: it is a kind of information gathering which usually does not ring any bell on the target’s side, since there is no direct interaction here between the attacker and the target.

Domain names / Whois

Domain names are usually the first information one can get about a company. All large companies do usually have at least one corporate domain name, with several subdomains.

The most important thing we have to know here, is that all the domain names need to be registered to exist. They are registered through a registrar, and the information provided to these registrars is public, though it can be modified through time. The registration information is also called the “Whois” information.

The goal of this blog post is not to explain in details how everything on Internet works but to focus on interesting information. If you want to know exactly how the Whois service works, I encourage you to read the Wikipedia page about it.

The Whois service allows us to access interesting information, including the host names of the Domain Name Systems (DNS), the date of creation for the domain, the date of last update, and the contact information for the domain, amongst several other informations. The easiest way to access to the Whois service is via the “whois” command in a Linux system. Other ways can be to download a whois client on your operating system, and use online whois services like CentralOps for example.

The most interesting informations in the Whois data, from an attacker's point of view, are the contact informations, the name servers and the registrar information. If an attacker is unsuccessful in collecting e-mail addresses, this is a place where there will always be an e-mail address, which he can try to exploit, or which sometimes can help him in guessing other e-mail addresses within the company. If the e-mail address here is firstname.lastname@company.com, odds are good that the company uses this method for every employee. The attacker will only need to know names to guess their e-mail addresses.
As for the registrat information, it could be used in spear phishing attacks, in which the attackers could try to have one or several employees of the target company open an attached document by pretending to be the registrar. The chances that an employee opens an e-mail are greater when he recognizes a company name or person. If the employee is responsible for domain registration in his company, he might open an e-mail from a registrar.
Name servers are also very interesting for attackers. It can provide the attacker with other domain names used by the company, or provide other information which can be used for other information gathering. Collecting specific information from the DNS servers will be shown later in this chapter.

Important note on anonymized domains: While it was quite rare to see anonymized domains some years ago, it becomes more and more frequent. Most registrar offer anonymization for their customers. The idea is that instead of showing the customer registration information, anonymized information is shown. A special e-mail address is still provided to be able to reach the owner of the domain name but it is a forward from the registrar or anonymizing third party, making it useless for any investigation.

Archive.org
Several online services are able to provide archives of web sites. Archive.org, also called the “Wayback Machine”, is one of these services.

The idea behind these services is to store snapshots of web pages, and provide the user with older versions of these pages.

While this information is interesting for users in different situations, it is rarely used by APT attackers. They still could use these online services to find old versions of vulnerable websites, and try to find old parts of the website against which an exploit could be run, that is why we decided to mention this ressource here.

Here follows an example of how Google.com looked like, the 2nd of December 1998:

oldgoogle2.png

DNS Servers

DNS servers are a very good target for attackers. They can provide a lot of useful information, from a passive and active point of view.

DNS servers are a key part of every company’s public infrastructure, since they are aware of all named entity publicly available. An attacker who successfully manages to get a full access to a DNS server of the company knows everything about the targets infrastructure in terms of network mapping.

Another particular interest for the attacker lies in the severity of such a server. Just like database servers (this is just an example), DNS servers are often considered so sensitive that they “should not be touched without high necessity”. They are therefore often unpatched and vulnerable to several exploits. System administrators often ignore DNS servers, as long as they work. These servers are usually extremely stable, so the system administrators do not often check what happens there.

Another consequence of system administrators neglecting their DNS servers is that it is often misconfigured.

DNS zone transfer (also called AXFR) is the first kind of misconfiguration that attackers usually check. DNS zone transfers are possible because companies usually deploy several DNS servers, for redundancy/load balancing purposes. As a result, there is a need for a mechanism to replicate the DNS database across all these servers, and this is where the zone transfer occurs. During this process, all the IP address/domains mappings are transferred from one DNS server to another.

One might wonder how frequent it is to fall on such misconfigured DNS server. A study from Sergey Belov, at the beginning of 2011, had estimated that out of the Top 25000 from Alexa.com, about 2000 DNS servers were subject to DNS zone transfer.

One tool of the trade to start watching DNS entries is called “dig”. Once again, as for the whois command, there is a command line version available directly in Linux systems. You can also directly use some online services proposing this feature, like this one for example.

We will not go into details about what information the DNS queries provides, let us just say it is interesting enough for an attacker.

Netcraft.com

Netcraft.com is a famous Internet monitoring website which can provide information on subdomains, in case additional information could not be obtained using the DNS zone transfer. One of the most interesting features Netcraft provides is its online research tool that allows users to query its databases for host information. Moreover, it can be queried using wildcard researches.

Just from this single view, attackers can directly get a lot of useful information on subdomains, hosting, or even IT technologies used.

Here is a partial result on facebook.com for example:

fb-netcraft.png

E-Mail addresses gathering

E-mail addresses are one of the most interesting elements that attackers can gather. It enables an attacker to send spear phishing (which we will detail in our next blog post relating t APT attacks) and try to infect specific or generic users.

Most average size companies do use "easy to guess" e-mail addresses. As soon as the attacker found the “e-mail generation algorithm” for the company, the game is over: he can reach hundreds or thousands of people. The most common way used by companies to handle their employee’s e-mail addresses is to simply use firstname.lastname@company.com , or sometimes firstletteroffirstnamelastname@company.com.

Since some employees of the companyt will probably be on social networks, it gives easy-to-reach e-mail addresses within the company for the attacker.

theHarvester is a very powerful tool to gather subdomains, hosts, employee names, open ports and banners from different public sources such as search engines or PGP key servers. But it is particularly useful for e-mail addresses discovery.

Try it on your domain name, you might be surprised.

Online publications

Depending on the industry, companies usually publish white papers or studies on a regular basis. These papers have mostly four interests for the attackers:

• They often show parts of the work that is done in the company on some projects, providing the attackers with more information about their target’s activities;
• They sometimes provide names of people working on these activities;
• They often provide e-mail addresses;
• They are often designed using the company’s template for publications. This is useful because a skilled social engineering attack could use it to design fake documents which would look “normal” to the company’s employees. A good recommendation regarding these templates of documents would be to have one template for internal documents, and one template for public release, which would be very different or at least contain some patterns which could be recognized. This could also help the company to find documents which might have leaked on Internet. No document with the internal template should be available in public.

Social networks

Social networks do provide a lot of useful information about employees, but also about the company. While some data probably leaks on Facebook, attackers are more interested in professional social networks like LinkedIn.com, which is probably the website attackers use the most in the passive reconnaissance phase. This website allows anyone to anonymously hunt for employees of companies and see their profiles.

APT attackers do lurk on such social network to:
• Get a more precise idea of the company in general
• Get more information on the departments and their people
• Try to find sensitive project names
• Establish who works on what project
• Locate the “important” people in the company related to data theft/strategic information: CEO, CISO, experts, etc. but also their assistants. It might sound surprising at first, but a lot of people do reveal far too much about their professional activities on LinkedIn. There are many reasons for this. Some people just want to brag about cool stuff they do at work, others are maybe a bit selfish and want to say as much as possible on themselves. But the main reason for such data leak is probably that people who are specialized in a field need to show what they have worked on, so that recruiters and head hunters can knock at their door more often.

As an example, a LinkedIn research on combined words like “project manager missile” returns a fair amount of results. Choosing one people in these results, here is a part of his public profile:

--
xxxx Project
Missile launcher.
- Conception of a missile launcher.
- Functional specification and user manual writing.
- Test and validation of the missile launcher simulation.
--

Please note that these results have been anonymized. The name of the project has been replaced to “xxxx” and the description of the project has been altered to protect this person. Anyway, an attacker looking for information on missile launching technologies in this precise company will probably dig into this profile , try to collect more information about this person, and try to hack into his e-mail accounts etc. The blog post on the next phase will detail these operations.

This single research could have been restricted to one company of our choice. Seeing these results highlights the fact that finding an “interesting” person within a company is often very easy for an attacker.

At the time of this writing, LinkedIn.com provides a lot of free functionalities. An attacker can easily register on LinkedIn with a fake identity, send loads of invitations (to expand his 3rd level contacts), and start browsing through other people’s profiles anonymously.




ACTIVE RECONNAISSANCE The next phase after passive reconnaissance is active reconnaissance. Active reconnaissance involves more preparation from the attackers, because active reconnaissance leaves traces, which might trigger alerts on the target’s side or provide information about the attackers in the case of an investigation.

Anonymity

Active reconnaissance is generally the point where attackers start deploying their anonymization structure and tools. The most common methods to stay anonymous on Internet are:

• Use “public” Internet access. Open hotspot and free wifi access can be found in a lot of places. It is an easy way to become anonymous, yet it is often very limited: bad transfer rates, limitations to HTTP only etc.

• Use online services/resources. Some online services can help the attackers easily in their active reconnaissance phase. Hidemyass provides proxies or VPNs that the attackers can use; Anonymizer provides online privacy when browsing websites, etc.

• Compromise a third party server, and use it to “bounce” to the target. The idea is to use the compromised server as a proxy server to reach the target.

• Buy a third party server. This method has the disadvantage of leaving traces, both financial and technical. Attackers need to buy a dedicated server somewhere, to an ISP. The ISP will need a credit card number and an identity. Attackers could use stolen credit card numbers and identity, but it would only last for some days, until the ISP would be aware that the credit card number came from a theft. Moreover, the attacker’s IP address will be logged. The attacker would need to anonymize this operation as well. Buying a server this way is probably not a spread method in the APT world. This method is rather used by cybercriminals interested in building phishing websites, because they have plenty of credit card information and only need servers for few days.

• Use “bulletproof” services . A bulletproof ISP is an ISP providing different services for cybercriminals acknowledgely: possibility to send billions of spam e-mail, hosting criminal content (drugs offers, child pornography, malware c&c, etc.). The ISP guarantees that it will not provide any information to the Law Enforcement, and not reply to any abuse complaint. Bulletproof hosting companies generally allow cybercriminals to bypass the laws in his own country, as many of these bulletproof providers are based overseas. Many of these bulletproof providers are in China, Asia, Russia or Russia's surrounding countries.

These are the most used methods for all cybercriminals to protect their identity and stay safe from investigations. APT attackers are professionals. While there have been no information regarding the methods they use to anonymize themselves in the active reconnaissance phase, it is highly probable they use the same methods as during the attacks themselves. They mostly use compromised third parties located in the US or Asia. They also use bulletproof ISPs, and in some cases a part of the IP addresses ranges they use are probably bulletproof ranges sponsored by their governments.

Search engines / Google search operators

Search engines are used during the passive reconnaissance phase, as we have seen earlier, but only to collect information on third party websites. No browsing is done to the company’s website, or its subsidiaries websites. In active reconnaissance, browsing will be done to the target websites.

An interesting way to search for information on Internet is to use metasearch engines. Metasearch engines are search tools querying several search engines and returning results to the user.

Building a list of online metasearch engines able to query hundreds of other search engine can be a good idea, if you really don’t want to miss any information. Software like Copernic Agent can be useful too, since it is an installed software on the computer which queries hundreds of search engines and provides results which can be managed offline or exported.

Now Google is still the most used search engine, and it provides good results, if you know how to use it. “Google search operators” are specially very useful to get targeted information. These operators are keywords that enable the user to get data more accurately from the Google search engine.

Attackers can also find real vulnerabilities using special queries. Several online services provides lists of “dangerous” Google searches, which goes from leaking useless data to finding usernames, passwords, databases, or vulnerabilities to exploit. An example is the “Google Dorks” from Google Hacking DataBase (GHDB).

ghdb.png

Attackers might only use Google, or use several search engines, the result is often very positive for them. Search engines do provide information, sometimes too much, and that is what the attackers are looking for.

Web site copiers

Looking at the source code of web pages can be interesting for an attacker. It can reveal security holes, vulnerabilities, network structure problems, or even leak information.

Depending on the target’s website, there could be very few web pages, or several thousands, with links to 3rd parties hosting companies handling parts of the website… It can get complex quite fast, and browsing into this data “by hand” is close to impossible.

Attackers rather use website copiers. The most famous is called HTTrack, available for free, for Linux and Windows operating systems. HTTrack allows making “page by page” copies of complete websites, including pages, links, pictures, documents from the target website. This copy will be local and can be comfortably browsed offline. HTTrack can be configured to follow a given level of links, enabling to copy third party web pages linked to the target’s web pages.

The major problem for attackers using such tool to copy a whole web site is that it is very noisy on the network, therefore very easy to detect, trace and investigate. It is also often considered as a highly offensive action, triggering alerts on the customer side, which can lead to being banned very quickly.

The more data the attackers copy from the website, the more likely activity is discovered. One way to avoid being detected using such tools would be to make copies of the website using multiple proxies which would only collect few pages each, or through a long period of time.

Once attackers are in possession of such a copy, they can start looking for:

• Mistakes in the source code of the pages
• Vulnerable CMS (Content Management System) and other code vulnerabilities
• Data leak (passwords, databases accesses, IP addresses, forgotten/old pages etc.)
• E-Mail addresses / IP addresses / subdomains
• Job offers
• Project names / people
• …

There are many possibilities here. It is unfortunately very common for websites to be vulnerable because they don’t update their CMS very often, sometimes never.

APT attackers are rarely using web copiers to find vulnerabilities. They rather do collect e-mail addresses, job offers describing projects and/or software, and every information regarding ongoing projects in the company.

Vulnerability scanners

Vulnerability scanners are tools and ways to scan a remote host to find vulnerabilities. Most penetration testing books will approach scanning as a complete phase in the penetration testing life cycle.

The usual three phases to vulnerability scanning are determining which systems are alive, which ports are open on these systems, and which vulnerabilities would be there.

This is the moment most script-kiddies enjoy, because it gives them a great feeling of “lurking” on servers on Internet and it’s the pre-phase to attacking a host. They also enjoy it because some tools are doing scanning and exploiting at the same time. These are not APT attackers.

In fact, most APT attackers generally do launch very few vulnerability scanning, at least on their target’s networks. They might use vulnerability scanning to help compromise third parties servers, which they will only use as proxies for their attacks. There are exceptions though, like Shell Crew for example, who are compromising servers as a point of entry rather than using spear phishing.

Mail server information gathering

Mail servers are interesting for attackers. One common way of getting more information about the mail server is actually…to send it an e-mail. The goal here is to send an e-mail to any of the company’s e-mail address which will hopefully be rejected. This reject will provide the attacker with several useful information he might exploit later. The best way to do this is to send an e-mail which contains a legitimate .exe or .bat file. Usually, files of these types are rejected because they’re considered as a threat.

The answer from the server will at least contain its IP address/domain name and probably the server’s brand and its version. It might also send an automated message which indicates the reason of reject, and maybe the types of files accepted. It might also contain the name of the anti-virus and its version, which is very valuable information for an APT attacker.

Private e-mail addresses

Companies do usually have a password policy which forces the users to change their password once in a while. Passwords are generally used between one month and three months, and the user often cannot use an old password.

Usual online e-mail providers like Google Mail or Hotmail do not force such change. The users have one password, and most of them do not ever change it.

An attacker could find the personal e-mail address of an employee and get access on it by compromising his machine or phishing him and getting his password. Why? Because a lot of people usually use e-mail to send information from work to home and the other way round. A user might therefore post interesting information for the attacker. The attacker will also access all e-mails from the victim, and see all the relationships he can have in private with other employees, making it easier for the attacker to build a good spear phishing later on in the initial compromise phase of the APT.

Conclusion on the reconnaissance phase

To completely understand the reconnaissance phase, it is warmly advised to read penetration testing literature, which provides more details, tools description and techniques.

Several reconnaissance methods, techniques, and tools have been exposed here in this blog post. There are many more possibilities during this phase, but these were the most basic and comprehensive ones.

APT attackers definitely start their attacks with an effective reconnaissance phase, yet it is more about finding intelligence on the target than finding vulnerabilities. Finding out key people in a company (those working on cutting edge technologies) is what they are after. They also need to have a good idea of the company and how it is structured.

We have had many proofs of this “good target knowledge” during incident response operations on our customers. In most cases, the attackers are not only present in the main network of the company, but also in interesting sub-networks which store sensitive information.

You might be surprised, but APT attackers often choose the simplest way, in every step of their attacks. "If a simple thing works, why make it more complex?" could be their motto.

On our next blog post, we will dig into the initial compromise phase and see how the data collected on this reconnaissance phase can be used to target the victims and penetrate their systems.