# nmap -6 -O scanme.nmap.org OS details: Linux 2.6.32 - 3.0.0
Network inventory.
Tailoring of likely exploits.
Social engineering.
Threats to anonymity through fingerprinting.
Send probes designed to uncover differences in implementations. RFC ambiguities, implementation bugs, changing standards can all lead to different replies.
Error responses are often not fully specified or fully tested. (What do you do when you receive a TCP packet with both SYN and FIN set?)
Example: Send a TCP SYN packet with the TCP timestamp option (RFC 1323). If the remote host supports timestamps, it will include the option in its response, otherwise it won't.
As of 2011-10-18, Nmap has 3611 OS fingerprints (in a monster 66000-line database file). 711 (20%) are Linux fingerprints. 645 (18%) are Windows. 210 (6%) are Cisco devices.
A response fingerprint is compared against every fingerprint in the database, and the one with the lowest (weighted) number of mismatches is the winner. A simple pattern language allows other than exact matches. 100% matches are common.
We typically get about 1500 submissions every 6 months, and about half of those are usable. Most of the usuable ones result in a change to an existing entry, and a small fraction result in a new entry.Fingerprint Sony PlayStation 2 game console Class Sony | embedded || game console SEQ(SP=12-40%GCD=1-6%ISR=5F-8B%TI=I%CI=I%II=RI%SS=O%TS=U) OPS(O1=M5B4%O2=M578%O3=M280%O4=M5B4%O5=M218%O6=M109) WIN(W1=8000%W2=8000%W3=8000%W4=8000%W5=8000%W6=8000) ECN(R=Y%DF=Y%T=FA-104%TG=FF%W=8000%O=M5B4%CC=N%Q=) T1(R=Y%DF=Y%T=FA-104%TG=FF%S=O%A=S+%F=AS%RD=0%Q=) T2(R=N) T3(R=Y%DF=Y%T=FA-104%TG=FF%W=8000%S=O%A=S+%F=AS%O=M109%RD=0%Q=) T4(R=Y%DF=Y%T=FA-104%TG=FF%W=8000%S=A+%A=S%F=AR%O=%RD=0%Q=) T5(R=Y%DF=Y%T=FA-104%TG=FF%W=8000%S=A%A=S+%F=AR%O=%RD=0%Q=) T6(R=Y%DF=Y%T=FA-104%TG=FF%W=8000%S=A%A=S%F=AR%O=%RD=0%Q=) T7(R=Y%DF=Y%T=FA-104%TG=FF%W=8000%S=A%A=S+%F=AR%O=%RD=0%Q=) U1(DF=Y%T=FA-104%TG=FF%IPL=38%RIPL=G%RID=G%RIPCK=G%RUCK=G%RUD=G) IE(DFI=S%T=20-3A%TG=40%CD=S)
###[ IPv6 ]### version = 6L tc = 0L fl = 74565L plen = 40 nh = TCP hlim = 53 src = 2001:470:1f04:155e::2 dst = 2600:3c01::f03c:91ff:fe93:cd19 ###[ TCP ]### sport = 51801 dport = ssh seq = 3458888558 ack = 4263257805 dataofs = 10L reserved = 0L flags = FSPU window = 256 chksum = 0x8d5 urgptr = 0 options = [('WScale', 10), ('NOP', None), ('MSS', 265), ('Timestamp', (4294967295, 0)), ('SAckOK', '')]
###[ IPv6 ]### version = 6L tc = 0L fl = 0L plen = 40 nh = TCP hlim = 58 src = :: dst = :: ###[ TCP ]### sport = ssh dport = 48586 seq = 2710224669 ack = 530265079 dataofs = 10L reserved = 0L flags = SA window = 14080 chksum = 0xf2cf urgptr = 0 options = [('MSS', 1420), ('SAckOK', ''), ('Timestamp', (1351602082, 4294967295)), ('NOP', None), ('WScale', 7)]
###[ IPv6 ]### version = 6L tc = 0L fl = 0L plen = 24 nh = TCP hlim = 128 src = :: dst = :: ###[ TCP ]### sport = microsoft_ds dport = 45894 seq = 1705792087 ack = 2935989184 dataofs = 6L reserved = 0L flags = SA window = 16430 chksum = 0x34cf urgptr = 0 options = [('MSS', 1440)]
###[ IPv6 ]### version = 6L tc = 0L fl = 0L plen = 20 nh = TCP hlim = 64 src = :: dst = :: ###[ TCP ]### sport = ftp dport = 39343 seq = 0 ack = 2601789566 dataofs = 5L reserved = 0L flags = RA window = 0 chksum = 0xdf69 urgptr = 0 options = {}
TCP options alone provide an idea of the large variety of implementations.
[] ['MSS'] ['MSS', 'NOP', 'NOP', 'SAckOK', 'NOP', 'NOP', 'Timestamp'] ['MSS', 'NOP', 'NOP', 'SAckOK', 'NOP', 'WScale'] ['MSS', 'NOP', 'NOP', 'SAckOK', 'NOP', 'WScale', 'NOP', 'NOP', 'Timestamp'] ['MSS', 'NOP', 'WScale', 'NOP', 'NOP', 'SAckOK', 'NOP', 'NOP', 'Timestamp'] ['MSS', 'NOP', 'WScale', 'NOP', 'NOP', 'Timestamp', 'SAckOK', 'EOL'] ['MSS', 'NOP', 'WScale', 'NOP', 'NOP', 'Timestamp', 'SAckOK', 'NOP', 'NOP'] ['MSS', 'NOP', 'WScale', 'SAckOK', 'Timestamp'] ['MSS', 'SAckOK', 'Timestamp', 'NOP', 'WScale'] ['NOP', 'NOP', 'Timestamp', 'MSS', 'NOP', 'WScale', 'NOP', 'NOP', 'SAckOK'] ['SAckOK', 'Timestamp', 'MSS', 'NOP', 'WScale']
Through experience we know that an OS classification engine should be able to distinguish certain Linux micro-revisions, Windows service packs, major releases of OS X.
It is not usually possible to distinguish GNU/Linux distributions, but embedded Linux usually stands out, and iOS tends to differ from plain Mac OS X.
Sometimes distinguishing features disappear when scanning localhost (TCP initial windows).
Linux 2.4.21 Linux 2.6.11 - 2.6.15 Linux 2.6.18 - 2.6.30 Linux 2.6.23 - 2.6.33 (embedded) Linux 2.6.27 OpenWrt (Linux 2.6.32) Linux 2.6.32 - 2.6.35 Linux 2.6.32 - 3.0.0 Linux 2.6.32 - 3.0.0 Linux 2.6.32 - 3.0.0 Linux 2.6.35 Linux 2.6.35 - 2.6.39 (localhost) Vyatta Core 6.3 (Linux 2.6.37) Linux 2.6.38 - 2.6.39 Linux 2.6.39
Send probes, collect responses.
Convert into one-dimensional feature vectors.
Classify with LIBLINEAR on a model trained offline. (With -s 0: L2-regularized logistic regression (primal).)
Started with a test program and candidate list of about 150 probes.
From about 30 complete sets of responses, narrowed the list to 18 probes (not all are sent in every case).
We send up to 18 probes:
S1
–S6
IE1
IE2
NI
NS
U1
TECN
T2
–T7
The IE2 probe uses a highly bogus set of extension headers.
hop-by-hop destination options routing options hop-by-hop
What’s interesting is that while all OSes reject this with an ICMPv6 Parameter Problem message, they differ on what exactly is wrong with it.
ICMP ParamPointer: (0, 48, 56, 64) ICMP Param Code: (1, 2) # unrecognized Next Header, unrecognized IPv6 option
We use 661 features so far. Sample fingerprint and feature vector.
For each response:
For each TCP response:
IPv6 extension headers (types and lengths as with TCP_OPT_*). Some extension headers may have contents amenable to fingerprinting.
IP flow label sequence generation algorithm. (May be zero, random, sequential, equal to probe flow label.)
Guessed original hoplimit. (Can often determine hop distance though ICMPv6 calculation.)
TCP timestamp rate, GCD of initial sequence numbers, SEQ/ACK numbers.
ICMPv6 type, code, parameter problem pointer. ICMPv6 checksum (good, bad, zero?).
UDP features.
Could use fragment offsets, but too costly to induce fragmentation.
Run against its own training set.
Accuracy 141/153 92.16% Details
Leave-one-out cross validation. But 32/153 = 21% of examples are the lone members of their class, so the best that could be done is 121/153 = 79%.
90/153 = 58.82%
$ sudo ./nmap -6 -O -F ipv6.google.com www.debian.org www.freebsd.org Nmap scan report for ipv6.google.com (2001:4860:4001:803::1011) OS details: Linux 2.6.18 - 2.6.30 Nmap scan report for www.debian.org (2607:f8f0:610:4000:211:25ff:fec4:5b28) OS details: Linux 2.4.21 Network Distance: 10 hops Nmap scan report for www.freebsd.org (2001:4f8:fff6::22) OS details: Microsoft Windows 7 SP1
Hard to do raw IPv6 sockets portably. Not possible to set the flow label on some platforms.
Low use of IPv6 means few signature submissions. The current database is based on 153 submissions.
Similar OSes are separated into different classes. Not classifying identical fingerprints into different classes is important.
Novelty detection. If you have a 0.9999 match, is it a really good match, or something completely different than the classifier has seen before? We handle this by checking for matches with nearly the same score.
Fingerprints for the “same” OS can differ, just because a closed port was missing, for example, or because of network filtering. Ideally we would use some kind of maximum likelihood estimation to “guess” the missing feature vector elements. But would this require the entire training set to be available at classification time?
Everything has an IP stack. A staggering number of unique devices that defy categorization.
Fingerprint 3M Filtrete 3M-50 thermostat Fingerprint RGB Spectrum MediaWall 1500 video processor Fingerprint RSA SecurID authentication appliance Fingerprint WIZnet W3150A TCP/IP chip
An equally staggering number of apparently different devices that are actually the same OS underneath. (Embedded Linux is easy though.)
# VxWorks? -- Ed. Fingerprint Aastra 57i IP phone; Nortel 4548GT switch; or Toshiba e-STUDIO 281c, 351c, 3510c, 520, or 850 printer # VxWorks? --Ed. Fingerprint WAP (Cisco Aironet 1010, D-Link DWL-2100AP or DWL-3200AP, Linksys WAP51AB or WAP55AG, Netgear WPN824, or Proxim ORiNOCO AP-4000M), Lights-Out remote server management, or ReplayTV 5500 DVR # VxWorks? --Ed. Fingerprint Dell 3115cn printer, Enterasys switch, HP Integrity iLO 2 remote management interface, Mitel 3300 PBX controller, or Nortel 5520 switch
Windows XP SP3 uses separate TCP implementations for IPv4 and IPv6! Different windows and options. Windows 7 uses the same implementation for both.
Mac OS X apparently misimplements the reply to RFC 4620 Node Information query for IPv4 addresses.
###[ IPv6 ]### version = 6L tc = 0L fl = 0L plen = 44 nh = ICMPv6 hlim = 64 src = :: dst = :: ###[ ICMPv6 Node Information Reply - IPv4 addresses ]### type = ICMP Node Information Response code = Successful Reply cksum = 0xf220 qtype = IPv4 Address unused = 0L flags = nonce = '\x01\x02\x03\x04\x05\x06\x07\n' data = [ (0, 23.68.97.114), (1768893763, 105.99.99.97), (1919905381, 115.45.105.80) ] ###[ Raw ]### load = 'hone'
0000000: 6000 0000 002c 3a40 0000 0000 0000 0000 `....,:@........ 0000010: 0000 0000 0000 0000 0000 0000 0000 0000 .#....+Q........ 0000020: 0000 0000 0000 0000 8c00 f220 0004 0000 ..`..^c.... .... 0000030: 0102 0304 0506 070a 0000 0000 1744 6172 .............Dar 0000040: 696f 2d43 6963 6361 726f 6e65 732d 6950 io-Ciccarones-iP 0000050: 686f 6e65 hone