Quantcast
Viewing all articles
Browse latest Browse all 2

DNS Covert Channel Attacks: Detecting Anomalies

Image may be NSFW.
Clik here to view.
DNS Covert Channel Attacks: Detecting Anomalies banner

DNS Covert Channel Attacks: Detecting Anomalies

Part 3 of 5

In part 1 of this series, I introduced the idea of detecting and blocking data being exfiltrated or infiltrated using the DNS protocol. Part 2 focused more on the characteristics of encoding and how packing the encoded information into a DNS query would be easily detected by a human who happened to notice the few nefarious packets in the passing flood of traffic. The problem is in how to automate detection so as to keep up with traffic flow. Our final comments in part 2 alluded to expanding the basis for anomaly detection.

I previously shared examples of long consonant strings being embedded in the base-64 encoded data. The examples I provided showed consonant strings that were 7 and 9 characters long. However, for effective detection with high confidence, we need to queue on more than just a single anomaly type. Fortunately, there are several string anomalies we can detect as well as other anomaly types. In this article, we’ll examine the multiple types of detectable anomalies before refining what we might consider as anomalous string lengths.

Character string anomalies

There are several anomalies that humans subconsciously perceive that can be detected using simple linear mathematical tests. These tests are:

  • Length of digit-only substrings
  • Length of consonant-only substrings
  • Length of vowel-only substrings
  • Length of special character-only substrings
  • Query or overall URL length
  • Label length

Why are the above particular tests significant and effective? The answer lies in two diverse areas. As previously discussed, the purpose of the DNS infrastructure is to allow humans to write address mnemonics in a human-readable form. Long strings of digits, consonants, and vowels are not human readable, nor are lengthy labels or queries. The second area of the answer deals with the encoding of data.

The perpetrator chooses an encoding standard which, once set, will for any large set of data eventually produce long substrings that are easily detected by relative simple algorithms. The DNS standard also includes dots to separate labels in the URL. In the DNS queries that are sent out into the network or the Internet, the dots have been replaced by length bytes. The length bytes tell how many characters occur in the label. If we ignore the dots – or length bytes – in the string searches, the true length of the substring becomes apparent and the degree of the anomaly is revealed.

Consonant and vowel string anomalies

The detection length of consonant and vowel strings can be approximately determined logically. The other lengths are more of a matter of preference. Most human-readable addresses, even with run-on words, will not have more than about 8 to 10 consonants or vowels in a row. This can be confused by the fact that w and y can appear as either consonants or vowels as in cow or the archaic word twyndyllyngs. Other vowelless words could be: hymn, rhythm, and pygmy, so it is probably best in most cases to consider y to be a vowel. The w is generally used as a consonant so we’ll choose those as our standard for designing a detector.

Some words in English have no vowels at all and some words have no consonants or contain long strings of one or the other character class. The English words aqueous and euouae have five and six vowels in a row. If the former were followed immediately by the latter with no delimiting character between, it could produce a label with 11 vowels in a row. Words such as wppwrmwste, though ancient, appear to have few vowels with long consonant strings.

The upshot of these examples is that an initial value for long strings should be selected within a reasonable range then adjusted to minimize false positives. By starting on the low side of reasonable and adjusting upward, you can arrive at a satisfactory value that balances between false positives and false negatives.

Digit, label, and query length anomalies

As for digit, label, and overall query length, experience is the best guide. Non-authoritative discussion boards indicate that 80 characters is a reasonable length for URLs. Searches turned up no recommendations for label length but experience indicates 27 characters is a reasonable upper value for what web developers will use. As for digits, let’s guess at 8 as an upper bound of substring length to accommodate information such as dates. Longer strings of digits quickly become less human-readable.

Special character string length anomalies

As mentioned earlier, included in the DNS query infrastructure are the characters “_” and “-“. While these may seem inconsequential, long strings of these characters would definitely not be human-readable. Encoding a common file as herein described we can create an example. Using the executable of Windows™ Explorer (not Internet Explorer), we find the longest string of these two special characters is 564 – definitely not a readable string in a URL. This might be hidden if the perpetrator tries to use extremely small DNS queries but if the query is small or all the characters in a particular label are special characters, you have two more detectable anomalies.

Using actual anomaly occurrence in encoded data

At this point you are probably wondering if any of these long strings do occur in actual encoded data. I happen to possess an encoding application and have some test results. We’ll get to that in the next article and begin to describe the capabilities needed to detect and aggregate those anomalies for high-confidence alerting.

Stay tuned and let me know if you have anything to add in the comments section.

Image: Fotolia.com, creaktivahost
Image may be NSFW.
Clik here to view.

Viewing all articles
Browse latest Browse all 2

Trending Articles