Tuesday, March 24, 2015

Math Lessons

While you might not think this really has anything to do with forensics or other security-related issues, the reality is that math is your friend. And my friend. And when you have to calculate the byte offset on a hard drive to locate the cluster where a particular file is located, you will really want to know a little about the basics of math. 

You may have guessed that the origination of this topic is all of the nonsense spreading around on social networking sites like Facebook. Based on the number of times variations on these math problems show up and the number of times I see wrong answers, it seems as though a large number of folks really could stand a brief math lesson and while I am neither a math instructor in real life, nor do I play one on TV, I am going to take this one on because it will make me feel better. 

The acronym to remember here, and it’s really quite simple, is PEMDAS. Make up whatever mnemonic you want to remember, what it really means is parentheses, exponents, multiplication, division, addition and subtraction. This is the officially approved order of operations. When you see a very long chain of mathematical operations, you might think that you should just work left to right and as a general rule, that’s not a bad instinct. However, in order to come up with a consistent and mathematically accurate answer, you should apply the order of operations first. Then you can move on to left to right. You will also find that it’s generally easier to do a simple replacement. Let’s illustrate with an equation I’ve been seeing recently on Facebook. 

7 + 7 / 7 + 7 * 7 - 7

For those of you unfamiliar with two of those symbols, the / is a division sign for cases where we don’t have the horizontal line with a dot above and below, as in a computer keyboard. The * is a multiplication symbol, which is commonly used in place of a X or an x because those might be confusing in algebraic equations. So, let’s apply the order of operations and then re-write the equation after substituting. 

7 + 1 + 49 - 7

7 divided by 7 is 1, so I swapped in a 1 for the division operation I did. 7 multiplied by 7 is 49 so I swapped that value in. That leaves us with the equation above. There are a couple of ways to do this at this point. I could certainly go left to right and add the first three numbers then subtract the last but you may have noticed that two of them cancel each other out. If I were to re-write the equation above as follows, it quickly becomes a lot easier. 

7 - 7 + 1 + 49

This leaves me with adding 1 to 49 resulting in 50. See how easy that was? Keep in mind that the order of operations is really important. I suppose I could get into the history of why someone determined that multiplication and division were more important than addition and subtraction but it would likely bore you to tears. It would take far more of my time to come up with something coherent than I feel like putting in at the moment, so let’s skip it and move on to series. Let’s say you see the following:

10  =  50

9  =  38

8  =  27

7  =  17

5  =  ?

There are two things you should notice right away. The difference between 50 and 38 is 12. 38 to 27 is 11. 27 to 17 is 10. So, the next in the series should be 8 because we were decreasing the right hand side by one less each time. Since the last difference was 10, the next difference will be 8. 17 - 8 is 9. This leads us to the next thing you really should notice. The value on the left skipped one. The value of 6 should be 8. I’m asking for the value of 5. Keep the series going. I decrease the difference on the right hand side by 1, meaning that as I decrease by one on the left, I will be decreasing by 8 on the right hand side. This means that 5 = 0. When you see a series like this, there is generally a trick. They have skipped a value out of the series. This doesn’t mean that you just assign the correct right hand value (the next one in the series) to the wrong left hand value. It means you apply the right hand series twice and assign that value to the left hand side. 

A little bit of math, folks, will take you a very long way. I hope this has been a little bit of help. I know it’s made me feel better to share it with you. 


Tuesday, July 29, 2014

More Fun With Python

Since I’m in the middle of trying to get a title completed on Python scripting for security professionals for Infinite Skills, I’ve been doing a lot of writing little scripts that do interesting things. So, in order to dump my head and also get something somewhat recent and potentially interesting up here, I thought I’d write up one of those scripts here. This could be a useful foundation for someone who wanted to do a little security testing using Python scripts. It is also useful for forensics professionals since you may want to write custom tools that are capable of parsing data in a way that makes sense to you rather than relying on tools that represent data in a way that made sense to someone else. Being able to parse simple data structures, for example, is a very useful skill for both network programming and also forensics programming. One case where there was a lot of parsing to do that came up both in terms of the video training I am doing but also in the next book I am writing is dealing with the information in the master boot record, including the partition table. 

So, let’s take a look at a program that I threw together to do some quick parsing of the partition table in a master boot record. 

#!/usr/bin/python3

#  (c) 2014, WasHere Consulting, Inc

import struct

 

f = open("mbr.dd", "rb")

 

mbr = bytearray()

try:

    mbr = f.read(512)

finally:

    f.close()

 

x = struct.unpack("<i", mbr[0x1B8:0x1BC])

print("Disk signature: ", x[0])

x = mbr[0x1BE]

if x == 0x80:

    print("Active flag: Active")

else:

    print("Active flag: Not active")

 

lbastart = struct.unpack("<i", mbr[0x1C6:0x1CA])

print("Partition Start (LBA): ", lbastart[0])

lbaend = struct.unpack("<i", mbr[0x1C9:0x1CD])

print("Partition End (LBA): ", lbaend[0])

For the purposes of this program, I have grabbed an image of the master boot record so I can get to the partition table. I did this by using the UNIX/Linux utility dd. You simply grab the first 512 byte block with dd if=/dev/sdb of=mbr.dd bs=512 count=1 and you end up with an image of the master boot record you can use various tools with. So, the program opens up the disk image called mbr.dd as a binary file then creates a byte array to store all of the bytes from that disk image into. Once I have a byte array, called mbr, I can start to pull the bytes out that I want as long as I know where the offsets are. 

Something to keep in mind, though, is that the master boot record is stored as little endian so if I have a file with an image or copy of that master boot record, all of the multi-byte values are going to be backwards. We need to use struct.unpack to get the bytes out and in the correct order. So, we tell struct.unpack that we have a little endian integer with the parameter <i and then we have to provide a range of bytes out of the byte array that struct.unpack should use to create that integer out of. The thing to keep in mind when you are providing a range is that the top end is not inclusive. For the disk signature, I am grabbing bytes 1B8, 1B9, 1BA, 1BB. Even though the last byte in the range indicated in the program is 1BC, we don’t get that last byte because it’s not included based on the Python syntax. 

Once we have the basics of pulling data out of the byte array, the rest is trivial. I can grab the single byte indicating whether a partition is active (bootable) or not and then compare that value with what I know about that flag. If it’s 0x80, I know the partition is active. If it’s not, then it’s not active so I can print out the results based on that byte. I can also get the starting logical block address and the ending logical block address by grabbing the bytes from my byte array and converting them into integers, again using the struct.unpack method. 

This is a simple technique that can then be applied to other binary data structures. Whether that data structure is the rest of the master boot record or if it’s the BIOS parameter block or the structures associated with a GUID Partition Table disk. 

 

Thursday, July 10, 2014

Finding Data In A Disk Image

Back to the basics now, especially since it seems as though there is a lot of emphasis on letting the tools do all the work for us that we forget how to do it ourselves when we are lacking the tools. This time we’re going to walk through the process of locating a deleted file on a hard drive. In this case, I’m going to be using a deleted file but, of course, this same process can be used for locating information for non-deleted files off a disk image as well. To save a little time, I’m going to be using some utilities from The Sleuth Kit, though we could also do the same thing by hand. I should also mention at this point that one of the reasons for writing this up is that I am working through a file systems chapter in my next book on Operating Systems Forensics, due to be published early next year by Syngress/Elsevier. It helps to understand the structure of the file system format so you can find data on the disk and know where to look for it.

The first thing I need to do is create a file I want to use on an NTFS partition. I’m going to be doing this from a Linux system, though the Sleuth Kit utilities work on other operating systems as well like Windows. Since I want to acquire a disk image, I am going to use Linux so I get dd. I am going to create the file with a word that isn’t likely to show up anywhere else on the file system in order to minimize the number of hits I get when I go looking. You can see the contents of the file below.

Screen Shot 2014 07 10 at 11 46 55 AM

I need to grab an image of the file and I may as well also capture a cryptographic hash at the same time since it’s just good practice. I’m going to use dd to capture the image. So, in my case, I’m using dd if=/dev/sdb1 of=ntfs.dd. I want to capture the whole disk so I am not setting a block size or setting any count. I want it to run until it runs out of disk to copy. Then I’m going to grab a cryptographic hash of the resulting image. When I’m all done, I get 2d59270e187217c3c222fc78851a1ebe91e3f8ec for my SHA1 hash on a disk image that is 150M in size. For the purposes of this exercise, I left the image small.

After deleting the file from the disk, I want to figure out where the data is. There are a couple of ways of accomplishing this using the Sleuth Kit tools. The first is to look it up by the file name. In reality, when you delete a file, it’s not gone from your drive or partition. As a result, there are still references to the file name sitting out there on my partition. One TSK tool I can use to find the file by the filename is ifind. ifind will search your file system for a reference to the filename or you can look up the metadata for a file based on the data unit you provide. In this example, I’m going to use ifind to look for the filename and then I’m going to use another TSK utility to pull the data out of the address on the disk that ifind has provided for me. You can see what that looks like below.


Screen Shot 2014 07 10 at 1 39 10 PM

That provides me with the contents of the file and you can compare the results with what we saw when I used cat to display the file. ifind looked in the image file ntfs.dd for a filename called wubble.txt and it returned a data block of 73, which I then used to pass into icat to get the contents of block 73. This assumes we know the name of the file or that the name of the file is still available to be searched on. What I may have is just a chunk of text that comes out of the file. I can use another utility to go looking for the file based on just a word search. Since I’m looking through a binary file with the image capture, I need to do something special with grep in order to figure out the offset where I can find the text I’m looking for. I am going to tell grep to give me a binary offset and look through the whole file, so I’ll be using grep -oba to search my image file for the word wubble. You can see the results below.


Screen Shot 2014 07 10 at 1 47 07 PM

It looks like we have several hits in the image. The number on the left hand side is the byte offset. Since I have that in bytes, I need to figure out what cluster that is going to be in so I can use the TSK tools, which are block or cluster-based. As a result, I need to do a little math but I need to know what my cluster size is first. Fortunately, I can use fsstat on my image and get my cluster size. Of course, I could also use a hex editor and do it the really old-fashioned away but this should be adequate for our purposes. If I divide 91551 by my cluster size of 4096, I end up with 22 and some change. That tells me that the data is going to be in cluster 22, so I can use the TSK tool blkcat to get the contents of block/cluster 22. You can see the results of that below.

Screen Shot 2014 07 10 at 1 51 11 PM

You can see a lot of funny looking characters. That’s because what we have here is an entry in the Master File Table so there is a lot of binary data and when you try to convert a byte that wasn’t meant to be a character into a character, you end up with values that don’t translate into something that looks right. You can, though, see the text of the file and that’s pretty common for an NTFS entry. Since it’s a very small file, the contents of the file was simply stored as an attribute of the file rather than taking up a data block somewhere else on the file system. You can also see the filename in the middle of all of that content. You can also see the value FILE0. This indicates that the drive was formatted with Windows XP or something newer. Since I used a Linux system to format it, the formatting utility just used conventions from a more recent version of Windows and NTFS.

We used a lot of TSK utilities to do this but we could just as easily stick with standard UNIX utilities to perform the work. Using the output from grep, we can see that we need to be 22 blocks into the file system in order to get the data we are looking for out. We can easily use dd to extract that information and then use xxd to view it in a hexadecimal dump. Using dd, we would set the block size to the block/cluster size of 4096 and then skip 22 blocks or clusters, grabbing only one. So, we could use dd if=ntfs.dd of=caught.dd bs=4096 count=1 skip=22 to grab a single block from the image file we have. Once we have the 4096 byte cluster, we can just use xxd caught.dd to view the results in a hexadecimal dump and see that we have the MFT entry for the file wubble.txt.

Tuesday, May 27, 2014

Network Byte Order

There are probably perfectly legitimate reasons for the world being this way but I don’t know what it is. In a pretty substantial chunk of the world, when we write numbers, we write them from left to right meaning the portion of the number with the largest value is on the left hand side and typically, we would write from left to right. What this means is that if I write the number 6785, what I mean is six thousand, seven hundred eighty-five. When we are talking about digital communications, however, everything is in the form of a byte. Rather than dealing with the all of the individual bits that a byte would normally be represented as, let’s shorthand it to hexadecimal. One hexadecimal digit pair is how we would represent a single byte. The reason for that is simple. Four bits gives me the values of 0-15 since 2^0 + 2^1 + 2^2 + 2^3 = 1 + 2 + 4 + 8 = 15 as the maximum value for a 4 bit number. Since a byte is 2 pairs of 4 bits and a single hexadecimal digit (values 0-F or 0-15) is 4 bits, 2 hexadecimal digits is a whole byte. Simple, right? 

Let’s move on to writing values, knowing that we are going to be talking about writing out bytes for now and we are going to represent them as hexadecimal. We are going to write out the word hello and it doesn’t much matter where we write this out because we can run into the same problem, no matter what we are doing. The title of this suggests we are talking about writing out to a network interface but we have the same problem on hard disks and in memory. No matter where we have to write bits and bytes, we have to decide how we are going to write it. When we write character values, we have to have a way of converting them to a number. As a result, we use a table lookup. The common table to lookup characters to get a numeric representation is the ASCII table. After doing the lookup, we get the following: 68 65 6C 6C 6F. Again, without getting into the bit level, we have to decide what order we are going to send these in. Do you send the h first or the o first and then follow with the rest of the characters?

Thinking about numbers where the result is more catastrophic if you get it wrong, let’s take a look at a 16-bit value. The value 1348 is 0x0544 in hexadecimal. This is two bytes. If I send the 44 followed by the 05, how does the receiving party interpret that. If I send the 05 before I send the 44, I am sending in big-endian form. The reason for that is that I am sending the most significant data first — the data that has the largest value or is the biggest. If I send the 44 first, I am sending in little-endian form. If the receiving end is used to doing things a different way, I could go from sending the value 1348 but on the receiving end getting 17413. This is a very big difference. The reason is that if I send 05 then 44, which is big-endian, but the other end assumes little-endian, it would view what I sent as 44 05. 

So, which is the right way? Neither, actually. But since little-endian systems need to talk to big-endian systems, there had to be some consensus. As a result, there are two ordering schemes. There is host-order, which is whatever order your particular system architecture uses (Intel uses little-endian, by the way) and then there is network order. Network byte order is a synonym for big-endian, since historically more hardware architectures used the big-endian form of storing data. Of course, these days, far more systems on the network use little-endian simply because of the ubiquity of systems with Intel processors. 

When you are storing data on your own system, it doesn’t much matter how it’s represented because the operating system has to take care of writing and reading so you get the real value at a programmatic layer. When you are trying to interface with values on disk at a raw level, as you might in the case of forensics, you have to be aware of multi-byte values and what architecture the data was written on. If you have a multi-byte value that was written from a little-endian system, you need to remember to reverse the order of the bytes. But only within that value. 

If you are talking to another system, something has to handle the translation from host to network form. Languages that are capable of talking to the network, generally have those functions available. As an example, we can see how the process works in Python, below. 

kilroy@opus:~$ python3

Python 3.3.3 (v3.3.3:c3896275c0f6, Nov 16 2013, 23:39:35) 

[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin

Type "help", "copyright", "credits" or "license" for more information.

>>> import socket

>>> socket.ntohl(45)

754974720

>>> socket.htonl(45)

754974720

>>> 

 

The socket class has a number of conversion functions including the two above. In the first example, I am converting from network byte order to a host long. In this case, that means I am converting to a 32-bit value that is little-endian. In the second example, I am converting from a little-endian number to a network long. Again, a long data type is 32 bytes in this case. As a result, you take the value of 45 in bits and then just turn all the bits around and re-calculate back to decimal. You can see the result we get is significantly larger than the value we have put in. 

 

 

Sunday, May 11, 2014

More Net Neutrality

Our friends at the Federal Candy Company, specifically in the person of Tom Wheeler are likely to release new guidance on a concept called Net Neutrality, sometimes called The Open Internet. The FCC’s current stance is that no traffic should be blocked unless it is illegal. However, that may well soon change. Not surprisingly, this has caused some amount of anguish on the part of Internet activists and anyone who has gotten used to the idea that their traffic flows freely (it doesn’t really, but more on that later) across the Internet. Considering that Tom Wheeler comes from a background of companies he is now responsible for regulating, it may not be terribly surprising that under his guidance, the FCC may soon back down from their previous stance that carriers should not discriminate regarding the type of traffic they carry. 

Why are we in this position? Well, in 2010, the FCC released the Open Internet Order, which is the current stance of the FCC. Make note, by the way, that the FCC, for what say it does have, only has say over Internet service providers in the United States. The rest of the world is free to act however they damn well please. Verizon took the FCC to court to challenge the Open Internet Order and earlier this year, a court indicated that the FCC couldn’t make a such a rule. As a result, the FCC was sent back to its room to redo its homework. It is about to turn in its homework, which is why there is such a ruckus. 

Why can’t the FCC make such rules and hold the Internet service providers in the United States to them? The problem, in part anyway, is that the FCC designated the Internet and the service providers responsible for it as an information service. The District of Columbia court of appeals ruled that by designating the Internet as an information service, it couldn’t make rules like the Open Internet Order. What’s the way out of this mess? Well, one of the ways out is to designate all Internet service providers as a common carrier. A former FCC commissioner, Michael Copps, has made that very suggestion. What is a common carrier? The telephone companies like Fairpoint, Verizon, AT&T and others are all common carriers. A common carrier is an entity that provides a service for the “public convenience and necessity” meaning that what you are getting is a utility that you rely on in your day to day life. Common carriers have certain obligations that fall under Title II of the Communications Act of 1934. At the moment, Internet service providers do not fall under Title II, though the FCC could easily designate them under Title II and life would be very different. 

One of the biggest concerns around the Net Neutrality discussion is the impact on consumers and average businesses. Why? Well, another way out of this kerfuffle is to codify what the ISPs want to be able to do and that is to charge what are being called premiums to companies to carry their traffic. We covered this previously. Let’s break this down to a simple example. Take a look at the diagram below. You can see a nice little neighborhood of Alice, Bob and me. 

Trafficflow

 

Let’s say that Alice and Bob have packages they want to exchange with one another. It would make some sense that when Bob has packages for Alice, she should come and get them. The same holds true if Alice has packages for Bob. She should let him know so he can come get them. We can assume something similar with me and Bob. This all makes sense and works out nicely. What happens if Alice suddenly has packages for me? When Bob comes to get his packages, Alice is throwing in packages for me into the mix, meaning that Bob now has to come and get packages for me. Maybe the same is true in the case of packages to Alice from me. Suddenly, Bob has become something of a pack mule shuffling packages between me and Alice. Bob has entered into these neighborly arrangements in good faith, assuming that he was getting something out of it. In this case, he gets to send packages to me and Alice and get packages in return. If suddenly, though, he is being asked to carry packages from me to Alice and vice versa, he gets nothing out of the deal. As a result, he may want to change his agreement with both me and Alice so we pay him to carry packages back and forth. Now it’s equitable. 

The same is true for Internet service providers. Let’s say that instead of the names Alice, Bob and Ric in those clouds, the names are YouTube (Google), Level 3 and Comcast. Picture me in the Comcast cloud, trying to get to YouTube. If Comcast doesn’t have a direct connection (peering arrangement) with YouTube (Google), it would need to carry that traffic across Level 3’s network. Level 3 has peering arrangements with both YouTube (Google) and Comcast because it makes sense for Level 3’s customers to have that peering arrangement, meaning that Level 3 expects to send roughly the same amount of packages to the others as it gets from them. This is an equitable deal. If it happens that suddenly Level 3 is receiving a lot of traffic between YouTube and Comcast without any benefit for itself, it may want to make a different arrangement with these other companies, shifting the relationship from one of peering to one of transit, meaning that the company, say Comcast, is now paying Level 3 to ship packages to other parts of the Internet on its behalf. This is not without lengthy precedent, including a highly charged and publicized case from nearly a decade ago involving Level 3 and Cogent. 

This all seems like good business practice, right? The problem we have is that the Telecommunications Act of 1996 made a lot of changes to the way the world of communications works and as a result, we have seen a lot of consolidation in the telecom space. Now we have companies like Comcast providing the vast majority of consumer broadband where at one point phone companies had a foot in the space as well. For the most part, phone companies have either pulled out or simply can’t compete when it comes to speed, though they sometimes have an advantage when it comes to reach. Why is this potentially troubling? Because Comcast sells Internet services and consumers are moving more and more to the Internet for their entertainment, which is Comcast’s biggest money maker. As its Internet customers begin moving away from entertainment services like cable television, Comcast will want to make up that money somewhere. What it may do is require that companies like Netflix, YouTube, VuDu, Hulu and so on pay for transit in order to get access to the eyeballs on the Comcast network.

What this means is that the biggest companies will end up winning because they will be the ones with the money to pay for access to the end user. One reason for this need to get access to the end user is because in many cases the end user is the product. YouTube (Google) makes money by selling ads to businesses that will be viewed by you, the end user. The same is true for several other companies. They make money by selling their users in some regard. Companies like Netflix offer low rates to you, the end user, because they may not be paying much for their Internet connection as compared to them having to pay surcharges to a number of Internet service providers just to  make sure their service is fast enough that end users will continue to stay with them.

Another risk is that Comcast, with its extensive reach into the desktop (end user) space could simply decide to choke a business off if it felt that there was too much competition coming from that new business. It would do this by slowing down the speed that packages from that business arrive at the end user, potentially making the service utterly unusable. 

Make no mistake. This is happening today in many different ways. You get the amount of bandwidth you pay for. If you can’t afford bandwidth for your business, particularly if it consumes a lot of data, you are going to be a little out of luck. Also, service providers like Comcast and Time Warner have a long history of crippling services. While their argument is commonly that the services are illegal, that’s not always the case. Certainly, Gnutella, LimeWire and various other peer to peer file sharing services often carried information that violated intellectual property rights, not all of the files shared fell into that category and yet all of it was either slowed substantially or outright blocked. The same is true for BitTorrent streams. Yes, there are files that are shared illegally but not all files being shared are illegal. Does a company like Comcast or Time Warner or Verizon have the right to block all traffic simply because they are concerned that some of it may be illegal? Whether they have the legal right or not, it is happening.

Is your head spinning yet? It is a very complicated issue, this whole Net Neutrality/Open Internet mess. We haven’t even touched on how all of this is handled in other countries. That’s a whole different ball of wax and one for another time. 

Wednesday, April 9, 2014

E-Mail Forensics

We all have to deal with bogus e-mail from time to time. Following on the last post referencing e-mail security, where you can rely on domain keys and the sender policy framework to ensure you are getting e-mail from the right source, this is all about tracking the bad guys through the network. Okay, maybe not that exciting. At a minimum, this will certainly help you determine whether e-mail can be trusted or not. While there are a lot of technologies that might eventually give us a world where we don’t have to worry about spam and phishing attempts from untrusted sources, the reality is that most businesses are not implementing DomainKeys Identified Mail (DKIM) or Sender Policy Framework (SPF). Until such a time as e-mail security becomes a priority around the world, we will continue to have to deal with e-mail being an open communication mechanism, meaning any system around the world can send to any mail transport agent (MTA). This means e-mail from untrusted sources, spam and various other unwanted garbage in our inbox. 

Since I have had my e-mail address for a lot of years and it happens to be a pretty popular one for jokesters to use when they don’t want to use their own. Personally, I often fill in foo@foo.com or even something like nunyer@beezwax.com if I’m asked for an e-mail address that doesn’t matter and they aren’t going to send a confirmation e-mail to. Because of these two factors, I get a lot of junk e-mail so I have a lot of fun messages to choose from. Picking one at random that is offering me a way to look as good as Martha Stewart who is more than 20 years older than me, I have a set of headers to play with. You can see them below and while you can see the entire chain of receipt headers, I have removed my e-mail address. No offense, but I don’t trust anyone. It’s something of an occupational hazard. 

Screen Shot 2014 04 09 at 2 55 49 PM

It would be nice to find where this message came from. While it can be challenging because there isn’t much in the way of actual verification done with e-mail systems, we can get close or at least find places we can dig a little further. The first place to start is the Received header at the bottom of the pile. This is the very first MTA that has touched this message. When we connect to a mail server, the protocol specifies that we indicate who we are. The mail server will track that information as well as the Internet Protocol (IP) address that the connection is received from. The first part of a message dialog, speaking Simple Mail Transport Protocol (SMTP) is as follows:

220 dallas ESMTP Postfix (Ubuntu)

EHLO blah.com

250-dallas

250-PIPELINING

250-SIZE 10240000

250-VRFY

250-ETRN

250-STARTTLS

250-ENHANCEDSTATUSCODES

250-8BITMIME

250 DSN

MAIL From:foo@foo.com

250 2.1.0 Ok

 

When I use EHLO, which is HELO for extended SMTP, I am just saying hi to the mail server and introducing myself. The mail server keeps track of who connects so they know what IP address I am really coming from, regardless of whether I tell the truth about the hostname I am. In the transaction above, you can see that I am telling the mail server that I am coming from blah.com, which is obviously untrue. Checking the mail server logs, I can see the IP address that the connection actually came in from. 

Apr  9 15:43:52 dallas postfix/smtpd[9931]: connect from unknown[10.0.0.13]

 

The example above is from my own internal network. In the case of the headers from the e-mail we are using as an example, we are looking for information about the address 173.0.145.21. The first thing I want to do is see whether the IP address has a hostname associated with it. We want to look up the PTR record in the domain name system (DNS). Best e-mail practice is to have the reverse DNS match the forward. So, if miho.ribefsfield.com resolves to 173.0.145.21 then 173.0.145.21 should resolve to miho.ribefsfield.com. We want to check to see whether that’s the case and whether either of them actually resolve to anything. 

kilroy@dallas /var/log $ host 173.0.145.21

;; connection timed out; no servers could be reached

 

Turns out we couldn’t find the name server that was supposed to own the IP block this address came out of. Since that’s the case, we can’t do a reverse DNS lookup on the IP. This doesn’t exactly bode well for verifying the source of this e-mail. Let’s see what the hostname that was offered up resolves to. 

kilroy@dallas /var/log $ host miho.ribefsfield.com

miho.ribefsfield.com has address 66.78.32.6

 

Well, 66.78.32.6 is an entirely different IP address altogether. At this point, we should probably check to see who owns the domain name. Skipping a lot of the preamble from the whois lookup, we get the following information from the regional Internet registries. 

Registrant Email: WEBMASTER@NIZMEDIAGROUP.NET

Registry Admin ID: 

Admin Name: WEB MASTER

Admin Organization: -

Admin Street: 37 TOWER LANE

Admin City: WILLISTON

Admin State/Province: VT

Admin Postal Code: 05495

Admin Country: US

Admin Phone: +1.88888888

 

The funny thing about this is that I’ve been digging around a lot in e-mail over the last few weeks for a variety of reasons. It seems like every spam message I look at ends up resolving to a domain that is registered to this physical address. The peculiar thing is that this address is a town over from where I am sitting as I write this. You can check any map server you like and you’ll find that it’s a blue house. Google Maps shows that there is a Saturn in the driveway. At some point, it may be entertaining to do a little drive by and see what else is going on but before that, let’s keep going with this e-mail. The hostname referred to in the message actually exists and it resolves to 66.78.32.6. We should check to see who actually owns that IP block and see if it matches anything that we have seen so far. When I run a whois on that IP address, I find that it is part of a pretty big block of addresses (63.78.0.0/18) belonging to Virtual Development Inc. 

OrgName:        Virtual Development INC

OrgId:          VDI

Address:        590 Bloomfield Ave 

Address:        Suite 317

City:           Bloomfield

StateProv:      NJ

PostalCode:     07003

Country:        US

RegDate:        1999-10-21

Updated:        2013-03-07

According to Manta, VDI is a two person shop but there doesn’t appear to be a Web site associated with it, in spite of the enormous block of IP addresses that is registered to it. So far, we have locations in Williston, VT and Bloomfield, NJ. Let’s see if we can add any additional locations to this little tangled Web we have uncovered. We can perform a traceroute to the original IP address from the e-mail headers. Below you will see the last part of the traceroute to the IP address. The suspense is likely killing you at this point. 

12  hurricane-ic-138359-sjo-bb1.c.telia.net (213.248.67.106)  103.452 ms  104.842 ms  167.425 ms

13  10ge1-4.core1.sjc1.he.net (72.52.92.117)  158.437 ms  170.045 ms  170.470 ms

14  evernet-hosting.gigabitethernet2-2.core1.sjc1.he.net (216.218.196.6)  161.826 ms  162.256 ms  162.262 ms

 

It dead ends at this point. The IP address can not be reached. Interesting that we have added in another player and not one that surprises me. As I said, I’ve been grubbing around through e-mail headers for the last couple of weeks and our good friends at Evernet Hosting are quite familiar to me. Either they are a breeding ground for spammers or else their systems are so badly secured that they are easily compromised. Either way, there is a pretty solid connection between Evernet Hosting in San Jose and a small house in Williston, VT since much of the spam messages where a domain is registered to the Williston address actually originates in San Jose, CA with an IP address somewhere behind Evernet Hosting. Doing a geographic lookup from the IP address reveals that the IP is located in San Jose, CA, just as the traceroute indicates. 

While we don’t have anything specific in terms of a name or even a clearcut company we could point to, we certainly have a lot of clues and from a legal standpoint, we have some places we could look further, as long as we had some legal support. We could check with Evernet Hosting or Virtual Development, Inc just as a starting point. While not immediately satisfying, it’s a step along the path. 

Monday, April 7, 2014

We Now Pause For This Commercial Advertisment

In the spirit of full disclosure, with apologies to RFP, this is as much about propaganda for an upcoming book about cloud computing done securely as it is about much of anything technical. I will say, however, in the process of writing the book, I learned a lot about really cool things that can be done with cloud computing providers. At the risk of giving away the contents of the book, let me pass along a few things that you might think about when it comes to moving your sensitive infrastructure off to become someone else’s problem. I often find someone else’s problems to be very good things and if you plan well, you can get a lot of benefit without a lot of risk. 

While the infrastructure for my own business domain is hosted with Microsoft, using their Office 365 plan, which has a lot of benefits, not the least of which is a subscription to the Office software for up to 5 computers. This also includes SkyDrive, now called OneDrive, so I can store documents with a storage provider while also being able to edit them through a Web interface. On top of that, I can access documents from wherever I am and share them with other people. All of the benefits of cloud storage that we all know and love so well. In addition to the storage, of course, I get Web hosting and e-mail. As with many other e-mail providers, Microsoft takes care of spam for you but they also provide organizations with settings where you can fine tune how they detect spam. You can see some of those settings below. 

Screen Shot 2014 04 07 at 7 42 01 PM

One thing they don’t handle, however, is the ability to support Domain Keys Identified Mail (DKIM) or the Sender Policy Framework (SPF). DKIM allows organizations to take ownership of e-mail messages. This uses header fields in the e-mail messages that associate a cryptographic key with a domain. If the right key isn’t in place, the message didn’t come from the right place. With SPF, a mail administrator can create a record in the domain name system (DNS) entries for the domain and if a mail transport agent (MTA) receives a message from a host that doesn’t match up with the domain SPF record, it’s likely spam. If it’s spam, the MTA can safely drop it or at least place it into a junk folder for the user to determine whether they really want to look at it or not. 

Microsoft’s settings don’t actually give me specific settings for either DKIM or SPF so I don’t have any control over whether they use it or not or what I might use for settings for either of those features. In the course of researching for the book, though, I did some investigation into Google’s offerings for businesses and discovered some interesting things. Again, you can read about this in more detail, but if you get Google Apps for Business, you will get some additional control over your e-mail settings. You can create a key that can be used for DKIM. You can see the settings, or at least a portion of the settings since it doesn’t render in the window correctly and is cut off on the right hand side, below. 

Screen Shot 2014 04 07 at 8 24 40 PM

 

 

Once I have the correct setting in my domain name server, recipients can verify that messages they have that appear to be from me are actually from me. Google will also use SPF to help protect recipients. The one thing I don’t get as well with Google that I had with Microsoft was fine grained settings over spam and how it’s filtered. 

In the process of writing the book, I put together a whole domain with Web site and e-mail just to walk through how it would work and also have a Web site related to the book when it was all over. The domain I created is cloudroy.com and it has additional information about the book. There is also a link back to this blog so now I have linked the two completely together.