Tuesday, July 29, 2014

More Fun With Python

Since I’m in the middle of trying to get a title completed on Python scripting for security professionals for Infinite Skills, I’ve been doing a lot of writing little scripts that do interesting things. So, in order to dump my head and also get something somewhat recent and potentially interesting up here, I thought I’d write up one of those scripts here. This could be a useful foundation for someone who wanted to do a little security testing using Python scripts. It is also useful for forensics professionals since you may want to write custom tools that are capable of parsing data in a way that makes sense to you rather than relying on tools that represent data in a way that made sense to someone else. Being able to parse simple data structures, for example, is a very useful skill for both network programming and also forensics programming. One case where there was a lot of parsing to do that came up both in terms of the video training I am doing but also in the next book I am writing is dealing with the information in the master boot record, including the partition table. 

So, let’s take a look at a program that I threw together to do some quick parsing of the partition table in a master boot record. 

#!/usr/bin/python3

#  (c) 2014, WasHere Consulting, Inc

import struct

 

f = open("mbr.dd", "rb")

 

mbr = bytearray()

try:

    mbr = f.read(512)

finally:

    f.close()

 

x = struct.unpack("<i", mbr[0x1B8:0x1BC])

print("Disk signature: ", x[0])

x = mbr[0x1BE]

if x == 0x80:

    print("Active flag: Active")

else:

    print("Active flag: Not active")

 

lbastart = struct.unpack("<i", mbr[0x1C6:0x1CA])

print("Partition Start (LBA): ", lbastart[0])

lbaend = struct.unpack("<i", mbr[0x1C9:0x1CD])

print("Partition End (LBA): ", lbaend[0])

For the purposes of this program, I have grabbed an image of the master boot record so I can get to the partition table. I did this by using the UNIX/Linux utility dd. You simply grab the first 512 byte block with dd if=/dev/sdb of=mbr.dd bs=512 count=1 and you end up with an image of the master boot record you can use various tools with. So, the program opens up the disk image called mbr.dd as a binary file then creates a byte array to store all of the bytes from that disk image into. Once I have a byte array, called mbr, I can start to pull the bytes out that I want as long as I know where the offsets are. 

Something to keep in mind, though, is that the master boot record is stored as little endian so if I have a file with an image or copy of that master boot record, all of the multi-byte values are going to be backwards. We need to use struct.unpack to get the bytes out and in the correct order. So, we tell struct.unpack that we have a little endian integer with the parameter <i and then we have to provide a range of bytes out of the byte array that struct.unpack should use to create that integer out of. The thing to keep in mind when you are providing a range is that the top end is not inclusive. For the disk signature, I am grabbing bytes 1B8, 1B9, 1BA, 1BB. Even though the last byte in the range indicated in the program is 1BC, we don’t get that last byte because it’s not included based on the Python syntax. 

Once we have the basics of pulling data out of the byte array, the rest is trivial. I can grab the single byte indicating whether a partition is active (bootable) or not and then compare that value with what I know about that flag. If it’s 0x80, I know the partition is active. If it’s not, then it’s not active so I can print out the results based on that byte. I can also get the starting logical block address and the ending logical block address by grabbing the bytes from my byte array and converting them into integers, again using the struct.unpack method. 

This is a simple technique that can then be applied to other binary data structures. Whether that data structure is the rest of the master boot record or if it’s the BIOS parameter block or the structures associated with a GUID Partition Table disk. 

 

Thursday, July 10, 2014

Finding Data In A Disk Image

Back to the basics now, especially since it seems as though there is a lot of emphasis on letting the tools do all the work for us that we forget how to do it ourselves when we are lacking the tools. This time we’re going to walk through the process of locating a deleted file on a hard drive. In this case, I’m going to be using a deleted file but, of course, this same process can be used for locating information for non-deleted files off a disk image as well. To save a little time, I’m going to be using some utilities from The Sleuth Kit, though we could also do the same thing by hand. I should also mention at this point that one of the reasons for writing this up is that I am working through a file systems chapter in my next book on Operating Systems Forensics, due to be published early next year by Syngress/Elsevier. It helps to understand the structure of the file system format so you can find data on the disk and know where to look for it.

The first thing I need to do is create a file I want to use on an NTFS partition. I’m going to be doing this from a Linux system, though the Sleuth Kit utilities work on other operating systems as well like Windows. Since I want to acquire a disk image, I am going to use Linux so I get dd. I am going to create the file with a word that isn’t likely to show up anywhere else on the file system in order to minimize the number of hits I get when I go looking. You can see the contents of the file below.

Screen Shot 2014 07 10 at 11 46 55 AM

I need to grab an image of the file and I may as well also capture a cryptographic hash at the same time since it’s just good practice. I’m going to use dd to capture the image. So, in my case, I’m using dd if=/dev/sdb1 of=ntfs.dd. I want to capture the whole disk so I am not setting a block size or setting any count. I want it to run until it runs out of disk to copy. Then I’m going to grab a cryptographic hash of the resulting image. When I’m all done, I get 2d59270e187217c3c222fc78851a1ebe91e3f8ec for my SHA1 hash on a disk image that is 150M in size. For the purposes of this exercise, I left the image small.

After deleting the file from the disk, I want to figure out where the data is. There are a couple of ways of accomplishing this using the Sleuth Kit tools. The first is to look it up by the file name. In reality, when you delete a file, it’s not gone from your drive or partition. As a result, there are still references to the file name sitting out there on my partition. One TSK tool I can use to find the file by the filename is ifind. ifind will search your file system for a reference to the filename or you can look up the metadata for a file based on the data unit you provide. In this example, I’m going to use ifind to look for the filename and then I’m going to use another TSK utility to pull the data out of the address on the disk that ifind has provided for me. You can see what that looks like below.


Screen Shot 2014 07 10 at 1 39 10 PM

That provides me with the contents of the file and you can compare the results with what we saw when I used cat to display the file. ifind looked in the image file ntfs.dd for a filename called wubble.txt and it returned a data block of 73, which I then used to pass into icat to get the contents of block 73. This assumes we know the name of the file or that the name of the file is still available to be searched on. What I may have is just a chunk of text that comes out of the file. I can use another utility to go looking for the file based on just a word search. Since I’m looking through a binary file with the image capture, I need to do something special with grep in order to figure out the offset where I can find the text I’m looking for. I am going to tell grep to give me a binary offset and look through the whole file, so I’ll be using grep -oba to search my image file for the word wubble. You can see the results below.


Screen Shot 2014 07 10 at 1 47 07 PM

It looks like we have several hits in the image. The number on the left hand side is the byte offset. Since I have that in bytes, I need to figure out what cluster that is going to be in so I can use the TSK tools, which are block or cluster-based. As a result, I need to do a little math but I need to know what my cluster size is first. Fortunately, I can use fsstat on my image and get my cluster size. Of course, I could also use a hex editor and do it the really old-fashioned away but this should be adequate for our purposes. If I divide 91551 by my cluster size of 4096, I end up with 22 and some change. That tells me that the data is going to be in cluster 22, so I can use the TSK tool blkcat to get the contents of block/cluster 22. You can see the results of that below.

Screen Shot 2014 07 10 at 1 51 11 PM

You can see a lot of funny looking characters. That’s because what we have here is an entry in the Master File Table so there is a lot of binary data and when you try to convert a byte that wasn’t meant to be a character into a character, you end up with values that don’t translate into something that looks right. You can, though, see the text of the file and that’s pretty common for an NTFS entry. Since it’s a very small file, the contents of the file was simply stored as an attribute of the file rather than taking up a data block somewhere else on the file system. You can also see the filename in the middle of all of that content. You can also see the value FILE0. This indicates that the drive was formatted with Windows XP or something newer. Since I used a Linux system to format it, the formatting utility just used conventions from a more recent version of Windows and NTFS.

We used a lot of TSK utilities to do this but we could just as easily stick with standard UNIX utilities to perform the work. Using the output from grep, we can see that we need to be 22 blocks into the file system in order to get the data we are looking for out. We can easily use dd to extract that information and then use xxd to view it in a hexadecimal dump. Using dd, we would set the block size to the block/cluster size of 4096 and then skip 22 blocks or clusters, grabbing only one. So, we could use dd if=ntfs.dd of=caught.dd bs=4096 count=1 skip=22 to grab a single block from the image file we have. Once we have the 4096 byte cluster, we can just use xxd caught.dd to view the results in a hexadecimal dump and see that we have the MFT entry for the file wubble.txt.

Tuesday, May 27, 2014

Network Byte Order

There are probably perfectly legitimate reasons for the world being this way but I don’t know what it is. In a pretty substantial chunk of the world, when we write numbers, we write them from left to right meaning the portion of the number with the largest value is on the left hand side and typically, we would write from left to right. What this means is that if I write the number 6785, what I mean is six thousand, seven hundred eighty-five. When we are talking about digital communications, however, everything is in the form of a byte. Rather than dealing with the all of the individual bits that a byte would normally be represented as, let’s shorthand it to hexadecimal. One hexadecimal digit pair is how we would represent a single byte. The reason for that is simple. Four bits gives me the values of 0-15 since 2^0 + 2^1 + 2^2 + 2^3 = 1 + 2 + 4 + 8 = 15 as the maximum value for a 4 bit number. Since a byte is 2 pairs of 4 bits and a single hexadecimal digit (values 0-F or 0-15) is 4 bits, 2 hexadecimal digits is a whole byte. Simple, right? 

Let’s move on to writing values, knowing that we are going to be talking about writing out bytes for now and we are going to represent them as hexadecimal. We are going to write out the word hello and it doesn’t much matter where we write this out because we can run into the same problem, no matter what we are doing. The title of this suggests we are talking about writing out to a network interface but we have the same problem on hard disks and in memory. No matter where we have to write bits and bytes, we have to decide how we are going to write it. When we write character values, we have to have a way of converting them to a number. As a result, we use a table lookup. The common table to lookup characters to get a numeric representation is the ASCII table. After doing the lookup, we get the following: 68 65 6C 6C 6F. Again, without getting into the bit level, we have to decide what order we are going to send these in. Do you send the h first or the o first and then follow with the rest of the characters?

Thinking about numbers where the result is more catastrophic if you get it wrong, let’s take a look at a 16-bit value. The value 1348 is 0x0544 in hexadecimal. This is two bytes. If I send the 44 followed by the 05, how does the receiving party interpret that. If I send the 05 before I send the 44, I am sending in big-endian form. The reason for that is that I am sending the most significant data first — the data that has the largest value or is the biggest. If I send the 44 first, I am sending in little-endian form. If the receiving end is used to doing things a different way, I could go from sending the value 1348 but on the receiving end getting 17413. This is a very big difference. The reason is that if I send 05 then 44, which is big-endian, but the other end assumes little-endian, it would view what I sent as 44 05. 

So, which is the right way? Neither, actually. But since little-endian systems need to talk to big-endian systems, there had to be some consensus. As a result, there are two ordering schemes. There is host-order, which is whatever order your particular system architecture uses (Intel uses little-endian, by the way) and then there is network order. Network byte order is a synonym for big-endian, since historically more hardware architectures used the big-endian form of storing data. Of course, these days, far more systems on the network use little-endian simply because of the ubiquity of systems with Intel processors. 

When you are storing data on your own system, it doesn’t much matter how it’s represented because the operating system has to take care of writing and reading so you get the real value at a programmatic layer. When you are trying to interface with values on disk at a raw level, as you might in the case of forensics, you have to be aware of multi-byte values and what architecture the data was written on. If you have a multi-byte value that was written from a little-endian system, you need to remember to reverse the order of the bytes. But only within that value. 

If you are talking to another system, something has to handle the translation from host to network form. Languages that are capable of talking to the network, generally have those functions available. As an example, we can see how the process works in Python, below. 

kilroy@opus:~$ python3

Python 3.3.3 (v3.3.3:c3896275c0f6, Nov 16 2013, 23:39:35) 

[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin

Type "help", "copyright", "credits" or "license" for more information.

>>> import socket

>>> socket.ntohl(45)

754974720

>>> socket.htonl(45)

754974720

>>> 

 

The socket class has a number of conversion functions including the two above. In the first example, I am converting from network byte order to a host long. In this case, that means I am converting to a 32-bit value that is little-endian. In the second example, I am converting from a little-endian number to a network long. Again, a long data type is 32 bytes in this case. As a result, you take the value of 45 in bits and then just turn all the bits around and re-calculate back to decimal. You can see the result we get is significantly larger than the value we have put in. 

 

 

Sunday, May 11, 2014

More Net Neutrality

Our friends at the Federal Candy Company, specifically in the person of Tom Wheeler are likely to release new guidance on a concept called Net Neutrality, sometimes called The Open Internet. The FCC’s current stance is that no traffic should be blocked unless it is illegal. However, that may well soon change. Not surprisingly, this has caused some amount of anguish on the part of Internet activists and anyone who has gotten used to the idea that their traffic flows freely (it doesn’t really, but more on that later) across the Internet. Considering that Tom Wheeler comes from a background of companies he is now responsible for regulating, it may not be terribly surprising that under his guidance, the FCC may soon back down from their previous stance that carriers should not discriminate regarding the type of traffic they carry. 

Why are we in this position? Well, in 2010, the FCC released the Open Internet Order, which is the current stance of the FCC. Make note, by the way, that the FCC, for what say it does have, only has say over Internet service providers in the United States. The rest of the world is free to act however they damn well please. Verizon took the FCC to court to challenge the Open Internet Order and earlier this year, a court indicated that the FCC couldn’t make a such a rule. As a result, the FCC was sent back to its room to redo its homework. It is about to turn in its homework, which is why there is such a ruckus. 

Why can’t the FCC make such rules and hold the Internet service providers in the United States to them? The problem, in part anyway, is that the FCC designated the Internet and the service providers responsible for it as an information service. The District of Columbia court of appeals ruled that by designating the Internet as an information service, it couldn’t make rules like the Open Internet Order. What’s the way out of this mess? Well, one of the ways out is to designate all Internet service providers as a common carrier. A former FCC commissioner, Michael Copps, has made that very suggestion. What is a common carrier? The telephone companies like Fairpoint, Verizon, AT&T and others are all common carriers. A common carrier is an entity that provides a service for the “public convenience and necessity” meaning that what you are getting is a utility that you rely on in your day to day life. Common carriers have certain obligations that fall under Title II of the Communications Act of 1934. At the moment, Internet service providers do not fall under Title II, though the FCC could easily designate them under Title II and life would be very different. 

One of the biggest concerns around the Net Neutrality discussion is the impact on consumers and average businesses. Why? Well, another way out of this kerfuffle is to codify what the ISPs want to be able to do and that is to charge what are being called premiums to companies to carry their traffic. We covered this previously. Let’s break this down to a simple example. Take a look at the diagram below. You can see a nice little neighborhood of Alice, Bob and me. 

Trafficflow

 

Let’s say that Alice and Bob have packages they want to exchange with one another. It would make some sense that when Bob has packages for Alice, she should come and get them. The same holds true if Alice has packages for Bob. She should let him know so he can come get them. We can assume something similar with me and Bob. This all makes sense and works out nicely. What happens if Alice suddenly has packages for me? When Bob comes to get his packages, Alice is throwing in packages for me into the mix, meaning that Bob now has to come and get packages for me. Maybe the same is true in the case of packages to Alice from me. Suddenly, Bob has become something of a pack mule shuffling packages between me and Alice. Bob has entered into these neighborly arrangements in good faith, assuming that he was getting something out of it. In this case, he gets to send packages to me and Alice and get packages in return. If suddenly, though, he is being asked to carry packages from me to Alice and vice versa, he gets nothing out of the deal. As a result, he may want to change his agreement with both me and Alice so we pay him to carry packages back and forth. Now it’s equitable. 

The same is true for Internet service providers. Let’s say that instead of the names Alice, Bob and Ric in those clouds, the names are YouTube (Google), Level 3 and Comcast. Picture me in the Comcast cloud, trying to get to YouTube. If Comcast doesn’t have a direct connection (peering arrangement) with YouTube (Google), it would need to carry that traffic across Level 3’s network. Level 3 has peering arrangements with both YouTube (Google) and Comcast because it makes sense for Level 3’s customers to have that peering arrangement, meaning that Level 3 expects to send roughly the same amount of packages to the others as it gets from them. This is an equitable deal. If it happens that suddenly Level 3 is receiving a lot of traffic between YouTube and Comcast without any benefit for itself, it may want to make a different arrangement with these other companies, shifting the relationship from one of peering to one of transit, meaning that the company, say Comcast, is now paying Level 3 to ship packages to other parts of the Internet on its behalf. This is not without lengthy precedent, including a highly charged and publicized case from nearly a decade ago involving Level 3 and Cogent. 

This all seems like good business practice, right? The problem we have is that the Telecommunications Act of 1996 made a lot of changes to the way the world of communications works and as a result, we have seen a lot of consolidation in the telecom space. Now we have companies like Comcast providing the vast majority of consumer broadband where at one point phone companies had a foot in the space as well. For the most part, phone companies have either pulled out or simply can’t compete when it comes to speed, though they sometimes have an advantage when it comes to reach. Why is this potentially troubling? Because Comcast sells Internet services and consumers are moving more and more to the Internet for their entertainment, which is Comcast’s biggest money maker. As its Internet customers begin moving away from entertainment services like cable television, Comcast will want to make up that money somewhere. What it may do is require that companies like Netflix, YouTube, VuDu, Hulu and so on pay for transit in order to get access to the eyeballs on the Comcast network.

What this means is that the biggest companies will end up winning because they will be the ones with the money to pay for access to the end user. One reason for this need to get access to the end user is because in many cases the end user is the product. YouTube (Google) makes money by selling ads to businesses that will be viewed by you, the end user. The same is true for several other companies. They make money by selling their users in some regard. Companies like Netflix offer low rates to you, the end user, because they may not be paying much for their Internet connection as compared to them having to pay surcharges to a number of Internet service providers just to  make sure their service is fast enough that end users will continue to stay with them.

Another risk is that Comcast, with its extensive reach into the desktop (end user) space could simply decide to choke a business off if it felt that there was too much competition coming from that new business. It would do this by slowing down the speed that packages from that business arrive at the end user, potentially making the service utterly unusable. 

Make no mistake. This is happening today in many different ways. You get the amount of bandwidth you pay for. If you can’t afford bandwidth for your business, particularly if it consumes a lot of data, you are going to be a little out of luck. Also, service providers like Comcast and Time Warner have a long history of crippling services. While their argument is commonly that the services are illegal, that’s not always the case. Certainly, Gnutella, LimeWire and various other peer to peer file sharing services often carried information that violated intellectual property rights, not all of the files shared fell into that category and yet all of it was either slowed substantially or outright blocked. The same is true for BitTorrent streams. Yes, there are files that are shared illegally but not all files being shared are illegal. Does a company like Comcast or Time Warner or Verizon have the right to block all traffic simply because they are concerned that some of it may be illegal? Whether they have the legal right or not, it is happening.

Is your head spinning yet? It is a very complicated issue, this whole Net Neutrality/Open Internet mess. We haven’t even touched on how all of this is handled in other countries. That’s a whole different ball of wax and one for another time. 

Wednesday, April 9, 2014

E-Mail Forensics

We all have to deal with bogus e-mail from time to time. Following on the last post referencing e-mail security, where you can rely on domain keys and the sender policy framework to ensure you are getting e-mail from the right source, this is all about tracking the bad guys through the network. Okay, maybe not that exciting. At a minimum, this will certainly help you determine whether e-mail can be trusted or not. While there are a lot of technologies that might eventually give us a world where we don’t have to worry about spam and phishing attempts from untrusted sources, the reality is that most businesses are not implementing DomainKeys Identified Mail (DKIM) or Sender Policy Framework (SPF). Until such a time as e-mail security becomes a priority around the world, we will continue to have to deal with e-mail being an open communication mechanism, meaning any system around the world can send to any mail transport agent (MTA). This means e-mail from untrusted sources, spam and various other unwanted garbage in our inbox. 

Since I have had my e-mail address for a lot of years and it happens to be a pretty popular one for jokesters to use when they don’t want to use their own. Personally, I often fill in foo@foo.com or even something like nunyer@beezwax.com if I’m asked for an e-mail address that doesn’t matter and they aren’t going to send a confirmation e-mail to. Because of these two factors, I get a lot of junk e-mail so I have a lot of fun messages to choose from. Picking one at random that is offering me a way to look as good as Martha Stewart who is more than 20 years older than me, I have a set of headers to play with. You can see them below and while you can see the entire chain of receipt headers, I have removed my e-mail address. No offense, but I don’t trust anyone. It’s something of an occupational hazard. 

Screen Shot 2014 04 09 at 2 55 49 PM

It would be nice to find where this message came from. While it can be challenging because there isn’t much in the way of actual verification done with e-mail systems, we can get close or at least find places we can dig a little further. The first place to start is the Received header at the bottom of the pile. This is the very first MTA that has touched this message. When we connect to a mail server, the protocol specifies that we indicate who we are. The mail server will track that information as well as the Internet Protocol (IP) address that the connection is received from. The first part of a message dialog, speaking Simple Mail Transport Protocol (SMTP) is as follows:

220 dallas ESMTP Postfix (Ubuntu)

EHLO blah.com

250-dallas

250-PIPELINING

250-SIZE 10240000

250-VRFY

250-ETRN

250-STARTTLS

250-ENHANCEDSTATUSCODES

250-8BITMIME

250 DSN

MAIL From:foo@foo.com

250 2.1.0 Ok

 

When I use EHLO, which is HELO for extended SMTP, I am just saying hi to the mail server and introducing myself. The mail server keeps track of who connects so they know what IP address I am really coming from, regardless of whether I tell the truth about the hostname I am. In the transaction above, you can see that I am telling the mail server that I am coming from blah.com, which is obviously untrue. Checking the mail server logs, I can see the IP address that the connection actually came in from. 

Apr  9 15:43:52 dallas postfix/smtpd[9931]: connect from unknown[10.0.0.13]

 

The example above is from my own internal network. In the case of the headers from the e-mail we are using as an example, we are looking for information about the address 173.0.145.21. The first thing I want to do is see whether the IP address has a hostname associated with it. We want to look up the PTR record in the domain name system (DNS). Best e-mail practice is to have the reverse DNS match the forward. So, if miho.ribefsfield.com resolves to 173.0.145.21 then 173.0.145.21 should resolve to miho.ribefsfield.com. We want to check to see whether that’s the case and whether either of them actually resolve to anything. 

kilroy@dallas /var/log $ host 173.0.145.21

;; connection timed out; no servers could be reached

 

Turns out we couldn’t find the name server that was supposed to own the IP block this address came out of. Since that’s the case, we can’t do a reverse DNS lookup on the IP. This doesn’t exactly bode well for verifying the source of this e-mail. Let’s see what the hostname that was offered up resolves to. 

kilroy@dallas /var/log $ host miho.ribefsfield.com

miho.ribefsfield.com has address 66.78.32.6

 

Well, 66.78.32.6 is an entirely different IP address altogether. At this point, we should probably check to see who owns the domain name. Skipping a lot of the preamble from the whois lookup, we get the following information from the regional Internet registries. 

Registrant Email: WEBMASTER@NIZMEDIAGROUP.NET

Registry Admin ID: 

Admin Name: WEB MASTER

Admin Organization: -

Admin Street: 37 TOWER LANE

Admin City: WILLISTON

Admin State/Province: VT

Admin Postal Code: 05495

Admin Country: US

Admin Phone: +1.88888888

 

The funny thing about this is that I’ve been digging around a lot in e-mail over the last few weeks for a variety of reasons. It seems like every spam message I look at ends up resolving to a domain that is registered to this physical address. The peculiar thing is that this address is a town over from where I am sitting as I write this. You can check any map server you like and you’ll find that it’s a blue house. Google Maps shows that there is a Saturn in the driveway. At some point, it may be entertaining to do a little drive by and see what else is going on but before that, let’s keep going with this e-mail. The hostname referred to in the message actually exists and it resolves to 66.78.32.6. We should check to see who actually owns that IP block and see if it matches anything that we have seen so far. When I run a whois on that IP address, I find that it is part of a pretty big block of addresses (63.78.0.0/18) belonging to Virtual Development Inc. 

OrgName:        Virtual Development INC

OrgId:          VDI

Address:        590 Bloomfield Ave 

Address:        Suite 317

City:           Bloomfield

StateProv:      NJ

PostalCode:     07003

Country:        US

RegDate:        1999-10-21

Updated:        2013-03-07

According to Manta, VDI is a two person shop but there doesn’t appear to be a Web site associated with it, in spite of the enormous block of IP addresses that is registered to it. So far, we have locations in Williston, VT and Bloomfield, NJ. Let’s see if we can add any additional locations to this little tangled Web we have uncovered. We can perform a traceroute to the original IP address from the e-mail headers. Below you will see the last part of the traceroute to the IP address. The suspense is likely killing you at this point. 

12  hurricane-ic-138359-sjo-bb1.c.telia.net (213.248.67.106)  103.452 ms  104.842 ms  167.425 ms

13  10ge1-4.core1.sjc1.he.net (72.52.92.117)  158.437 ms  170.045 ms  170.470 ms

14  evernet-hosting.gigabitethernet2-2.core1.sjc1.he.net (216.218.196.6)  161.826 ms  162.256 ms  162.262 ms

 

It dead ends at this point. The IP address can not be reached. Interesting that we have added in another player and not one that surprises me. As I said, I’ve been grubbing around through e-mail headers for the last couple of weeks and our good friends at Evernet Hosting are quite familiar to me. Either they are a breeding ground for spammers or else their systems are so badly secured that they are easily compromised. Either way, there is a pretty solid connection between Evernet Hosting in San Jose and a small house in Williston, VT since much of the spam messages where a domain is registered to the Williston address actually originates in San Jose, CA with an IP address somewhere behind Evernet Hosting. Doing a geographic lookup from the IP address reveals that the IP is located in San Jose, CA, just as the traceroute indicates. 

While we don’t have anything specific in terms of a name or even a clearcut company we could point to, we certainly have a lot of clues and from a legal standpoint, we have some places we could look further, as long as we had some legal support. We could check with Evernet Hosting or Virtual Development, Inc just as a starting point. While not immediately satisfying, it’s a step along the path. 

Monday, April 7, 2014

We Now Pause For This Commercial Advertisment

In the spirit of full disclosure, with apologies to RFP, this is as much about propaganda for an upcoming book about cloud computing done securely as it is about much of anything technical. I will say, however, in the process of writing the book, I learned a lot about really cool things that can be done with cloud computing providers. At the risk of giving away the contents of the book, let me pass along a few things that you might think about when it comes to moving your sensitive infrastructure off to become someone else’s problem. I often find someone else’s problems to be very good things and if you plan well, you can get a lot of benefit without a lot of risk. 

While the infrastructure for my own business domain is hosted with Microsoft, using their Office 365 plan, which has a lot of benefits, not the least of which is a subscription to the Office software for up to 5 computers. This also includes SkyDrive, now called OneDrive, so I can store documents with a storage provider while also being able to edit them through a Web interface. On top of that, I can access documents from wherever I am and share them with other people. All of the benefits of cloud storage that we all know and love so well. In addition to the storage, of course, I get Web hosting and e-mail. As with many other e-mail providers, Microsoft takes care of spam for you but they also provide organizations with settings where you can fine tune how they detect spam. You can see some of those settings below. 

Screen Shot 2014 04 07 at 7 42 01 PM

One thing they don’t handle, however, is the ability to support Domain Keys Identified Mail (DKIM) or the Sender Policy Framework (SPF). DKIM allows organizations to take ownership of e-mail messages. This uses header fields in the e-mail messages that associate a cryptographic key with a domain. If the right key isn’t in place, the message didn’t come from the right place. With SPF, a mail administrator can create a record in the domain name system (DNS) entries for the domain and if a mail transport agent (MTA) receives a message from a host that doesn’t match up with the domain SPF record, it’s likely spam. If it’s spam, the MTA can safely drop it or at least place it into a junk folder for the user to determine whether they really want to look at it or not. 

Microsoft’s settings don’t actually give me specific settings for either DKIM or SPF so I don’t have any control over whether they use it or not or what I might use for settings for either of those features. In the course of researching for the book, though, I did some investigation into Google’s offerings for businesses and discovered some interesting things. Again, you can read about this in more detail, but if you get Google Apps for Business, you will get some additional control over your e-mail settings. You can create a key that can be used for DKIM. You can see the settings, or at least a portion of the settings since it doesn’t render in the window correctly and is cut off on the right hand side, below. 

Screen Shot 2014 04 07 at 8 24 40 PM

 

 

Once I have the correct setting in my domain name server, recipients can verify that messages they have that appear to be from me are actually from me. Google will also use SPF to help protect recipients. The one thing I don’t get as well with Google that I had with Microsoft was fine grained settings over spam and how it’s filtered. 

In the process of writing the book, I put together a whole domain with Web site and e-mail just to walk through how it would work and also have a Web site related to the book when it was all over. The domain I created is cloudroy.com and it has additional information about the book. There is also a link back to this blog so now I have linked the two completely together. 

Monday, March 31, 2014

Analyzing a Linux Memory Dump

Earlier, we talked about different techniques that could be used to extract memory from a Linux system. At this point, we have a memory dump, though we are still missing something when it comes to being able to analyze it. Virtuality requires some awareness of how the memory is laid out. With Windows, this is well documented and Virtuality includes profiles for different types of Windows installs between processor architectures and version of Windows. The thing about any type of memory forensics is that you need to know where everything is. When it comes to Linux, different versions of the Linux kernel and different kernel parameters may lead to a different layout of the memory space. The only way to determine the memory layout is to take a look at the memory map. 

A map is a symbol table indicating where in memory to locate a particular function. The memory map includes the address, the type of entry it is and name of the entry. You can see an example of a memory map from a Linux Mint installation on a 64-bit system running a 3.11 kernel. 

c1a3d180 d cpu_worker_pools

c1a3d600 D runqueues

c1a3dc00 d sched_clock_data

c1a3dc40 d csd_data

c1a3dc80 d call_single_queue

c1a3dcc0 d cfd_data

c1a3dd00 D softnet_data

c1a3ddc0 D __per_cpu_end

c1a3e000 D __init_end

c1a3e000 R __smp_locks

c1a45000 B __bss_start

c1a45000 R __smp_locks_end

c1a45000 b initial_pg_pmd

c1a46000 b initial_pg_fixmap

c1a47000 B empty_zero_page

c1a48000 B swapper_pg_dir

c1a49000 b dummy_mapping

c1a4a000 B idt_table

c1a4b000 B trace_idt_table

c1a4c000 b bm_pte

c1a4d000 B initcall_debug

c1a4d004 B reset_devices

c1a4d008 B saved_command_line

c1a4d00c b panic_param

 

You can see three columns in this memory map. The first is the address in memory that the entry can be located. The second is the type of entry that it is. While there are a number of different types of memory segments, some of the types you will run into in a system.map file on a Linux system are as follows:

A is an absolute location
B or b is an uninitialized data segment, referred to as BSS and a stack segment
D or d for an initialized data segment
G or g for a global segment of initialized data which may be thought of as a heap
R or r for read only data segments
T or t for text segments, which is where executable code is stored
U for undefined
W or w for weak objects that have not been tagged as weak objects

You can see from the listing above that we have some stack segments and some initialized data segments from the short sample of the system.map file from this one system. This system.map file would be required to create a profile that we can use with Volatile to analyze the memory dump we have created. We also need additional data, though. We need some debugging information, which leads to another package that needs to be installed. The package is based on DWARF, a debugging format. On Ubuntu and systems that are based on it, the package is called dwarfdump. This will provide us with additional information Volatility needs to be able to extract the relevant pieces from the memory dump. Once dwarfdump is installed, we can build the module. In tools/linux inside the Volatility source tree, you run make and that will create a module.dwarf. You can see the process below as well as the dwarf.module that we will need. 

kilroy@quiche:~/volatility-2.3.1/tools/linux$ make 

make -C //lib/modules/3.11.0-12-generic/build CONFIG_DEBUG_INFO=y M=/home/kilroy/volatility-2.3.1/tools/linux modules

make[1]: Entering directory `/usr/src/linux-headers-3.11.0-12-generic'

  CC [M]  /home/kilroy/volatility-2.3.1/tools/linux/module.o

  Building modules, stage 2.

  MODPOST 1 modules

  CC      /home/kilroy/volatility-2.3.1/tools/linux/module.mod.o

  LD [M]  /home/kilroy/volatility-2.3.1/tools/linux/module.ko

make[1]: Leaving directory `/usr/src/linux-headers-3.11.0-12-generic'

dwarfdump -di module.ko > module.dwarf

make -C //lib/modules/3.11.0-12-generic/build M=/home/kilroy/volatility-2.3.1/tools/linux clean

make[1]: Entering directory `/usr/src/linux-headers-3.11.0-12-generic'

  CLEAN   /home/kilroy/volatility-2.3.1/tools/linux/.tmp_versions

  CLEAN   /home/kilroy/volatility-2.3.1/tools/linux/Module.symvers

make[1]: Leaving directory `/usr/src/linux-headers-3.11.0-12-generic'

kilroy@quiche:~/volatility-2.3.1/tools/linux$ head module.dwarf 

 

.debug_info

 

<0><0x0+0xb><DW_TAG_compile_unit> DW_AT_producer<"GNU C 4.8.1 -m32 -msoft-float -mregparm=3 -mpreferred-stack-boundary=2 -march=i686 -mtune=generic -maccumulate-outgoing-args -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -g -O2 -p -fno-strict-aliasing -fno-common -fno-delete-null-pointer-checks -freg-struct-return -fno-pic -ffreestanding -fstack-protector -fno-asynchronous-unwind-tables -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-strict-overflow -fconserve-stack"> DW_AT_language<DW_LANG_C89> DW_AT_name<"/home/kilroy/volatility-2.3.1/tools/linux/module.c"> DW_AT_comp_dir<"/usr/src/linux-headers-3.11.0-12-generic"> DW_AT_stmt_list<0x00000000>

<1><0x1d><DW_TAG_typedef> DW_AT_name<"__s8"> DW_AT_decl_file<0x00000001 include/uapi/asm-generic/int-ll64.h> DW_AT_decl_line<0x00000013> DW_AT_type<<0x00000028>>

 

We now need to create the profile we are going to use before we can actually spin Volatility up. We need to zip up the Dwarf module and the system.map for the system that we gathered the memory dump from. In my case, I am going to use the command zip LinuxMint.zip tools/linux/module.dwarf /boot/System.map-3.11.0-12-generic and that will result in the profile I need. In order to get Volatility to find it, though, we need to put it in the right place. If I were running volatility from the directory that was created when I extracted the tarball,I would put the resulting .zip file into volatility/plugins/overlays//linux. However, if I have gone through the process of installing Volatility after making it, I need to put the resulting zip file into /usr/local/lib/python2.7/dist-packages/volatility-2.3.1-py2.7.egg/volatility/plugins/overlays/linux. This is where Volatilty looks to find the profiles for Linux systems. Once we have it in place, we can verify that Volatility has found it by running vol.py —info and looking for Linux, as seen below. 

kilroy@quiche:~$ vol.py --info | grep Linux

Volatility Foundation Volatility Framework 2.3.1

LinuxLinuxMintx86 - A Profile for Linux LinuxMint x86

linux_banner            - Prints the Linux banner information

linux_yarascan          - A shell in the Linux memory image

 
Based on the name LinuxMint.zip that I gave to the file, Volatility has prepended linux onto it as a profile name because the profile is in the linux folder. The profile name turns into linuxLinuxMint. Now we have a profile and a memory dump file, so we can do a little digging into the memory. The first thing we need to do is figure out what commands we can use. Running vol.py yields a list of commands that are targeted at Windows profiles. In order to get the list of Linux-related commands, we have to run vol.py —info which shows up as an option if you run vol.py -h. Now that we have a list of things we can do, let’s dig into the memory dump we previously obtained. The capture below, showing a couple of commands, is taken from a 64-bit system running Kali Linux. 
 

root@quiche:~# vol.py linux_banner --profile=LinuxSystemx64 -f linux.dd 

Volatility Foundation Volatility Framework 2.3.1

Linux version 3.7-trunk-amd64 (debian-kernel@lists.debian.org) (gcc version 4.7.2 (Debian 4.7.2-5) ) #1 SMP Debian 3.7.2-0+kali8

root@quiche:~# vol.py linux_ifconfig --profile=LinuxSystemx64 -f linux.dd

Volatility Foundation Volatility Framework 2.3.1

Interface        IP Address           MAC Address        Promiscous Mode

---------------- -------------------- ------------------ ---------------

lo               127.0.0.1            00:00:00:00:00:00  False          

eth0             10.0.0.182           00:00:00:00:00:00  False          

 
While these commands are not all that impressive in terms of useful data, they do show that we can extract information from the system dump. There are a lot of other commands we can perform against the memory dump that could be used to extract data from running memory. Perhaps this next example will be more intriguing to you. This is a list of shell commands that have been run. They have been extracted from the memory dump. There is absolutely no question that this is a list of commands that I have run on the system where the memory was captured. The command time is a little more specious and it seems to reflect the date and time that the capture was created, more than the time and date the command was created. 
 

root@quiche:~# vol.py linux_bash --profile=LinuxSystemx64 -f linux.dd

Volatility Foundation Volatility Framework 2.3.1

Pid      Name                 Command Time                   Command

-------- -------------------- ------------------------------ -------

    3577 bash                 2014-04-01 00:11:43 UTC+0000   cd /media/cdrom/

    3577 bash                 2014-04-01 00:11:43 UTC+0000   ls /etc/init.d

    3577 bash                 2014-04-01 00:11:43 UTC+0000   ifconfig -a

    3577 bash                 2014-04-01 00:11:43 UTC+0000   cd

    3577 bash                 2014-04-01 00:11:43 UTC+0000   rm -Rf installer

    3577 bash                 2014-04-01 00:11:43 UTC+0000   ls

    3577 bash                 2014-04-01 00:11:43 UTC+0000   cd 

    3577 bash                 2014-04-01 00:11:43 UTC+0000   netdiscover

    3577 bash                 2014-04-01 00:11:43 UTC+0000   rm install

    3577 bash                 2014-04-01 00:11:43 UTC+0000   rm -Rf kmods

    3577 bash                 2014-04-01 00:11:43 UTC+0000   sh install

    3577 bash                 2014-04-01 00:11:43 UTC+0000   cp -Rf * ~

    3577 bash                 2014-04-01 00:11:43 UTC+0000   shutdown -r now

    3577 bash                 2014-04-01 00:11:43 UTC+0000   rm -Rf tools

    3577 bash                 2014-04-01 00:11:43 UTC+0000   ls

    3577 bash                 2014-04-01 00:11:43 UTC+0000   shutdown -h now

    3577 bash                 2014-04-01 00:11:43 UTC+0000   ifconfig -a

    3577 bash                 2014-04-01 00:11:43 UTC+0000   ./install

    3577 bash                 2014-04-01 00:11:43 UTC+0000   rm install-gui 

    3577 bash                 2014-04-01 00:11:43 UTC+0000   ./install

    3577 bash                 2014-04-01 00:11:43 UTC+0000   ls

    3577 bash                 2014-04-01 00:11:43 UTC+0000   shutdown -h now

    3577 bash                 2014-04-01 00:11:43 UTC+0000   ls

    3577 bash                 2014-04-01 00:11:43 UTC+0000   rm version

    3577 bash                 2014-04-01 00:11:43 UTC+0000   ls

    3577 bash                 2014-04-01 00:11:43 UTC+0000   clear

    3577 bash                 2014-04-01 00:11:43 UTC+0000   sudo nmap -sS -O -T 4 172.30.42.41

 
The interesting thing about this is the process id shown in this list appears to be the same across all of the commands in the list. Checking the system itself, there is a process that has that particular pid and the process is a bash process. The commands were not all run from that bash session, though, but clearly the shell loads all of the history into memory for use. I can also check the memory dump to see if there was a process with the pid 3577. I would use the linux_pslist command to look for that process. That command will show us the list of processes that were running at the time the capture was created. Searching through the list of processes can be a very time consuming task. Running linux_pstree is much faster. You can see the output below showing the process tree that includes the bash process with a pid of 3577 that had the history loaded up into it. 
 

.gdm3                2468            0              

..gdm-simple-slav    2474            0              

...Xorg              2482            0              

...gdm-session-wor   3219            0              

....x-session-manag  3256            0              

.....ssh-agent       3317            0              

.....gnome-settings- 3326            0              

.....metacity        3361            0              

.....gnome-panel     3375            0              

......gnome-terminal 3570            0              

.......gnome-pty-helpe 3576            0              

.......bash          3577            0              

 
When you run vol.py —info | grep linux, you can see the list of commands you can run, as noted above. You can see that the list is shorter than that of the list of commands available for Windows. One of the most interesting commands available for Windows that, sadly, isn’t available for Linux is the one that extracts files from memory. This difference in the commands available has to do with the way memory is laid out and managed by the operating system. On top of that, Windows is used far more often and there is more call for forensics of Windows systems so it could simply be that there was more development done on the Windows commands and plugins. Whatever the case, Linux has fewer commands that are available but the ones that are available are still very powerful and useful. 

Memory Forensics the Linux Way

Now that we have grabbed a memory dump from a Windows system and used Volatility to extract critical information that would otherwise have been lost if the system had simply been powered down, we should take a look at how to do the same thing in Linux. If you are familiar with Linux, you may recognize that we have one utility already in hand that can do an extraction for us. However, if you have a modern version of the Linux kernel (operating system), we have an issue. Before we get into the problem, we should talk about how we might get access to memory. 

In Linux, every device is easily accessible through the /dev filesystem, which is a pseudo-filesystem populated by the operating system while it’s running. The operating system creates an entry in the /dev tree as a result of device drivers that are either built into the kernel or built as a module that can be loaded and unloaded in the running kernel. Getting access to memory is not different from accessing other devices. In Linux, the memory device is found at /dev/mem and we could use dd to extract data from that device, just as we would with a disk device like /dev/sda. However, in order to protect the memory space, the Linux developers have restricted what we can do to the mem device. One of the reasons is that it’s very easy to really mess up your system. All you’d have to do would be to execute dd if=/dev/urandom of=/dev/mem and you’d quickly have a system that was no longer operational. You can see what happens when we try to get a dump of /dev/mem using dd. 

kilroy@quiche:~$ sudo dd if=/dev/mem of=mem.dump

dd: reading ‘/dev/mem’: Operation not permitted

2048+0 records in

2048+0 records out

1048576 bytes (1.0 MB) copied, 0.0109432 s, 95.8 MB/s

 

Linux says operation is not permitted and all we get is 1M out of a system that has 2G of memory in it. We need another approach. Fortunately, there are a couple of kernel modules we can get that will give us access to the memory we want. The first one is a simple replacement for mem, without the restrictions that the mem device has. fmem is a kernel module that works well for forensic purposes and it works just like you’d expect it to. Before we can use it, though, we have to get it installed. Below, I have the source code that I have built and I run make install as a superuser to get the module loaded into the kernel. I could have also used insmod. 

kilroy@quiche:~/Downloads/fmem_1.6-1$ sudo make install

./run.sh

Module: insmod fmem.ko a1=0xc1058800 : OK

Device: /dev/fmem

----Memory areas: -----

reg00: base=0x000000000 (    0MB), size= 2048MB, count=1: write-back

reg01: base=0x0b0000000 ( 2816MB), size=    2MB, count=1: write-combining

-----------------------

!!! Don't forget add "count=" to dd !!!

 

Now we have the module loaded, we can use dd to extract memory and just refer it to the /dev/fmem device. I could set a count on it if I wanted to restrict the amount of memory I get but if I want to get all of memory, I don’t want to use a count. I do, though, set a block size to be higher than the typical block size. This should help dd go faster since it’s reading larger chunks and as a result, does fewer reads. Below, you can see what the process looks like. To extract 2G of memory took a little over 6 seconds, as you can see from the output. 

kilroy@quiche:~$ sudo dd if=/dev/fmem of=linux.dd bs=1M

dd: reading ‘/dev/fmem’: Bad address

2047+0 records in

2047+0 records out

2146435072 bytes (2.1 GB) copied, 6.39649 s, 336 MB/s

 

fmem is just one possibility. Another possibility is Lime, which can be used for memory extraction from Linux. One of the interesting capabilities of Lime is the ability to capture memory over a network. Lime will startup a server on the target host and when you connect to that server, you will get a stream of the contents of memory. Before we get there, though, we have to build the source tree. While I haven’t mentioned this before, building a Linux kernel module requires a lot of extra packages including the Linux headers. This isn’t something you probably want to do on a system that you are trying to get a forensic image of since you’ll be adding a lot of files and packages just to get this one kernel module installed. Linux kernel modules are not always easy to port from one system to another. For instance, after creating the Lime module, I copied it over to another system and tried to install it without success. Building a portable kernel module is a topic for another day, however. Once we have the module built, we have to install it. Lime installation, unlike fmem, requires parameters so you can indicate how you want to acquire the image. In this case, we’re just going to write it out to disk. 

kilroy@quiche:~/lime/src$ sudo insmod lime-3.11.0-12-generic.ko "path=lime.dump format=lime"

kilroy@quiche:~/lime/src$ ls -la lime.dump

-r--r--r-- 1 root root 2147019840 Mar 31 08:52 lime.dump

 

If I wanted to open up a network listener so I could use netcat to get the image, I would change the parameter to path=tcp:4500. Using netcat, I could then use nc 10.0.0.8 4500 > lime.dump to get the image out. This can be very convenient if you don’t have enough disk space on the target or if you want to leave the disk as it is as much as possible. Once we have the image, we can use Volatility on it, right? Well, not necessarily. We have to create a profile for Volatility. I’ll take that up as a separate topic along with using Volatility on a Linux memory dump. Stay tuned for that. 

Thursday, March 27, 2014

Memory Forensics (the first in a series?)

One of the most interesting places to find operating system artifacts is in memory. We can learn a lot about a user’s behavior by just observing what they have in memory at the time of a system capture. However, acquiring memory isn’t always a walk in the park. This is another area where Windows continues to have an advantage. Sort of. One of the problems Windows has is, in spite of its command line origins (DOS), it doesn’t handle the command line well. One of the areas this shows is its organization of program content. The Program Files directory works well for graphical programs that are and should be self-contained. However, when we are working with the command line, we rely on the PATH environment variable and if I have to add in a long path in the Program Files directory for every program I want to get access to, I have a very long PATH variable. Or, worse, if I have a number of utilities that are single file executables, I have to have the files I work on in the same directory as the executable or else I need to deal with long path names. 

One solution for this is to create a single directory to store a lot of small utilities in. The Windows Sysinternals tools are a great example of small utilities you may want to have access to everywhere. Two more are utilities we will be looking at here. What I did to solve the PATH problem and making sure I could act on a file no matter where it was without having to move executables around was to create a C:\Tools folder and then add that to the PATH variable on the system. You can get to the PATH variable from Advanced System Settings in Computer Properties. 

The first tool we need to make use of is the DumpIt tool. The DumpIt tool overcomes a challenge that we have. The challenge is that we shouldn’t be able to get direct access to memory. Memory should be managed by the operating system and we shouldn’t get direct access to it. This is especially true when it comes to writing to memory because you can cause problems if you somehow manage to sidestep the operating system and write directly to memory, you could overwrite critical operating system components or other programs that are executing and cause application of system crashes. It would be nice to keep all of our problems to a minimum and rather than diving into a lecture on the usefulness of separating functions into different rings where ring 0 has the highest level of privileges, we’ll just stick with getting our hands on some memory. We will use the DumpIt program to do that. Fortunately, it’s incredibly easy to use as you can see below. 

Screen Shot 2014 03 27 at 7 45 07 PM

 

It’s really as simple as running DumpIt and then saying that you really want to continue when it asks you. You will then get a really large file that could be really, really large if you happen to have the kind of system that supports a LOT of memory. I’d say if you were the type of person who had a lot of money and could afford a lot of memory except that the price of RAM is supposed to be dropping, right? Well, at the very least it’s a lot easier and less expensive to get a metric buttload of memory now as compared with, say, the early 80s. Be aware that you need to make sure you have the disk space available to store the contents of your memory. If you have 4 gigabytes worth of memory on your system, you need at least 4 gigabytes of memory available to write to on your disk. 16G, 32G, whatever you have, make sure you have the disk space to store it because it’s all coming down. 

Once we have the memory capture from our running system, you should keep one thing in mind. Whatever else you get from the memory capture, you will definitely get artifacts from your use of DumpIt. In order to get the memory, DumpIt has to execute and in order to execute, it has to be in memory and if it’s in memory, it’s going to show up in the memory dump you are getting. Something to keep in mind. When you run a process list, you’ll see DumpIt in your process list. 

Now that we have a dump of memory, we need a good way of taking a look at it. As much fun as hex editors are, I find it’s difficult to find much of anything in a very large block of memory in a hex editor. We need a utility to automate the process. A utility that understands how memory is structure. A utility that knows where all the skeletons are buried, so to speak. We need Volatility. The Volatility Framework is a way of grabbing a lot of artifacts from a memory capture. In order to do this, though, it needs to know what type of memory capture we have. It uses a set of profiles in order to know what the memory layout looks like and where the different important structures are. We can determine the profile that is in use by using the command volatility imageinfo -f imagename.raw. You can see the output of that below, though I have substituted imagename.raw for the actual name of the image I got. I’m doing the analysis on a 64-bit Windows 7 Pro system, though the memory was captured from a Windows XP Pro system. 

Screen Shot 2014 03 27 at 7 42 16 PM

 

Once we have determined the profile, we can make any additional work much faster by providing the profile to volatility. As an example, we can get a list of the process privileges by using the priv command against our captured memory. This will present a list of the processes that were running when the capture took place and the privileges that are associated with it. In order to get that and also provide the profile, we can run volatility privs —profile=WinXPSP3x86 -f imagename.raw and it will skim through the memory dump, gathering all of that information. One of the problems we have is a limitation of the Windows command prompt and how Microsoft by default only allows 80 characters for the width of the window. You can increase the width, but it requires going into the properties and then dragging the window open. You can’t just drag the window open in Windows 7 and have the window adapt to the new size. Volatility, however, outputs in tables wider than 80 characters so your output will be wrapped, making it harder to read. We can write the output of any command to a text file by using —output-file=filename. You’ll get a text file with all of the relevant details. An example is the data below.

Screen Shot 2014 03 27 at 8 28 24 PM

 

This output has been truncated in order to better fit in this page. There is, in actuality, a whole other column with a description field that has a text description of the privilege.

While we have done a couple of things to gather information, there is so much more to do but there isn’t as much point in walking through every single command available in Volatility when you can just go get yourself a copy of Volatility and play. Volatility has a number of Windows profiles and the default set of commands is targeted at the Windows profile. However, there are ways to analyze Linux memory captures and capturing Linux memory has its own unique set of challenges. That, however, is another task for another day. 

 

Tuesday, March 11, 2014

Anti-Forensics Part 1 (Hiding Files in the Registry)

This semester, I’ve been teaching a class on Anti-Forensics, which is a variety of techniques designed to make life difficult for a forensic investigator. The Windows registry is a great place to hide data as it turns out. While it’s stored in plain sight, the registry is such an enormous, convoluted mess that finding a value stored in the registry in an arbitrary place would be just like looking for a needle in a haystack. You could store notes to other people, account numbers, passwords or any number of other pieces of data. You can see the registry editor below and the New menu with key, which is like a folder where you would collect a number of values. These values could be strings, numbers as words, double words or quad words, string collections or just simply binary data. 

Screen Shot 2014 03 11 at 7 58 48 PM

 

 

 

 

 

We started down the road of talking about different things you could store in the registry one class period. Once you start thinking about binary data, it’s nearly irresistible to think about stuffing files, particularly executable program files, into a registry key somewhere. There are challenges with using the registry editor to do this, however. You can’t just open an executable file in a hexadecimal editor, copy the contents of the file and then paste the data into a binary value. When you create a binary value and go to plug data into it, you get a dialog box where you can start entering hexadecimal. You can’t Paste. None of the typical pasting techniques (Ctrl-V, right click and select paste and so forth) work. However, there are great application programming interfaces (APIs) that we can use to get access to the registry. The challenge is then to write a program that will take any file as input and stuff it into the registry. I took up the challenge in two programming languages. The first was C#. I mean, why not use Microsoft’s own language to get access to a Microsoft feature. I ran into creeping featurism, however, and though I currently have a working version, I don’t consider it to be complete at this point. The second language was Python. You can see the proof of concept script below. This can be used to store any file into a registry key. 

 

#  File: reghide.py
#  Author: Ric Messier
#  Description: This program could be used to hide files inside a registry key. 
# While we assume that the key created will be in HKEY_CURRENT_USER\Software,
#it could be anywhere and this script could be edited to reflect that or I 
#could also extend it to make that flexible as well
#  Copyright:  2014, WasHere Consulting, Inc.
 
import _winreg
import sys, os
import argparse
 
# get arguments
argParser = argparse.ArgumentParser()
argParser.add_argument('-f', type=str, help='the file you want to store', required=True)
argParser.add_argument('-v', type=str, help='the name of the value to use', required=True)
argParser.add_argument('-k', type=str, help='the name of the key to use', required=True)
 
passedArgs = vars(argParser.parse_args())
 
keyName = passedArgs['k']
baseName = passedArgs['v']
fileName = passedArgs['f']
 
key = _winreg.CreateKey(_winreg.HKEY_CURRENT_USER, "Software\\" + keyName)
 
#  set the extension to the base value name to 1. This will increase based on the number
#  of chunks read in
currValue = 1
 
#  open the file specified with a bunch of exception handling
try:
with open(fileName) as fileHandle:
#  going to read in 1024 byte chunks
dataChunk = fileHandle.read(1024)
while dataChunk:
#  create a value name from the base name and then a zero filled number
#  appended to it to create unique value names
valName = baseName + str(currValue).zfill(6)
#  set the value in the registry
_winreg.SetValueEx(key, valName, 0, _winreg.REG_BINARY, dataChunk)
#  read another chunk in
dataChunk = fileHandle.read(1024)
currValue = currValue + 1
except IOError as err:
print("I/O error: {0}".format(err))
except:
print("Unexpected error:", sys.exc_info()[0])

Fair warning that stuffing very large files into your registry may cause unexpected consequences. The Python script chunks the data up into 1024 byte chunks meaning that you will end up with a number of values with data in them. You can see some of what that looks like below.

Screen Shot 2014 03 11 at 9 01 05 PM

 

 

The one thing missing from this scenario, of course, is a way to extract the file once it’s been stuffed into the registry. Well, that would be a project for another day and it seems like it may require more overhead when it comes to the hiding program. It may be useful to store the filename so when you extract the data, you don’t have to prompt for a new filename. The filename is already there in the registry with all of the bytes from the file. Just get the name back and associate it with the data.