Security Kilroy

Saturday, March 26, 2016

Analyzing Virtual Images

The Sleuth Kit can be used to investigate disks and disk images but the images don’t actually have to be copied from a real, physical disk. You can analyze a virtual image that you have created just as easily. The Sleuth Kit includes a number of very useful command line utilities that can be run on Windows, Linux and Mac OS X systems. For our purposes, this is being done on a Mac OS X system where the program was built using the Xcode Command Line Utilities that need to be installed before you can build The Sleuth Kit. First, let’s take a look at an image that was created on a Windows system using diskpart to create a virtual disk image and then format it. Before we do anything, we need to take a look at the partition table in the image to see where the partitions are. Without that information, we can’t go much further.

You can see from the screen capture above that we have used the mmls utility from The Sleuth Kit to get the partition table. mmls tells us that this is a DOS partition table. We aren’t restricted to the type of partition table that’s on the disk, though. Let’s take a look at another virtual disk image that was created using the Mac OS X Disk Utility program. Using mmls on that, we can see a GUID partition table.

In either case, you need to locate the partition that you want to investigate. In order to get file system statistics, we can use fsstat but we need to point fsstat at the actual partition using -o to indicate the offset within the image. In our case, we are looking at the fifth slot from mmls, which has a starting offset of 40 so that’s what we tell fsstat.

From this, we can see that it’s an HFS+ file system that was last mounted by Mac OS X and it was journaled, meaning that the operating system was keeping track of changes to the filesystem in case anything bad happened so the changes could be redone to reconstruct a clean copy of the the data, including the metadata indicating where all of the files were located. While this is all very interesting, what we probably want to get at is the actual files within the image. For that, we can use fls. This will give us a file listing of the partition. Let’s go back to the Windows image from earlier, since it had a different partition type, file system and offset. Looking at the mmls output above, the third slot is the only one that actually carries a filesystem, so that’s the one we will use. Again, we need to provide the offset to get to the actual filesystem and in this case, the offset is 128.

Once we have the list of files that were stored in the file table, which in this case is the Master File Table (MFT) from the NT File System (NTFS), we can do a bit more digging into files if we chose to. What you see here are the entries within the file table only and with the MFT, there is a lot more information to be gathered. First, we need to know where to look. Find the entry for Diskpart1.png above. We can see that this is a regular file. There are two r’s there indicating that the filename and the metadata for the file agree. These would normally be identical, though if a file were deleted you may see a difference between them. Keep in mind that if a file has been deleted, it still remains on the disk — both the data and, in some cases, the metadata within the filesystem. There is then a chain of three numbers. The first indicates which entry in the file table we want to look at. The 128-1 indicates that this is an NTFS entry and we can ignore that. Where we want to look next is the entry in the MFT and we can get to that using istat.

The istat utility extracts and decodes all of the information from the MFT entry for that file. You can see the filename and then the other attributes associated with it, including the $DATA attribute at the bottom. This attribute includes a list of blocks where we should be looking for the file data. The metadata (filename, permissions, access dates and times, etc) is kept entirely separate in most cases from the actual data that’s contained in the file. If all you did was to gather the contents of the file, you wouldn’t have the filename. If all you did was look at the metadata, you wouldn’t have any idea what was in the file. The two are separate but both necessary. Our starting point to gather the data for the file is in block 8346. We can use blkcat to extract the data from that block. According to fsstat for this virtual disk, we have a cluster/block size of 4096 bytes. blkcat will take care of that for us and only grab a single cluster.

Just as with the other tools, you have to tell blkcat where the actual partition starts by providing an offset within the file. This tells blkcat where the filesystem itself is, meaning the BIOS Parameter Block from which it can locate the file table. When you look at the output here, which has been piped into xxd to do the ASCII decoding for us, you can see that this is a PNG file. We knew that from the filename but filenames can lie. You are not required to use .png as a file extension for a PNG file. Windows systems maintain a list of file associations so they know what programs to launch when you want to just open the file from the Windows Explorer. That’s simply a convenience. As a result, it’s always good to verify that what you have in terms of data is what the filename and file extension tell you that you have.

One thing we didn’t look at here is the case where you may have deleted files. Typically, if a file is deleted, you would see * between the r/r and the file table entry. If you see that, it doesn’t mean the data is gone. It just means that the file has been flagged as deleted and so the entries can be recycled at some point.

Creating Disk Images

Let’s say you want to play around with disk analysis but you really want something small to use. You just want to tinker around with some forensics tools, why do you want to play around with even a multi-gigabyte USB stick? It’s much easier to just create a small disk image to use, though you won’t be able to put many files on it. If you don’t need that, there are easy ways to create disk images on each of the three primary operating systems — Windows, Linux and Mac OS X. Of course, the quickest way is to use a virtual machine. Using a virtual machine, you can add a second hard disk that you can make use of from inside a guest operating system that you are running your forensics tools on. Using Parallels, virtualization software for Mac OS X, you can add an additional hard drive by customizing the virtual machine, as you can see below.

If you don’t want to use virtualization but just use the operating system you came with, you can use tools that are already built in. On the Windows side, you would use DiskPart. DiskPart is a command line program. Launching the Command Prompt program, found in various places in the menus, depending on the version of Windows you are running from. DiskPart can be used to create a virtual image. DiskPart uses an interactive shell to issue commands. As a result, you start up DiskPart and it dumps you into the shell. Once you are there, you tell DiskPart to create a virtual disk image, as you can see below.

Once the virtual disk is created, we have to attach it to the system. Once it’s attached, you create a partition, assign it a drive letter and format it. Once you have done all of that, as you can see in the capture below, you have a working disk that is attached to your Windows system with a drive letter and it will show up in Windows Explorer. Once you have created the partition and assigned the letter, Windows will pop up a message saying there is an uninitialized disk and would you like to initialize it. You can initialize it using the dialog box or just type format in DiskPart and you have a formatted drive that is really just a file.

Using Linux requires multiple utilities as opposed to the single utility that Windows provides. Using Linux, we can create an empty file using dd. In the screen capture below, you can see dd creating a file using /dev/zero as the input source. This is a logical device that just generates 0s. We set the block size to be 512 which is mostly meaningless other than it tells us the size in conjunction with the count. 512 bytes * 200000 gives us a file that’s roughly 100M. Once we have the file, we can partition it just as you would a regular disk device.

As soon as we have partitioned it, we need to format it. Before we do that, we need to create a device file. We do that using losetup. Since there was already a loop device, the first thing is to delete the existing one using losetup -d as you can see in the screen capture below. We need to skip by the master boot record and the reserved sectors, which we do using —offset. Then you provide losetup with a device file, which we are calling /dev/loop0, since devices belong in the /dev directory. Once you have the device setup, you can format and mount it. You can format it with any format that you would like but in the screen capture below, you can see that it is formatted using the ext4 filesystem. As soon as we have formatted it, it’s ready for use but we need to mount it to a mountpoint within the filesystem. In the example below, we’ve mounted it to /mnt. As soon as it’s mounted, you can use it just as you would any other directory and start copying files to it, though keep in mind that in our example we are limited to 100M.

Mac OS X has Disk Utility, which is a graphical program that can create virtual disk images which you can then mount. You can see the creation of a new disk image in the screen capture below.

Once you have selected new image, you will be prompted for the size, format, encryption, read/write properties and the name. You can also specify whether you want to use a GUID partition map or master boot record partition table. The moment you have created the disk image on any of the operating systems you can start writing to the image as though it were a regular disk and you can also start to perform a forensic analysis using a variety of forensics tools. However, that’s another write-up so stay tuned.

Tuesday, November 3, 2015

Policy and Compliance Are Not Enough

The information security business seems to have strayed a bit from its roots. The roots of digital information security really began decades ago by the people who built and maintained systems. They may have wanted to either protect information or keep people out. While our nature as humans is more sharing and collaborative than it is secretive and isolationist, the reality is that we all have times when we need our secrets and spaces where we can store information that no one else can get to unless we specifically allow it. The problem comes when you start allowing a lot of users into the system or if you start connecting a lot of systems together into networks. Then we need additional protections in place to make sure everyone’s little corral of horses stays their own little corral of horses, unless they choose to set up a petting zoo.

Ultimately, there are competing priorities when it comes to information security. There is the pure security-focused no one gets in unless they are specifically allowed in priority. On the other end, there is the perspective that the business owns everything and so it gets to set the rules. The problem that arises here is that rather than these two ends working together to find a middle, the money for the security end is tied up in the focus on the business priorities.

This is where we find a conundrum. Sure, you can say that without the business, there are no systems to protect. As a result, the business should always set the priorities. This has the potential to work well if the business truly owns its resources and has a stake in protecting them. The problem arises when the business has no stake in the resources that are under the control of the information technology and information security people. I’m losing you, right? Okay, let’s talk about case studies.

Using a very simple scenario that can be extrapolated to much higher levels. Let’s say you are a company who wants to start up a loyalty card for your customers. This will allow you to learn a lot about the people who spend money with you and you can feed a little back in discounts or other goodies. Without the goodies, what is to entice people to sign up for your loyalty card so you can gather all of that data? Suddenly, though, your business has a resource that it has no stake in. You are storing names, addresses and phone numbers of a large number of your customers. What happens to your business if that information is stolen? It could be that absolutely nothing happens. If you aren’t storing credit cards or other financial data with those records, it could be you don’t even need to let anyone know, depending on the breach notification laws where you do business.

Even if you do have to notify someone, what has actually been lost? Some names and addresses. No big deal, right? If there is no downside to the business if that information is lost, what is the incentive to do everything possible to protect that information? This is where the problem of business-driven security comes in. The information that has been stolen doesn’t actually belong to the business. It belongs to the customers of the business. Since it doesn’t belong to the business and there is no actual impact to the business from its loss — maybe you have to shut down your loyalty card program, which doesn’t lose sales. It just means less marketing information that you can make use of to be more effective.

Business-driven security starts with the security policy. This is a very high-level statement of expected outcomes. There is nothing at all about implementation in the policy. That comes in a set of standards that fall out of the policy. An acceptable use policy, which is common, may simply say that anyone making use of a company resource like a computer and the enterprise network will do so in a business-appropriate manner. That’s it. That’s the policy. There are countless ways to implement that policy. The standards that are defined underneath that policy will get into more detail but still won’t get into specific technologies and implementations. Instead, you get a more fine-grained set of requirements for what meeting that policy should look like.

The notion of making sure that you are achieving your policy goals is called compliance. This is also a word used in relationship to meeting regulatory requirements. Some businesses may need to meet requirements set down by the Payment Card Industry (PCI) if they deal with credit or debit cards. Others may have requirements set down by the Federal Depositors Insurance Corporation (FDIC). This would be common with banks and other similar financial companies. Making sure that you are meeting these requirements is also called compliance. As a result, compliance is big business. Meaning, there is a lot of money in auditors coming in to make sure you are following the appropriate rules.

Meeting a set of rules, however, that are very high level statements of expected outcomes may not necessarily be the right things to be paying attention to. Here’s an example. A business has a security awareness training program for all its users. Every user has to take this training. An auditor may come in and determine whether the business is really getting all of its users through security training. If the business has a goal of getting, say 97% of users through training in a month and they hit 98%, they have achieved their objective.

Is this the right objective, though? Are the right topics being covered in the training? How is retention of the training being measured? Does this training actually help improve the overall security posture of an organization?

These are all questions that are not answered in this scenario but they are potentially far more important than the question of the percentage of users who have successfully made it through training. What it comes down to is clearly defining the problem. If you haven’t identified the problem well enough, your measurements are likely to be meaningless.

Large businesses are often driven by this compliance mentality and auditors and security professionals are often driven by meeting objectives that bear no relation to improving the security posture of an organization. A business can meet all of its compliance objectives and still be breached. This happens all of the time. The large companies you have read about all have robust security policies and compliance programs in place. The problem is that the security policies are all around protecting the business.

So, back to the scenario from above. If the business is about protecting the business but the business is storing information about a third party (its customers), where does the third party get a say in protecting its information? Once the information is stolen, it’s too late to walk away from the business and research shows that in most cases, businesses are not impacted financially by these breaches. Certainly, their stock prices are not impacted over the long term. Where is the place at the table for the stakeholders who have the most to lose from a security breach?

Tuesday, June 30, 2015

Hiding Between Partitions

One of the challenges of digital forensics is the number of places that someone who knows what they are doing could hide data they didn’t want to be found. Fortunately, the vast majority of cases don’t fall into that category. Most of the time, this will be a case of files sitting in the Documents folder where they belong or maybe in the Recycle Bin or Trash, depending on the system you are looking at. This requires some experience with the different operating systems to know where to look for documents.
Of course, if someone wanted to be a little sneaky, they may create a folder somewhere else on the drive and hide their illicit documents, like their collection of Justin Beiber MP3s, in that folder. This would, of course, be outside of the standard document repositories for users. You might, for example, stuff a folder into the Windows directory structure. This would require administrative rights to the drive but on most personal desktop systems, the user would have those rights. However, you are not restricted to the filesystem itself. If you know what you are doing a little bit, you can make use of the entire drive to store data.
Let’s say that you were to create a little slack space on your drive where you didn’t have a partition defined. Remember that a partition is a collection of consecutive blocks or sectors on a drive. A partition can then be formatted with a filesystem so it can then be used by the operating system to store files on. If you have a blank space before a partition or after a partition, that’s space that the operating system can’t use within a filesystem. That makes it fair game to stuff data into. You can see in the figure below, a drive that has two partitions defined on it with a large gap between the two partitions.

The end sector for the first partition is 1000000 but the beginning of the next sector is 1500000. This leaves 500000 sectors unused. Each sector is 512 bytes. That gives us something around 250M to store information into. This is a small drive and that value is close to a quarter of the drive. On larger drives, it may be a lot easier to carve a decent chunk out of the middle or the end of the drive and not have it be noticed.
The problem with this space is that it’s unorganized. I can’t just copy files to it and expect those files to be placed nicely so they can be retrieved easily. If I have a few files that I
want to put up there, I could manually place them one at a time and keep track of where I positioned them within that space. That requires figuring out how many blocks the files take up and remembering where they are. Instead, another way to do it is to simply concatenate the files together. There are a number of utilities you can use to do that. Under Linux, which is where I did this work, you can just use the cat utility. Perhaps one thing you want to do is to put a separator file between the files you want to store. That will allow you to extract the original files later on.
Once we have the concatenated file with all of the data we want to store in it, we can just dump it up to the slack space on the drive. In the figure below, you can see that I took my file as my input to the dd (disk dump) command and then wrote out to a sector in the slack space between the two partitions.

You will notice that when I was writing out to the slack space, I used the seek parameter. That’s because I needed to seek into the output file. If I want to go to a particular location in the input file, I would use skip, as you can see in the next invocation of dd where I extract the data. You will also notice in the output, I get 56+1 when I write the data out. That’s because I am writing 56 complete blocks plus 1 partial block. I didn’t file the last 512 byte sector when I wrote the file out. When I read in, then, I need to read in all 57 blocks to get the complete file.
Since I have the original file as well as the recovered file, I can compare one against the other. diff shows an insignificant difference regarding a newline at the end of the file. When I check the line count, I get the same line count between the two. These two comparisons tell me that the file I retrieved is substantially the same as the file that I put up into the disk.
Of course, I could also check to see if there is a specific piece of content in the entire file system using grep but on large disks, that can take a while. There are other tools that can be used to ferret out such things, especially if you can afford the commercial forensics tools.
While this is all very manual, this could be performed programmatically as well. Perhaps another time.

Saturday, April 25, 2015

To Frag or Not To Frag, That's Hardly A Question

In the beginning, for there certainly was once a beginning, the gods created the Arpanet and saw that it was good. Skip ahead several years and they further begat TCP, then IP, then UDP and so on and so forth. Considering that they always had in mind the notion that data would be sent in small packages of a size that could be easily determined, they decided to call these packets. Because, sure, why not? However, not everything was called a packet. It all depended on where the chunking up took place as to what you called the end results. In the case of chunking at the Internet Protocol (IP) layer, the result was indeed called a packet. IP then needed to be able to handle this idea of packets and be able to put them back together again. This meant knowing the size and where they needed to be placed, much like putting a puzzle back together. If you try to simply jam all the little dangly bits into holes on other pieces, you ran the risk of having gaps in the resulting puzzle where the dangly bits didn’t cleanly go into holes. Not to mention, the resulting image probably just won’t look right.

Let’s say, for example, that you have this chunk of data that’s 100 bytes, as you can see in the figure below. Maybe it gets broken up into chunks of 20, 30 and 50 bytes each. For ease of reference, let’s call each A, B and C. If what you are sending starts with abcdefghijklmnopqrstuvwxyz, which is 26 bytes. Only 20 of them would fit into the first chunk of data, or packet. If I were to send that to my friend Allan, but if I were to send them in separate packets (maybe think of them as envelopes), he would need to know which to open first and how to arrange them. Because of that, it’s helpful to have some sort of identifier associated with them. This puts us back to A, B and C.

If I were to receive C first, followed by A then B, I would know, because I know my alphabet, that A comes first, then B, then C. One of the problems with fragmentation is that I can’t tell what’s really going on by looking at a single fragment. Maybe I catch the one that says uvwxyz. I don’t know what that means. I suppose it could be the tail end of the alphabet, but that’s just guessing. What really comes before or after? It’s this uncertainty that opens the door to some bad behaviors. The utility fragroute, written by Dug Song, allows us to establish some rules that can actually cause fragmentation of messages. In the IP world, we have an expression for the names A, B and C that we used above. There is a field in the IP header called the IP identification field that associates a lot of related messages together. From there, we have a second field called an offset that tells the receiving end where that particular part of the message slots in. You can see a sample IP header below showing the IP identification field (IPID) and the Fragment offset field. The receiving end needs both of these in order to pull the entire puzzle together with all the pieces in the right order.

Let’s say, though, that you were sitting in the middle looking at all of the puzzles that were going through and you needed to determine whether they were bad puzzles or not. Maybe, in the case of me mailing letters to my friend Allan, I have put a single sheet of paper into an envelope and sent them to him. Maybe out of order, maybe one a day at a time. The complete thing is the Anarchist’s Cookbook with bomb making recipes and so forth in it. Once he has it, he can do bad things. You might want to stop him, and if you knew Allan, that might be a really wise idea. You’d have to know that what he is getting is really bad. You’d need the entire collection of papers, potentially, before you could make a determination. If we are talking about a network device, though, you need all of the fragments before you can determine whether to send it on. If fragments come in out of order or delayed, that means the receiving application is going to be delayed and that’s often unacceptable. So, maybe if it’s fragmented, you just send the fragments along because you don’t have enough information to make a decision from each individual fragment. Rather than holding up the train, which may cause users to be upset, you just push everything through.

We can take advantage of this with fragroute. Using fragroute, we can grab messages from applications and fragment and otherwise mangle them before they get sent on their merry way. Let’s try this as a ruleset.

delay random 10
dup random 20
ip_frag 48
ip_ttl 3
print

You may be able to figure most of this out on your own but let’s step through just to be sure. The first line says to delay random messages by 10 milliseconds. The next line says to duplicate random messages with a 20 percent chance to perform the duplication. The next line is the one that we’ve been talking about. Fragment each message at 48 bytes. Finally, set the IP time to live field to be 3 and then print out what has happened.

Below is one of the frames (another way of talking about messages that have been broken up but this is each chunk of data that is seen on the wire) that results from running fragroute. You can see that the total length of the frame is 68 bytes. Out of that 68 bytes, 20 of them are just the IP header. The other 48 bytes are the actual data. Based on looking at the fragment offset, it looks as though each fragment was 48 bytes, just as we had set in the fragroute rules. The offset is a multiple of 48, indicating that this is the third fragment (0-47, 48-95).

You may notice that there is no indication here what the data indicates. Normally, you would have some indication of what protocol was being used. The problem is that we don’t have the TCP header in this fragment. You can see from the IP header that the next layer protocol is TCP but we don’t know what the protocol above that is. If you look at the very bottom of the Wireshark window above, you can see the actual data. This suggests a user agent from an HTTP request. But without the actual TCP header, we can’t determine for sure that it’s an HTTP request. Other requests can use a user agent and there is really nothing else to suggest that this is HTTP. We certainly don’t have a port number, because of the lack of TCP header.

You can see from this why fragmented packets is such a problem. We can pull it all together, of course. You can see the message in Wireshark that the entire message is reassembled in frame 3765. I can also follow the conversation by looking at frame 3765 and I find the following.

This is clearly an HTTP request. The section highlighted in red is the request that we were looking at a fragment from. The section highlighted in blue is the response. This doesn’t look at all like anything worth getting worried about. It’s just a standard HTTP request and response. But just from the fragment we saw, it was hard to say. In this case, the entire conversation was fractions of a second. We could easily have delayed the frames by quite a bit, which would have required a lot of hold up. This is why fragmentation attacks can be challenging when it comes to detection and certainly when it comes to prevention.

Tuesday, March 24, 2015

Math Lessons

While you might not think this really has anything to do with forensics or other security-related issues, the reality is that math is your friend. And my friend. And when you have to calculate the byte offset on a hard drive to locate the cluster where a particular file is located, you will really want to know a little about the basics of math.

You may have guessed that the origination of this topic is all of the nonsense spreading around on social networking sites like Facebook. Based on the number of times variations on these math problems show up and the number of times I see wrong answers, it seems as though a large number of folks really could stand a brief math lesson and while I am neither a math instructor in real life, nor do I play one on TV, I am going to take this one on because it will make me feel better.

The acronym to remember here, and it’s really quite simple, is PEMDAS. Make up whatever mnemonic you want to remember, what it really means is parentheses, exponents, multiplication, division, addition and subtraction. This is the officially approved order of operations. When you see a very long chain of mathematical operations, you might think that you should just work left to right and as a general rule, that’s not a bad instinct. However, in order to come up with a consistent and mathematically accurate answer, you should apply the order of operations first. Then you can move on to left to right. You will also find that it’s generally easier to do a simple replacement. Let’s illustrate with an equation I’ve been seeing recently on Facebook.

7 + 7 / 7 + 7 * 7 - 7

For those of you unfamiliar with two of those symbols, the / is a division sign for cases where we don’t have the horizontal line with a dot above and below, as in a computer keyboard. The * is a multiplication symbol, which is commonly used in place of a X or an x because those might be confusing in algebraic equations. So, let’s apply the order of operations and then re-write the equation after substituting.

7 + 1 + 49 - 7

7 divided by 7 is 1, so I swapped in a 1 for the division operation I did. 7 multiplied by 7 is 49 so I swapped that value in. That leaves us with the equation above. There are a couple of ways to do this at this point. I could certainly go left to right and add the first three numbers then subtract the last but you may have noticed that two of them cancel each other out. If I were to re-write the equation above as follows, it quickly becomes a lot easier.

7 - 7 + 1 + 49

This leaves me with adding 1 to 49 resulting in 50. See how easy that was? Keep in mind that the order of operations is really important. I suppose I could get into the history of why someone determined that multiplication and division were more important than addition and subtraction but it would likely bore you to tears. It would take far more of my time to come up with something coherent than I feel like putting in at the moment, so let’s skip it and move on to series. Let’s say you see the following:

10 = 50

9 = 38

8 = 27

7 = 17

5 = ?

There are two things you should notice right away. The difference between 50 and 38 is 12. 38 to 27 is 11. 27 to 17 is 10. So, the next in the series should be 8 because we were decreasing the right hand side by one less each time. Since the last difference was 10, the next difference will be 8. 17 - 8 is 9. This leads us to the next thing you really should notice. The value on the left skipped one. The value of 6 should be 8. I’m asking for the value of 5. Keep the series going. I decrease the difference on the right hand side by 1, meaning that as I decrease by one on the left, I will be decreasing by 8 on the right hand side. This means that 5 = 0. When you see a series like this, there is generally a trick. They have skipped a value out of the series. This doesn’t mean that you just assign the correct right hand value (the next one in the series) to the wrong left hand value. It means you apply the right hand series twice and assign that value to the left hand side.

A little bit of math, folks, will take you a very long way. I hope this has been a little bit of help. I know it’s made me feel better to share it with you.

Tuesday, July 29, 2014

More Fun With Python

Since I’m in the middle of trying to get a title completed on Python scripting for security professionals for Infinite Skills, I’ve been doing a lot of writing little scripts that do interesting things. So, in order to dump my head and also get something somewhat recent and potentially interesting up here, I thought I’d write up one of those scripts here. This could be a useful foundation for someone who wanted to do a little security testing using Python scripts. It is also useful for forensics professionals since you may want to write custom tools that are capable of parsing data in a way that makes sense to you rather than relying on tools that represent data in a way that made sense to someone else. Being able to parse simple data structures, for example, is a very useful skill for both network programming and also forensics programming. One case where there was a lot of parsing to do that came up both in terms of the video training I am doing but also in the next book I am writing is dealing with the information in the master boot record, including the partition table.

So, let’s take a look at a program that I threw together to do some quick parsing of the partition table in a master boot record.

#!/usr/bin/python3

import struct

f = open("mbr.dd", "rb")

mbr = bytearray()

try:

mbr = f.read(512)

finally:

f.close()

x = struct.unpack("<i", mbr[0x1B8:0x1BC])

print("Disk signature: ", x[0])

x = mbr[0x1BE]

if x == 0x80:

print("Active flag: Active")

else:

print("Active flag: Not active")

lbastart = struct.unpack("<i", mbr[0x1C6:0x1CA])

print("Partition Start (LBA): ", lbastart[0])

lbaend = struct.unpack("<i", mbr[0x1C9:0x1CD])

print("Partition End (LBA): ", lbaend[0])

For the purposes of this program, I have grabbed an image of the master boot record so I can get to the partition table. I did this by using the UNIX/Linux utility dd. You simply grab the first 512 byte block with dd if=/dev/sdb of=mbr.dd bs=512 count=1 and you end up with an image of the master boot record you can use various tools with. So, the program opens up the disk image called mbr.dd as a binary file then creates a byte array to store all of the bytes from that disk image into. Once I have a byte array, called mbr, I can start to pull the bytes out that I want as long as I know where the offsets are.

Something to keep in mind, though, is that the master boot record is stored as little endian so if I have a file with an image or copy of that master boot record, all of the multi-byte values are going to be backwards. We need to use struct.unpack to get the bytes out and in the correct order. So, we tell struct.unpack that we have a little endian integer with the parameter <i and then we have to provide a range of bytes out of the byte array that struct.unpack should use to create that integer out of. The thing to keep in mind when you are providing a range is that the top end is not inclusive. For the disk signature, I am grabbing bytes 1B8, 1B9, 1BA, 1BB. Even though the last byte in the range indicated in the program is 1BC, we don’t get that last byte because it’s not included based on the Python syntax.

Once we have the basics of pulling data out of the byte array, the rest is trivial. I can grab the single byte indicating whether a partition is active (bootable) or not and then compare that value with what I know about that flag. If it’s 0x80, I know the partition is active. If it’s not, then it’s not active so I can print out the results based on that byte. I can also get the starting logical block address and the ending logical block address by grabbing the bytes from my byte array and converting them into integers, again using the struct.unpack method.

This is a simple technique that can then be applied to other binary data structures. Whether that data structure is the rest of the master boot record or if it’s the BIOS parameter block or the structures associated with a GUID Partition Table disk.