<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Wibble &#187; Research</title>
	<atom:link href="http://www.thewibble.com/category/research/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.thewibble.com</link>
	<description></description>
	<lastBuildDate>Fri, 06 Nov 2009 22:45:49 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=abc</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Binary Level Metadata in Microsoft Word</title>
		<link>http://www.thewibble.com/2009/10/06/binary-level-metadata-in-microsoft-word/</link>
		<comments>http://www.thewibble.com/2009/10/06/binary-level-metadata-in-microsoft-word/#comments</comments>
		<pubDate>Tue, 06 Oct 2009 21:09:40 +0000</pubDate>
		<dc:creator>Jen</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[anonymity]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[Microsoft Word]]></category>
		<category><![CDATA[privacy]]></category>

		<guid isPermaLink="false">http://www.thewibble.com/?p=79</guid>
		<description><![CDATA[The breakdown is this: give me a relatively recent Microsoft Word Document (.doc) and I can tell you what word processor last edited it.

I studied information leakage. There are a plethora of examples of cases where information that wasn’t supposed to be revealed, was. One could argue that it’s the user’s fault for not correctly [...]]]></description>
			<content:encoded><![CDATA[<p>The breakdown is this: give me a relatively recent Microsoft Word Document (.doc) and I can tell you what word processor last edited it.</p>
<p>
I studied information leakage. There are a <a href="http://www.casi.org.uk/discuss/2003/msg00457.html">plethora of</a> <a href="http://news.bbc.co.uk/2/hi/europe/4506517.stm">examples</a> <a href="http://www.sfgate.com/cgi-bin/article.cgi?f=/n/a/2009/02/10/state/n230703S73.DTL">of cases</a> where information that wasn’t supposed to be revealed, was. One could argue that it’s the user’s fault for not correctly sanitizing their documents, but I blame WYSIWYG editors. Back in the day of pen and paper, if someone wanted to redact information from a document she was releasing, all she had to do was take a black market and cross it out. For extra security, she could make a photocopy of the original and only release the photocopy. WYSIWYG editors try to imitate paper in that the document being edited is in theory the one being published, but especially with redacting information, there’s a failure to communicate to the user what’s actually going on. In a WYSIWYG editor, one can’t just put a black box over information to redact it. The same goes for putting a black background on text. The problem: the information is still there.</p>
<p>
The stories about information being incorrectly redacted are more high profile and glamorous, but metadata leakage can also be embarrassing. Metadata can be thought of as data about data. When you create a file, the program that created it stores some identifying information–for example title, author, date of creation. It stores data about the data you just made. I talked earlier about how technology can be seen as like magic and just working. Again, the problem is if one thinks of technology this way, privacy and security are never questioned. In this project I examined Microsoft Word Documents–one of the most common file formats for editing and publishing text documents. Word stores metadata and in a world increasingly worried about metadata, Microsoft offers advice on how to sanitize documents of metadata. While clicking around Microsoft’s help pages, I came across the following <a href="http://support.microsoft.com/kb/223396">snippet</a>:</p>
<blockquote>
    “Some metadata is readily accessible through the user interface of each Ofﬁce program. Other metadata is only accessible through extraordinary means, such as opening a document in a low-level, binary ﬁle editor.”
</blockquote><br />
<p>
Extraordinary means? Thus I set forth tying to determine whether this Computer Science undergraduate could find the metadata Microsoft referred to using “extraordinary means” (also known as the Unix tools <a href="http://linux.die.net/man/1/strings">strings</a> and <a href="http://linux.die.net/man/1/od">octal dump</a>).</p>
<p>
What I found was quite fun. Microsoft Word Documents (of the .doc variety–.docx is an entirely different beast) differ enough on the binary/octal level differ enough so that I can identify Word files created by Microsoft Office 2003, 2004, 2007, 2008, OpenOffice, and Google Docs. A quick tip on identifying Office version: Microsoft always releases the Windows version the year before the Mac version. Thus Office 2003 and 2007 are the Windows versions and Office 2004 and 2008 are the Mac versions. There are major differences in structure between Windows and Mac Office-produced Word documents and definitely differences between each version. Microsoft Office is a minor nightmare from a backwards compatibility standpoint, so I don’t blame Microsoft for having convoluted file formats (fun fact: Word documents alternate between UTF-8 and UTF-16 encoding). It turns out that when one version of Office (say 2004) opens and saves a Word file created by another version of Office (say 2003), the file structure will be converted from 2003 to 2004. It is possible to create an operating system neutral word processor though: I couldn’t tell the difference between OpenOffice Word files created on Windows computers or Macs. It goes without saying that OpenOffice and Google Docs produced Word files that look very different on a binary level from the Microsoft ones.</p>
<p>
I recognize that looking at Word documents at this close of a level is beyond most Word users’ abilities or desires, but I’m also surprised how easy it was to find differences in the file formats. Microsoft Word stores unintended metadata about what word processor you used to last edit a document. This is troubling since Microsoft has tools that are supposed to strip metadata from documents, but this just goes to show that metadata is embedded deep into documents. I’m guessing that one of the reasons Word moved to a .docx format was because .doc was becoming too cumbersome to deal with. It’s very possible that .docx is operating system and Office version neutral.  I definitely don&#8217;t think that Microsoft was sloppy in creating the .doc format, I just believe that in most moderately complicated file formats constructed in an environment where privacy isn&#8217;t paramount, there will be traces of hidden metadata.</p>
<p>
This was one of the two projects I did at Princeton.  The other, on RFID security, can be found <a href="http://www.thewibble.com/2009/09/30/rfid-and-smart-card-privacy-and-security-concerns/">here</a>.]]></content:encoded>
			<wfw:commentRss>http://www.thewibble.com/2009/10/06/binary-level-metadata-in-microsoft-word/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>RFID and Smart Card Privacy and Security Concerns</title>
		<link>http://www.thewibble.com/2009/09/30/rfid-and-smart-card-privacy-and-security-concerns/</link>
		<comments>http://www.thewibble.com/2009/09/30/rfid-and-smart-card-privacy-and-security-concerns/#comments</comments>
		<pubDate>Wed, 30 Sep 2009 21:21:33 +0000</pubDate>
		<dc:creator>Jen</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[princeton]]></category>
		<category><![CDATA[privacy]]></category>
		<category><![CDATA[rfid]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://www.thewibble.com/?p=52</guid>
		<description><![CDATA[

http://www.flickr.com/photos/midnightcomm/ / CC BY 2.0


Arthur Clarke once proclaimed that &#8220;any sufficiently advanced technology is indistinguishable from magic.&#8221;  Even as a Computer Science student, I find myself identifying with this idea.  Because I&#8217;ve studied more on the software side, I tend to think of hardware as vaguely magical black boxes.  When dealing with magic, [...]]]></description>
			<content:encoded><![CDATA[<div style="text-align: center;"><a title="Blue and Purple RFID tag by midnightcomm, on Flickr" href="http://www.flickr.com/photos/midnightcomm/171587228/"><img src="http://farm1.static.flickr.com/49/171587228_f78f978bd8.jpg" alt="Blue and Purple RFID tag" width="500" height="333" /></a>
<span style="font-size:60%">
<div><a rel="cc:attributionURL" href="http://www.flickr.com/photos/midnightcomm/">http://www.flickr.com/photos/midnightcomm/</a> / <a rel="license" href="http://creativecommons.org/licenses/by/2.0/">CC BY 2.0</a></div>
</span></div><br />
<p>
Arthur Clarke once proclaimed that &#8220;any sufficiently advanced technology is indistinguishable from magic.&#8221;  Even as a Computer Science student, I find myself identifying with this idea.  Because I&#8217;ve studied more on the software side, I tend to think of hardware as vaguely magical black boxes.  When dealing with magic, things are supposed to &#8220;just work&#8221; and we don&#8217;t question why because it&#8217;s all mysterious.  The problem with this thinking is that even if a technology works, it might not work well or have been implemented correctly, especially in terms of security.</p>
<p>
RFID is a magical technology&#8211;it&#8217;s commonly used enough so that people will know what it is, but not well-known enough for people to understand what it is.  If you&#8217;re unfamiliar with RFID, it&#8217;s the chip that can be found inside of some credit cards that forms the basis of &#8220;tap and go&#8221; payment.  RFID tags can also be found in many transportation system cards, like the CharlieCard (Boston) or the SmarTrip (D.C.).  RFID tags can store information (like how much money is on your card) and they communicate through radio frequency waves.  The radio waves are why RFID can probably work through your wallet but doesn&#8217;t if you wrap it in aluminum foil.  At Princeton, our student ids (&#8221;Prox&#8221; cards) have RFID tags inside them and students can use them to access buildings.  They add an extra layer of building security.</p>
<p>
Princeton&#8217;s security is based on our Prox cards, so I wanted to know how secure they were.  I used an off-the-shelf RFID reader (an Omnikey CardMan 5321, around $100) and open source software (RFIDIOt, free) to see what I could get out of the RFID cards I had, including a Princeton Prox card, a CharlieCard, and a Princeton Public Library card.  Luckily (or unluckily for me), the Princeton Prox card was an HID iCLASS card, which I found in my literature study to be one of the more secure cards on the market.  HID claims that it built in anti-cloning (copying a card) physical devices into the card.</p>
<p>
However, I discovered that hotlisting attacks were very possible with all three cards I had.  Hotlisting is an attack that involves tracking an individual through a unique identifier (UID), a number that was unique to that card.  Each of the cards had a UID that I could read with my unauthorized reader, and since it was a unique number, I could link it directly to that card.  Because each card is linked strongly with one individual, I could then track individuals if I had a point of reference where I could confirm their identity and read the UID off their card.  Reading a card&#8217;s RFID tag is very unobtrusive, especially when the cards are commonly used.  All it would take is brushing up against an individual&#8217;s wallet, and I would have the number.  This means that if I wanted to track an individual&#8217;s movements, all I would have to do is place a number of RFID readers in key locations, and obtain someone&#8217;s UID.  Since I could read the UID of all the cards I tested and considering the ubiquity of cards with RFID tags, I believe that most people are trackable.  RFID tags are also being found in items other than cards, such as library books and EZ Pass or related electronic toll payment systems.  As more cards add RFID tags, this will become a bigger issue.  Whenever you carry your card, you are followable.</p>
<p>
This was one of two research projects I completed during my junior year at Princeton.  <a href="http://www.thewibble.com/2009/10/06/binary-level-metadata-in-microsoft-word/">Here is my other project</a> on hidden metadata in Microsoft Word Documents.</p>]]></content:encoded>
			<wfw:commentRss>http://www.thewibble.com/2009/09/30/rfid-and-smart-card-privacy-and-security-concerns/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
