Helpful Navigation Toolbar

Friday, August 8, 2014

Parsing Windows Live Messenger data from iOS devices


Good afternoon readers, the past couple of weeks have been pretty busy with case work, but thankfully I finally had some time to dig in to some messaging data that I extracted from an iOS device that never seems to have been addressed previously and does not appear to be recognized by any mobile device forensic tools that I have used.

The application I will be covering in this blog post is Windows Live Messenger for iOS devices, which seems to have been discontinued some time in 2013. Unsurprisingly, the application is VERY poorly written and stores data on the device itself in a variety of different ways, which probably explains part of the reason that no one has really dug into this data. As the below image shows, there does not appear to be specific time stamps or even a definitive structure as to how data appears within the application itself, which makes extracting data MUCH more difficult.


Screenshot of Windows Live Messenger on an iOS device. Retrieved 7 August 2014 from http://news.softpedia.com/news/Microsoft-Discontinues-Windows-Live-Messenger-for-iOS-324028.shtml#


I would like to note though, that with all of the research, time, and effort that I put into this post and parsing the data from, I only have one device that has the application on it, so the data stored on a device you encounter may be slightly different. If you have a case where you have access to this data and are willing/able to share it I would much appreciate it in an effort to make the small Perl script that accompanies this blog post. Likewise, if you encounter issues with the script please reach out to me and I will try my best to help! So, now that all of the formalities are out of the way, shall we begin?



The Messenger data itself is stored in a standard location, under the "/private/var/mobile/Applications/com.microsoft.wlx" folder (or the applicable SHA value, if you have not used a tool/method to reconstruct file paths). 


Messenger application folder from iOS device when viewed in X-Ways

There is quite a bit of data in here, but for now we are most interested in files that are stored under the "Documents/cache/Messenger" and under the "Library/Caches/cache/Messenger" folders. It should look something like this:


Contents of "Messenger" folder. The files stored in "Documents/cache/Messenger" and "Library/Caches/cache/Messenger" following the same naming conventions, but contain different data.

The first thing that I want to point out here is that all of the files in these folders have a ".cache" file extension. But of course Microsoft does not follow a standard format for exactly what a ".cache" file should be, so the file header and footer for each of these files are different. The files seem to follow the naming convention of "MSN User ID_filename.cache". The data in the Message History of the files also appear to bring up the possibility that 7-bit encoding is in use, so that opens up an entirely new can of worms for parsing the data. Rather than reverse engineer the entire file structure format, I decided to focus on areas from which I could extract data and have relative confidence in the results.



No discernible, repeatable, standard file header. Foiled again! 



This seems to suggest 7-bit encoding might be in use, at least a little bit. Maybe. I don't know to be 100% honest. 

Thus far I have identified three files that contain chat and/or chat associated data of interest. The files contain "ChatConversations", "MessageHistory", and "Status" in the filename. The "ChatConversations" file seems to contain a listing of the most recent chat sessions, the "MessageHistory" file seems to contain most of the message data, and the "Status" file seems to contain email addresses and usernames of individuals involved in chat sessions as well as the username and email address of the individual who used the Windows Live Messenger application on the device itself. I will cover each of the files individually, but I would also like to note that the script can be given the "-folder" option and it will attempt to recursively (meaning all of the subfolders as well) search to find these files for you.


"Message History"

The message history files seem to follow the format of storing the data in the following format


  • User "Friendly Name" (if present)
  • Hex character 00 (can occur 1 or 2 times (usually if friendly name is present, but not always))
  • Number 1 
  • Colon
  • Hex character 00 (can occur 1 through 4 times)
  • Email Address
  • -- sometimes additional printed and non-printed characters --
  • Message
  • Variety of characters
  • Hex character 00 five times in a row

Since I have not found a reliable way to account for if the "Friendly Name" is present or not, I am going to focus on starting the pattern match with the number 1 and the colon, and then greedily match everything up to where the hex character 00 occurs at least five times in a row. The Perl regular expression that the script uses for this matching is:

"\x31\x3A([\w\W]+?)\x00\x00\x00\x00\x00"

Once the script matches that pattern, it then attempts to format the data structure by replacing the "1:" with the term "Email Address", and tries to find the beginning of the message by matching the various patterns that occur after the ".com"  that I have seen in my data. It then attempts to clean up non-printable characters that occur in the message itself as well and then prints the chunks of data out one by one. This method is not 100% foolproof, however if should be more than sufficient in order to allow you to at least get an understanding of the message conversations that have occurred on the device. 


The contents of a modified "Message History" file that match our regular expression that we search for, since the data structure is currently unknown.


The parsed contents of the above file. Different programs recognize different encoding schemes, in this case Notepad translates \x84\x00 as ",,".



"Chat Conversations"

Initially I thought that this file would contain the same data as found in Message History, but interestingly enough, it did not. This seems to contain a list of what appears to be (guessing here, based on the limited amount of data I have to work with) the most recent chat message that was received from an individual that is listed in the Message History file(s). What this means is that if there is a Message History file associated with the username "peter_quill_88@hotmail.com", there will be an entry in the chat conversation folder with that user name and the message as well. So in order to try to be as complete as possible, the script can also handle this file. Fortunately, the data in this file seems to follow a very rough structure, and the data format of this file appears to be:


  • Hex characters \xE8\xFF\xFF\xFF
  • Variety of characters
  • Number 1
  • Colon
  • Email Address
  • Message
  • Variety of characters
  • Hex characters \x00\x00\x00\xBC


Using this, our regular expression to extract the data is going to be

"\xE8\xFF\xFF\xFF([\w\W]+?)\x00\x00\x00\xBC"

As stated above, the script matches that pattern, it then attempts to format the data structure by replacing the "1:" with the term "Email Address", and tries to find the beginning of the message by matching the various patterns that occur after the ".com"  that I have seen in my data. It then attempts to clean up non-printable characters that occur in the message itself as well and then prints the chunks of data out one by one. I think it bears repeating again that this method is not 100% foolproof, however if should be more than sufficient in order to allow you to at least get an understanding of the message conversation that is stored in this file. 


"Status"

The file containing the term "Status" seems to be the most well-structured file within the data. The data itself seems to be stored in a similar structure as the Chat Conversations file, although there are differences in the data itself.


  • Hex characters \xE8\xFF\xFF\xFF
  • Variety of characters
  • Number 1
  • Colon
  • Email Address
  • Username (if present)
  • Microsoft Object (if present)
  • Variety of characters
  • Hex characters \x00\x00\x00\x01

Just like the above examples, the regular expression that we use to extract data is:

"\xE8\xFF\xFF\xFF([\w\W]+?)\x00\x00\x00\x01"


As with the above examples, the script then takes this data and attempts to clean it up in a user readable fashion. The script replaces the "1:" with the term "Email Address", it attempts to identify is a Username is present and if it is, label it accordingly. It also attempts to digest the embedded Microsoft Object, if present, and format that data into a more easily readable format.


The Script

The script, which is written in Perl (insert obligatory coding language debate comment here), runs by either specifying a "file" or a "folder", which should be used after identifying the Windows Live Messenger application on your iOS device or iOS backup file(s). The application attempts to digest the file based on the fully restored filename (i.e. having "Status", "MessageHistory", or "ChatConversations" in that file). In other words, pointing the script at a raw iOS backup folder will not yield any results since it is looking for specific patterns in the file name, rather than opening every single file and trying to find the data structure(s) the script is looking for.

The output is just regular text, so you can copy/paste from the command prompt or you can save it to a file of your choosing. Be advised that the formatting is not 100% perfect, so it might require some cleanup before presenting the "final" version of the output but it should allow you to get a much better idea of any messages that are stored in the application data since no other mobile forensic tool on the market currently seems able to handle this data.

If you have any questions or issues please do not hesitate to let me know. You can download the script here.


Filename: "wlm_ios_parser.pl"
MD5: c6f3f7d09bea79d69dc3cd60da1fc17a
SHA256: 99380423520b16385dde1493bb46ad439459a221346b65e62451b9a2d7c17bac











No comments:

Post a Comment