San Fran Systems


MailXplorer Version 2009.0808 Manual


Table of Contents


Introduction

Do you have tons of email, from years of correspondence, and in several different email packages?

We do, and San Fran Systems wrote MailXplorer to help gather up all our old email in one place, and perform complex searches on all the email we've ever sent or received.

MailXplorer will:

In this manual, we'll show you how to create an archive, update it, and how to search it.

MailXplorer consists of 2 programs, FindMail and SearchMail, which are described below.


Chapter 1 - FindMail

1.1. FindMail from Scratch

The first thing you do in FindMail is click Options and set your preferences.

In this list, "Jeremy" is your Windows username.

Don't worry if that doesn't make much sense right now. At the least, use the existing options, and just set the index folder (with 'Browse for Index Folder').

Now you will tell FindMail to look for your email files. You can choose from:

Afterwards, you hit 'Scan for Email' and it will read in the emails, see if they are valid, skip over corrupt emails, output to archives as in Options, and create a Body/Header index so you can search your email fast.

Then you can run Search Mail and test out a few simple searches.

A MailXplorer email archive folder is named 'MyMail'.

1.2. Finding Email Files in a File

Click 'Browse for File', and select a file, then 'Scan for Email'. If the file is an email file, FindMail will read out all messages and output them to an archive.

1.3. Finding Email Files in a Folder

Click 'Browse for Folder', and select a drive or directory, then 'Scan for Email'. FindMail will go in there and look at every file it finds, see if it's an email file, then try to figure out which type. Finally, it archives all emails found.

1.4. Finding Email Files by the Mail Client

Click 'Find Email Folder', and select your email client. FindMail will select the most common file OR folder for that that email client. Then click 'OK' and 'Scan for Email'.

1.5. Indexing Email

Before we look at indexes, this is how FindMail works:

What is an index? Well, for most non-fiction books, it's a small number of pages in the back that let you quickly find all references to given topics. It's much faster to scan those few pages than to read the entire book.

Therefore, searching all your emails is quicker by looking in an index than manually searching all your email files. For instance, when doing a full-text search in Mozilla Thunderbird, it looks through megabytes of text, and this can take minutes. Using FindMail takes just seconds.

An index is typically a few percent of the size of your email archive. It is stored in MyMail/index and MyMail/sqlite.ind

Now, say you've created an archive of 100 emails. They are indexed for searching. But what about new emails? What happens to the index then?

Simple: If the MyMail location hasn't changed, and the archive of 100 emails is still in that location, but is now 110 messages in size, all you have to do is select the updated mail file or folder and Scan for Email. It will spend a couple of minutes processing the 10 new emails, adding them to the archive and indexing their contents, fully automatically.

Bottom line: You don't have to worry about the index, as it's all done automatically.

1.6. Archiving Discovered Email

FindMail can store all emails it finds in archives, so you can make a backup of all your email - whatever format, wherever it is on your drive, and into a simple generic format which can be archived and looked at in any mail program.

ALL emails go into the selected MyMail folder.

FindMail can output (in this list, "Jeremy" is your Windows username):

1.7. What the Buttons Do

Browse for File

Prompts you for the email file (any format) that you want to convert and index.

Browse for Folder or Drive

Lets you find a folder, or click on a drive (C:) and then all data deep in that folder is scanned and any mail indexed.

Find Email Folder

Click this button, select the mail client you use and click OK.

'Scan' Button

Either one of the following has been selected:

Whichever, it will read the mail from that source and update the index with any new messages.

Quitting Indexing

If you hit the 'Abort' button, indexing stops after the next email message found, or the next file to be scanned. It then loads in the SQL data and Clucene data and exits (which can take up to 5 minutes).

Pausing Indexing

You might want to halt scanning of your folder or drive, so you can, for instance, unplug a hard drive and plug it back in later.

When clicked, the captions on these 2 buttons change to 'Resume File' and 'Resume Msg'. Before resuming, ensure the drive you were scanning is plugged back in, otherwise the resulting index will be out of sync.

1.8. Progress Sliders

A progress slider shows how far through scanning your emails you are. There's just one catch... It's quick and easy to find out how big a disk drive is, but finding out how big a folder is very slow - minutes and minutes.

Therefore, the "Progress (Drive root only, not Folder)" bar is only used when you are scanning a disk drive (called root, or C:/ or F:/ etc), and it's blank while searching subfolders.

However, the "Mail File Progress" bar tells you how far into a mail file we are. So if you have a 50MB mail file being scanned, and it's done 25MB, the bar will be halfway along.

1.9. Message Scanning Status

This text box tells you what's going on in FindMail right now.

1.10. Finishing Beep

FindMail will make a 'ping' sound when the scanning, and index updates, are complete.


Chapter 2 - SearchMail

2.1. MyMail Location

Before searching, you need to make sure the index you're searching is the one you specified when you created your archive in Chapter 1.

It's easy. You click on 'Options', then click on 'Sync' and then click OK.

2.2. Search Basics

Here we'll see how to do a search query on the indexed email headers, and message text.

A 'search field' is a graphical item which lets us specify what we want to find.

You basically fill in the text boxes you want to search by and hit the 'Search' button.

Empty text boxes are not searched.

The results are displayed in a grid, or via your email client.

2.3. Extra Columns and Wildcards

Wildcards

In all boxes except 'Body', a ? represents any character (?eremy matches Jeremy, Zeremy, 1eremy) - while a * represents a length of any character (Jerem* matches Jeremy, Jeremi, Jeremsomething).

Extra Columns

If you want ANY entry, such as any email Attachment, put '*' in the 'Attachment' field. This will create an extra column in the results, 1 row for each unique Attachment filename. This works with From, To, Attachment Filename and Email Archive - no other fields require an extra column.

The thing you have to realise is that if an email has 2 destinations (say, bob@bob.com and foo@bar.com), 2 rows will be created in the grid view instead of 1. Each row has a different email destination (row 1 = bob@bob.com and row 2 = foo@bar.com). This is just a stopgap until I put little '+' symbols to reveal the extra data.

Where From = *

Extra columns: Flag, Name

Where To = *

Extra columns: Flag2, Name2

Where Filename = *

Extra column: Path

Where Attachment = *

Extra column: Filename

2.4. The Search Fields and What They Mean

1. From

This field lets you search for emails 'From' someone.

The name goes in the text box, and the 'radio' buttons on the left specify whether this is an email address (e.g., jeremy@), or just a name (Jeremy).

2. To

This field lets you search for emails 'To' someone.

The name goes in the text box, and the 'radio' buttons on the left specify whether this is an email address (e.g., jeremy@), or just a name (Jeremy).

3. Email Subject

e.g.: '*Recipes*'

Finds emails with a certain Subject field in the email header.

The above search will find all subjects containing the text 'Recipes':

4. Earliest Date of Arrival

First off, you need to specify a date in the 'Date time picker'. Say, June 5th, 2007. When filled in, it will find emails from after 00:00 on the 6th.

5. Latest Date of Arrival

Specifying 5th May 2005 will find all emails up to 00:00 on the 5th May 2005.

6. Flags

Every email created in an email client has flags (or attributes, or properties) which say something about the email.

For instance, an email can be:

FindMail keeps track of these flags. In a search, you can specify if an email has one of these flags or not.

You select it thus:

Here are the meanings of the flags:

7. Email Body

There are a few kinds of body search:

  1. A series of vanilla words that must all be in the email body text (+amazon +book).
  2. A series of words, some with wildcards - ? for a single letter, * for any sequence of letters (+ama?on +bo*).
  3. Search for a quotation ("amazon book") - But you can't use wildcards as in search No.2.

8. Email Attachment Filename

Find all emails with attachments matching this filename.

The search query "*.zip" finds all attachments where the filename ends '.zip'.

9. Mbox Filename

Find all emails which were originally found in files with filenames matching this.

The search query "Inbox*" finds all emails from a file called 'Inbox', 'Inbox2', etc.

10. Warning about Sizes of Search

The order of search execution is:

  1. Email Subject
  2. Earliest Date of Arrival
  3. Latest Date of Arrival
  4. Flags
  5. Email Body
  6. From
  7. To
  8. Email Attachment Filename
  9. Email Folder

If you want to search for, say, a Body query (e.g., "amazon book"), it's good to have a Subject or Date to make the query quicker.

The only modification that might need making in a later version, is to put Email Body last.

11. Columns to Sort By

The Primary sort is the column (e.g., Subject, Composeddate are columns) that the rows from your search are sorted by.

When rows in the Primary sort have the same value (e.g. same Subject or Date), the column in Secondary is what the same entries are further sorted by.

Here's what you can sort by:

A future version will be able to order by email addresses and email names.

2.5. What the Buttons Do

Options

Brings up the Options dialog:

Here is what each button on the Options page does:

'Search' Button

Hit this once all search data is entered, and a processing dialog will pop up (with the option to cancel and go back to the search page) and after a few seconds, the results page will appear:

If you are searching for a quoted string, it shows you how long the search will take:

Exit Button

Exit the search client.


Chapter 3 - SearchMail Results Page

3.1. Windowed Output

Results Grid

After you hit Search and wait a few seconds, the results page will appear.

The window is titled "FindMail Client: 9 search results found for query '+alan +partridge' - Press Escape to exit"

As we can see, it's found some eBay items from 2002, and 2 email newsletters from 1998.

The window is a grid, and the headers go along the top, describing each column, while each email found by the search, takes up a row.

Columns

A description of each column (* means the column has been added by From,To,Attachment,or Email searches):

Viewing a Result Row

If you double-click on a row, it will load that email into a web browser, whether the email is plaintext or HTML.

Now see where the phrase '+alan +partridge' appears:

Now see a HTML email:

Saving Results to a File

If you select a few emails (using shift-click or holding down shift and using the cursor keys), and hit the Space bar, those emails will be exported to a file "selected.mbx".

Exiting Results Page

When you close the results window, it will go back to the Search dialog, where you can edit your query and search again.


Appendix A - Behind the Scenes

a.1. MyMail folder

First, searching 1000's of emails is made easier by creating an index which is much smaller in size, and faster to search than brute force.

An email goes into 2 indexes:

The email is also stored in lots of little files (singles), monthly collections of email (bymonth), and one large file (findmailcomplete.mbx) with all emails residing in it.

Now, because you might not want all your email index data in 'C:/Program Files/FindMail' - because you might want it on a USB flash stick, or your C: drive is full - we let you specify where it is to go.

a.2. MyMail & findmailcomplete.mbx - Exemption from Scanning

Any folder ending in 'MyMail' is ignored by the indexer. This means you can scan a drive and keep all previous MyMail indexes out of the new index.

Because findmailcomplete.mbx might be outside of a MyMail folder, this filename is also ignored by the indexer.

a.3. Changing MyMail location for a Different Index

The files 'findmail.ini' and 'findmailclient.ini' contain the key "MyMail=". You can edit these files by hand (e.g., MyMail=C:/MyMail), or:

This means you can have multiple MyMail folders, and change their location in SearchMail to search different email archives.

a.4. Description of Files in MyMail Index

a.5. Description of Files in the Search Client

All client files go in the Windows Temp folder and are undocumented, and deleted after use.

a.6. Index Space Requirements

First, all email goes to findmailcomplete.mbx. There needs to be enough space for this and all the index files:

With a 2.33GB findmailcomplete file, 460mb (20% of findmailcomplete) is needed for the above index files, plus 500mb temporary space for Clucene (temp/toindex).

You should have free space of at least 4 times the size of all your email, on the drive you put your MyMail index on. So email amounting to 1 Gigabyte needs 4 Gigabytes.

a.7. Deleting an Index

You might want to get rid of an old index, for whatever reason, like saving disk space or discarding the old index before building a new index.

  1. First you need to figure out which MyMail index you want to delete:
  2. Next, shut down FindMail Indexer or the FindMail client
  3. Find the MyMail folder
  4. Right-click on the MyMail folder icon and click 'Delete'
  5. It should ask you if you're sure, say Yes. The index then goes into the Recycle Bin for later emptying


Notes


Written by Jeremy Smith 8th August 2009 and (c) San Fran Systems