Managing Compliance in Microsoft Exchange Server 2010

  • 11/24/2010

Discovery searches

Discovery searches are performed through ECP by users who hold the Discovery Management role. ECP reveals the options to initiate mailbox searches under the Reporting node (Figure 15-23) to mailboxes that hold the Discovery Management role. This feature is a good example of functionality that is available through ECP and doesn’t appear in EMC.

Figure 15-23

Figure 15-23 Viewing the ECP options for mailbox searches.

It is a quirk of Microsoft licensing policy that you need an enterprise CAL to be able to use the Discovery options included in ECP, but a standard CAL suffices if you conduct searches using the Search-Mailbox cmdlet as described later on in this section. You can therefore save a few dollars by executing all searches through EMS, which seems to be a strange situation!

Exchange is able to search across items stored in primary mailboxes, archive mailboxes, and the dumpster. It is not able to search through mailboxes that are deleted, even if the mailbox content is still in the database, because it hasn’t exceeded the deleted mailbox retention period.

The content index catalogs that are maintained for mailbox databases are critical to Exchange’s ability to perform searches: If the catalogs are unhealthy or not fully populated, then search results will be unpredictable or incomplete. Exchange uses the same content indexes for searches by clients, including Outlook Web App and Outlook. However, Outlook only uses the Exchange content indexes when it is configured to work in online mode. Most Outlook clients are now configured in cached Exchange mode, in which case they use local search indexes created with Windows Desktop Search to be able to conduct searches even when they are not connected to a server.

Microsoft introduced the current system of content indexing in Exchange 2007 and improved the performance and throughput of the content indexing component in Exchange 2010 in the following areas:

  • Content indexing uses fewer system resources such as CPU, memory, I/O, and disk space.

  • Items are typically indexed within 10 seconds of their creation on a server. Query results are much faster.

  • You don’t need to configure Exchange Search. It is automatically installed and configured on all mailbox servers.

  • Attachments are indexed (see the “Unsearchable items” discussion next).

  • Indexing throttles back automatically in periods when mailbox servers experience heavy load. Again, administrators don’t have to take any action for this to happen.

Administrators tend to forget about content indexing because it hums away in the background and doesn’t make their lives difficult.

Unsearchable items

As shown in the left screen in Figure 15-24, all item types are discoverable, including voice messages, drafts, attached documents of various formats, and IM conversations (if stored in mailboxes). Before Exchange can include a document, usually an attachment to a message, in its content indexes, it must be able to extract the content. Exchange includes a set of content filters for this purpose. Unlike the RTM version of Exchange 2010, Exchange 2010 SP1 registers the IFilters for Office 2010 with Exchange Search. However, if you want to use other IFilters, such as the one for Adobe PDF, you have to install them separately.

You can see the list of default filters installed on mailbox servers by looking in the system registry at HKLM\SOFTWARE\Microsoft\ExchangeServer\v14\MSSearch\Filters. The list includes formats that you would expect, such as Microsoft Office, text attachments, HTML files, and so on.

Figure 15-24

Figure 15-24 Determining the options for a multimailbox search.

If Exchange meets content that it doesn’t understand, it marks the item as unsearchable. For example, if you use an application that generates files of a type that are only understood if the application is installed on a client workstation, the content indexing agent running on a mailbox server won’t be able to open and index the files. Other items that Exchange deems as unsearchable include items encrypted with Secure Multipurpose Internet Mail Extensions (S/MIME). However, messages protected with Active Directory Rights Management Services remain searchable for discovery purposes.

You can see a list of unsearchable items with the Get-FailedContentIndexDocuments cmdlet. When you run the cmdlet, you can pass it the name of a server to see all items on a server, or just a mailbox database to see the unsearchable items in the content index for that database. For example, here’s how to run the cmdlet followed by an extract of the information returned for an unsearchable item (pipe the results to the Format-List cmdlet to see this information). As you can see, the item shown couldn’t be indexed because a filter wasn’t found for the attachment type.

Get-FailedContentIndexDocuments -MailboxDatabase DB1
RunspaceId       : 5de022fc-bd60-4b22-8f5e-e983550a4f8a
DocID            : 21847
Database         : DB3
MailboxGuid      : ab83c57b-d51c-4527-8f99-5609e0ee96c8
Mailbox          : Ruth, Andy
SmtpAddress      : Andy.Ruth@contoso.com
EntryID          : 000000002BA4E1B5193C7441BCD9110F91902C5A0700A0EAF17663EEB9429D-
934943C3A2409300000000002000005036EA2334225B46B46EA1623061B2A40000017803030000
Subject          : Designing Secure Multi-Tenancy into Virtualized data center (Secure Cloud
Architecture)
ErrorCode        : 2147749142
Description      : Filter not found
IsPartialIndexed : True
Identity         :
IsValid          : True

Should you be worried if many unsearchable items exist for your database? The answer is, “It depends.” First, it depends on the percentage of unsearchable items. If 0.0002 percent of items are unsearchable, then it’s probably acceptable because any search has a very high chance of discovering information that’s required. Second, it depends on the items that are failing to be indexed. If they are all of the same type and a filter is available, you can install that filter to solve the problem. However, if the items are of a type for which a filter is not available or known to be unsearchable (such as S/MIME encrypted items), then you might have to live with the situation.

Normally, a relatively small number of items turn out to be unsearchable. In addition, you should remember that item properties (sender, recipients, subject, and so on) and message bodies are always indexed and searchable, so the fact that a small percentage of attachments can’t be searched is probably not going to be of great concern in a legal search. After all, if people are doing something that they shouldn’t, it’s likely that they will leave some trace of their activity in a searchable property that will be discovered. After this happens, the next step is often for investigators to take a complete copy of the suspect’s mailbox to conduct a detailed search to discover what it contains, and any lurking unsearchable items can be reviewed at that time.

Creating and executing a multimailbox search

A mailbox search can cover every mailbox in the organization (rightmost screen in Figure 15-24) or a select set formed of individual mailboxes or the members of specific distribution groups (but not dynamic distribution groups). You can include other search criteria such as date ranges and specific words or phrases in message bodies and subjects. You can update the search criteria multiple times; each time you do, Exchange will restart the search and discard any items found using the previous criteria. However, if you want to change the criteria for a search while it is being processed, you have to stop the search before you can make any changes.

After you input all the search criteria and click Save, Exchange stores the criteria as search metadata in the default discovery mailbox. At this point, the original version of Exchange 2010 proceeds to execute a full search and will copy any results that it finds into the selected discovery mailbox. Exchange 2010 SP1 offers you the option of running either an estimate or the search. A search estimate is a scan of the content indexes to determine how many hits are likely if you initiate a search with the criteria as provided. You can see these options revealed under the Search Name And Storage Location section of the rightmost screen in Figure 15-24.

An estimate does not actually retrieve the located items and copy them into the discovery mailbox, so it runs much faster than a “copy” search. After Exchange completes its scan to determine the estimate, the number of hits and the mailboxes that contain items are shown in the results pane (Figure 15-25). Because estimates run faster, you can afford to run a number of estimates to refine the search criteria to meet your exact needs. You don’t have to run an estimate before you conduct a full search. If you want to search and copy items without an estimate, just select the Copy Results To The Selected Mailbox option and save the search. However, it makes sense to run an estimate to see just how much data might be found and test the efficiency of the search criteria.

Figure 15-25

Figure 15-25 Viewing a search estimate.

As you refine a search, you’ll probably experiment with the query that lies at the heart of the search. The query is passed in Advanced Query Syntax (AQS) format. Table 15-6 lists the most important query terms that you are likely to use in discovery searches.

Table 15-6. AQS terms that can be used in search queries

Property

Example

Search results

Attachments

Attachment:BadReport.ppt

Attachment:Bad*.pp*

Return any items that have an attachment called BadReport.ppt. The second example shows how to use wildcards to conduct a less specific search.

CC

CC: Joe Healy

CC: JoeH

CC: JoeHealy@contoso.com

Return any message with Joe Healy listed as a CC recipient in the message header. The second and third examples show how to specify a search for an alias or an SMTP address.

From

From: Tony Redmond

From: Tony.Redmond@contoso.com

Return any message sent by Tony Redmond using different forms of his address.

Keywords

RetentionPolicy:Critical

Returns any item that has the Critical retention tag applied to it.

Expiration

Expires: 10/10/2010

Returns any item that expires on October 10, 2010.

Search message recipients

Person: Tony Redmond

Person: TR@contoso.com

Returns any item that has the recipient included as a To:, CC:, or BCC: in the message header. You can pass the display name, alias, or SMTP address.

Sent

Sent: yesterday

Returns messages sent yesterday.

Subject

Subject: “Trading Tip”

Returns all items that include the words “Trading Tip” in the subject.

To

To: Tony Redmond

To: TR@contoso.com

Returns any message that has Tony Redmond listed as a To: recipient. You can use the display name, alias, or SMTP address.

You can run as many search estimates as you want. To change the search criteria, click the Details icon and amend details such as the mailboxes to search or the phrases for which you are looking. It’s entirely possible that your first attempt to create a search could result in an estimate of thousands of hits across hundreds of mailboxes when you expect to find just a few items. This might force you to narrow (or sometimes widen) the search criteria. For example, you might exclude some mailboxes from the search or include some new terms that you think will help to locate just the right information. On the other hand, you might be happy that you’ve found so much data. From Exchange 2010 SP1 onward, ECP and EMS both default to deduplicated searches.

Once you’re happy that the search will find the data that you’re interested in, you can click the Details icon to change the type of search and instruct Exchange to copy the matching items. You also need to decide whether Exchange should save a copy of each matching item in the discovery mailbox or if it should reduce the number of items that it copies by only capturing the first copy found. After you save the updated search parameters, Exchange will then conduct the full search and copy any items that it locates into the discovery mailbox.

Mailbox searches are performed in the background. You can wait for the search to be complete or have Exchange notify you with an email (Figure 15-26). The next step is then to access the discovery mailbox where Exchange has copied the items found by the search.

Figure 15-26

Figure 15-26 Notification message after a successful search.

Accessing search results

By default, there is a single discovery mailbox in an Exchange organization. As described in Chapter 6, you can create additional discovery mailboxes to use to hold search results, and you have to select a discovery mailbox to use when you create a new search. The results of the search, including copies of all items that match the search criteria, will be placed in the selected discovery mailbox. If you have the necessary permission to access the discovery mailbox, you can enter its name in the in the Switch Mailbox control (Figure 15-27). The name of the default discovery mailbox is long, but you can enter the first few characters and then press Ctrl+K to have Outlook Web App validate the mailbox name. Thereafter, Outlook Web App will remember the mailbox and you can select it easily from a drop-down list of mailboxes if you need to access the discovery mailbox again. Of course, you can also use Outlook to open the discovery mailbox by configuring a suitable profile.

Figure 15-27

Figure 15-27 Switching to the discovery mailbox.

Within the discovery mailbox, Exchange inserts the items located by a search into a set of folders called after the name that you gave to the search. For example, if you call the search “Illegal stock trading investigation,” Exchange will create a root folder of this name in the discovery mailbox and then create a child folder underneath for each mailbox where a matching item was found. The date and time of the search (the date and time of the server rather than the client workstation that starts the search) is appended to the mailbox name to clearly identify different searches that have occurred and to provide a solid time line for when evidence is gathered for an investigation. If you open the folder for a mailbox (Figure 15-28), you see all of the folders from which items have been copied in both the primary mailbox and the personal archive (if the mailbox has one). You can then click on the items to review their content and decide whether they are of real interest to your investigation. Incriminating evidence can be retained and any useless thoughts of idle minds discarded.

Figure 15-28

Figure 15-28 Viewing search results in a discovery mailbox.

If you use Active Directory Rights Management Services (see the section “Protecting content” later in this chapter), searches might uncover items that are protected because a user has applied an Information Rights Management (IRM) template to them. When an item is protected, its content can only be read by the sender, the intended recipients, and members of the Active Directory Rights Management Services (AD RMS) Super Users group; the team that is reviewing the contents copied into the discovery mailbox won’t be able to see anything but the message header data (Figure 15-29). This information might be enough to eliminate an item from the list of those that an investigator wants to see, but more often it’s an indication that makes an item even more interesting to an investigator.

Figure 15-29

Figure 15-29 Viewing a protected message uncovered by a mailbox search.

Typically, the AD RMS Super Users group only contains the federated system mailbox, as its membership allows Exchange to decrypt protected messages as they pass through the transport system and apply transport and journal rules as required. In the RTM version of Exchange 2010, to allow investigators to view protected content, we therefore have to make the discovery mailbox a member of the AD RMS Super Users group for as long as the investigators need to review items uncovered by the search. The discovery mailbox uses a disabled account, and this also has to be enabled. These actions will allow the AD RMS server to provide the necessary credentials to the discovery mailbox to reveal the hidden content to the investigators. It seems strange to insist that the discovery mailbox account must be enabled to allow access to protected content, but AD RMS can only provide credentials to enabled accounts. The act of enabling the discovery mailbox should be approved and audited by some authority within the company because enabling the account creates a higher risk that someone could have unauthorized access to its contents.

Enabling accounts that should remain disabled is clearly an unacceptable workaround to a problem that should be fixed in software. Microsoft addressed the issue in Exchange 2010 SP1 by introducing a new parameter for the IRM configuration cmdlet to instruct Rights Management to allow access to protected content for legal investigators. To make everything work, you have to run the Set-IRMConfiguration cmdlet as follows:

Set-IRMConfiguration -EdiscoverySuperUserEnabled $True

Deduplication of search results

An item that you are searching for might exist in multiple mailboxes. You don’t necessarily want to copy every single occurrence of the message from every mailbox in which Exchange finds it. Apart from the system overhead that is incurred to copy and store every instance of a message found in the searched mailboxes, providing extra copies of messages will drive up the cost of responding to legal discovery actions if the lawyers or other individuals who review the search results are paid on a per-item basis. Deduplication is therefore a very useful feature, with the only drawback being that storing the first discovered copy of an item sent to a distribution group does not prove that an individual received the item. You’d need to find the item in their mailbox to prove this.

You instruct Exchange to deduplicate search results by selecting the Copy Only One Instance Of The Message option under the Search Name And Storage Location section. When Exchange copies items for a deduplicated search, it places a single copy of each unique item in a single folder called “Results” and the date and time of the search under a root folder for the search name (Figure 15-31). The message identifier, which is a unique value established when items are first created, is used as the basis of deduplication.

Figure 15-31

Figure 15-31 The folder structure created by a deduplicate search.

The users who perform discovery searches are not necessarily those who can access the results of the searches that are placed in discovery mailboxes. As discussed in Chapter 6, you need to assign full access permission to the discovery mailbox to a user before he will be able to open it to access the search results. By default, members of the Discovery Management role group should be able to access the default discovery mailbox, but you have to explicitly grant full access to any other discovery mailboxes that you create for use in mailbox searches.

A clear separation therefore exists between the following:

  • Membership of the Discovery Management role group, which is required to be able to create and execute mailbox searches.

  • Full access to the discovery mailbox used for a mailbox search, which is required to be able to open the discovery mailbox and review the items copied there by the mailbox search.

The separation in the two requirements allows for a division of responsibilities between those who are responsible for responding to requests for information (often the IT department) and those who will review the retrieved information forensically to look for evidence or other information that is important to an investigation (often the legal department). You might therefore create discovery mailboxes to hold information retrieved for different types of searches so that you can restrict access to those mailboxes to ensure that confidential material is always treated in a correct and legally defendable manner. Some discovery mailboxes might be used for straightforward legal discovery actions and be under the control of the legal department, whereas others might be used for the pursuit of internal complaints against an employee for something like sexual harassment and be restricted to selected members of the HR department.

Search logging

Exchange generates a log for every mailbox search unless you suppress it by setting the –LogLevel parameter for the search to Suppress. By default, the search log captures basic information about the parameters used for the search as well as the results of the search. You can also increase the logging level to Full, in which case Exchange captures information about the items that are captured by the search in an attachment. The search report and any attachment are stored in the top-level folder created for the search in the discovery mailbox. Figure 15-32 shows a typical search report. You can see that it has an attachment, so this indicated that the logging level was set to full (you can also see this in the search parameters).

You can conduct a search and copy results multiple times. However, if you do this without creating a new search, Exchange removes the previous search results from the discovery mailbox before it copies items as a result of the new search. Therefore, you have to create and execute a new search if you want to keep the results of a previous search.

Figure 15-32

Figure 15-32 Viewing a search log report.

Search annotation

The ability to annotate search results is a new feature in Exchange 2010 SP1. Basically, the idea is that the people who look through search results should be able to mark the items that are of interest. Exchange accomplishes this through some special user interface that Outlook Web App exposes whenever a user logs into a discovery mailbox.

Figure 15-33 shows how annotation works. The Open Message Annotation option is exposed in the shortcut menu. This opens a simple text box to allow users to input whatever text they deem fit to mark the item. For example, they might mark items with a case reference or other indicator. Later on, they can search the mailbox for the marked items to see the collection of items of interest. There’s no feature provided to export annotated items from the discovery mailbox if you need to provide copies for use elsewhere, but it’s easy to copy the items to a folder and then use the New-MailboxExportRequest cmdlet to export the folder to a PST. Alternatively, you can open the discovery mailbox with Outlook and drag and drop the copied items into a PST.

The annotation is only visible through Outlook Web App and can’t be accessed with other clients.

Figure 15-33

Figure 15-33 Annotation of search results.

Executing searches with EMS

ECP is a very convenient interface to create and initiate searches, but you can also do the same through EMS using a set of cmdlets that are only exposed if you are a member of the Discovery Management role group. These cmdlets are as follows:

  • New-MailboxSearch Creates and initiates a new mailbox search.

  • Get-MailboxSearch Retrieves details of a mailbox search.

  • Set-MailboxSearch Changes the search criteria for a search that has already been created.

  • Start-MailboxSearch Restarts a mailbox search.

  • Remove-MailboxSearch Removes a mailbox search. This action also removes all of the items found by a search from the discovery mailbox.

For example, a new search to look for information about potential illegal stock trading by company officers could be initiated with this command:

New-MailboxSearch -Name "Stock Trading Discovery 2" -SourceMailboxes 'Company
Officers' -TargetMailbox 'DiscoveryMailbox@contoso.com' -StartDate '10/01/2010'
-EndDate '11/30/2010' -SearchQuery "XXE Stock tip"
-StatusMailRecipients 'LegalSearch@contoso.com' -SearchDumpster -DoNotIncludeArchive
-EstimateOnly
-IncludeUnsearchableItems -ExcludeDuplicateMessages:$False -LogLevel Full

Table 15-7 lists the most important parameters that you are likely to use with the New-MailboxSearch cmdlet and their meaning.

Table 15-7. Important parameters for the New-MailboxSearch cmdlet

Parameter

Meaning

Name

A unique identifier for the search that should be something meaningful, such as “Illegal stock trading review.”

SourceMailboxes

Specifies the mailboxes that Exchange will search. If you have more than a few mailboxes to search, it is more convenient (and probably more accurate) to create a distribution group to identify the mailboxes to include in the search. If you don’t specify the –SourceMailboxes parameter, Exchange searches all mailboxes.

TargetMailbox

Specifies the SMTP email address of the discovery mailbox where you want to store the search results. The default discovery mailbox has a rather long and complicated email address so I usually assign a new and shorter secondary email address to the mailbox to make it easier to type. In fact, this mailbox doesn’t have to be a discovery mailbox, as Exchange is happy to place search results in any mailbox that you select.

SearchQuery

An AQS-format query that Exchange will execute to locate items in the target mailboxes. In the example shown, Exchange will match any of the words in the search query. This search query is a very simple one and some trial and error is probably required to arrive at the best query. If you omit the search query, Exchange will find every item in every mailbox that you include in the search and store copies of all those items in the discovery mailbox. This kind of search can swamp a server with work.

StatusMailRecipients

Tells Exchange the recipients who should be notified by email after the search is complete. No message is sent if you don’t provide a value for this parameter. You can provide one or more recipient SMTP addresses to receive notifications, separating each address with a comma. It’s often more convenient to use a distribution group for this purpose.

SearchDumpster

Forces Exchange to include the contents of the dumpster in the search. All searches executed through ECP include this parameter. As shown in Figure 15-30, any items from the Dumpster that are found by a search are placed in the Recoverable Items folder in the discovery mailbox.

DoNotIncludeArchive

Instructs Exchange to ignore items stored in any personal archives that are assigned to mailboxes.

EstimateOnly

Tells Exchange that it is to run a search estimate only rather than to copy items that match the search criteria to the discovery mailbox.

ExcludeDuplicateMessages

Tells Exchange how to deal with duplicate items that it encounters in mailboxes. Set the parameter to $True to force Exchange to deduplicate (only copy a single instance of an item) or $False to copy every copy of an item that it finds.

LoaLevel

Dictates the level of logging that Exchange performs for the search. Valid options are Suppress, Basic (default), and Full. If Basic or Full are chosen, Exchange creates a search report in the root folder for the search in the discovery mailbox.

The Get-MailboxSearch cmdlet tells us what happened to a search. All known searches are revealed. For example:

Get-MailboxSearch | Format-Table Name, Status, PercentComplete, ResultSize,
ResultNumber -AutoSize
Name                                 Status      PercentComplete  ResultSize             ResultNumber
-------                              ---------   --------------   ----------             -----------
Review Dumpster content              InProgress  39 112.8 MB      (118,262,783 bytes)    395
Deduplicated search                  Failed      9 3.061 MB       (3,209,944 bytes)      20
XXE Investigation March 2010         InProgress  87 1.132 GB      (1,214,974,519 bytes)  730
CEO Discovery                        Succeeded   100 136.1 MB     (142,687,323 bytes)    161
XXE Investigation Feb 2010           Succeeded   100 134.8 MB     (141,344,252 bytes)    156
Stock Trading Discovery 3            Succeeded   10020.42 MB      (21,413,083 bytes)     536
Stock Trading Discovery 2            Succeeded   1005.269 KB      (5,395 bytes)          2
Illegal stock trading investigation  Succeeded   100 9.008 KB     (9,224 bytes)          2

The information we are interested in here is the status (this will be Estimate Succeeded, Succeeded, InProgress, or Failed) and the number of items found by the search. The size of the items is interesting if we expect to find a large attachment. As you can see from the search called “XXE Investigation March 2010,” a search can generate a lot of information. In this case, the search located a number of very large objects (730 objects for 1.132 GB at 87 percent complete), so it will be interesting to check the contents of the discovery mailbox to find out just what these objects are.