It’s really about defining what is sensitive in your organization, and we could use sensitive information types (SIT) to identify that data. There are many built-in options from Microsoft, and you could build your own in many ways. And also use trainable classifiers to use Machine learning. We could then use these in Sensitivity labels, DLP, retention labels, etc.
You should read about sensitive information types to understand in-depth how they are used to classify sensitive content in the organization.
You first need to run the following commands in PowerShell to install and import the Exchange Online PS Module:
Install-Module ExchangeOnlineManagementImport-Module ExchangeOnlineManagement
This example connects to Security & Compliance Center PowerShell in a Microsoft 365 organization:
In Microsoft Purview, under Data Classification, you find Sensitive Information Types, where you can look at the Microsoft-created ones or create a custom. I have created a very simple SIT that finds combinations of the “Project Hodor” combinations. This project is top secret; we could use the SIT to see when people create a document, write in chat or channels, and add protection. With Data Loss Prevention, we can do more things like removing sharing possibilities.
We can use Get-DlpSensitiveInformationType to get all or a specific SIT:
Get-DlpSensitiveInformationType -Identity "Project Hodor Information" | Format-List
All custom SIT you create are added to the rule package named Microsoft.SCCManaged.CustomRulePack.
$rulePackId = (Get-DlpSensitiveInformationType -Identity "Project Hodor Information").RulePackId
(Get-DlpSensitiveInformationTypeRulePackage -Identity $rulePackId).ClassificationRuleCollectionXml | Out-File -Encoding bigendianunicode -FilePath "C:\users\simon\desktop\new.xml"
I opened the file in Visual Studio Code and formatted it for clarity.
We will cheat and create a new SIT from our existing XML. For learning what you can customize, read this customize-a-built-in-sensitive-information-type
Because rule packages and rules are identified by their unique GUIDs, you must generate two GUIDs (New-Guid in PowerShell). Use the first to change the RulePack id value. The second one should replace Entity id and Resource idRef values (matching).
Make some name changes to the localization to make a new name for the custom SIT.
New-DlpSensitiveInformationTypeRulePackage -FileData (Get-Content -Path "C:\users\simon\desktop\new.xml" -Encoding Byte -ReadCount 0)
And now we can find the new SIT in the UI
Just as an example, let’s add a keyword to the keyword list in the XML and save:
Make the following call, and confirm the update with Y:
Set-DlpSensitiveInformationTypeRulePackage -FileData (Get-Content -Path "C:\users\simon\desktop\new.xml" -Encoding Byte -ReadCount 0)
Now we have an updated keyword list, and this is how it looks from the UI:
We will further look at Microsoft Purview Information Protection and PowerShell in some upcoming posts. Now that we have a custom SIT we can also look at how to incorporate it with labels and DLP policies as well.
Thank you for reading