Automatic classification of files and emails in Microsoft 365 using Microsoft Purview or SharePoint Premium

Atle Skjekkeland

January 22, 2025

Employees are usually too busy to care about compliance. We need to automate security and governance as best as possible, and I will in this blog post outline ways to automate data classification in Microsoft 365 to ensure data security and governance. This is for auto-applying sensitivity labels to secure sensitive files and emails and avoid unauthorized access and data loss, but also for auto-applying record labels to ensure important files and emails are governed to ensure integrity, authenticity, reliability, and availability. It can also be used to auto-apply retention labels to ensure privacy data is deleted once it has met its purpose, e.g., job applications or resumes stored in non-approved locations.

Important: Please note that sensitivity labels only work for modern office files and PDFs in SharePoint and OneDrive, and emails in Exchange. Sensitivity labels follow the files when sent out of your Microsoft 365 tenant. Record labels work for all file types in SharePoint and OneDrive for Business, and emails in Exchange, but they do not follow your files and emails when sent out of your Microsoft 365 tenant.

Important: Ensure sensitivity labels are applied on files before record labels since record labels lock the file from changes. This interoperability issue will be resolved later by Microsoft.

Options for auto-applying sensitivity and/or record labels:

Option 1: Automatic classification based on storage location – set default sensitivity label for library that files automatically inherit (e.g. Teams and SharePoint sites classified as confidential can have a default item confidential sensitivity label that files automatically inherit), and/or default record label for folder (e.g. Final documentation folder with a default record label that files automatically inherit).

Pros: Ensure files are automatically classified based on storage location
Cons: Users need to have a common understanding of where files should be stored, e.g. final folders for the correct default record label

Option 2: Automatic classification based on metadata – use Content Types to set mandatory metadata on SharePoint sites, and then auto-apply record label (but not sensitivity label) based on metadata, e.g. record category and document status changed from draft to final.

Pros: Users only must change document status to declare records
Cons: Require managed metadata to be implemented. Require users to change metadata for auto-applying record labels. Does not work for auto-applying sensitivity labels

Option 3: Automatic classification of files and emails based on content using out-of-the-box (OOTB) or custom Sensitive Information Types (SITs) – use 200+ OOTB or custom SITs to automate labelling based on keywords, dictionaries, or word patterns. Visit Learn about sensitive information types | Microsoft Learn for more information.

Pros: Automatic classification based on content
Cons: Require testing and tuning to ensure correct classification. For auto-applying record labels, apply first a retention label to ensure work-in-progress files are not locked before files are final, and then auto-change retention to record label after 2 months.

Option 4: Automatic classification of files and emails based on content using pre-trained or custom Trainable Classifiers (machine learning) – use 10+ pre-trained classifiers or create new based on seed and training files. Visit Learn about trainable classifiers | Microsoft Learn for more information.

Pros: Automatic classification based on content
Cons: Require testing and tuning with lots of positive and negative files to ensure correct classification. For auto-applying record labels, apply first a retention label to ensure work-in-progress files are not locked before files are final, and then auto-change retention to record label after 2 months.

Option 5: Automatic classification of files based on content using SharePoint Premium (formerly known as Syntex) – train an AI model (e.g., document understanding or form processing) in SharePoint Premium to classify content.

Pros: Automatic classification based on content, Empowers super users to set up auto-classification
Cons: More complex to manage auto-classification rules across a tenant.

Important: All options require E5 license, E5 Compliance add-on license, or E5 Information Protection and Governance (IPG) add-on license to E3. In addition, SharePoint Premium is consumption-based pricing.

Tips for auto-classifying files based on content: Using document templates in the business (e.g., Decision Memo template) for key documents will help to ensure accurate classification.

Please contact us at contact@infotechtion.com to schedule a meeting to discuss any of this in more detail.

By submitting this form you agree that Infotechtion will store your details and send future resources. You may opt-out any time.

compliance, information governance, information management, Information Protection, m365, Microsoft 365, Microsoft Purview, SharePoint, SharePoint Premium