Document Autoclassification Technologies

Brian Tuemmler

January 11, 2021

A MasterClass – ification for unstructured content

Post 1

This is an exciting time to be in the world of AI, machine learning, and autoclassification for unstructured information. This is a broad topic, so I want to focus on very specific elements that you can search on to auto-apply classifications and where they work well. In general, they work well for classifying large volume of unstructured content (many documents). The key take-away here is that different tools and techniques are good for different purposes and expecting a one-size-fits-all approach will disappoint.

Organizations face this problem whenever they try to clean up shared drives, search for privacy information, decommission legacy ECM systems, and before or after they have migrated content to cloud governed locations like M365. Some of these capabilities are useful and available for shared drives, and some for M365, many for both. If you are starting with shared drives, you need to either find or repurpose some tools (they may already exist in your legal or investigations function) or you can move content to a Microsoft environment and take advantage of the capabilities there. Within M365, only a limited number of file formats (Microsoft ones) can be classified based on their content. I would recommend the first option so as not to waste time moving content that should remain as is.

I have separated the short segments into different blogs so that it can be read in easy digestible bits. Before we start looking at classification techniques, a few concepts need to be defined to better know how to pick the right technique.

Content Auto-Classification – Source and Purpose

The different classification techniques are classified as either context or content based. (See what I did there?). There are:

Auto-Classification using Context: File Extensions, Metadata and Properties and

Auto-Classification using Content: Keywords, Number and Word Patterns

Auto-Classification using Content: Similarity and Topic Comparisons

Finally we will look at some strategies for classifying large numbers of documents

Auto-Classification Strategies

Document autoclassification is a broad topic and there are lots of details or depth I cannot cover. My goal is to get you off the ground to understand your options. If you would like me to include additional details in this series, please reach out. I am also happy to answer questions. Autoclassification techniques and taxonomies should be considered in the context of a larger M365 content management paradigm. We are here to help.

Next in series

By submitting this form you agree that Infotechtion will store your details and send future resources. You may opt-out any time.

AI, auto-classification, m365, Machine learning

Data Security as a Service: Elevating Your Data Management Strategy

Configuring Multi-Language Policy Tips in Microsoft Purview DLP: A Comprehensive Guide

6. Project Management Basics – The PM Toolkit

A Guide To Data Governance As A Service Operating Model

Email

contact@infotechtion.com

Socials

Get in touch today.

Let’s start an amazing project together. We’re excited to hear about your ideas and work with you to turn them into reality. Contact us today to get started.

For any questions

Sitemap

Learn more

Contact