A MasterClass – ification for unstructured content
Post 1
This is an exciting time to be in the world of AI, machine learning, and autoclassification for unstructured information. This is a broad topic, so I want to focus on very specific elements that you can search on to auto-apply classifications and where they work well. In general, they work well for classifying large volume of unstructured content (many documents). The key take-away here is that different tools and techniques are good for different purposes and expecting a one-size-fits-all approach will disappoint.
Organizations face this problem whenever they try to clean up shared drives, search for privacy information, decommission legacy ECM systems, and before or after they have migrated content to cloud governed locations like M365. Some of these capabilities are useful and available for shared drives, and some for M365, many for both. If you are starting with shared drives, you need to either find or repurpose some tools (they may already exist in your legal or investigations function) or you can move content to a Microsoft environment and take advantage of the capabilities there. Within M365, only a limited number of file formats (Microsoft ones) can be classified based on their content. I would recommend the first option so as not to waste time moving content that should remain as is.
I have separated the short segments into different blogs so that it can be read in easy digestible bits. Before we start looking at classification techniques, a few concepts need to be defined to better know how to pick the right technique.
Content Auto-Classification – Source and Purpose
The different classification techniques are classified as either context or content based. (See what I did there?). There are:
Auto-Classification using Context: File Extensions, Metadata and Properties and
Auto-Classification using Content: Keywords, Number and Word Patterns
Auto-Classification using Content: Similarity and Topic Comparisons
Finally we will look at some strategies for classifying large numbers of documents
Auto-Classification Strategies
Document autoclassification is a broad topic and there are lots of details or depth I cannot cover. My goal is to get you off the ground to understand your options. If you would like me to include additional details in this series, please reach out. I am also happy to answer questions. Autoclassification techniques and taxonomies should be considered in the context of a larger M365 content management paradigm. We are here to help.