Document Autoclassification Technologies

Brian Tuemmler


A MasterClass – ification for unstructured content

Post 1

This is an exciting time to be in the world of AI, machine learning, and autoclassification for unstructured information. This is a broad topic, so I want to focus on very specific elements that you can search on to auto-apply classifications and where they work well. In general, they work well for classifying large volume of unstructured content (many documents). The key take-away here is that different tools and techniques are good for different purposes and expecting a one-size-fits-all approach will disappoint.

Organizations face this problem whenever they try to clean up shared drives, search for privacy information, decommission legacy ECM systems, and before or after they have migrated content to cloud governed locations like M365. Some of these capabilities are useful and available for shared drives, and some for M365, many for both. If you are starting with shared drives, you need to either find or repurpose some tools (they may already exist in your legal or investigations function) or you can move content to a Microsoft environment and take advantage of the capabilities there. Within M365, only a limited number of file formats (Microsoft ones) can be classified based on their content. I would recommend the first option so as not to waste time moving content that should remain as is.

I have separated the short segments into different blogs so that it can be read in easy digestible bits. Before we start looking at classification techniques, a few concepts need to be defined to better know how to pick the right technique.

Content Auto-Classification – Source and Purpose

The different classification techniques are classified as either context or content based. (See what I did there?). There are:

Auto-Classification using Context: File Extensions, Metadata and Properties and

Auto-Classification using Content: Keywords, Number and Word Patterns

Auto-Classification using Content: Similarity and Topic Comparisons

Finally we will look at some strategies for classifying large numbers of documents

Auto-Classification Strategies

Document autoclassification is a broad topic and there are lots of details or depth I cannot cover. My goal is to get you off the ground to understand your options. If you would like me to include additional details in this series, please reach out. I am also happy to answer questions. Autoclassification techniques and taxonomies should be considered in the context of a larger M365 content management paradigm. We are here to help.

Next in series

 © 2024 Infotechtion. All rights reserved 


By submitting this form you agree that Infotechtion will store your details and send future resources. You may opt-out any time.

Recent posts

Job application.

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorestandard dummy text ever since.

Please fill the form

Job application.

Join Infotechtion for an impactful career filled with passion, innovation, and growth. Embrace diversity, collaboration, and continuous learning. Discover your potential with us. Exciting opportunities await!

Please fill the form

By submitting the form, you confirm that you do not require a visa sponsorship to work in the country of application.