Document Autoclassification using Content: Similarity and Topic Comparisons
Similarity and Topics Similarity classifies by determining how close one document is to another. This is one area where AI is being leveraged. There are a number of variants to this capability but generally, if you know that one thing is a true representation of what you are looking for (an “exemplar”), other things that […]
Document Autoclassification using Content: Keywords, Number and Word Patterns
Keywords A word (or set of words) can be associated with a type, metadata or security. Usually, a keyword to find a type of content (like using the word “contract” to find a contract) is problematic for a number of reasons and is not very useful. You will find contracts, but you will also find […]
Document Autoclassification using Context: File Extensions, Metadata and Properties
File extensions This is the easiest entry point into classifying content because it can be done with a good >DIR command and a spreadsheet. It is done without needing to access and open the content – which makes it fast. It can also be done very efficiently from the cloud. The file extension mostly tells […]
Document Autoclassification: Source and Purpose
Before we start looking at classification techniques, a few concepts need to be defined. Source data The information we rely on to classify content automatically comes from three sources within a single document. Format – Format often includes the coding that allows a specific application to work with it. It can also include any structure […]
Document Autoclassification Technologies
A MasterClass – ification for unstructured content Post 1 This is an exciting time to be in the world of AI, machine learning, and autoclassification for unstructured information. This is a broad topic, so I want to focus on very specific elements that you can search on to auto-apply classifications and where they work well. […]