Document Autoclassification using Content: Similarity and Topic Comparisons
![](https://infotechtion.com/wp-content/uploads/2023/07/146bdc_26af936e39f84918afcc6469745ab0c7mv2.webp)
Similarity and Topics Similarity classifies by determining how close one document is to another. This is one area where AI is being leveraged. There are a number of variants to this capability but generally, if you know that one thing is a true representation of what you are looking for (an “exemplar”), other things that […]
Document Autoclassification using Content: Keywords, Number and Word Patterns
![](https://infotechtion.com/wp-content/uploads/2023/07/a94034c1ea2a436da066828d0ac997e9.webp)
Keywords A word (or set of words) can be associated with a type, metadata or security. Usually, a keyword to find a type of content (like using the word “contract” to find a contract) is problematic for a number of reasons and is not very useful. You will find contracts, but you will also find […]
Document Autoclassification using Context: File Extensions, Metadata and Properties
![](https://infotechtion.com/wp-content/uploads/2023/07/146bdc_bc89de934fcb41ed86d0125a8c50387bmv2.webp)
File extensions This is the easiest entry point into classifying content because it can be done with a good >DIR command and a spreadsheet. It is done without needing to access and open the content – which makes it fast. It can also be done very efficiently from the cloud. The file extension mostly tells […]
Document Autoclassification: Source and Purpose
![](https://infotechtion.com/wp-content/uploads/2023/07/f4ef069707a041f8884add9497584f48.webp)
Before we start looking at classification techniques, a few concepts need to be defined. Source data The information we rely on to classify content automatically comes from three sources within a single document. Format – Format often includes the coding that allows a specific application to work with it. It can also include any structure […]
Document Autoclassification Technologies
![](https://infotechtion.com/wp-content/uploads/2023/07/11062b_05abb5e4783d4fe1b5a72408d2ee2f85mv2.webp)
A MasterClass – ification for unstructured content Post 1 This is an exciting time to be in the world of AI, machine learning, and autoclassification for unstructured information. This is a broad topic, so I want to focus on very specific elements that you can search on to auto-apply classifications and where they work well. […]