Data mapping is a key activity while determining information architecture, information flow, and manage data privacy / regulatory risks. Once you know the risks associated with information, a data map can be an effective tool in prioritizing the transition of information to a compliant destination or making disposition decisions.
What is the Interactive Information Data Map?
Interactive Data Mapping Process
-
Identify and inventory what types of the information resides in different repositories
-
Understand available destination locations based on functional criteria that map to source data use cases
-
Classify content to build a high-level taxonomy and risk category (High, Med, Low value/risk)
-
Act to build a transition plan based on priority, cost, availability, risk. Map content to the destination.
How do you create a Data Map?
Firstly, identify and understand your top challenges, current practices, and behaviors.
-
Is there a single solution to finding content for multiple IG use cases?
-
Are you able to ensure compliance with changing regulations, e.g. GDPR, CCPA?
-
Do users access information across multiple ‘content- silos’ to complete a single activity?
-
Volumes and adoption of new technologies happen too quickly
-
Are you still relying on users to demonstrate compliance instead of automated policies?
-
Are you able to discover relevant information in a timely manner to support your litigation, RM, and compliance processes?
Also, consider the following aspects when looking to optimize the value proposition of creating an interactive data map
-
Business mandates
-
Litigation response
-
IT support
-
Privacy risk
-
Security protection
-
RIM compliance
Develop a high-level taxonomy classification for mapping your ungoverned information and high-level decisions in the information lifecycle. The following diagram shows a typical data-map created for ungoverned information stored in multiple content platforms. In the absence of a data mapping, most organizations end up moving data into long term archives further increasing compliance and knowledge management risks.
A sample data map for information in multiple content repositories
Data Mapping Decision Options
Not all content is made equal and hence it is essential to understand the value proposition of each content category and then use that information to make informed data mapping fro its transition and management on an appropriate platform.
-
Normal Content: Content can be classified, organized or migrated with minimal complications and will benefit from record-level management because it matters
-
Complex Content: Content is grouped or linked logically or physically in a way that mandates either particular archive functionality, or migration tactics
-
Problematic Content: Content impacted by functional and environmental variables that make a choice of archive location narrow and specific
-
Operational Content: Value and nature of the content is low enough that complex or advanced management is not necessary
-
Low-value Content: Minimal effort should be expended to keep or maintain. Focus on delete.
Once you understand the content, use the following guidelines as a reference to develop an intelligent mapping which really is much more than mapping source and destinations.
Reference design: Intelligent information mapping
Data Mapping – Sample Technologies
Development of a data map without specialist expertise and automation tooling can be a time consuming and difficult task to complete. Consider the following technology options when developing your data mapping and information transition strategy.
Machine Content Learning
-
Bulk keyword, regex, proximity
-
Fuzzy search
-
Topic and wordlist
-
Google Deep learning – photographs
-
High volume, repeatable, analytics classification based on subject, words, and content
Machine Context Learning
-
Attributes, ownership, timelines
-
Geospatial
-
Quality metrics
-
High volume, repeatable, multifaceted classification based on function, attributes, and properties
Artificial Intelligence
-
Exemplar hunting
-
Exemplar near-duplicate cluster matching
-
Automatic classifiers – Bayesian
-
Clustering and learning based on content similarity and differences
What Should I Do Now?
The benefits of intelligent information governance supported by comprehensive data mapping are significant which can not only deliver defensible compliance but also can deliver a guaranteed return on investment by creating value out of information. Infotechtion can support your organization’s objectives and accelerate your journey to transition your legacy information into invaluable knowledge.
Here is a high-level overview of the Infotechtion methodology to support/advise your IT and business teams through the information governance journey.
Partner with Infotechtion to empower your information governance initiatives and maximize value from your ongoing migration/transformation programs. Some of our customers have realized the following benefits by partnering with Infotechtion experts.
Strategic
-
Visibility into the highest value corporate assets
-
Optimize litigation, investigation and mitigation efforts
-
Alignment with strategic goals, IG stakeholders and external regulators
Operational
-
Focused on real tasks rather than manual migration
-
Ability to focus on real threats and issues from attack
-
Consistency and repeatability
Tactical
-
55% reduction in content needing review
-
10x speed increase in the migration process
-
Compressed response time
I have shared below a representative solution blueprint if the target platform is based on Microsoft 365 information governance suite.
A comprehensive architecture for Microsoft Information Governance
A CASE STUDY
Major Oil and Gas company legal department in the US wanted to transition to a more stable compliance footing for content management. Emails and shared drives proved to be more troublesome and difficult for users, which made their transition more also difficult. We ran advanced index and classification tools to discover, classify, and organize data to facilitate the transition process to the ECM. Examining topics, terms, threads, dates, formats, and users, we developed a priority and triage process based on content value and risk.