Increasing Importance of Data Quality
Enterprises today face several challenges in addressing organizational data issues and conflicts resulting from the mismanagement of information and data. Today’s Enterprise is a bunch of companies, each using their own system/application to manage. The challenge and impact is on operational efficiency - cost and compliance. For instance, two of the divisions could be using the same vendor and they could have multiple prices for each of them; you might miss the opportunity to negotiate as one organization. Further, the challenge is on the Enterprise analytics. If you were to analyze all the customers, say based in US, to find out what revenue are they contributing, if any of those divisions were not to maintain proper location details of the customer, the results will be incorrect.
The bigger challenge for large companies is that they have a quite difficult time implementing data quality projects as they often get bogged down in aspects such as data governance that concern organizational politics as the responsibility spreads over multiple departments and organizations. They should build teams and set policies for Data quality Governance structure, assign roles, responsibilities and accountability to target desired levels of data quality.
Enterprises today also have to deal with duplication of data as currently they are organized as multiple independent divisions with their own systems/applications. The best way to handle this is to define “Data Quality” standards and share it across all divisions, introduce and set Data Quality index which is assigning weights to data quality calculations, and measure, monitor and improve Data Quality at every touch point.
Further, I see an opportunity to consolidate the data in the new paradigm - Enterprise Data Lake, built on top of Apache Hadoop, with Extract and Load only, unlike ETL which is Extract, Transform and Load into a Data Warehouse. Having raw data helps in filtering the duplicates and providing accurate analytics.
Challenge also emerges in the form of uncontrolled data distribution resulting in legal and regulatory non-compliance. To prevent it, enterprises should collect operational reports in Enterprise Data Lake and distribute/share the reports from that. They should assign Data Quality folks to run analytics on Enterprise Data Lake for multiple versions, duplications, and effectively monitor the impact of data quality on the business. This is now possible with Big Data Solutions as it provides scalable computing and storage with low cost CPU.
When it comes to improving the performance, Data Mining is going to play an important role for enterprises. The challenge is that many enterprises have not addressed this effectively as it costs money and the return on these investments is often long term, which attracts less attention from budgeting perspective. However, enterprises will continue to compete with analytics as they move ahead and Data Quality will be the key for that. As enterprises embrace global competition on their way forward, customer analytics will become competitive differentiation for enterprises and given that it is highly correlated data, data quality will play an important role.