Why Data-Driven Organizations Need to Drive Out Dirty Data
Carissa Zukowski, Engineering Manager
Business should be data-driven. This just sounds like common sense, right? Today, we have the technology that can process huge amounts of data to deliver business value through machine learning and artificial intelligence. Big Data and AI are not as mystical as people think – we’re not in The Matrix yet! – but machine learning and optimization algorithms will only yield results if the information you put into the equation maintains a standard of quality. There are rules that must be followed to ensure data quality and proper data management, and that level of consistency is often overlooked as companies seek to leverage the power of AI.
At Wise Systems, our business is focused on last-mile logistics and providing customers with a perfect delivery experience. Our software continuously learns from fleet data, helping optimize and improve fleet performance over time. To do that, we need to work with clean and robust data sets. That’s why we’re sticklers about dirty data. How do we categorize dirty data? Here are some of the issues we encounter as we work to improve fleet efficiency:
Missing data –Important information like an address or delivery time windows for a customer may be missing
Stale data – Perhaps someone has left the company and the contact information for a customer is no longer relevant.
Inaccurate data – A vehicle capacity may be 42,000 pounds, but someone typed in 4,200 instead – this would lead to a lot of nearly empty trucks!
Incompatible data – The inventory is measured in weight in the OMS, however, truck capacity for routing is in CEs or units.
Changing rules for data input – Suppose you have one field in your ERP to track delivery windows. Previously, this field was used to track open and close times of businesses. Now, some sales representatives use it to capture the actual delivery time window. There is no way for you to determine which time windows reflect which truth.
When you start by creating a holistic data model to collect and catalog your data, you’re heading in the right direction, but you need to answer the big questions first: “What will this data be used for?” and “How do we plan on solving problems with the data?” Starting at that very high level is important because data can tell a story, and you need to decide what story you want to tell.
A lot of people put too much faith in the data itself, but they don’t have the right processes in place to maintain the data, clean it, and to make sure everyone is onboard. The biggest issue is consistency. So, at Wise Systems, we’re asking our clients to give us their source of truth – and that’s their data. If they’re not confident with that, it’s very difficult for us to give them back the most efficient, optimized routes.
A place that every company can start is through documentation. One thing every company should do is create an internal data dictionary for whatever databases are responsible for giving us data. This lays out consistent rules on how fields should be used, what the data represents, and if that data is coming from another system, which system it’s coming from. Ideally, companies will have someone whose responsibility is master data management and gets certified in this area.
We take a lot of precautions and do types of data cleansing and manipulation once we receive data. But, when the clients make clean data a priority, it has a significant positive impact on results. Building highly optimized precise, cost-effective routes is predicated on having really great data. Conceptually, people think it’s simple “Of course, I’ll just keep everything up to date. If I’m a driver or a sales rep and I know that there’s a new piece of information I know I need to add it to a profile.” But, over time there are gaps. We know these gaps aren’t done maliciously; they just build up.
Dirty data is a problem that a lot of businesses are experiencing, and it’s not trivial. But we can’t boil the ocean all in one go – you’ve just got to start, and you’ve got to make this best practice known and get more people to see the impact. Why should businesses care? Why should anyone feel incentivized to update customer records, to be accountable? It all comes down to dollars and cents. That’s how you save money. When you’re able to utilize these tools to their fullest potential, you become more efficient and more cost-effective – and that’s the key to success in this increasingly dynamic world.