Solving data issues in the last mile - Tips from a Data Scientist

April 17, 2024

Alex, an experienced Data Scientist, and Mily Technologies' CTO, will guide you through the basics of forming a data strategy for last-mile companies and solving related issues. 

“The Big Data problem”

Logistics is definitely an industry where we can see the application of Big Data, which also means it has a ‘Big Data problem.’

The defining properties of Big Data, dubbed the three Vs, are volume, velocity, and variety. Last-mile delivery checks all of the boxes - an immense amount of diverse data collected every day at a rapid speed. 

If a proper structure isn't in place, the health and quality of data will plummet. Instead of creating opportunities, data will unfortunately become a major pain for last-mile carriers and posts. 

What are the common data issues in the last mile

Since we started Mily Tech, we’ve handled data from companies of different sizes and from different countries. It’s interesting to note that there isn’t a unified way of collecting and storing data in the industry. Also, even for advanced companies, there is still room for improvement.

So, from my experience, these are the common data issues last-mile companies face.

Data accuracy

Data accuracy, or better said, inaccuracy, is often a struggle. A wrong delivery address, contact details, and similar errors can lead to quite a few problems down the line, such as failed or delayed deliveries. 

This can happen when data comes from third parties, e.g., shoppers filling out details for their order on an e-commerce site. A validation mechanism should be in place to validate the address and stop users from entering an email address without a ‘@’ sign or a phone number that doesn’t start with a number. 

Apart from validation tools, you need to check how the data flows through your systems and have a 360-degree overview. Unfortunately, it’s easy to overlook this. 

Data fragmentation

Usually, many systems are involved in the daily operations of last-mile carriers and posts. But if the integration between them isn’t good (and most of the time it isn’t), gaps will appear and these systems will start to live independently.

It’s very difficult to create a predictive model that’s supposed to analyze a large chunk of data when the data is stuck in different places. 

Completeness and consistency 

It’s good to check if you capture all fields necessary to complete a delivery or to analyze the shipment lifecycle. This means asking if we have data for the sender and the recipient, billing information, etc. If yes, is it always collected, or are some fields empty? If something is empty, have we realized it too late, and what can we do about it? 

You also have to ensure consistency. This mostly involves using a third-party tool to check if all instances are the same across all systems and if data is up to date. Addresses are a good example here as well. 


Timeliness basically means all the data is there at the moment you need it. Sometimes, an event is registered late, or the data is only available when a certain process is completed. But the operations team needs it immediately, and that can be very troublesome.  

Is there a cure?

There is no silver bullet. I think it’s more a case of effort on multiple fronts. 

Let’s take data fragmentation as an example. To eliminate this problem, you need to ensure different systems communicate with each other. If you plan on bringing in a new solution, you need to be able to define what it should look like, what is expected, and how it will communicate with existing tools. This setup is very important and needs to be thought through. 

Apart from validation mechanisms, it’s good to rely on benchmarks or reliable data sources for accuracy. For example, you can cross-check demographic data with other sources providing census data. Or you can check if your GPS data is accurate by looking at some open-source geodata. 

You need to have different tests in place to make sure the data you depend on every day is actually reliable.

How to move the needle - The basics of creating a data strategy

Everything I’ve mentioned is just setting the foundation for the data strategy. From there, companies can take slightly different paths depending on the use case

If you’re leaning toward predictive and real-time scenarios, you need to ensure everything flows through your systems in real time. The data is collected quickly, you can always access it, and so on.

But if you need help forecasting demand, planning workload distribution, or other scenarios that require historical data, you can focus more on data accuracy rather than its availability and timeliness. 

This data topic also requires investments, not only in money but also in time and training. Data literacy is a skill that should be present in every role, and it’s only going to become more important. In any strategy or plan, you need to know how to move the needle and in what direction. So, you must prepare your team to take this next step. 

In short, apart from the right tools, a strong foundation, and a clear goal, you also need the right people. 

How to prepare for AI 

AI is a pretty big topic. We see many companies rushing to implement it somehow in their daily operations. 

I suggest having a very clear picture of what you’re trying to accomplish. That will be your North Star. Then, you can reverse engineer to see what issues are preventing you from achieving that goal. Is it data health, data volume, people, or something else?

You have to answer questions such as whether this solution will actually help, what data I need, and of what quality, and which departments should be involved in this decision. That is the only way to ensure you experience benefits rather than another headache. 

Bridging the gap with MIly Tech

Navigating the complexities of data can be daunting.  The ability to efficiently store and process data is just the beginning. The real challenge lies in using it to get meaningful insights. 

Logistics companies understandably focus more on their core business than technical aspects. We at Mily Tech saw an opportunity to bridge this gap. 

Our solution is designed to integrate with existing systems, enabling faster and more effective data utilization. This sets the ground for establishing a data culture in companies, enabling people in all departments to make data-driven decisions.

See how you can get the most out of your data

Contact us

About Aleksandar: He started his professional career as a Data Scientist at KDDI Laboratories in Tokyo. After completing his M.Sc. in Computer Science, Alex began working at Allianz SE headquarters in Munich as a Senior Data Scientist, where he spent almost five years. Today, he leads MIly Tech’s engineering team with a true passion for technology and expertise in Machine Learning and Big Data technologies.