Data you need to predict freight cost and where to find it

July 19, 2022 Michael Janiak

Part 2 of the Predicting Freight Cost Series

by Michael Janiak, Synapsum's Head of Data and Analytics

In Part 1 of the Predicting Freight Cost Series, we reviewed challenges of using carrier freight APIs and how predictive modeling can overcome them.

After reading Part 2 of the Predicting Freight Cost Series, you will understand the data needed to predict freight.

It is worth stating up front that the data needs to be available:

Historically - for training predictive models.
In real time - for making predictions when needed. If this is not possible in the short term, you can start with the training step and build a business case to capture the benefits of improved predictions and control of freight cost recovery.

Since a large proportion of shippers use static freight tables, there is also an opportunity to use this data to retrain those rates while building up the business case for 'real-time'.

Where to start?

Carrier invoice data will provide the target variable (dependent variable) in the predictive model.

The invoice will contain:

Carrier and service information.
Pickup and delivery addresses.
Cost information, including accessorial fees.
Date information, pick up and delivery dates.
And importantly, a reference number.

The invoice will seldom have:

Customer or item level information that is specific and consistent enough to be used for analysis. Receiver fields can be difficult to parse and map to organizations in your customer master data.
Cross border packages have rudimentary information for calculating border crossing charges but unless all your business is cross border, this will only cover some of your item level transportation cost.

When shippers use multiple carriers, they will often use freight audit and pay companies that aggregate carrier invoices and send them back to the shipper as a data feed or file drop. As the name suggests, the audit and pay company will audit carrier invoices, contest specific charges, and support the shipper’s processes that remit payment. Returning data will have both billed/ pre-audit and paid cost columns.

Typical Freight Invoice

Where to find the data inputs required for the prediction?

The independent variables or features for the predictive model will live in the shipper's Enterprise Resource Planning (ERP) system, including the Transportation Management System (TMS) module.

Once the shipment is ready, the carrier will provide a reference for the shipment and will later invoice against it. The official legal document between carrier and shipper is the Bill of Lading but there are other references like the Tracking Number, Waybill and Air Waybill. The tracking numbers encapsulate a package in Parcel and Less than Truckload (LTL) and a truck or container in Full Truck Load (FTL) and Ocean. For each shipment, the shipper (or designated logistics provider) will typically create an internal id in its ERP and will capture the carrier’s reference number against that internally generated id.

Synapsum starts by matching the external reference number retained by the shipper with the reference in the invoice data, thus connecting the shipper’s ERP with carrier cost data. This connection can then be used to attach other information that is pertinent to prediction from the shipper’s systems via ids linked to the delivery, especially: order id, item id, ship from and ship to id, bill to id.

Below is a simple example of an order that was split across three deliveries.

For delivery #1 and #2, there is a direct match between the tracking number provided by the carrier at point of shipment and retained by the shipper and the Bill of Lading field from audit and pay carrier billing data (yellow).
For delivery #3, the match is between the tracking number and Probill fields (green).
Some shippers will send their carriers a PO number or Order Number on receipt and attempt to track against that, in this example the shipper's Order Number is reflected back in a Reference field (blue). We find this approach to be less reliable, especially with LTL carriers. In this example, the Order Number is not populated in the reference field for delivery #3 (pink), it is captured in the Bill of Lading field instead.

Typical match between ERP (TMS) and Carrier invoice data for Deliveries in an Order

The matching process is straightforward if the shipper uses one or two carriers. For shippers with multiple carriers, matching becomes more complex and requires a more sophisticated approach.

Matching accuracy depends on mode and time elapsed from shipment.

Matches improve over time as invoices arrive at the audit and pay company. As seen below, the match rate picks up a month after shipment and reaches a max match by month four. The primary reasons for this pattern are delays in carrier billing invoice issuance/shipper receipt and payment terms causing delays in data release.

Indicative match rate for Parcel over time

Generally, Full-Truckload and Ocean will match better than Parcel and LTL, which have higher volumes.

Indicative match rate by mode

I linked my shipping costs to ERP data, now what?

The predictive model will require additional information that a business would know at the time an order is booked:

Mode (FTL, LTL, Parcel ground, etc.) - Modes have very different economic models and feature weights.
Service level e.g. Next day air, 2nd day air, etc. Rates increase by service level.
Order value - used to model border crossing costs.
Package weight - important for Parcel, less important for LTL and FTL.
Distance - important for LTL and FTL, less important for Parcel.
Product dimensions - often important but may be worked around.

Most of the above data is typically captured in the ERP, which becomes the source for training data for the model.

What is often missing from the ERP is distance (determined by ship from and to addresses), which can be acquired through third party data solutions such as PC Miler. This data can be accessed in bulk for model training and in real-time via API.

What about Package dimensions? Package dimensions are used to calculate billable weight, especially for Parcel. They can be useful data points but are often unavailable at the point of order because either the master data is unavailable or it is unknown how the order will be split into packages and the sizes of those packages.

Predictive modeling can succeed without product dimensions because package sizes tend to be consistent per an organization’s assortment. Seeing as parcel pricing is non-linear, if detailed dimensions are not available, one shortcut is to tag oversize or overweight products and use the binary feature (oversize/not oversize) for training.

Two final data-points to consider are ship date and promised ship date. These are used for modeling peak surcharges, Black Friday, Super Saturday, etc. especially in Parcel and LTL. They are challenging to use however, because to model peaks you need a few years of data and peak surcharges are not particularly consistent peak on peak or year on year.

What you can do next

Having put in the hard yards of acquiring the above data, the next post in this series will explore how you can apply this information to up-level your transportation operation and bring tangible benefits to your business.

If this stuff excites you, Synapsum offers a proof of value that makes this a reality in a fraction of the time and cost it would take internally. Email us at team@synapsum.com to schedule time to connect.