SQL · Kimball · DuckDB · 10 business questions

96,497 orders.
One quiet finding that breaks the customer-retention story.

Two years of Brazilian e-commerce, 9 raw tables joined into a Kimball star schema, ten business questions. The one that landed: it isn't the seller's fault.

of orders that arrived 8+ days late
got rated 1 or 2 stars
Orders
96,497
delivered, 2 years
Revenue
R$ 14.7M
across 92k customers
Top 5 cats
53%
of total revenue
Revenue at risk
R$ 560k
late + 1-2 star orders
The story in three lines

Late deliveries don't just annoy customers. They collapse the review.

When an order arrived 8+ days past estimate, the average review score collapsed from 3.98 to 1.64. Once a review goes below 2 stars, the probability that customer ever returns drops to almost zero. So I traced the lateness back to its source.

The bottleneck wasn't the seller. Sellers shipped fast — an 8-day median, even on late orders. The carriers held the packages for 25.6 days on average before delivering. Almost 2× the on-time number.

That single insight, surfaced from a 9-table SQL warehouse, redirects the customer-retention conversation away from "improve seller onboarding" and toward the real lever: carrier accountability.

The headline insight

"The bottleneck isn't the seller. The carrier holds the package twice as long as everyone else combined."

The lateness, broken down

Where the days go on a late order.

A typical late delivery, decomposed. The accent segment is where the time actually disappears.

Day 0 · ordered Day 8 · seller shipped Day ~34 · delivered
Seller dispatch
8 d

Median time from purchase to seller handoff. The same on late orders as on on-time orders.

Carrier transit
25.6 d

Median carrier hold on late orders. The on-time baseline is 13.1 days — already long, but here it's ~2× worse.

Under the hood

What's inside the repo.

The Olist quirks worth knowing

What I learned the hard way.

~3% of orders are flagged canceled — exclude from revenue, keep for conversion math.

freight_value lives on the item table, not the order. Easy to triple-count if you SUM after joining items to orders without aggregating first.

Reviews can arrive months after delivery. Filter by review timestamp, not order timestamp, when measuring fresh sentiment.

November 2017 is a strong Black Friday peak. Treat it as a normal month and your cohort retention baselines lie to you.

The stack

Built with.

SQL DuckDB Python 3.12 pandas matplotlib seaborn Kimball star schema Power BI spec