Data Mining

Data mining is the practice of analysing large data sets to find patterns, correlations, and relationships that aren't obvious from inspection. In ecommerce, data mining surfaces things like which product combinations sell together, which customer segments churn earliest, and which acquisition channels produce the highest-LTV customers.

What data mining actually does

  • Classification: grouping data points by shared characteristics. "These customers will likely churn" or "these orders are likely fraudulent."
  • Clustering: finding natural groupings without predefined labels. Used heavily in customer segmentation.
  • Association: identifying which items appear together. The classic "people who bought X also bought Y" recommendation.
  • Regression: modelling relationships between variables. Predicting LTV from acquisition channel, first-purchase product, and demographics.
  • Anomaly detection: flagging outliers — fraudulent orders, broken integrations, sudden traffic spikes from bots.

Where data mining shows up in ecommerce

  • Product recommendations: which products to show on the cart page, in post-purchase emails, in browse-abandonment flows.
  • Customer segmentation: behavioral segments that don't follow obvious demographic lines.
  • Churn prediction: identifying customers likely to lapse before they do, so retention can intervene.
  • Pricing and promotion: finding the price elasticity of specific SKUs and segments.
  • Fraud detection: flagging orders with patterns matching known fraud cases.

How data mining works in practice

For most Shopify brands, data mining doesn't mean building custom models from scratch. It usually means using tools that have data mining baked in: CRM platforms with churn prediction, recommendation engines like Rebuy or Klaviyo predictive analytics, fraud detection like Signifyd, and marketing analytics tools like Triple Whale that surface attribution patterns automatically.

Custom data mining — building models in Python or R against a data warehouse — typically becomes worth the investment at $20M+ revenue or for brands with non-standard data needs.

Data mining vs. analytics vs. AI/ML

  • Analytics: the umbrella term — anything that turns data into insight, from dashboards to predictive models.
  • Data mining: the specific practice of finding patterns in data, often using statistical and ML methods.
  • AI/ML: overlaps heavily with data mining. Modern ML methods (classification, clustering, regression) are exactly what data mining uses; the distinction is more about scale and automation than fundamental difference.

Common data mining mistakes

  • Mining first, asking questions second. Without a clear business question, data mining produces interesting-looking patterns that don't change any decisions.
  • Spurious correlation: patterns that exist in the data but don't reflect causal relationships. Especially common in small-sample ecommerce data.
  • Dirty data: garbage in, garbage out. Data mining on inconsistent UTM parameters or duplicate customer records produces unreliable results.
  • Building before buying: most ecommerce data mining needs are addressed by SaaS tools. Building custom is expensive and rarely produces a meaningful edge over the off-the-shelf option.