Why Machine Learning Beats Rule-Based Filtering for UK Off-Market Property Sourcing

Rule-based filtering is intuitive, transparent, and easy to build. It is also limited in ways that compound the more you use it. This post is about why GalimAI replaced rule-based filtering with machine learning prediction across the core sourcing workflow, and what changed when we did.

What rule-based filtering does well

To be fair to the approach: rule-based filtering has real strengths.

Transparency. A user knows exactly why an owner is on the list ("they match these four filters").
Speed to build. A new filter combination can be configured and tested in minutes.
Auditability. Regulators and compliance teams can read the rules and confirm what is and is not in scope.
Stability. The same input always produces the same output. No model drift, no quarterly recalibration.

For a small portion of the UK property sourcing workflow (regulatory exclusion lists, hard customer constraints), rule-based filtering is still the right tool. We have not removed it.

Where rule-based filtering breaks down

Five places rule-based filtering systematically underperforms once you have enough training data to do better.

1. Interaction effects

The rule "director over 65 + held 15+ years" captures a known sell-signal stack. The rule does not know that, within that universe, the subset with a specific recent charge satisfaction pattern is 2 to 3x more likely to dispose than the subset without it. Rules are flat; the underlying reality is hierarchical and interactive.

2. Non-linear thresholds

The rule "director age > 65" treats a 66-year-old and an 85-year-old identically. The underlying disposal probability is not flat above 65. Rules force step-changes; the data has gradients.

3. Trajectory vs state

Rules look at point-in-time state ("does this owner currently have a late filing?"). The signal that actually matters is often the trajectory ("has this owner's late-filing frequency increased over the last 18 months?"). Rules cannot express trajectory; time-series features in a model can.

4. Regional and sector calibration

The same signal stack does not predict the same probability in the South East as in Wales, or for a residential SPV as for a mixed-use commercial holding. Rules apply globally; the data calibrates by region and sector. A model learns those calibrations automatically.

5. Feedback loops

A rule does not improve when a campaign succeeds or fails. A model does. Every disposal outcome captured in the data improves the next quarter's predictions. Rule-based systems are static; ML systems compound.

What changes in practice

Three concrete operational differences when sourcing runs on ML predictions instead of rule-based filtering.

Conversion rate at the top of the list

Both approaches return a ranked list. The top of the rule-based list is the set of owners who match the most filter conditions. The top of the ML list is the set of owners the model assigns the highest probability of disposal in the prediction window.

Empirically, the top of the ML list converts at multiples of the top of the rule-based list, on the same underlying universe. The lift is largest in segments where rule-based filtering performs worst (multi-director ownership, mid-aged directors, mid-sized portfolios, less-obvious distress patterns).

Surface area of non-obvious prospects

Rule-based filtering identifies prospects whose signals match a small set of pre-coded patterns. Owners whose signals do not match any rule are invisible to the system, even if they are in fact likely sellers. ML scoring is universe-wide: every owner gets a score, including the ones who do not match any rule.

A meaningful minority of high-converting prospects come from this surface-area expansion. They are the owners no rule would have flagged but the model identifies because their combination of weaker signals adds up to a strong prediction.

Continuous improvement

The model retrains every quarter. Each retrain incorporates the most recent transaction data plus the campaign outcomes from the prior period. The system gets better at the same task with the same effort over time. Rule-based systems require manual rule revision to incorporate new signals; ML systems incorporate them automatically through retraining.

What ML does not replace

ML is not a substitute for everything. Three places we still rely on rules or human judgement:

Hard exclusions. Regulatory exclusions, customer-defined no-contact lists, geographic carve-outs. These are rules, not predictions. Get them wrong and you have a compliance problem.
Sensitive segments. Owners flagged with formal Gazette distress signals are scored by the model but also reviewed by a human before outreach. The model is good but not perfect, and the reputational cost of contacting the wrong person in a sensitive moment is high.
Letter copy. The response-prediction model selects between pre-written templates but it does not write copy. Writing the templates is a human creative problem.

The hybrid that actually works

The GalimAI workflow is hybrid in practice. Rules handle exclusions and hard constraints at the front of the pipeline. ML handles ranking, prioritisation, and template selection in the middle. Humans handle copy, sensitive-segment review, and customer judgement at the end.

The hybrid is more accurate than rules alone and safer than ML alone. That is the architecture that ships.

Why most off-market platforms have not made the move

Three reasons most UK off-market platforms still run on rule-based filtering:

Training data. Building an ML model for sell-timing requires owner-centric historical transaction data with quarterly snapshots going back years. Most platforms do not have that infrastructure.
Feedback loop. Without owning the outreach pipeline, the platform never sees outcomes. Without outcomes, the model cannot learn. Most platforms sell lists and never close the loop.
Domain knowledge. Even with the data and the loop, the feature engineering that turns raw Companies House and HM Land Registry filings into model-ready features requires several years of accumulated knowledge about which UK property signals actually matter. That knowledge is the moat.

GalimAI built the infrastructure, owns the outreach pipeline, and has the domain knowledge. That is why the ML approach is operationally viable here and not at most competitors.

See ML-scored prospects in the portal

GalimAI exposes the model's sell-probability and response-probability scores on every owner list. Filter by score, layer your own constraints, and outreach against the prospects the model says are most likely to land.

Try the portal Book a call

FAQ

Is ML always better than rules?

No. For hard constraints (regulatory exclusions, customer carve-outs) rules are the right tool. For prediction and prioritisation, ML wins once you have enough training data and a feedback loop. The right architecture is hybrid.

Can I see the rules and the ML separately?

Yes. The portal exposes rule-based filters and ML scores as separate columns. You can sort, filter, and combine as you like.

Does the ML model explain its predictions?

Yes. Per-prediction feature attributions are exposed for any selected owner. You can see which signals drove a high or low score.

How often does the model retrain?

Quarterly, with continuous drift monitoring between retrains.

What if a competitor copies your features?

The feature list is partly defensible knowledge. More importantly, the training data and the closed-loop outreach feedback are what make the model accurate. A competitor with the same features but without the training data and the feedback loop would not get the same accuracy.