Off-market property sourcing has historically been a rules game. Filter for "director over 65 + held the property 15+ years + no recent refinance" and you get a list. Send letters. Wait for responses. That motion works, but it is statistical noise dressed up as targeting.
GalimAI replaced that motion with a machine learning model in 2024. The model takes every UK property-owning company in the database and outputs a probability score: how likely is this entity to dispose of a freehold in the next 6 to 12 months? The score is rebuilt every quarter on new transaction data, and it materially outperforms rule-based filtering at every threshold we have tested.
What the model actually does
The model is a supervised classification problem. The target variable is binary: did this property-owning company complete a freehold disposal within a defined forward window (6, 12, or 18 months) of the observation date?
The training data is the historical record of UK property transactions over the last several years, cross-referenced against the state of each owning entity at the time of the observation. For every owner-month in the training window, the model sees:
- The owner's full state at that point (director profile, hold period, charge stack, family ownership signals, late-filing history, regional context, property type).
- Whether or not a disposal followed within the prediction window.
From millions of these owner-month observations, the model learns which combinations of signals correlate with disposal. The output for each currently-active owner is a probability between 0 and 1, plus the top contributing features for that score so the prediction is explainable.
The technology stack
The pipeline runs on:
- BigQuery as the data warehouse. Companies House, HM Land Registry, and Gazette feeds are normalised into a single owner-centric schema, with a snapshot table per quarter going back to the start of the training window.
- Python for feature engineering and model training. Standard scientific stack (pandas for data work, scikit-learn for baselines, gradient-boosted trees for the production model).
- Gradient boosting for the production model. Tree-based models handle the heterogeneous, mixed-type, partially-missing feature set the UK property data presents better than linear or deep models do at this data volume.
- Quarterly retraining, with the most recent transaction data added each cycle. Drift monitoring runs continuously between retrains.
The choice of gradient boosting over a neural network is deliberate. The training set is large but the feature space is small, the signal-to-noise is high, and explainability matters. Tree-based models also let us extract per-prediction feature attributions, so when the model says "this owner is in the top 1% probability for the next 6 months," we can show exactly which signals drove that score.
What the model learns that humans miss
Rule-based filtering captures the obvious signals. Director age, hold period, portfolio size, basic distress flags. The model captures those too, but it also captures combinations and non-linearities that rules cannot.
Three patterns the model surfaces that rule-based filtering does not:
- Interaction effects. A 70-year-old director who has held a property for 20 years is a known sell-prediction stack. The model finds that a 55-year-old director with the same hold period and one specific charge satisfaction in the last 12 months has comparable disposal probability. No rules engine would have inferred that.
- Sequence patterns. The model uses time-series features (charge activity in the last 90 days, late-filing trend over the last 2 years, refinance cadence). Rules look at point-in-time state; the model looks at trajectory.
- Regional and SIC-specific calibration. The same signal stack predicts disposal at different rates in different regions and different SIC codes. The model learns these calibrations automatically; humans would not encode them.
How the prediction is used in practice
The portal exposes the model's output as a sortable column on every owner list. Buyers and sourcers filter their universe to (say) the top 10% by predicted sell-probability, then add their own constraints on region, property type, and portfolio size.
The lift over rule-based filtering is material. We do not publish specific accuracy numbers (overfitting risk, plus the right metric varies by use case), but directionally:
- The top decile of predicted sellers transacts at multiples of the base rate observed in the underlying universe.
- The lift is largest in the segments where rule-based filtering performs worst (multi-director ownership, mid-aged directors, mid-sized portfolios).
- The model surfaces a meaningful minority of high-conviction sellers that no rule-based filter would have identified.
Limitations and honest caveats
The model produces probabilities, not certainties. Three real limitations a thoughtful user should keep in mind:
- Macro shifts. The 2024 to 2026 transaction window is unusual (rate normalisation, regulatory changes, the bridge-debt overhang). The model is recalibrated to current conditions but cannot perfectly predict the next macro shift.
- Idiosyncratic causes. An owner sells because a family member died, or because they got an unsolicited offer they could not refuse. Idiosyncratic causes are by definition out-of-distribution for the model.
- England and Wales coverage. The property linkage layer is England and Wales only. The model can score Scottish and Northern Irish companies on their Companies House signals but cannot incorporate property-level features for those jurisdictions.
The right way to use the score is as a prioritisation layer, not a guarantee. A top-decile owner is a much better outreach prospect than a randomly-selected one; that is what the model promises and what it delivers.
What makes this defensible
Three things make the GalimAI sell-timing model hard to replicate:
- The training data. Aggregating Companies House, HM Land Registry, and Gazette into a single owner-centric schema with quarterly snapshots going back years is non-trivial infrastructure. Most off-market platforms work off rule-based filters because they do not have a sufficiently complete or temporally indexed dataset to train a model on.
- The feedback loop. Letters sent through our pipeline and outcomes tracked feed back into the model. Each campaign cycle improves calibration.
- Domain-specific feature engineering. The 30+ features encode several years of accumulated knowledge about which UK property-owning signal stacks matter. That knowledge is the moat.
See sell-probability scores for any UK property owner
The GalimAI portal exposes the model output on every owner list. Filter to the top decile by sell-probability, layer your own constraints, and prioritise outreach where the data says it will land.
Try the portal Book a callFAQ
How accurate is the sell-timing model?
We publish directional lift over baseline rather than headline accuracy numbers, because the right metric depends on the use case. For most prospect-list use cases, the relevant question is 'what is the conversion rate of the top decile of predictions vs. a random sample,' and the answer is multiples of baseline. We are happy to walk through the per-segment lift on a customer call.
Is the model deterministic or probabilistic?
Probabilistic. Every owner gets a score between 0 and 1 representing the model's estimated probability of disposal within the prediction window. Scores are rebuilt every quarter; ranking within a quarter is stable.
What is the prediction window?
We expose 6-month and 12-month windows in the portal. The 6-month is sharper for outreach prioritisation; the 12-month is better for pipeline planning.
How is the model retrained?
Quarterly. Each cycle adds the latest transaction data and refreshes features. Drift monitoring between retrains alerts the team if input distributions shift materially.
Does the model work in Scotland?
Partially. We can score Scottish companies on their Companies House signals but cannot incorporate property-level features because the property registers are different. The model's strongest predictions are in England and Wales.
Can I see why a specific owner has a high score?
Yes. The portal exposes per-prediction feature attributions for the top contributing signals on any selected owner.