Main points
- Anti-corruption authorities (ACAs) are increasingly using big data proactively, not just reactively. E-procurement systems, asset and interest declaration platforms and beneficial ownership (BO) registers now generate large volumes of machine-readable information. ACAs can use these to identify risks, prioritise cases and demonstrate results, but their ability to do so varies greatly across country contexts.
- ACAs typically rely on a small set of core data families used across different functions. The main sources are: (i) public procurement and contracting data; (ii) company and BO registers; (iii) income, interest and asset declarations; and (iv) financial intelligence and related datasets. These support corruption prevention (risk mapping, red-flagging, ex ante conflict of interest screening), investigations (network and pattern analysis) and prosecution (using evidence to build and open cases).
- Concrete country experiences show the promise of big data approaches. Examples include AI-supported procurement analytics, integrity screening in procurement based on asset-declaration data, large-scale e-declaration risk-scoring and BO-based network analysis. These demonstrate how linking datasets can prevent conflicts of interest, guide investigations and support asset recovery.
- Institutional arrangements for cross-government data sharing largely determine what ACAs can do with data. The most relevant datasets are usually held by other institutions (tax, customs, registries, FIUs, courts, procurement bodies). Effective use of big data depends on legal gateways, technical interoperability and governance arrangements that enable secure, purpose-limited sharing. Emerging practices include memoranda of understanding, joint analytical units and shared data hubs, but these must navigate legal, technical, organisational and political obstacles.
- In practice, the kind of big data ACAs would need to fully exploit advanced analytics is still rare, even in relatively digitalized contexts, such as the EU. The main constraint is often the broader integrity-data ecosystem. While procurement data may be available in bulk, additional linkable datasets needed for integrity analytics are frequently missing or unusable.



