Structure From Motion — Technical Glossary

Structure-from-motion is the canonical photogrammetry pipeline for converting a set of 2D images into a 3D scene model. The algorithm has three stages. First, distinctive feature points are detected and matched across pairs of images (typically using SIFT, SURF or learned descriptors). Second, the relative positions of the cameras are estimated from these matches by solving for the projection geometry that explains the matched points (the “essential matrix” / “fundamental matrix” computation). Third, bundle adjustment refines both the camera poses and the 3D point positions simultaneously, minimising the reprojection error across all observations.

The output of SfM is a sparse point cloud plus a set of camera poses. The sparse cloud is useful for some applications directly (localisation, navigation, scene topology), and it is also the standard initialisation for denser downstream pipelines: multi-view stereo to build a dense mesh, or modern Gaussian-splat training to build a photoreal rendering. In the posemesh stack, Auki’s reconstruction servers run a custom SfM pipeline in Python, Rust and C++ that consumes scan jobs from the Domain Management Tool app and produces either point clouds or Gaussian splats depending on the use case.

The hardware bar for SfM is moderate. Bundle adjustment is compute-intensive but parallelises well across GPU cores, which is why decentralised reconstruction networks specify mid-tier NVIDIA GPUs (Auki’s published minimum is 8 CPU cores, 12 GiB of RAM, 8+ GiB of VRAM and CUDA 12.8). The bottleneck is usually not raw FLOPs but memory bandwidth and the scale of the optimisation problem when scenes get large; partitioning a building or a city into local SfM jobs and stitching them is the dominant practical pattern.

For DeAI, SfM matters because it is the most commoditised pre-processing step in the spatial pipeline. The algorithms are textbook and the open-source implementations (COLMAP, OpenMVG, AliceVision) are mature, which means a decentralised network of operators can run identical pipelines and produce comparable output. The competitive moat in a posemesh-style network is not the SfM algorithm itself but the operator distribution, the data sovereignty model and the downstream rendering quality (which is increasingly defined by Gaussian splatting, not by classical multi-view stereo).

Related terms