Keywords: Filters, Kalman, particle, smoother, augmented states, forward-backward, factor graph, pose graph, least squares, optimization
An Engineering Question
One question / discussion that often arises — in the world of signal processing / estimation theory / data fusion — is which is the “best” engineering solution between: filters, forward-backward smoothers, or graph optimization designs?
The NavAbility Conclusion
Factor graphs are the best modeling language, and we continue to develop (and make freely available) our open-source solver. We strive to make available the leading hybrid (non) parametric factor graph solver that is capable of unimodal / multimodal / non-Gaussian solutions.
NavAbility Products and technology-enabled-services simplify the real-world process of bringing mapping, localization, calibration, and synthesizing capabilities to robotic systems.
TLDR; Why
Because the listed methods (filtering, smoothing, optimization) are algebraically equivalent, and graph based approaches provide for two significant benefits:
Factor graphs are a more capable modeling language (e.g. including non-linear and non-Gaussian aspects) over traditional linear algebra modeling approaches. This allows for greater reuse of solver features over more applications, including more efficient computations than say a Kalman filter design. See here for more information on how non-Gaussian information originates.
Factor graphs are the engineer’s friend. Factor graphs are made up by combining factors. Each measurement type has it’s own factor. Therefore a “factor” (or observation model) is only ever implemented once and then readily reused in any application; and can readily be mixed with any other factor types in near infinite combinations. Modifying the estimation system or dealing with sensor failures becomes much, much simpler.
Motivation and Background
Filtering (a.k.a. Hidden Markov Models)
Filters usually have two main computational steps, namely prediction and corrections. Prediction and corrections can occur in any order, or at almost any rate. Predictions use a model of system dynamics to try predict where system states will be at different (or future) time steps.
Corrections incorporate measurements to adjust predicted estimates towards what is being measured. While this basic framework works great under ideal circumstances, a few problems can quickly arise, e.g.:
Corrections must have a rigid structure with regard to estimated states, and can create difficulties when tracking states through erroneous measurement data.
Filters generally only estimate the most recent time epoch of states. In order to have more information and fluid estimation if states, designers often resort to augmented state filters. This dramatically increases computational load beyond the theoretically required compute cost.
Beyond conceptual similarities, filter designs and implementations are difficult to migrate from one application to another. Any system changes often require major data fusion software rework.
See more information about the HMM standard form here.
Kalman Filtering
The Kalman filter was developed and first used during the Apollo Lunar program (1960's) and has become a popular method for data fusion ever since; and remains a popular blueprint for many state estimation systems being developed today. Kalman filtering assumes all inputs and measurements have Gaussian distribution and that the noise is uncorrelated. Any non-unimodal or “outlier” measurement resilience must be envisioned and catered for by the system developers.
In particular, note that augmented state filters can be corrected via iterated Gauss-Newton update steps to allow linearization points and estimates of the variable state to better approach the least squares equivalent estimate (a.k.a. Iterative filters).
While a major advantage of Kalman filter systems is that predictable computation times can be used to design and develop real-time data fusion or signal tracking system. This predictable nature of Kalman filtering has had a major influence in adoption of the filter, but also prevented designers from adopting a different kind of inference system.
The complexity of inferring the posterior belief over higher dimension state variables is a consequence of the hidden Markov model approach. All historic information is captured in the (state and) dense covariance estimate; however, there is inherent sparse structure within the problem can significantly reduce computational cost, which is not apparent in the oversimplified hidden Markov model approach.
Particle Filtering
A popular and relatively easy alternative filtering approach is to use particles to approximate the distribution of estimation states. Prediction steps with particles can be similar to other filtering, while corrections are usually done by importance sampling. This allows the computation to track non-Gaussian belief, but also introduces a few drawbacks.
For example, It is assumed that particles will be available in all parts of the domain (alphabet) so that the influence of measurement can only be supported by importance sampling. This so-called particle depletion problem is exponentially exacerbated as the dimension of state vector is increased.
Forward-Backward Filtering (the other “smoothing”)
Some literature or marketing refers to forward-backward smoothing. This is usually the same filter design which is run forwards and backwards in time over the same data. Our comment here is that a properly designed filter (including iterative filters) should have already arrived at the optimal solution in a single forward pass only.
Note, our use of the term Smoothing (as in the title) refers instead to graph based inference techniques.
Graphical Representation of Filtering
A frequent discussion point is the correspondence between Kalman/particle/log-flow filtering strategies and factor graph formulations. This section aims to shed light on the relationship, and to show that factor graph interpretations are a powerful generalization of existing filtering techniques.
Figure 1 illustrates a graphical model representation, at time (k) of the state Xk and covariance estimate (Pk) which are propagated forward in time and updated by measurements Yk., and is commonly referred as a hidden Markov model (HMM). Markov, since each next state is only dependent on the previous state, and hidden since the deterministic instantiation of state vector (X) may contain latent variables that are not directly observable through the measurement model producing (Y).
At each time step, the posterior belief is marginalized to the current state and used as a prior.. The state prior represents all previous information that has been marginalized out (gray X variables), while the current state variable (green X) and measurement prediction (blue Y) are still fluid.
Figure 1: The previous posterior marginal state belief (gray X), and predicted next state is (green Xk+1). Measurement observations (blue Y) are all related to the current state (green Xk) through the prediction and measurement models, as shown.
Solving over all variables produces a smoothing result, while solving and marginalizing forward along the HMM produces exactly the same set of equations used for the Kalman filter.
Why Graph Representations are Better
Comparing to an optimal Kalman filter design, one major realization is that the inverted state covariance matrix P is sparse! I.e. inverse P has structure that can be exploited for faster computation. To be more specific, the square root of the inverse covariance matrix encodes the bi-adjacency matrix of the underlying factor graph. To illustrate, the next section will look at a few basic factor graph examples.
To recap: factor graphs provide a tractable language (unifying perspective) for large scale, non-linear, and belief space interactions of many variables and factors (likelihood models). The interaction of information from various sensors is naturally described by adding the associated measurement likelihood model (algebraic functions) between variable nodes as a graph. Furthermore, factor graphs can then be used to develop the associated inference algorithms to produce the desired state estimates.
Basic Factor Graph Examples
The simplicity of filtering approaches also pose some problems, namely the complexity associated with estimating posterior distribution of high dimensional state variables (especially non-Gaussian estimates), and a rigid design philosophy which limits the flexibility in combining various measurements of opportunity. Consider introducing wheel odometry or leg kinematic type measurements into an existing filtering solution. This requires a major re-engineering to develop a new Kalman filter solution.
To start, consider a system evolving through time and estimates are made at discretized times {t0, t1, t2, ...}, as in Figure 2 below.
Figure 2: Basic factor graph example with green variable states at times {t0, t1, t2, ...}, system dynamics model through time as the blue relative factors, and direct absolute observations via red prior factors.
Now consider a multi-sensory system along with data transmission delays, variable sampling rates, etc.; when designing a filtering system to track one or multiple targets, it quickly becomes difficult to augment state vectors with the required state and measurement histories. In contrast, the factor graph as a language allows for heterogeneous data streams to be combined in a common inference framework
Figure 3: Basic factor graph model with green state variables (X) with dynamics relative factors in blue. Direct and absolute prior information is included via red prior factors. Note, top left prior is indicated as the fully marginalized history up to variable (X0). Indirect measurements to observations (Y) are made possible via observation models, i.e. orange relative factors. In other words, states X may include latent estimates not directly observed in (Y).
Factor graphs are constructed along with the evolution of time which allows the inference algorithm to resolve variable marginal estimates both forward and backwards in time. Conventional filtering only allows for forward-backward "smoothing" as two separate processes. When inferring over a factor graph, all variables and factors are considered simultaneously according the topological connectivity irrespective of when and where which measurements were made or communicated -- as long as the factor graph (probabilistic model) captures the stochastic qualities of the situation with sufficient accuracy.
Comparison Algorithms
EKF-SLAM
Initial simultaneous localization and mapping (SLAM) methods used Kalman filters as the inference (I.e. data fusion) algorithm. These filters use augmented state vectors that contain both the pose history and landmarks of interest in the state vector. Note that the the associated covariance matrix estimate becomes large and is dense. We now know this approach is overly expensive to compute.
Equivalent graph approaches require less compute: Note that filter operations require inversion of the covariance estimate, which dominates computational load, that is order cubic in state dimension. The most efficient factor graph solutions can achieve computational cost as low as order n*log n in state dimension for the same problem.
MSC-KF
Major contributions to visual-inertial odometry came in 2007 with the introduction of the Multi-State Constrained Kalman Filter (MSC-KF) by [Mourikis et al.]. employ tight coupling between inertial measurements and visual feature tracking, in that feature position estimates are directly dependent on the state estimates from the visual-inertial Kalman filter.
The MSC-KF approach uses both an augmented state Kalman filter and Rao-Blackwellized batch smoothing update for landmark estimates. That is, the camera pose locations are estimated with Kalman filter, assuming the landmark positions are fixed; and
Once new pose estimates are available, the position of landmarks is estimated using a least squares approach, this time keeping pose positions fixed. Filter correction updates from landmarks first use a null projection, via Schur compliment to avoid double counting of information.
The augmented state approach used by MSC-KF is very similar to EKF-SLAM. Furthermore, incorporating loop-closures from repeat sightings of of places in the world requires additional workarounds of the MSC-KF design.
FastSLAM
In FastSLAM, a set of particles are used to approximate the belief of all possible trajectories, where each particle represents an entire trajectory.
All trajectory poses and landmark positions are stacked in one vector, and each vector represents one particle. Each vector is then augmented with new pose information, using the odometry model, to update the entire system. Multi-hypothesis approaches therefore would require exponentially many particles to explore the entire space of possible solutions.
FastSLAM offloads the problem of hypothesis selection to the user, and internally loses hypotheses at the resampling step. Lost modes cannot be recovered, and represents a fundamental difference in the method for multi-modal inference. Furthermore, the augmented state variables can be Rao-Blackwellized, such that landmarks are separated out and estimated in an iterative fashion alongside the pose state variables.
For example, under correct data association, a single particle can be used to recover the full SLAM solution. However, FastSLAM does not exploit structure within the joint probability distribution.
While the FastSLAM approach is able to track non-Gaussian belief, each variation across the entire trajectory and landmark positions requires a different particle.
In turn, exponentially many particles are needed to encompass all the variable dimensions for an accurate posterior estimate -- this is prohibitively inefficient.
Since belief complexity scales exponentially in the dimension of a particle, one would rather use more particles who each represent a much smaller state vector.
Next Time: Bayes Tree
In the meantime see our paper on Bayes tree marginalization and recycling computations:
Fourie, D., Espinoza, A.T., Kaess, M. and Leonard, J., 2021. Characterizing marginalization and incremental operations on the Bayes tree. In Algorithmic Foundations of Robotics XIV: Proceedings of the Fourteenth Workshop on the Algorithmic Foundations of Robotics 14 (pp. 227-242). Springer International Publishing.
Classically factor graph and Bayes tree structures have focused on parametric optimization. We will discuss non-Gaussian aspects in more detail in other posts.
Dig Deeper
See the following links for more details:
For a list of related literature with citations and deeper discussions, see our Documentation page here.
See one of our Application Examples that use multimodal solutions to overcome navigation-affordance duality here.
See one of our Application Examples on non-Gaussian odometry using RADAR data.