![]() This is a cool paper (and I’m the least important of the four authors) and I’m really excited about the idea of having this in Stan and other probabilistic programming languages. ![]() In the world of Stan, we see three roles for Pathfinder: (1) getting a fast approximation (should be faster and more reliable than ADVI), (2) part of a revamped adaptation (warmup) phase, which again we hope will be faster and more reliable than what we have now, and (3) more effectively getting rid of minor modes. I gave the above title to the blog post to draw attention to the idea that Pathfinder finds where to go. The current title of the paper is actually “Pathfinder: Parallel quasi-Newton variational inference.” We didn’t say it in the above abstract, but our original motivation for Pathfinder was to get better starting points for Hamiltonian Monte Carlo, but then at some point it became clear that the way to do this is to get a variational approximation to the target distribution. The Monte Carlo KL-divergence estimates are embarrassingly parallelizable in the core Pathfinder algorithm, as are multiple runs in the resampling version, further increasing Pathfinder’s speed advantage with multiple cores. Importance resampling over multiple runs of Pathfinder improves the diversity of approximate draws, reducing 1-Wasserstein distance further and providing a measure of robustness to optimization failures on plateaus, saddle points, or in minor modes. Compared to ADVI and short dynamic HMC runs, Pathfinder requires one to two orders of magnitude fewer log density and gradient evaluations, with greater reductions for more challenging posteriors. We evaluate Pathfinder on a wide range of posterior distributions, demonstrating that its approximate draws are better than those from automatic differentiation variational inference (ADVI) and comparable to those produced by short chains of dynamic Hamiltonian Monte Carlo (HMC), as measured by 1-Wasserstein distance. Pathfinder returns draws from the approximation with the lowest estimated Kullback-Leibler (KL) divergence to the true posterior. Starting from a random initialization, Pathfinder locates normal approximations to the target density along a quasi-Newton optimization path, with local covariance estimated using the inverse Hessian estimates produced by the optimizer. We introduce Pathfinder, a variational method for approximately sampling from differentiable log densities. Lu Zhang, Bob Carpenter, Aki Vehtari and I write:
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |