Really cool, I am also working on a port of gaussian-splatting [0] but to WebGPU.
Like all the other implementations I have seen so far, this also makes the same mistake when projecting the ellipsoids in a perspective: First you calculate the covariance in 3D and then project that to 2D [1]. This approach only works with parallel / orthographic projections and applying it to perspectives leads to incorrect results. That is because perspective projections have three additional effects:
- Parallax movements (that is the view plane moves parallel to the ellipsoids) change the shape of the projected ellipse. E.g. a sphere only appears circular when in center of the view, once it moves to the edges it becomes stretched into an ellipse. This effect is manually counter balanced by this matrix I believe [2].
- Rotating an ellipse can change the position it appears at, or in other words creates additional translation. This effect is zero if the ellipse has one of its three axes pointing straight at the view (parallel to the normal of the view plane). But, if it is rotated 45°, then the tip of the ellipse that is closer to the view plane becomes larger through the perspective while the other end becomes smaller. Put together, this slightly shifts the center of the appearance away from the projected center of the ellipsoid.
- Conic sections can not only result in ellipses but also parabola and hyperbola. This however is an edge case that only happens when the ellipsoid intersects with the view plane and can probably be ignored as one would clip away such ellipsoids anyway.
The last two effects are not accounted for in these calculations in any of the implementations I have seen so far. What would be correct to do instead? Do not calculate the 3D covariance. Instead calculate the bounding cone around the ellipsoid which has its vertex at the camera position (perspective origin). Then intersect that with the view plane and the resulting conic section is guaranteed to be the correct contour of the perspective projection of the ellipsoid.
In general, a Gaussian is no longer a true Gaussian after camera projection since the pinhole camera projection function is nonlinear (due to dividing by z). However, if the Gaussian is small relative to the size of the image, you can apporximate it by linearizing the projection function. Therefore the Gaussian splatting paper uses the Jacobian of the projection function as described in equation 5 of the paper [0]. In practice, this approximation is extremely good. This Jacobian is the matrix you mentioned in the third link and it is mathematically sound and not "manually counter balanced". For a derivation, see [1].
I read the paper and I am aware that the gaussian projection is an approximation anyway (hence I spoke about ellipsoids, not gaussians). Still, one could at least aim to get the iso contour right and yes using the Jacobian matrix is not unsound, just incomplete. As I said, this approach can not produce the distinctive "wiggle" that you get from rotating an ellipsoid while staring dead center at it.
Yeah I think you're right, they're pretending the projection is a linear transformation (in cartesian coordinates) and using it to transform the Gaussian.
Or viewed alternatively they're approximating the projection by assuming all of the Gaussian is at a fixed depth, which I suppose works if it is far enough away.
A projective transformation of a Gaussian seems somewhat annoying, though I assume someone will have done it before. Seems like it should be possible to do it with projective coordinates but the final projection to cartesian coordinates is tricky.
For what it's worth, projecting a contour is also wrong, the whole density changes which also affects the contours.
Hi. I'm not very familiar with the gaussian splat technique but aren't they essentially quads with some intrinsic data in the vertices. I thought projecting quads was already a solved problem. Could you elaborate how this differs from a simple array of quads? Thank you.
If you can implement the intersecting bounding cone idea without impacting frame rates that's going to be even smoother on WebGPU but it would be interesting to see the difference apples to apples with this type of implementation.
Like all the other implementations I have seen so far, this also makes the same mistake when projecting the ellipsoids in a perspective: First you calculate the covariance in 3D and then project that to 2D [1]. This approach only works with parallel / orthographic projections and applying it to perspectives leads to incorrect results. That is because perspective projections have three additional effects:
- Parallax movements (that is the view plane moves parallel to the ellipsoids) change the shape of the projected ellipse. E.g. a sphere only appears circular when in center of the view, once it moves to the edges it becomes stretched into an ellipse. This effect is manually counter balanced by this matrix I believe [2].
- Rotating an ellipse can change the position it appears at, or in other words creates additional translation. This effect is zero if the ellipse has one of its three axes pointing straight at the view (parallel to the normal of the view plane). But, if it is rotated 45°, then the tip of the ellipse that is closer to the view plane becomes larger through the perspective while the other end becomes smaller. Put together, this slightly shifts the center of the appearance away from the projected center of the ellipsoid.
- Conic sections can not only result in ellipses but also parabola and hyperbola. This however is an edge case that only happens when the ellipsoid intersects with the view plane and can probably be ignored as one would clip away such ellipsoids anyway.
The last two effects are not accounted for in these calculations in any of the implementations I have seen so far. What would be correct to do instead? Do not calculate the 3D covariance. Instead calculate the bounding cone around the ellipsoid which has its vertex at the camera position (perspective origin). Then intersect that with the view plane and the resulting conic section is guaranteed to be the correct contour of the perspective projection of the ellipsoid.
[0]: https://github.com/graphdeco-inria/gaussian-splatting [1]: https://github.com/antimatter15/splat/blob/3695c57e8828fedc2... [2]: https://github.com/antimatter15/splat/blob/3695c57e8828fedc2...