LSA: A Solution to Plato's Problem

urlwolf · on March 21, 2009

I can comment on this, since I did my PhD work with Landauer and Kintsch and was the webmaster of lsa.colorado.edu for a while.

The current scene of statistical semantics is very active right now. I'm not sure that the SVD is easier to implement than the probabilistic versions (LDA, topics model). The svd code LSA uses required sparse matrices, and the code runs using the Lanczos algorithm; it also needs to place the entire matrix in memory at some point. This limits the scale of the corpus you can deal with. The probabilistic versions are iterative, and while they may take more CPU and time, they are not memory bounded.

thomaspaine · on March 21, 2009

I believe that PLSA is generally preferred over LSA, and LDA is preferred over PLSA. PLSA is equivalent to LDA when you assume the prior is a uniform Dirichlet distribution. I guess LSA is probably the easiest to implement since it just involves a singular value decomposition, but I don't know how often it's actually used in practice anymore.

http://en.wikipedia.org/wiki/PLSA

http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation