Volume 3, Issue 3, July 2013, Pages 701–713
Niladri Chatterjee1 and Pramod K. Sahoo2
1 Department of Mathematics, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, India
2 Institute for Systems Studies and Analyses, Defence Research and Development Organisation, Metcalfe House Complex, Delhi 110054, India
Original language: English
Copyright © 2013 ISSR Journals. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Application of Random Indexing (RI) to extractive text summarization has already been proposed in literature. RI is an approximating technique to deal with high-dimensionality problem of Word Space Models (WSMs). However, the distinguishing feature of RI from other WSMs (e.g. Latent Semantic Analysis (LSA)) is the near-orthogonality of the word vectors (index vectors). The near-orthogonality property of the index vectors helps in reducing the dimension of the underlying Word Space. The present work focuses on studying in detail the near-orthogonality property of random index vectors, and its effect on extractive text summarization. A probabilistic definition of near-orthogonality of RI-based Word Space is presented, and a thorough discussion on the subject is conducted in this paper. Our experiments on DUC 2002 data show that while quality of summaries produced by RI with Euclidean distance measure is almost invariant to near-orthogonality of the underlying Word Space; the quality of summaries produced by RI with cosine dissimilarity measure is strongly affected by near-orthogonality. Also, it is found that RI with Euclidean distance measure performs much better than many LSA-based summarization techniques. This improved performance of RI-based summarizer over LSA-based summarizer is significant because RI is computationally inexpensive as compared to LSA which uses Singular Value Decomposition (SVD) - a computationally complex algebraic technique for dimension reduction of the underlying Word Space.
Author Keywords: Word Space, Random Indexing, Index vector, Context vector, Near-orthogonal, PageRank.
Niladri Chatterjee1 and Pramod K. Sahoo2
1 Department of Mathematics, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, India
2 Institute for Systems Studies and Analyses, Defence Research and Development Organisation, Metcalfe House Complex, Delhi 110054, India
Original language: English
Copyright © 2013 ISSR Journals. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Application of Random Indexing (RI) to extractive text summarization has already been proposed in literature. RI is an approximating technique to deal with high-dimensionality problem of Word Space Models (WSMs). However, the distinguishing feature of RI from other WSMs (e.g. Latent Semantic Analysis (LSA)) is the near-orthogonality of the word vectors (index vectors). The near-orthogonality property of the index vectors helps in reducing the dimension of the underlying Word Space. The present work focuses on studying in detail the near-orthogonality property of random index vectors, and its effect on extractive text summarization. A probabilistic definition of near-orthogonality of RI-based Word Space is presented, and a thorough discussion on the subject is conducted in this paper. Our experiments on DUC 2002 data show that while quality of summaries produced by RI with Euclidean distance measure is almost invariant to near-orthogonality of the underlying Word Space; the quality of summaries produced by RI with cosine dissimilarity measure is strongly affected by near-orthogonality. Also, it is found that RI with Euclidean distance measure performs much better than many LSA-based summarization techniques. This improved performance of RI-based summarizer over LSA-based summarizer is significant because RI is computationally inexpensive as compared to LSA which uses Singular Value Decomposition (SVD) - a computationally complex algebraic technique for dimension reduction of the underlying Word Space.
Author Keywords: Word Space, Random Indexing, Index vector, Context vector, Near-orthogonal, PageRank.
How to Cite this Article
Niladri Chatterjee and Pramod K. Sahoo, “Effect of Near-orthogonality on Random Indexing Based Extractive Text Summarization,” International Journal of Innovation and Applied Studies, vol. 3, no. 3, pp. 701–713, July 2013.