Effect of Near-orthogonality on Random Indexing Based Extractive Text Summarization

Chatterjee, Niladri; Sahoo, Pramod K.

Volume 3, Issue 3, July 2013, Pages 701–713

Effect of Near-orthogonality on Random Indexing Based Extractive Text Summarization

BibTex | RIS | EndNote | RefWorks

@article{IJIAS-13-109-01,
author = {Niladri Chatterjee and Pramod K. Sahoo},
title = {{Effect of Near-orthogonality on Random Indexing Based Extractive Text Summarization}},
journal = {International Journal of Innovation and Applied Studies},
volume = {3},
year = {2013},
pages = {701--713},
issue = {3},
number = {3},
issn = {2028-9324},
url = {http://www.ijias.issr-journals.org/abstract.php?article=IJIAS-13-109-01},
abstract_html_url = {http://www.ijias.issr-journals.org/abstract.php?article=IJIAS-13-109-01},
pdf_url = {http://www.issr-journals.org/links/papers.php?journal=ijias&application=pdf&article=IJIAS-13-109-01},
document_type={Article},
source={www.issr-journals.org}
}

TY  - JOUR
ID  - 
TI  - Effect of Near-orthogonality on Random Indexing Based Extractive Text Summarization
AU  - Niladri Chatterjee
AU  - Pramod K. Sahoo
PY  - 2013
VL  - 3
IS  - 3
SP  - 701
EP  - 713
JO  - International Journal of Innovation and Applied Studies
T2  - International Journal of Innovation and Applied Studies
SN  - 20289324
UR  - http://www.ijias.issr-journals.org/abstract.php?article=IJIAS-13-109-01
AB  - Application of Random Indexing (RI) to extractive text summarization has already been proposed in literature. RI is an approximating technique to deal with high-dimensionality problem of Word Space Models (WSMs). However, the distinguishing feature of RI from other WSMs (e.g. Latent Semantic Analysis (LSA)) is the near-orthogonality of the word vectors (index vectors). The near-orthogonality property of the index vectors helps in reducing the dimension of the underlying Word Space. The present work focuses on studying in detail the near-orthogonality property of random index vectors, and its effect on extractive text summarization. A probabilistic definition of near-orthogonality of RI-based Word Space is presented, and a thorough discussion on the subject is conducted in this paper. Our experiments on DUC 2002 data show that while quality of summaries produced by RI with Euclidean distance measure is almost invariant to near-orthogonality of the underlying Word Space; the quality of summaries produced by RI with cosine dissimilarity measure is strongly affected by near-orthogonality. Also, it is found that RI with Euclidean distance measure performs much better than many LSA-based summarization techniques. This improved performance of RI-based summarizer over LSA-based summarizer is significant because RI is computationally inexpensive as compared to LSA which uses Singular Value Decomposition (SVD) - a computationally complex algebraic technique for dimension reduction of the underlying Word Space.
ER  -

TY  - JOUR
ID  - 
TI  - Effect of Near-orthogonality on Random Indexing Based Extractive Text Summarization
AU  - Niladri Chatterjee
AU  - Pramod K. Sahoo
PY  - 2013
VL  - 3
IS  - 3
SP  - 701
EP  - 713
JO  - International Journal of Innovation and Applied Studies
SN  - 20289324
AB  - 
Application of Random Indexing (RI) to extractive text summarization has already been proposed in literature. RI is an approximating technique to deal with high-dimensionality problem of Word Space Models (WSMs). However, the distinguishing feature of RI from other WSMs (e.g. Latent Semantic Analysis (LSA)) is the near-orthogonality of the word vectors (index vectors). The near-orthogonality property of the index vectors helps in reducing the dimension of the underlying Word Space. The present work focuses on studying in detail the near-orthogonality property of random index vectors, and its effect on extractive text summarization. A probabilistic definition of near-orthogonality of RI-based Word Space is presented, and a thorough discussion on the subject is conducted in this paper. Our experiments on DUC 2002 data show that while quality of summaries produced by RI with Euclidean distance measure is almost invariant to near-orthogonality of the underlying Word Space; the quality of summaries produced by RI with cosine dissimilarity measure is strongly affected by near-orthogonality. Also, it is found that RI with Euclidean distance measure performs much better than many LSA-based summarization techniques. This improved performance of RI-based summarizer over LSA-based summarizer is significant because RI is computationally inexpensive as compared to LSA which uses Singular Value Decomposition (SVD) - a computationally complex algebraic technique for dimension reduction of the underlying Word Space.
ER  -

RT Journal Article
ID IJIAS-13-109-01
A1 Niladri Chatterjee
A1 Pramod K. Sahoo
YR 2013
T1 Effect of Near-orthogonality on Random Indexing Based Extractive Text Summarization
JF International Journal of Innovation and Applied Studies

Download

Niladri Chatterjee¹ and Pramod K. Sahoo²

¹ Department of Mathematics, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, India
² Institute for Systems Studies and Analyses, Defence Research and Development Organisation, Metcalfe House Complex, Delhi 110054, India

Original language: English

Copyright © 2013 ISSR Journals. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Application of Random Indexing (RI) to extractive text summarization has already been proposed in literature. RI is an approximating technique to deal with high-dimensionality problem of Word Space Models (WSMs). However, the distinguishing feature of RI from other WSMs (e.g. Latent Semantic Analysis (LSA)) is the near-orthogonality of the word vectors (index vectors). The near-orthogonality property of the index vectors helps in reducing the dimension of the underlying Word Space. The present work focuses on studying in detail the near-orthogonality property of random index vectors, and its effect on extractive text summarization. A probabilistic definition of near-orthogonality of RI-based Word Space is presented, and a thorough discussion on the subject is conducted in this paper. Our experiments on DUC 2002 data show that while quality of summaries produced by RI with Euclidean distance measure is almost invariant to near-orthogonality of the underlying Word Space; the quality of summaries produced by RI with cosine dissimilarity measure is strongly affected by near-orthogonality. Also, it is found that RI with Euclidean distance measure performs much better than many LSA-based summarization techniques. This improved performance of RI-based summarizer over LSA-based summarizer is significant because RI is computationally inexpensive as compared to LSA which uses Singular Value Decomposition (SVD) - a computationally complex algebraic technique for dimension reduction of the underlying Word Space.

Author Keywords: Word Space, Random Indexing, Index vector, Context vector, Near-orthogonal, PageRank.

How to Cite this Article

Niladri Chatterjee and Pramod K. Sahoo, “Effect of Near-orthogonality on Random Indexing Based Extractive Text Summarization,” International Journal of Innovation and Applied Studies, vol. 3, no. 3, pp. 701–713, July 2013.

About IJIAS

News

Submission

Downloads

Archives

Custom Search

Contact

Connect with IJIAS

Effect of Near-orthogonality on Random Indexing Based Extractive Text Summarization

Abstract

How to Cite this Article