Volume 7, Issue 3, August 2014, Pages 1034–1037
Swati Vashisht1, Shubhi Gupta2, and Atul Mani3
1 Computer Science, Amity Group of Institutions, U.P., India
2 Computer Science, Amity Group of Institutions, U.P., India
3 Mechanical Engineering, RKGEC, U.P., India
Original language: English
Copyright © 2014 ISSR Journals. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Outlier mining is concerned with the data objects that do not comply with the general behavior or model of the data, such data Objects, which are either different from or inconsistent with the remaining set of data. Studying the extra ordinary behavior of outliers helps uncovering the knowledge hidden behind them and providing an approach to the decision makers to make profit or improve the service quality. Hence, mining for outliers is an important data mining research with numerous applications, including credit card fraud detection, criminal activities in E-commerce, unusual usages of telecommunication services, Weather Forecasting etc. Moreover, it is useful in digital and customized marketing for identifying the spending behavior of customers with extremely low or extremely high incomes, or in medical diagnose for finding unusual results to various medical treatments.
Some data mining techniques discard outliers as noise or exceptions. While in some applications, these exceptions are considered more interesting than regularly occurring ones like in terrorism attack. Challenges in outlier detection include finding appropriate data models, the dependence of outlier detection systems on the application involved, finding techniques to distinguish outliers from error or exception, and providing justification for identification outliers. Outliers can be detected through N-gram technique but this technique is using a large storage space to store metadata and data dictionary. There are a number of compression models e.g. Content tree weighting method, LZ77, LZ78, LZW that are used in compressing text & image. Burrows
Author Keywords: Outliers, Compression, N-gram technique, weighting methods, storage space.
Swati Vashisht1, Shubhi Gupta2, and Atul Mani3
1 Computer Science, Amity Group of Institutions, U.P., India
2 Computer Science, Amity Group of Institutions, U.P., India
3 Mechanical Engineering, RKGEC, U.P., India
Original language: English
Copyright © 2014 ISSR Journals. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Outlier mining is concerned with the data objects that do not comply with the general behavior or model of the data, such data Objects, which are either different from or inconsistent with the remaining set of data. Studying the extra ordinary behavior of outliers helps uncovering the knowledge hidden behind them and providing an approach to the decision makers to make profit or improve the service quality. Hence, mining for outliers is an important data mining research with numerous applications, including credit card fraud detection, criminal activities in E-commerce, unusual usages of telecommunication services, Weather Forecasting etc. Moreover, it is useful in digital and customized marketing for identifying the spending behavior of customers with extremely low or extremely high incomes, or in medical diagnose for finding unusual results to various medical treatments.
Some data mining techniques discard outliers as noise or exceptions. While in some applications, these exceptions are considered more interesting than regularly occurring ones like in terrorism attack. Challenges in outlier detection include finding appropriate data models, the dependence of outlier detection systems on the application involved, finding techniques to distinguish outliers from error or exception, and providing justification for identification outliers. Outliers can be detected through N-gram technique but this technique is using a large storage space to store metadata and data dictionary. There are a number of compression models e.g. Content tree weighting method, LZ77, LZ78, LZW that are used in compressing text & image. Burrows
Author Keywords: Outliers, Compression, N-gram technique, weighting methods, storage space.
How to Cite this Article
Swati Vashisht, Shubhi Gupta, and Atul Mani, “A nascent approach to mine outliers using compression,” International Journal of Innovation and Applied Studies, vol. 7, no. 3, pp. 1034–1037, August 2014.