Author, Subjects, Keywords

Cited Author

 

 
   » By Author or Editor
 » Browse Author by Alphabet
 » By Journal
 » By Subjects
 » By Affiliations
 » By Type
 » By Year
 » By Latest Additions
 
 
   » By Author
 » Top 20 Authors
 » Top 20 Article
 » Top 20 Journal Cited
 » Top 20 Cited
 » Top 20 Author Cited
 » Usage Since Sept 2007


 
 
 

Login | Create Account

Preliminary Investigation of Collecting Malaysia Web Pages Using Automated Traversing Tool

Zaitun Abu Bakar, and Yin, Tai Sock (2004) Preliminary Investigation of Collecting Malaysia Web Pages Using Automated Traversing Tool. Malaysian Journal of Computer Science, 17 (1). pp. 52-64. ISSN 0127-9084

Full text not available from this repository.

Official URL: http://mjcs.fsktm.um.edu.my/detail.asp?AID=288

Affiliations

University of Malaya, Faculty of Computer Science & Information Technology
University of Malaya, Faculty of Computer Science & Information Technology

Abstract

Over the past few years, Malaysia web pages have gained popularity in the internet due to the increasingly strong demand of localized contents. Such huge document sets have introduced many new challenges to efficiently search among all other web pages. This study investigates the use of an automated traversing prototype that implements breadth-first and depth-first approaches to gather Malaysia web pages from the web. In the introduction, we describe the web structure and traversing approaches. Then, we discuss briefly on the experimental set-up that investigate the process of automating web traversal process to gather Malaysia web pages and compare the quality of information found using the two different traversing approaches i.e. breadth-first and depth-first. This is followed by a presentation of the results obtained and its analysis. Finally, the paper describes how the use of these traversal approaches can achieve different results.

Item Type:Journal
Keywords:Web navigation, Traversal approaches, Search engines, Information retrieval, World Wide Web
Subjects:Q Science
ID Code:417

SIL (Summer Institute of Linguistics) (2000). Ethnologue: Languages of the World, 14th Edition. Available from: http://www.sil.org/ethnologue/countries/Asia.html. [Accessed 15th October 2003]

G.J. Kowalski, and M.T. Maybury, Information Storage and Retrieval Systems: Theory and Implementation. 2nd Ed. Massachusetts: Kluwer Academic Publishers, 2000.

S. William, High-Speed Networks and Internets: Performance and Quality of Service. 2nd Ed. New Jersey: Prentice Hall, 2000.

P. Vincent, Free Stuff from the World Wide Web. Arizona: Coriolis Group Books, 1995.

R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, “Trawling the Web for Cyber Communities”. 8th WWW Conference Proccedings, Edinburgh, 1998, pp. 403-415.

J.A. Kleinberg, “Authoritative Sources in Hyperlinked Environment”. Journal of the ACM, Vol. 46(2), 1998, pp. 212-235.

A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins and J. Wiener, “Graph Structure in the Web”. 9th WWW Conference Proceedings, Amsterdam, 2000, pp.309-320.

A.A. Barfourosh, M. Nezhad., M.L. Anderson and D. Perlis, (2002). Information Retrieval on the World Wide Web and Active Logic: A Survey and Problem Definition [online]. Available from: citeseer.nj.nec.com/barfourosh02information.html. [Accessed 12th April 2003].

M. J. Bates, The Design of Browsing and Berrypicking Techniques for the On-line Search Interface. Online Review, Vol. 13 (5), 1989, pp. 407-431.

M.R. Nelson, (2001) Being held hostage by information overload. Available from: http://www.acm.org/crossroads/xrds1-1/mnelson.html. [Accessed 18th July 2003]

T. Nelson, A File Structure for the Complex, the Changing, the Indeterminate. 20th National Conference Proceedings, Baltimore, 1965, pp. 84-100.

B. Pinkerton, (1994). Finding What People Want: Experiences with Wbcrawler. Available from: http://www.thinkpink.com/bp/WebCrawler/WWW94.html. [Accessed 17th July 2003].

D. Suvellan, (2003). Hitwise Search Engine Ratings. Available from: http://www.searchenginewatch.com/reports/article.php/3099931. [Accessed 30th October 2003].

J. Calvert, (2003). Internet Services: Malaysia, Gartner Report. Available from: http://www.gartner.com/. [Accessed 6th November 2003].

Hoffman Agency (2000). Singtel, Lycos launch Malaysia web sites. Available from: http://www.hoffman.com/newsgram/news_03_24_00.htm. [Accessed 6th November 2003].

MYNIC (2003). MYNIC statistics. Available from: http://www.mynic.net/. [Accessed 6th November 2003].

D. Suvellan, (2003). Search Engines Features Chart. Available from: http://www.searchenginewatch.com/facts/article.php/2155981[Accessed 6th November 2003].

W.B. Frakes and R. Beaza-Yates, Information Retrieval: Data Structures and Algorithms, New Jersey: Prentice-Hall, 1992.

Mauldin, Michael L. and John R.R. Leavitt (1994). Web Agent-related Research at the Center for Machine Translation. Available from: http fuzine.mt.cs.cmu.edu/mlm/signidr94.html. [Accessed 15th March 2003].

M.F. Porter, An Algorithm for Suffix Stripping. Program, Vol. 14 (3), 1980, pp. 130-137.

Tsunenori Ishioka, “Evaluation Criteria for Information Retrieval System”. Transaction of the Institute of Electronics Information and Communication Engineers, Vol. D-1(5), 2003.

Shi Weisong, W. Randy, E. Collins, and K. Vijay, (2002). Workload Characterization of a Personalized Web Site and Its Implications for Dynamic Content Caching. New York University. Available from: www.cs.nyu.edu/csweb/Research/TechReports/TR2002-829/TR2002-829.ps.gz. [Accessed 10th September 2003].

Li Wentian (1999). Zipf’s Law, North Shore LIJ research Institute. Available from: http://linkage.rockefeller.edu/wli/zipf/ [Accessed 2nd June 2003].

M. Kobayashi and K. Takeda, “Information Retrieval on the Web”. Research Report, RT0347, April 2000, Japan: IBM.

J.A. Kleinberg, and S. Lawrence, (2001). The Structure of the Web. Science, Vol. 294, pp. 1849-1851.

L.D. Catledge, and J.E. Pitkow, “Characterizing Browsing Strategies in the World-Wide Web”. 3rd WWW Conference Proceedings, Darmstadt, 1994, pp. 1-9.

Repository Staff Only: item control page