Tuesday, June 23, 2009

Exercise 23: Searching mechanisms

  1. How do search engines such as Alta Vista differ from information directories?
  2. I must admit that I was very surprised to see that altavista still had a presence on the web.http://www.altavista.com/ I guess it is debateable as to whether having a website constitutes a presence.
    I seemingly remember that years ago it was one of the big few interfaces to the web, but now it just looks like the very poor mans google.

    Having said all that and having to search for the connections between Altavista, information directories, and this topic, (Wikipedia) provided the enlightenment. It turns out that AltaVista used a fast, multi-threaded crawler (Scooter) to trawl the net.

    I am going to take a stab that an information directory is meant to represent a static information source, whereas AltaVista which used a crawler, could be thought of as a dynamic data source.

  3. What is a spider? What does it do?
  4. A spider is an automated software program used to locate and collect data from web pages for inclusion in a search engine's database, and to follow links to find new pages on the world wide web.(Eustace, p2)
  5. Describe a search situation where the requirement for recall is high?
  6. I am having trouble defining exactly what this question is actually asking. I am going to take a stab that it is referring to the correllation between search terms and the returned results. The more relevant the result the higher the recall value. With this interpretation in mind, (Cole)relates Visualizing a high recall search strategy output for undergraduates in an exploration stage of researching a term paper. The abstract states that When accessing an information retrieval system, it has long been said that undergraduates who are in an exploratory stage of researching their essay topic should use a high recall search strategy; what prevents them from doing so is the information overload factor associated with showing the undergraduate a long list of citations. One method of overcoming information overload is summarizing and visualizing the citation list.
  7. What is a meta-search engine? Provide some examples.
  8. According to (Wikipedia)a meta-search engine is a search tool that sends user requests to several other search engines and/or databases and aggregates the results into a single list or displays them according to their source. Metasearch engines enable users to enter search criteria once and access several search engines simultaneously. Metasearch engines operate on the premise that the Web is too large for any one search engine to index it all and that more comprehensive search results can be obtained by combining the results from several search engines. This also may save the user from having to use multiple search engines separately.

    Brainboost, ChunkIt!, Clusty, Dogpile, Excite, Harvester42, HotBot, Info.com, Ixquick, Kayak, LeapFish, Mamma, Metacrawler, MetaLib, Mobissimo, Myriad Search, SideStep, Turbo10, WebCrawler, DeeperWeb are all meta-search engines.

  9. What is spamming?
  10. According to (Yahoo)Spam is any message or posting, regardless of its content, that is sent to multiple recipients who have not specifically requested the message. Spam can also be multiple postings of the same message to newsgroups or list servers that aren't related to the topic of the message. Other common terms for spam include UCE (unsolicited commercial email) and UBE (unsolicited bulk email).
  11. How can you get your site listed at major search sites; and how could you improve your site ranking?
  12. According to (Yahoo) to get your site listed and improve your ranking do the following:
    • Submit your site
    • add meta tags
    • cultivate links to your site
    • build a quality site
    • keep your site fresh
    • use a sitemap

References
Cole C., Mandelblatt B., Stevenson J. Visualizing a high recall search strategy output for undergraduates in an exploration stage of researching a term paper (2002) Information Processing and Management, 38 (1), pp. 37-54.
Eustace K, Bots, agents, spiders and mobile computing. ITC382/ITC594/ITC565 Topic 10 lecture notes
Wikipedia. AltaVista. Retrieved 13 July 2009, from http://en.wikipedia.org/wiki/AltaVista
Wikipedia. Metasearch engine. Retrieved 13 July 2009, from http://en.wikipedia.org/wiki/Metasearch_engine
Yahoo!Help. How can I get my site listed in search engines? Retrieved 13 July 2009, from http://help.yahoo.com/l/us/yahoo/smallbusiness/webhosting/promote/promote-05.html
Yahoo!Help. What is spam? Retrieved 13 July 2009, from http://help.yahoo.com/l/us/yahoo/smallbusiness/bizmail/spam/spam-21.html

No comments:

Post a Comment