Comm Corner Logo
Comm Corner
Internet Search Basics: 
Finding Things 
How to get Started 
by John Woody

Alamo PC Organization: HOME > PC Alamode Magazine > Columns > Comm Corner 

Have you ever gotten so much information you did not know what to do with it? Or, have you ever been frustrated in not finding what you were looking for on the Internet? Have you ever gotten any information from your Internet connection? Do you know how to use the browser search tools? What are browsers? What are search tools? Are we asking the right questions? Have you ever visited the Alamo PC Web Site <Search the World Wide Web> page? 

 The state of the art in browsers seems to be advancing faster than we can keep up with its technology. Microsoft Explorer is in some advanced version. Netscape Navigator is at least at version 4.02 or higher. And I just upgraded from Netscape Beta version .08 to version 2.0ish. So, how do we use all this Internet technology? 

 Browsers, as we all know are client applications which search other locations for the information we would like to access. Browsers have been around for a long time and have their beginning in the directory search tools which were used to find files to download with File Transfer Protocol (FTP). 

 These early search tools were Unix-based and not graphical. They were developed because one of the biggest problems has been attempting to find what you wanted on the Internet. Anonymous FTP was great for moving files from one place to another, but it gave no insight as to was in the file or where the file may be located. 

 One of the first search tools was archie which was developed to search indexes or directories on public servers. Servers were set up to handle archie inquiries. Later Gopher was developed in a similar fashion complete with gopher servers. Each search tool became a little more sophisticated to a point where we can insert a query into our client application without knowing where the information is located and find it. These are "stateless" search tools. The World Wide Web (WWW), with its graphical interface, has made that capability even more accessible. 

 Now the WWW browsers have the capability to access nearly any public server world wide while searching for our information. Since the protocols, on which the WWW browsers are based, are layered onto the basic Transmission Control Protocol/Internet Protocol (TCP/IP), all of these earlier applications and servers are accessible by the browsers. 

 Nearly all of the browsers themselves have search capability in that we can use the <Location> Dialog box or the <Open> button, then enter a new Uniform Resource Locator (URL) addresses in the dialog box space and click on <send> to find remote Web sites. 

The first URL's we seem to remember are those for the search tools. Search tools are sort of like a book's table of contents or indexes. A means of making it easy to find information within the book. Internet search tools or search engines or robot spiders as they are called, function like the table or contents or indexes we are used to. 

 In conjunction with today's browsers, the "search engine" robot spiders have become easy to use and available from within the browsers or directly on the Internet, making it very easy to become overcome with too much information. If you have accessed the Alamo PC Web Site; <www.alamopc.org><Knowledge Icon><Hop> hypertext <Search the World Wide Web> hypertext. You have discovered a real gem in the categorized search tools listed there. The two columns listed on this page group these search tools into two categories. Our <Search the World Wide Web> page lists these programs into two areas which we call either Catalogs or Engines. 
 
 

Catalog Services 

The Catalog services are generally organized by subjects listed usually in hierarchical order. This makes searching easy in that you log onto the site, select your subject category, and continue to move deeper into the subject matter until you find what you are looking for. Catalog services are also called subject directories. The hierarchical order of catalog services provide two advantages, first, they are great for browsing, and, second, they reduce the odds of irrelevant sites in the search results. There is no set method of keeping the Catalog service categories current. Most of these services rely on others to send the information to the service of inclusion or update. 
 
 

Search Engine Services

The search engine services are large databases that contain document listings or word listings which have been recovered by the robot spider as it automatically cruises through it's path among the public servers. 

These databases are really keyword indexes. These keyword index databases contain billions of words and must be maintained by the robot spider and indexing computers. The Alta Vista Web site, for example is one of these keyword index databases. The Alta Vista database contains more than 30 million pages and 10 billion words. The index takes up more than 40 GB of disk space. Maintenance of the database uses six DEC Alpha computers with more than 11.5 GB of RAM and 463 GB of hard disk space. 

 Keyword index databases must constantly be re-built in order to keep the database current. This is the function of the robot spiders. The Alta Vista spider, named Scooter, crawls through the Web at a rate of three million pages per day. It constantly sends the updated data to the Alta Vista indexing computer which can index 1 GB of document text per hour. 

 All of the Keyword index database services function in this manner. This is where the real benefit of these services come into play. The combined ability of a large database and a fast, effective search engine provide us with more information than we really know what to do with. I need to note here that these ROBOT spiders or search engines are not capable of going behind firewalls or through gateways. They are capable of looking and indexing only that which is publicly available. 
 
 

How to Use These Services 

The first step in the use of either of these methods of Internet searching is to log on to the Internet, open the browser, then use the <Location> or <Open> dialog box to enter the URL of the search tool of choice to begin the data search. Remember to mark the URL location address as a bookmark during the first visit so that the search engine Web site can easily be returned to the next time. 

 During catalog searches, go the top category heading nearest to your subject, then continue to go to lower (more specific) levels until you find the information you are looking for. You are browsing from within the catalog service. The advantage of catalog services lies with the hierarchical nature of its subject directories tends to reduce the number of irrelevant documents. The major disadvantage of these catalog services is that many of the subject directories may have a small number of document sources. 

 There is no standard way of collecting the subject information in the various catalog services. Many, such as Yahoo wait for the data source to contact the catalog service to include the subject information. Catalog services do not use robot spiders to seek out the information. 

 During engine searches, load the search engine program, and then develop the search query in the provided dialog box. This search query is called a search string or query. The search string is the collection of text we want information about. All of the graphical based search engine programs provide a dialog box to enter the search string text into. Once that is done, click on the <send> button to fulfil the query. 

The result is not easy to sort out because we have uncovered 1,300,450 items with the search string text in them. The advantages of keyword engine searches becomes very apparent, more information than we will ever want. It also shows the disadvantage in that we need to know more about how the search engine indexes information and how the mechanics for the query program works. 

 A little reading of the search engine service HELP section discloses the modifiers used by that service to refine searches. 

Most of these programs use some form of BOOLEAN logic terms to help filter the search string. The standard BOOLEAN terms of AND, OR, NOR are supplemented by the plus(+) sign, quotation marks ( ""), and other modifiers to give that help. It would seem to be a good idea to print the HELP section of one or two of the search engine services and keep them handy for future information searches. 
 
 

Which Service to Use

Now back to the top of this article to get the Alamo PC Web Site URL; <www.alamopc.org>, then go to the search training page by clicking on <Knowledge Icon>, click <Hop> hypertext, then click <Search the World Wide Web> hypertext to get to the Web search training page. There we a linked page with listings of some of the more popular catalog and engine search services. 

 The two most well known services are at the top of the lists. In the catalog list, Yahoo is first; and in the engine list, Alta Vista is first. From the Search Hints and Tips on that Web page, we are remined to frame the search into broad topics and use the catalog services to go from a broad topic dow the hierarchical ladder to the refined topic we seek. We are browsing within the service topic subject headings. 

 The Search Hints and Tips section tells us to use one of the engine services if we want a focused search. Now we will need to actually read the HELP section to determine our engine service's form of BOOLEAN LOGIC. Once we have applied the logic, and entered the query into the dialog box as a search string, we can reap the returned information. 
 
 

Conclusion

Try out the Alamo PC Web site and try out the Search the World Wide Web page. Do a little homework on the one or two of the search engine service HELP sections to learn about the logic each uses. Each service has a little different uses of the logic techniques. Use more than one of each of the two types. The URLs for Alta Vista and Yahoo are:  Good searches! 

 JOHN WOODY IS A TELECOMMUNICATIONS CONSULTANT SPECIALIZING IN SMALL BUSINESS COMMUNICATIONS, NETWORKS, AND INTERNET BUSINESS TRAINING.