Tuesday, March 10, 2020

Search Engine Structure essays

Search Engine Structure essays The Internet is a vast and overwhelming collection of information on any subject that can be imagined. To provide structure to this huge amount of information, search engines allow users to search for specific pieces of Search engines such as Google and Yahoo are technically known as information retrieval systems (IR) (Liddy, 2001). These search engines then work on the basis of created indexes. These indexes are matched with queries entered by users. Indexes are created according to words in documents and pointers within documents. The IR system creating this index is structured according to four elements: a document processor, query processor, search and matching function, and ranking ability (Liddy, 2001). The document processor comprises a preparing, processing and inputting function when a search is conducted (Liddy, 2001). Several functions are inherent in this process, including normalizing the document stream, breaking it into retrievable units, metatagging subdocument pieces, identifying indexable elements, etc. The first three functions are known as pre-processing, and the main aim is standardization of multiple formats. The nature and quality of search results are determined by the index identification stage. Further concerning the quality of material is the elimination of stop words. These include words of little meaning to the content of the query, such as "and", "but", "of", etc. Deleting these words helps to save search time and volume. Closely related is term stemming, according to which suffixes are removed. This helps to reduce the number of unique words in an index, and again saves storage space. A disadvantage is that precision and accuracy of search results may be negatively affected. There is however the option of a strong or weak stemming algorithm in order to regulate precision. Finally, the document processor extracts i...

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.