The search engine to understand the word segmentation algorithm full-text retrieval technology

segmentation principle is constantly changing, constantly updated, we should continue to learn, only to grasp the essence to grasp the essence.


many times love of Shanghai will be split according to the weight of his words in the lexicon, calculation of the weights of all aspects of life, based on the complex search engine to do is to return to the user the most desired results, sometimes the webmaster do stand in the user’s point of view to consider the issue, which is actually to consider in the search the engine’s point of view, no matter in determining the target keywords or long tail keywords, can according to the principle of Chinese word choice, so you can maximize the reduction of useless.

full text retrieval technology

make people’s life more convenient, massive information increasingly sharp let us dazzling, the search engine allows us to quickly find the answer you want. So how about search engine word segmentation algorithm, can make a web site in the search engine to obtain a better opportunity to show. In the explanation of Chinese word segmentation technology, first to understand the full text retrieval technology.

since everyone is familiar with love Shanghai, love Shanghai Chinese own segmentation technology. By including forward maximum matching, reverse maximum matching, optimal matching method, expert system method etc.. The maximum matching segmentation is the most commonly used solution, it adopts a mechanical algorithm, by establishing a dictionary and maximum matching word segmentation of Chinese. A simple example of a search for "where is the Peking University, then return the result contain many words such as" Peking University, Peking University, the search engine is the best match is to judge, the Peking University as a words to index records and returns. Of course, the maximum matching also have integrity, such as the length of long words, search engines will sometimes not accurate segmentation, or connected to both before and after the word segmentation can not be accurately. For example, "synthetic molecules", will be returned to union, composition, midnight, and sometimes we want the key word is "molecular".

full-text retrieval refers to each word in the program to scan the index and establish corresponding index, recording the position and occurrence of the word of. When the query through the search engine, search program in record index search and return to the user. Full text retrieval is divided into words of the full text indexing based on full-text indexing words. The words of the full text indexing will establish an index for each word in the content and records based on this method, high precision and low recall, but, especially for Chinese, sometimes search Mark, will list the results of Marx. The index is based on word a word as a unit for index record, and can handle synonyms. Search engines have their own vocabulary, when users search, search engines will be selected from the lexicon as a key index, which can greatly improve the retrieval accuracy.

The rapid development of the Internet in twenty-first Century

Chinese segmentation technology

