Pages

Saturday, June 9, 2012

Spider Webs, Bow Ties, Scale-Free Networks, And The Deep Web

World Wide Web conjures up images of a giant spider web in which everything is connected with everything else in a random pattern and you can go from one end to another web just by following the appropriate link. Theoretically, this is what makes the Web different from the typical index system: you can follow a hyperlink from one page to another. In the theory of "small world" of web, each web page would have been separated from any other web page with an average of about 19 clicks. In 1968, Stanley Milgram found socologist small world of social network theory by noting that every human being is separated from other human beings, with only six degrees of separation. On the Web, the small world theory is supported by initial studies on a small sample of websites. But a study conducted jointly by scientists at IBM, Compaq, and Alta Vista to find something completely different. Scientists use a spider to identify the 200 million web pages and follow the 1.5 billion links on this page.
The researchers found that the Web is not like a spider web at all, but rather as a bow tie. Web butterflies have "strongly connected component" (SCC), which consists of some 56 million web pages. On the right side of the bow tie is a collection of 44 million OUT page, you can get from the center, but could not return to center. OUT page tends to corporate intranets and web pages to other sites that are designed to trap you on the site when you land. On the left side of the bow tie is a 44-million for the page from which you can get to the center, but you can not go to the center. He recently created a page that has not been associated with the centerfold, many. In addition, 43 million pages classified as a "runner" who do not have a link to the center and can not be linked from the center. However, our vines are sometimes associated with EN and / or OUT page. Sometimes the tendrils are related to each other without going through the center (this is called "tube"). Finally, there are 16 million pages completely disconnected from everything.
Another proof of the non-random and unstructured Web is provided in a study conducted by Albert-Laszlo Barabasi at the University of Notre Dame. Barabasi team found that much of the network, random exponential burst of 50 billion Web pages, Web activity is indeed highly concentrated in the "highly connected node Super" that provides connectivity to be less well-connected nodes. Barabasi network type called "scale free" network and find a parallel in the growth of cancer, disease transmission, and computer viruses. As it turns out, the scale-free networks are vulnerable to destruction: Destroy their super node and transmission of corrupted messages quickly. Conversely, if you are a buyer trying to "get the message" about the product, put your product on one of the super node and watch the news. Or build a super node and attract a wide audience.
Thus the image of the web that has emerged from this study is very different from previous reports. The idea that most couples are separated by a small web page link, almost always less than 20 years, and that the number of connections will grow exponentially with the size of the canvas is not supported . In fact, there is a 75% probability that there is no path from one page to another randomly selected. With this knowledge, it now becomes clear why the index of the search engine most advanced web a very small percentage of all the Web pages, and only about 2% of the total population of Internet hosts (about 400 million euros). Search engines can not find a web site the most because their pages are not well connected or related to the central core of the band. Another important finding was the identification of "deep web", composed of over 900 billion web pages are not accessible to web crawlers that the search engine company's most widely used. Instead, the page is the owner (not available for spam and non-clients), such as the page (Wall Street Journal) or not easily available from web pages. In recent years, a new search engine (such as medical Mammaheath search engine) and the former such as Yahoo was revised for deep web search. Because e-commerce revenues depends in part on the customer to be able to find a website using search engines, Web site managers must take steps to ensure that their sites are part of a central core that is connected, or "super node" of the web. One way to do is to ensure that the site has many links as possible to and from other related sites, especially to other sites in the SCC.
Share this article now on :

Post a Comment

:)) ;)) ;;) :D ;) :p :(( :) :( :X =(( :-o :-/ :-* :| 8-} :)] ~x( :-t b-( :-L x( :-p =))