This
article is about a search-results ranking algorithm.
Google Panda is a change to the Google's search results
ranking algorithm that was first released in February 2011. The
change aimed to lower the rank of "low-quality sites" or "thin
sites", and return higher-quality sites near the top of the search results[
CNET reported a surge in the rankings of news websites and social networking sites, and a drop in rankings
for sites containing large amounts of advertising. This change reportedly
affected the rankings of almost 12 percent of all search results.]Soon
after the Panda rollout, many websites, including Google's webmaster forum,
became filled with complaints of scrapers/copyright infringers getting better
rankings than sites with original content. At one point, Google publicly asked
for data point] to help detect scrapers better. Google's
Panda has received several updates since the original rollout in February 2011,
and the effect went global in April 2011. To help affected publishers, Google
published an advisory on its blog , thus giving some direction for
self-evaluation of a website's quality. Google has provided a list of 23 bullet
points on its blog answering the question of "What counts as a
high-quality site?" that is supposed to help webmasters "step into
Google's mindset".
The Panda process
Google Panda was built through an algorithm
update that used artificial intelligence in a more sophisticated and scalable
way than previously possible. Human quality testers rated thousands of websites
based on measures of quality, including design, trustworthiness, speed and
whether or not they would return to the website. Google's new Panda
machine-learning algorithm, made possible by and named after engineer Navneet
Panda, was then used to
look for similarities between websites people found to be high quality and low
quality.
Many new ranking factors have been introduced
to the Google algorithm as a result, while older ranking factors like PageRank have been downgraded in importance. Google
Panda is updated from time to time and the algorithm is run by Google on a
regular basis. On April 24, 2012 the Google Penguin update was released, which
affected a further 3.1% of all English language search queries, highlighting
the ongoing volatility of search rankings.
The latest Panda version was confirmed by the
company in its official Twitter page, where it announced, “New data refresh of
Panda starts rolling out this week. 1% of search results change enough to
notice. "July 28, 2012"
Significant differences between Panda and previous
algorithms
Google
Panda impacts an entire site's ranking or specific section rather than just the
individual pages on a site.
In
March 2012, Google updated Panda and stated that they are deploying an
"over-optimization penalty," in order to level the playing field.
PageRank is a link analysis algorithm, named after Larry Page and used by the Google Internet search engine, that assigns a numerical weighting
to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of
"measuring" its relative importance within the set. The algorithm may be applied to any collection of
entities with reciprocal quotations and references. The
numerical weight that it assigns to any given element E is referred to
as the PageRank of E
The
name "PageRank" is a trademark of Google, and the PageRank process
has been patented (U.S. Patent 6,285,999). However, the patent is assigned to Stanford
University and not to
Google. Google has exclusive license rights on the patent from Stanford
University. The university received 1.8 million shares of Google in exchange
for use of the patent; the shares were sold in 2005 for $336 million.
Google Toolbar
The
Google
Toolbar's PageRank
feature displays a visited page's PageRank as a whole number between 0 and 10.
The most popular websites have a PageRank of 10. The least have a PageRank of
0. Google has not disclosed the specific method for determining a Toolbar
PageRank value, which is to be considered only a rough indication of the value
of a website.
PageRank
measures the number of sites that link to a particular page. The PageRank of a
particular page is roughly based upon the quantity of inbound links as well as
the PageRank of the pages providing the links. The algorithm also includes
other factors, such as the size of a page, the number of changes, the time
since the page was updated, the text in headlines and the text in hyperlinked
anchor texts. The Google Toolbar's PageRank is updated infrequently, so the
values it shows are often out of date.
SERP Rank
The
search engine results page (SERP) is the actual result returned by a search engine
in response to a keyword query. The SERP consists of a list of links to web
pages with associated text snippets. The SERP rank of a web page refers to the
placement of the corresponding link on the SERP, where higher placement means
higher SERP rank. The SERP rank of a web page is a function not only of its PageRank,
but of a relatively large and continuously adjusted set of factors (over 200),
commonly referred to by internet marketers as "Google Love". Search engine optimization (SEO) is aimed at achieving the highest possible SERP
rank for a website or a set of web pages.
After
the introduction of Google
Places into the
mainstream organic SERP, PageRank played little to no role in ranking a
business in the Local Business Results. While the theory of citations still
plays a role in the algorithm, PageRank is not a factor since business
listings, rather than web pages, are ranked.
Google directory PageRank
The
Google
Directory PageRank is
an 8-unit measurement. Unlike the Google Toolbar, which shows a numeric
PageRank value upon mouseover of the green bar, the Google Directory only
displays the bar, never the numeric values.
False or spoofed PageRank
In
the past, the PageRank shown in the Toolbar was easily manipulated. Redirection
from one page to another, either via a HTTP 302 response or a "Refresh" meta tag, caused the source page to acquire
the PageRank of the destination page. Hence, a new page with PR 0 and no
incoming links could have acquired PR 10 by redirecting to the Google home
page. This spoofing technique, also known as 302 Google Jacking, was a known vulnerability. Spoofing
can generally be detected by performing a Google search for a source URL; if
the URL of an entirely different site is displayed in the results, the latter
URL may represent the destination of a redirection.
Manipulating PageRank
For
search engine optimization purposes, some companies offer to sell high PageRank
links to webmasters. As links from higher-PR pages are believed to be more
valuable, they tend to be more expensive. It can be an effective and viable
marketing strategy to buy link advertisements on content pages of quality and
relevant sites to drive traffic and increase a webmaster's link popularity.
However, Google has publicly warned webmasters that if they are or were
discovered to be selling links for the purpose of conferring PageRank and
reputation, their links will be devalued (ignored in the calculation of other
pages' PageRanks). The practice of buying and selling links is intensely
debated across the Webmaster community. Google advises webmasters to use the nofollow HTML attribute value on sponsored links. According
to Matt Cutts, Google is concerned about webmasters
who try to game the
system, and thereby
reduce the quality and relevancy of Google search results.
The intentional surfer model
The
original PageRank algorithm reflects the so-called random surfer model, meaning
that the PageRank of a particular page is derived from the theoretical
probability of visiting that page when clicking on links at random. However,
real users do not randomly surf the web, but follow links according to their
interest and intention. A page ranking model that reflects the importance of a
particular page as a function of how many actual visits it receives by real
users is called the intentional surfer model. The Google toolbar sends
information to Google for every page visited, and thereby provides a basis for
computing PageRank based on the intentional surfer model. The introduction of
the nofollow attribute by Google to combat Spamdexing has the side effect that webmasters
commonly use it on outgoing links to increase their own PageRank. This causes a
loss of actual links for the Web crawlers to follow, thereby making the
original PageRank algorithm based on the random surfer model potentially
unreliable. Using information about users' browsing habits provided by the
Google toolbar partly compensates for the loss of information caused by the nofollow attribute. The SERP rank of a
page, which determines a page's actual placement in the search results, is
based on a combination of the random surfer model (PageRank) and the
intentional surfer model (browsing habits) in addition to other factors.
Other uses
A
version of PageRank has recently been proposed as a replacement for the
traditional Institute for Scientific Information (ISI) impact factor, and implemented at eigenfactor.org. Instead of merely counting total
citation to a journal, the "importance" of each citation is
determined in a PageRank fashion.
A
similar new use of PageRank is to rank academic doctoral programs based on
their records of placing their graduates in faculty positions. In PageRank
terms, academic departments link to each other by hiring their faculty from
each other (and from themselves). PageRank has been used to rank spaces or
streets to predict how many people (pedestrians or vehicles) come to the
individual spaces or streets. In lexical semantics it has been used to perform Word
Sense Disambiguationnd
to automatically rank WordNet synsets according to how strongly they
possess a given semantic property, such as positivity or negativity.
A
dynamic weighting method similar to PageRank has been used to generate customized
reading lists based on the link structure of Wikipedia.
A Web crawler may use PageRank as one of a number
of importance metrics it uses to determine which URL to visit during a crawl of
the web. One of the early working papers
that were used in the creation of Google is Efficient crawling
through URL ordering, which discusses the use of a number of different
importance metrics to determine how deeply, and how much of a site Google will
crawl. PageRank is presented as one of a number of these importance metrics,
though there are others listed such as the number of inbound and outbound links
for a URL, and the distance from the root directory on a site to the URL.
The
PageRank may also be used as a methodology to measure the apparent impact of a community like the Blogosphere on the overall Web itself. This
approach uses therefore the PageRank to measure the distribution of attention
in reflection of the Scale-free
network paradigm.
In
any ecosystem, a modified version of PageRank may be used to determine species
that are essential to the continuing health of the environment. An application
of PageRank to the analysis of protein networks in biology is reported
recently.
No comments:
Post a Comment