A huge leak of Google documents has shown us how Google decides which websites to show first in search results. These documents explain what Google looks at when ranking websites, like clicks, links, content, and data from its Chrome browser.
What Happened:
- On March 13, a bot named yoshi-code-bot released thousands of internal Google documents on GitHub. These documents were shared with Rand Fishkin, co-founder of SparkToro, and Michael King, CEO of iPullRank.
Why It Matters:
- Understanding Google’s ranking algorithm is valuable for people who work with SEO (Search Engine Optimization). This leak is one of the biggest stories in the history of SEO.
What’s Inside the Documents:
- Ranking Features:
- There are 2,596 modules and 14,014 attributes in the documents.
- The documents do not reveal how important each feature is, just that they exist.
- Twiddlers:
- These are tools that can change how a document is ranked.
- Reasons for Demotion:
- Content can be ranked lower if:
- A link doesn’t match the target site.
- Users are not happy with the search results.
- Product reviews are involved.
- Location matters.
- The domain name matches the search exactly.
- It’s inappropriate content (e.g., porn).
- Content can be ranked lower if:
- Change History:
- Google keeps a copy of every version of every page it has ever indexed but only uses the last 20 changes when analyzing links.
Key Findings:
- Links Matter: Having a variety of relevant links is still important. Google considers the PageRank of a website’s homepage for every document.
- Successful Clicks Matter: Good content and user experiences help you rank higher. Google measures different types of clicks to see if users find the content useful.
- Brand Matters: Building a well-known brand helps improve your search rankings and traffic.
- Entities Matter: Google keeps track of authors and tries to identify who wrote the content.
- Chrome Data: Google uses data from its Chrome browser to help rank websites.
- Whitelists: Some domains, like those related to elections and COVID, might be whitelisted.
- Small Sites: Google might boost or demote small personal sites, but it’s unclear how much.
Other Interesting Findings:
- Freshness: Google checks dates in bylines, URLs, and on-page content to ensure the information is current.
- Core Topics: Google compares pages and sites to see if a page is a core topic of the website.
- Domain Registration: Google stores domain registration information.
- Page Titles: Titles are important and Google measures how well they match search queries.
- Font Size: Google measures the average font size of terms in documents and anchor text.
Articles for More Info:
- “Secrets from the Algorithm: Google Search’s Internal Engineering Documentation Has Leaked” by Michael King on iPullRank.
- “An Anonymous Source Shared Thousands of Leaked Google Search API Documents with Me; Everyone in SEO Should See Them” by Rand Fishkin on SparkToro.
There is some debate whether these documents were “leaked” or “discovered.” It’s likely they were accidentally made public during a code review.
Sharing is Caring!