Message Lab
Article

The Google leak: What it is and why it matters

Here’s what you can do to improve your webpage performance, based on the Great Google Algorithm Leak of 2024

By Jeff Jensen
Share

Earlier this year, Google’s ranking data was leaked — around 14,000 factors for how the company’s ubiquitous search engine evaluates and ranks pages. It’s the first-ever, definitive evidence of the elements in Google’s algorithm, straight from the source. For us at Message Lab who spend significant time thinking about how people discover content online, the leak was a goldmine of information around one of the most used — but opaque — online distribution channels.

Based on the leak and our own SEO secret sauce, here’s some important context about key elements of a web page — and specific actions you can take to improve the ranking of your content.  

The leaked Google code 

The Google algorithm is often framed as a monolithic entity. In reality, it’s a collection of modular scoring algorithms that combine to create the search results. 

The leak revealed the endpoints of Google’s Content Warehouse API, an element of just one of those algorithms. That doesn’t make it any less important. But it does mean you should take it with a proverbial grain of salt because while we do have the endpoints, we don’t have the weighting of those inputs. For example, we know that Google uses dates in its algorithm, but we don’t know if they are more important than who the author is. 

We also have no guarantee that Google is using all these attributes — especially after this leak. Common consensus is that these ranking factors are still being used. But some factors have notes calling out the fact that the ranking factor has been deprecated, meaning that the attribute is still in use but its importance may have been diminished. In its response to the leak, Google has cautioned against making assumptions based on potentially outdated or incomplete information. That hasn’t stopped the leak from sparking intense discussion in the SEO community. 

Actions you can take based on the Google leak 

Based on our newfound understanding of Google’s ranking factors, SEOs can refine their strategies to better align with these insights. Here’s some actionable tactics to consider to more effectively optimize your webpages. 

1. Google is attempting to understand authors, so clarify yours 

Years ago, Google attempted to make authorship a ranking factor, but then told us they’d removed it. A few months ago, I predicted that Google would back bring authorship in the age of AI. It turns out that Google doesn’t have to bring authorship back — because it never went away! 

One of the nodes in the leaked documents is “author,” which means Google is attempting to understand the author of a page. The key word there is “attempting.”  Google is a complex algorithm, but can be surprisingly simple at times. When articles reference multiple people without clear signals about the author, for example, Google is left to guess which of the many names cited is the actual author. And Google sometimes guesses incorrectly. 

Nevertheless, we now know — thanks to the leak — that author credibility matters. And thus, so does providing clarity on the article’s author. 

Action for SEOs: Put “author” schema on every article where there is an author. 

2. Google only uses the last 20 changes on a given page when analyzing links, so optimize yours 

Two revelations here, both part of the same factor, according to the leak. First, Google uses the last 20 edits of your article to evaluate whether a page should rank. Pre-leak, it was believed that only the most recent published versions of your content impacted your rankings. But the real key, and second revelation, comes with analyzing links. A link from another site to yours has always been one of the strongest ranking factors in SEO. The text of a link is an extremely powerful signal for what a page should rank for. And when the text of a link includes a keyword, the page that the link points to will rank better for that keyword. 

Two-part action for SEO:

a. Optimize and reoptimize backlog content more frequently. That means optimizing the same article multiple times, rather than once. 

b. Optimize around phrases used in links to an article.

3. Font size of terms and links matters, so vary yours  

As with much of the leak, the interpretation of this factor is up to the reader. The word-for-word description reads: “the average weighted font size of a term in the doc body.” 

My interpretation puts significant emphasis on the word "weighted." None of the analyses I’ve seen mention heading tags (H1, H2 HTML tags). It’s entirely possible that headers don’t matter, but the fact that they are more often weighted more heavily than the average text of the document is what made them impactful in the first place. So as far as the Google algorithm is concerned, bigger text is more important. 

Action for design: Consider greater variance in font sizes for headers when designing content pages.

Action for SEO: Include a bolded version of the keyword in articles when optimizing for SEO.

Google leak inline

4. Documents get truncated, so pay attention to your word counts

This factor reads:
“The number of tokens, tags, and punctuations in the tokenized contents. This is an approximation of the number of tokens, tags and punctuations we end up with in mustang, but is inexact since we drop some tokens in mustang and also truncate docs at a max cap.” (Mustang is an internal name for a process that utilizes this attribute.)

There’s no clear definition of what constitutes a token. But the outcome is clear. Documents beyond a certain length are truncated for SEO purposes. Beyond a certain, unspecified content length, there’s no SEO benefit to making content longer. 

We at Message Lab have the benefit of both our analytics investigations into engagement with long-form content and prior SEO experimentation to set our own guardrails. The bottom line: diminishing returns in terms of engagement for content beyond 2,500 words. 

Action for editorial: From an SEO perspective, there’s no benefit to creating content longer than ~2,500 words. With anything longer, consider whether a topic is better suited for multiple articles or a hub-and-spoke model.

5. Site embeddings are used to measure how on-topic a page is, so reference and integrate them 

We’re getting even wonkier now. A site embedding is an algorithm of its own that maps the relationship between words in a document on an X,Y graph. Here’s the most accessible Google description of an embedding

In practice and pre-link, SEOs have observed which sections ranking pages have in their content and mimicked them. That’s now a confirmed ranking factor. I’ve predicted this would become a dated practice. We’ll see.  

Two-part action for SEO in partnership with editorial:

a. SEOs: Provide information to the writer in your brief about embeddings.

b. Editorial: Incorporate that information/embeddings when creating the content as you would a target keyword. 

Action for SEO: Integrate embeddings into your brief/optimization process.

6. Chrome data is being used to inform Google Search, so be aware

This one surprised me. Anyone who’s spent time analyzing data in Google Analytics knows that the behavior of iOS users is very different than that of Android, Windows, or any other OS users. And yet Google is using Chrome data alone. One consideration is that Google lacks access to behavior data for other operating systems. 

Action for design: I can’t recommend designing for just one browser. But keep in mind that Chrome performance will have a direct impact on organic search performance and may thus deserve additional attention.

7. Engaged time matters, so keep your users engaged

That’s a bit of an exaggeration, but an accessible approach to the concept. From the documentation (and Google’s recent antitrust testimony), “successful clicks” is a ranking factor — perhaps the most impactful ranking factor. Getting a click from a user and then keeping that user engaged is one of the most effective ways to improve and maintain your ranking position.

Engaged time isn’t a ranking factor, but it is a measure of satisfaction and a good user experience. To this end, all elements of the user experience matter, including load times, responsive design, and usability. 

Action for everyone: Optimize for a good user experience.

The reality behind SEO: less digital puppeteer and more scientific experimenter

For many, the word SEO conjures up images of a hacker with a hoodie pulled over their head, hunched over a keyboard. They sit in a dark room furiously typing away, making the internet dance to the strokes of their keyboard like a digital puppeteer, watching as hapless algorithms become their marionettes. 

In reality, SEO is perhaps more akin to science, where exhaustive experimentation is required to approach what might be considered a fact. This is the closest we’ve ever come to having something that is definitely true. And yet, only time — and more experimentation — will tell how this leak will change SEO.


About the author
Jeff Jensen
Jeff Jensen

Jeff is a digital marketer with a broad set of skills. Working at various agencies, he’s helped all sizes of companies, from mom-and-pop shops to Fortune 500 corporations, drive growth in traffic. With experience in SEO, analytics, content marketing, and social media, he uses the best tools at hand to help his clients succeed. When he’s not running marketing campaigns, Jeff enjoys spending time with his wife and three children. He is based in Portland, Oregon.

Up Next

Live long to prosper: Longevity and the value of content

Live long to prosper: Longevity and the value of content

Ever work hard on a story only for it to get a few views for a couple weeks and then slide into digital obscurity? Find out how to keep your content visible.

Women in AI: Insights from top leaders

Women in AI: Insights from top leaders

The promise of AI is immense. So is its potential to perpetuate bias. Here's what to look for when writing about AI.