Huge Google Search Document Leak Reveals Inner Workings of Ranking Algorithm

Danny Goodwin reports via Search Engine Land: A trove of leaked Google documents has given us an unprecedented look inside Google Search and revealed some of the most important elements Google uses to rank content. Thousands of documents, which appear to come from Google’s internal Content API Warehouse, were released March 13 on Github by an automated bot called yoshi-code-bot. These documents were shared with Rand Fishkin, SparkToro co-founder, earlier this month.

What’s inside. Here’s what we know about the internal documents, thanks to Fishkin and [Michael King, iPullRank CEO]:

Current: The documentation indicates this information is accurate as of March.
Ranking features: 2,596 modules are represented in the API documentation with 14,014 attributes.
Weighting: The documents did not specify how any of the ranking features are weighted — just that they exist.
Twiddlers: These are re-ranking functions that “can adjust the information retrieval score of a document or change the ranking of a document,” according to King.
Demotions: Content can be demoted for a variety of reasons, such as: a link doesn’t match the target site; SERP signals indicate user dissatisfaction; Product reviews; Location; Exact match domains; and/or Porn.
Change history: Google apparently keeps a copy of every version of every page it has ever indexed. Meaning, Google can “remember” every change ever made to a page. However, Google only uses the last 20 changes of a URL when analyzing links.

Other interesting findings. According to Google’s internal documents:
Freshness matters — Google looks at dates in the byline (bylineDate), URL (syntacticDate) and on-page content (semanticDate).
To determine whether a document is or isn’t a core topic of the website, Google vectorizes pages and sites, then compares the page embeddings (siteRadius) to the site embeddings (siteFocusScore).
Google stores domain registration information (RegistrationInfo).
Page titles still matter. Google has a feature called titlematchScore that is believed to measure how well a page title matches a query.
Google measures the average weighted font size of terms in documents (avgTermWeight) and anchor text.
What does it all mean? According to King: “[Y]ou need to drive more successful clicks using a broader set of queries and earn more link diversity if you want to continue to rank. Conceptually, it makes sense because a very strong piece of content will do that. A focus on driving more qualified traffic to a better user experience will send signals to Google that your page deserves to rank.” […] Fishkin added: “If there was one universal piece of advice I had for marketers seeking to broadly improve their organic search rankings and traffic, it would be: ‘Build a notable, popular, well-recognized brand in your space, outside of Google search.'”

Read more of this story at Slashdot.

Instead of ‘Auth,’ We Should Say ‘Permissions’ and ‘Login’

The term “auth” is ambiguous, often meaning either authentication (authn) or authorization (authz), which leads to confusion and poor system design. Instead, Nicole Tietz-Sokolskaya, a software engineer at AI market research platform Remesh, argues that the industry adopt the terms “login” for authentication and “permissions” for authorization, as these are clearer and help maintain distinct, appropriate abstractions for each concept. From their blog post: We should always use the most clear terms we have. Sometimes there’s not a great option, but here, we have wonderfully clear terms. Those are “login” for authentication and “permissions” for authorization. Both are terms that will make sense with little explanation (in contrast to “authn” and “authz”, which are confusing on first encounter) since almost everyone has logged into a system and has run into permissions issues. There are two ways to use “login” here: the noun and the verb form. The noun form is “login”, which refers to the information you enter to gain access to the system. And the verb form is “log in”, which refers to the action of entering your login to use the system. “Permissions” is just the noun form. To use a verb, you would use “check permissions.” While this is long, it’s also just… fine? It hasn’t been an issue in my experience.

Both of these are abundantly clear even to our peers in disciplines outside software engineering. This to me makes it worth using them from a clarity perspective alone. But then we have the big benefit to abstractions, as well. When we call both by the same word, there’s often an urge to combine them into a single module just by dint of the terminology. This isn’t necessarily wrong — there is certainly some merit to put them together, since permissions typically require a login. But it’s not necessary, either, and our designs will be stronger if we don’t make that assumption and instead make a reasoned choice.

Read more of this story at Slashdot.

Ditch Brightly Colored Plastic, Anti-Waste Researchers Tell Firms

Retailers are being urged to stop making everyday products such as drinks bottles, outdoor furniture and toys out of brightly coloured plastic after researchers found it degrades into microplastics faster than plainer colours. From a report: Red, blue and green plastic became “very brittle and fragmented,” while black, white and silver samples were “largely unaffected” over a three-year period, according to the findings of the University of Leicester-led project. The scale of environmental pollution caused by plastic waste means that microplastics, or tiny plastic particles, are everywhere. Indeed, they were recently found in human testicles, with scientists suggesting a possible link to declining sperm counts in men.

In this case, scientists from the UK and the University of Cape Town in South Africa used complementary studies to show that plastics of the same composition degrade at different rates depending on the colour. The UK researchers put bottle lids of various colours on the roof of a university building to be exposed to the sun and the elements for three years. The South African study used plastic items found on a remote beach. “It’s amazing that samples left to weather on a rooftop in Leicester and those collected on a windswept beach at the southern tip of the African continent show similar results,” said Dr Sarah Key, who led the project. “What the experiments showed is that even in a relatively cool and cloudy environment for only three years, huge differences can be seen in the formation of microplastics.” This field study, published in the journal Environmental Pollution, is the first such proof of this effect. It suggests that retailers and manufacturers should give more consideration to the colour of short-lived plastics.

Read more of this story at Slashdot.

Rivers of Lava on Venus Reveal a More Volcanically Active Planet

Witnessing the blood-red fires of a volcanic eruption on Earth is memorable. But to see molten rock bleed out of a volcano on a different planet would be extraordinary. That is close to what scientists have spotted on Venus: two vast, sinuous lava flows oozing from two different corners of Earth’s planetary neighbor. From a report: “After you see something like this, the first reaction is ‘wow,'” said Davide Sulcanese, a doctoral student at the Universita d’Annunzio in Pescara, Italy, and an author of a study reporting the discovery in the journal Nature Astronomy, published on Monday. Earth and Venus were forged at the same time. Both are made of the same primeval matter, and both are the same age and size. So why is Earth a paradise overflowing with water and life, while Venus is a scorched hellscape with acidic skies?

Volcanic eruptions tinker with planetary atmospheres. One theory holds that, eons ago, several apocalyptic eruptions set off a runaway greenhouse effect on Venus, turning it from a temperate, waterlogged world into an arid desert of burned glass. To better understand its volcanism, scientists hoped to catch a Venusian eruption in the act. But although the planet is known to be smothered in volcanoes, an opaque atmosphere has prevented anyone from seeing an eruption the way spacecraft have spotted them on Io, the hypervolcanic moon of Jupiter. In the 1990s, NASA’s spacecraft Magellan used cloud-penetrating radar to survey most of the planet. But back then, the relatively low-resolution images made spotting fresh molten rock a troublesome task.

Read more of this story at Slashdot.