Recently Dave Davies wrote about Google's BERT vs SMITH and how they work & work together. With so much coming out in recent years around BERT, SMITH and just NLP in general we thought this was a great time to get Dave on to dig into these topics a bit more.
To say he was happy to "geek" out over this topic was an understatement!
Dave Davies covers a ton during the episode and rather than trying to surmise all of his points much of what you see in this write up are from some notes Dave Rohrer took prior and during the episode. To get what Dave Davies actually said, scroll further down to the Transcript as it will be your best bet for sure this time.
Update on SMITH/Passages
Barely 1 hour after we finished recording this podcast Danny Sullivan as Google SearchLiaison posted that Passages had gone live - tweet is here.
What is NLP and Why Does an SEO/Marketer care?
This was one of the questions that Dave Rohrer wanted to dig into during the conversation and we believe we did touch on it. The short answer is going to be to structure you content as you should have all along. The long answer is that you need to listen to the episode or read the full transcript because along the way Dave gives some ways and reasons to optimize and think about NLP, BERT and SMITH.
What is BERT?
Search algorithm patent expert Bill Slawski (@bill_slawski of @GoFishDigital) described BERT like this:
“Bert is a natural language processing pre-training approach that can be used on a large body of text. It handles tasks such as entity recognition, part of speech tagging, and question-answering among other natural language processes. Bert helps Google understand natural language text from the Web.
Google has open sourced this technology, and others have created variations of BERT.”- Bill Slawski
What is SMITH?
SMITH is also known as Siamese Multi-depth Transformer-based Hierarchical (SMITH) Encoder. In a very simplified description, the SMITH model is trained to understand passages within the context of the entire document. SMITH and BERT are "similar" in a very simple way but where SMITH gets involved is in understanding long and complex documents, and long and complex queries. BERT as you will learn from Dave Davies is much better situated to focus on shorter pieces of content.
At its core, SMITH takes a document through the following process (paraphrased mostly from Dave Davie's article):
It breaks the document into grouping sizes it can handle, favoring sentences (i.e., if the document would allocate 4.5 sentences to a block based on length, it would truncate that to four).It then processes each sentence block individually.A transformer then learns the contextual representations of each block and turns them into a document representation.
BERT vs. SMITH
BERT taps out at 256 tokens per document. After that, the computing cost gets too high for it to be functional, and often just isn’t.SMITH, on the other hand, can handle 2,248 tokens. The documents can be 8x larger.SMITH is the bazooka. It will paint the understanding of how things are. It is more costly in resources because it’s doing a bigger job, but is far less costly than BERT at doing that job.BERT will help SMITH do that, and assist in understanding short queries and content chunks.That is, until both are replaced, at which time we’ll move another leap forward and I’m going to bet that the next algorithm will be:Bidirected Object-agnostic Regresson-based transformer Gateways. (So Dave you are saying Google is going to create a BORG algo? Nice!!!)
Passage Ranking
The following passage about passages is from Google Passage Ranking Now Live in U.S.: 16 Key Facts:
Recently Bartosz then asks if tightening up the use of heading elements in order to better communicate what the different sections of a page are about will help or if Google will understand the content regardless of the markup.
“It’s pretty much that. With any kind of content som