January 03, 2024

Chapter 8: Brief of Top Other NLP Models

7 minutes

—

Today's Amazon Deals - https://amzn.to/3FeoGyg

–

Chapter 8: Brief of Top Other NLP Models

Introduction

This table describes some other existing high-level models in the NLP domain exploring their structure and capabilities and technically tested performance.

Table 8.1: Various NLP models

Name

Details

BERT

Model:

The model represents a bi-directional training with random masking of input tokens in transformer

The model has around 24 Transformer blocks, 1024-hidden, 340M parameters, and training with 3.3 billions of word corpus

Performance:

GLUE benchmark score ~ 80.4%, more than 7.6% from the previous best result;

Has an accuracy of 93.2 % on SQuAD 1.1 benchmark outperforming the human interpretation by 2%

Capabilities:

BERT gives more better angle in building sentiment analysis tools & more efficiency towards providing better customer experience using chatbots

XLnet

Model:

This model represents a combination of core concepts of TransformerXL and BERT, autoregressive technology of TransformerXL and bidirectional nature of BERT for tackle the limitations of both

Performance:

XLnet successfully performed 18 different NLP tasks with state-of-the-art achievements and outperformed BERT on 20 tasks

Capabilities:

XLnet is better at question answering, sentiment analysis, priority ranking with similar conversational business application can be done here

RoBERTa

Model:

This model is trained with higher number of dataset than the original BERT, almost 10x with longer iteration of training

Training batch number has been also increased up to 8000

Byte-pair-encoding vocabulary with more than 50k subwords units

Performance:

Almost outperformed BERT in every aspects as expected

Capabilities:

RoBERTa can be applied with the similar use cases like BERT and XLnet with better performance expectations

ALBERT

Model:

With the objective of reducing the unnecessary length parameters in the large NLP models and break the Moore’s law in NLP model building, ALBERT is introduced with parameter reduction mechanisms like factorized embedding parameterization and cross-layer parameter sharing

Performance:

Without observing a significant downfall in performance, ALBERT solved the issue of bulkiness of the models by reducing 18x fewer parameters and 1.7x faster training rate

Achieved an F1 score of 92.2 with SQuAD benchmark and GLUE benchmark of 89.4

Capabilities:

ALBERT can be applied with the similar use cases like BERT and XLnet with better performance expectations

PaLM

Model:

In this model around 540B training parameter existed and to accommodate that during the training phase, the help of data parallelization was taken across two cloud TPU v4 pods and finally achieved a training utilization of 57.8% effectively of hardware.

Performance:

It outperformed many large models on 28 out of 29 major NLP tasks. It surpassed many benchmark tasks such as SuperGLUE, BIG-bench with significant margin than others

PaLM outperforms the improved Codex 12B despite requiring 50 times less Python code for training, demonstrating that big language models are more efficient at transferring knowledge from other computer languages and natural language data.

Capabilities:

PaLM may be used for a variety of downstream activities, including conversational AI, question answering, machine translation, document categorization, ad copy production, code issue correction, and more. This is similar to other newly announced pre-trained language models.

MegaTron

Model:

This model has 530 billion parameters with 105 layers, 20480 hidden dimensions, and 128 attention heads. In this, 8-way tensor and 35-way pipeline parallelism, with 2048 sequence length, batch size 1920. It’s trained with 15 datasets consisting of a total of 339 billion tokens. During training, we opted to blend the datasets into heterogeneous batches according to variable sampling weights given in Figure 2, with an emphasis on higher-quality datasets. We trained the model on 270 billion tokens.

Performance:

It performed significantly well against reputed benchmarks like LAMBADA, RACE-h, BoolQ, PiQA, HellaSwag, WinoGrand, ANLI-R2, HANS, WiC with few-shots, zero-shots and one-shots. It performed well especially against Lambada, PiQA, HellaSwag and shows it’s performance over last word prediction, question answering, logical reasoning

Capabilities:

MT can be used for a variety of downstream activities, including conversational AI, question answering, machine translation, document categorization, ad copy production, code issue correction, and more. This is similar to other newly announced pre-trained language models. It also performed well against the mathematical interferences as well

Join Our Book's Discord Space

Join the book's Discord Workspace for Latest updates, Offers, Tech happenings around the world, New Release and Sessions with the Authors:

https://discord.bpbonline.com

...more

View all episodes

By AhbarjietMalta

January 03, 2024

Chapter 8: Brief of Top Other NLP Models

7 minutes

Chapter 8: Brief of Top Other NLP Models

—

Today's Amazon Deals - https://amzn.to/3FeoGyg

–

Chapter 8: Brief of Top Other NLP Models

Introduction

This table describes some other existing high-level models in the NLP domain exploring their structure and capabilities and technically tested performance.

Table 8.1: Various NLP models

Name

Details

BERT

Model:

The model represents a bi-directional training with random masking of input tokens in transformer

The model has around 24 Transformer blocks, 1024-hidden, 340M parameters, and training with 3.3 billions of word corpus

Performance:

GLUE benchmark score ~ 80.4%, more than 7.6% from the previous best result;

Has an accuracy of 93.2 % on SQuAD 1.1 benchmark outperforming the human interpretation by 2%

Capabilities:

BERT gives more better angle in building sentiment analysis tools & more efficiency towards providing better customer experience using chatbots

XLnet

Model:

This model represents a combination of core concepts of TransformerXL and BERT, autoregressive technology of TransformerXL and bidirectional nature of BERT for tackle the limitations of both

Performance:

XLnet successfully performed 18 different NLP tasks with state-of-the-art achievements and outperformed BERT on 20 tasks

Capabilities:

XLnet is better at question answering, sentiment analysis, priority ranking with similar conversational business application can be done here

RoBERTa

Model:

This model is trained with higher number of dataset than the original BERT, almost 10x with longer iteration of training

Training batch number has been also increased up to 8000

Byte-pair-encoding vocabulary with more than 50k subwords units

Performance:

Almost outperformed BERT in every aspects as expected

Capabilities:

RoBERTa can be applied with the similar use cases like BERT and XLnet with better performance expectations

ALBERT

Model:

Performance:

Without observing a significant downfall in performance, ALBERT solved the issue of bulkiness of the models by reducing 18x fewer parameters and 1.7x faster training rate

Achieved an F1 score of 92.2 with SQuAD benchmark and GLUE benchmark of 89.4

Capabilities:

ALBERT can be applied with the similar use cases like BERT and XLnet with better performance expectations

PaLM

Model:

Performance:

It outperformed many large models on 28 out of 29 major NLP tasks. It surpassed many benchmark tasks such as SuperGLUE, BIG-bench with significant margin than others

Capabilities:

MegaTron

Model:

Performance:

Capabilities:

Join Our Book's Discord Space

Join the book's Discord Workspace for Latest updates, Offers, Tech happenings around the world, New Release and Sessions with the Authors:

https://discord.bpbonline.com

...more

More shows like AhbarjietMalta

View all

DJ AKD Remixes

2 Listeners

Share Chapter 8: Brief of Top Other NLP Models

Sign up to save your podcasts

Chapter 8: Brief of Top Other NLP Models

Chapter 8: Brief of Top Other NLP Models

More shows like AhbarjietMalta

DJ AKD Remixes