AhbarjietMalta

Chapter 8: Brief of Top Other NLP Models


Listen Later

Chapter 8: Brief of Top Other NLP Models
Today's Amazon Deals - https://amzn.to/3FeoGyg
Chapter 8: Brief of Top Other NLP Models
Introduction
This table describes some other existing high-level models in the NLP domain exploring their structure and capabilities and technically tested performance.
Table 8.1: Various NLP models
Name
Details
BERT
Model:
The model represents a bi-directional training with random masking of input tokens in transformer
The model has around 24 Transformer blocks, 1024-hidden, 340M parameters, and training with 3.3 billions of word corpus
Performance:
GLUE benchmark score ~ 80.4%, more than 7.6% from the previous best result;
Has an accuracy of 93.2 % on SQuAD 1.1 benchmark outperforming the human interpretation by 2%
Capabilities:
BERT gives more better angle in building sentiment analysis tools & more efficiency towards providing better customer experience using chatbots
XLnet
Model:
This model represents a combination of core concepts of TransformerXL and BERT, autoregressive technology of TransformerXL and bidirectional nature of BERT for tackle the limitations of both
Performance:
XLnet successfully performed 18 different NLP tasks with state-of-the-art achievements and outperformed BERT on 20 tasks
Capabilities:
XLnet is better at question answering, sentiment analysis, priority ranking with similar conversational business application can be done here
RoBERTa
Model:
This model is trained with higher number of dataset than the original BERT, almost 10x with longer iteration of training
Training batch number has been also increased up to 8000
Byte-pair-encoding vocabulary with more than 50k subwords units
Performance:
Almost outperformed BERT in every aspects as expected
Capabilities:
RoBERTa can be applied with the similar use cases like BERT and XLnet with better performance expectations
ALBERT
Model:
With the objective of reducing the unnecessary length parameters in the large NLP models and break the Moore’s law in NLP model building, ALBERT is introduced with parameter reduction mechanisms like factorized embedding parameterization and cross-layer parameter sharing
Performance:
Without observing a significant downfall in performance, ALBERT solved the issue of bulkiness of the models by reducing 18x fewer parameters and 1.7x faster training rate
Achieved an F1 score of 92.2 with SQuAD benchmark and GLUE benchmark of 89.4
Capabilities:
ALBERT can be applied with the similar use cases like BERT and XLnet with better performance expectations
PaLM
Model:
In this model around 540B training parameter existed and to accommodate that during the training phase, the help of data parallelization was taken across two cloud TPU v4 pods and finally achieved a training utilization of 57.8% effectively of hardware.
Performance:
It outperformed many large models on 28 out of 29 major NLP tasks. It surpassed many benchmark tasks such as SuperGLUE, BIG-bench with significant margin than others
PaLM outperforms the improved Codex 12B despite requiring 50 times less Python code for training, demonstrating that big language models are more efficient at transferring knowledge from other computer languages and natural language data.
Capabilities:
PaLM may be used for a variety of downstream activities, including conversational AI, question answering, machine translation, document categorization, ad copy production, code issue correction, and more. This is similar to other newly announced pre-trained language models.
MegaTron
Model:
This model has 530 billion parameters with 105 layers, 20480 hidden dimensions, and 128 attention heads. In this, 8-way tensor and 35-way pipeline parallelism, with 2048 sequence length, batch size 1920. It’s trained with 15 datasets consisting of a total of 339 billion tokens. During training, we opted to blend the datasets into heterogeneous batches according to variable sampling weights given in Figure 2, with an emphasis on higher-quality datasets. We trained the model on 270 billion tokens.
Performance:
It performed significantly well against reputed benchmarks like LAMBADA, RACE-h, BoolQ, PiQA, HellaSwag, WinoGrand, ANLI-R2, HANS, WiC with few-shots, zero-shots and one-shots. It performed well especially against Lambada, PiQA, HellaSwag and shows it’s performance over last word prediction, question answering, logical reasoning
Capabilities:
MT can be used for a variety of downstream activities, including conversational AI, question answering, machine translation, document categorization, ad copy production, code issue correction, and more. This is similar to other newly announced pre-trained language models. It also performed well against the mathematical interferences as well
Join Our Book's Discord Space
Join the book's Discord Workspace for Latest updates, Offers, Tech happenings around the world, New Release and Sessions with the Authors:
https://discord.bpbonline.com
...more
View all episodesView all episodes
Download on the App Store

AhbarjietMaltaBy AhbarjietMalta


More shows like AhbarjietMalta

View all
DJ AKD Remixes by Dj Akd

DJ AKD Remixes

2 Listeners