How Nvidia’s Megatron is Boosting Transformer Performance

Nov 1

In 2022, I presented at Deci AI’s Deep Learning Hacktoberfest.

My presentation was about MegaTron- a single "top to bottom" stack transformer with L layers where each layer has a self-attention block with attention heads followed by a multi-layer perceptron with two layers that increase hidden layer size to 4h before reducing it back to h.

Its architecture allows for near linear scaling up to a Trillion Parameters allowing for larger language models built with less compute and shared across multiple GPUs for optimal model training.

The recording for the presentation is here.

Find the presentation slides here.

Jazmia Henry

How Nvidia’s Megatron is Boosting Transformer Performance

Mitigating Bias in Production

Jazmia Henry