How Nvidia’s Megatron is Boosting Transformer Performance

In 2022, I presented at Deci AI’s Deep Learning Hacktoberfest.

My presentation was about MegaTron- a single "top to bottom" stack transformer with L layers where each layer has a self-attention block with attention heads followed by a multi-layer perceptron with two layers that increase hidden layer size to 4h before reducing it back to h.

Its architecture allows for near linear scaling up to a Trillion Parameters allowing for larger language models built with less compute and shared across multiple GPUs for optimal model training.

The recording for the presentation is here.

Find the presentation slides here.

Previous
Previous

Mitigating Bias in Production