Karpathy implements a transformer decoder based on Attention Is All You Need paper here: Let's build GPT: from scratch, in code, spelled out. I just wanted to draw the transformer diagram. that's it.