Somehow, I am interested in using models at the client side / locally (web browser, desktop, mobile, etc). I am just collecting resources at the moment. I am going to explore soon.
Olive is an easy-to-use hardware-aware model optimization tool that composes industry-leading techniques across model compression, optimization, and compilation.
Per https://poloclub.github.io/transformer-explainer/,
"Transformer Explainer features a live GPT-2 (small) model running directly in the browser. This model is derived from the PyTorch implementation of GPT by Andrej Karpathy's nanoGPT project and has been converted to ONNX Runtime for seamless in-browser execution. The interface is built using JavaScript, with Svelte as a front-end framework and D3.js for creating dynamic visualizations. Numerical values are updated live following the user input."
GitHub - https://github.com/poloclub/transformer-explainer