Created - 2023/08/02
Last updated - 2023/08/02

Using Llama2

Download Llama2 model

Provide details and accept the policy at https://ai.meta.com/llama/
A email is sent to the provided email with download details.
Download the repo https://github.com/facebookresearch/llama

Executing the following commands on Mac M1 machine.

git clone https://github.com/facebookresearch/llama
cd llama

brew install wget
brew install md5sha1sum

sh download.sh

Enter the URL from email: https://download.llamameta.net/...
Enter the list of models to download without spaces (7B,13B,70B,7B-chat,13B-chat,70B-chat), or press Enter for all: 7B

Downloading LICENSE and Acceptable Usage Policy

...
-2023-08-02 22:10:34--  https://download.llamameta.net/LICENSE?Policy=&Key-Pair-Id=K15QR
Resolving download.llamameta.net (download.llamameta.net)... 18.67.111.89, 18.67.111.127, 18.67.111.46, ...
Connecting to download.llamameta.net (download.llamameta.net)|18.67.111.89|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7020 (6.9K) [binary/octet-stream]
Saving to: ‘./LICENSE’

./LICENSE                                                       100%[====================================================================================================================================================>]   6.86K  --.-KB/s    in 0s

2023-08-02 22:10:34 (209 MB/s) - ‘./LICENSE’ saved [7020/7020]
...
./llama-2-7b/checklist.chk                                      100%[====================================================================================================================================================>]     100  --.-KB/s    in 0s

2023-08-02 22:49:46 (1.64 MB/s) - ‘./llama-2-7b/checklist.chk’ saved [100/100]

Checking checksums
consolidated.00.pth: OK
params.json: OK

The download placed the model in llama-2-7b directory.

ls -l ./llama-2-7b

-rw-r--r--  1 rk  staff          102 14 Jul 09:00 params.json
-rw-r--r--  1 rk  staff          100 14 Jul 09:00 checklist.chk
-rw-r--r--  1 rk  staff  13476925163 14 Jul 09:00 consolidated.00.pth

The 7B parameter model size is about 13GB.

Inference

Different models require different model-parallel (MP) values:

Model	MP
7B	1
13B	2
70B	8

All models support sequence length up to 4096 tokens, but we pre-allocate the cache according to max_seq_len and max_batch_size values. So set those according to your hardware.

cd llama
ls -l

-rw-r--r--  1 rk  staff       50 14 Jul 08:27 tokenizer_checklist.chk
-rw-r--r--  1 rk  staff   499723 14 Jul 08:27 tokenizer.model
-rw-r--r--  1 rk  staff     7020 15 Jul 10:06 LICENSE
-rw-r--r--  1 rk  staff     4766 15 Jul 10:06 USE_POLICY.md
-rwxr-xr-x  1 rk  staff      426  2 Aug 22:04 setup.py
-rwxr-xr-x  1 rk  staff       35  2 Aug 22:04 requirements.txt
-rwxr-xr-x  1 rk  staff     1552  2 Aug 22:04 example_text_completion.py
-rw-r--r--  1 rk  staff     2774  2 Aug 22:04 example_chat_completion.py
-rw-r--r--  1 rk  staff     2171  2 Aug 22:04 download.sh
-rw-r--r--  1 rk  staff  1253223  2 Aug 22:04 Responsible-Use-Guide.pdf
-rwxr-xr-x  1 rk  staff     6285  2 Aug 22:04 README.md
-rw-r--r--  1 rk  staff     7445  2 Aug 22:04 MODEL_CARD.md
-rw-r--r--  1 rk  staff     1236  2 Aug 22:04 CONTRIBUTING.md
-rw-r--r--  1 rk  staff     3536  2 Aug 22:04 CODE_OF_CONDUCT.md
drwxr-xr-x  5 rk  staff      170  2 Aug 22:49 llama-2-7b
drwxr-xr-x  7 rk  staff      238  3 Aug 07:01 llama.egg-info
drwxr-xr-x  7 rk  staff      238  3 Aug 07:04 llama

pip install -e .

Obtaining file:///Volumes/samsung-2tb/rk/llama
  Preparing metadata (setup.py) ... done
Requirement already satisfied: torch in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from llama==0.0.1) (2.0.1)
Requirement already satisfied: fairscale in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from llama==0.0.1) (0.4.13)
Requirement already satisfied: fire in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from llama==0.0.1) (0.5.0)
Requirement already satisfied: sentencepiece in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from llama==0.0.1) (0.1.99)
Requirement already satisfied: numpy>=1.22.0 in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from fairscale->llama==0.0.1) (1.25.2)
Requirement already satisfied: filelock in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from torch->llama==0.0.1) (3.12.2)
Requirement already satisfied: typing-extensions in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from torch->llama==0.0.1) (4.7.1)
Requirement already satisfied: sympy in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from torch->llama==0.0.1) (1.12)
Requirement already satisfied: networkx in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from torch->llama==0.0.1) (3.1)
Requirement already satisfied: jinja2 in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from torch->llama==0.0.1) (3.1.2)
Requirement already satisfied: six in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from fire->llama==0.0.1) (1.16.0)
Requirement already satisfied: termcolor in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from fire->llama==0.0.1) (2.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from jinja2->torch->llama==0.0.1) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from sympy->torch->llama==0.0.1) (1.3.0)
Installing collected packages: llama
  Attempting uninstall: llama
    Found existing installation: llama 0.0.1
    Uninstalling llama-0.0.1:
      Successfully uninstalled llama-0.0.1
  Running setup.py develop for llama
Successfully installed llama-0.0.1

Pretrained Models

torchrun --nproc_per_node 1 example_text_completion.py \
    --ckpt_dir llama-2-7b/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 128 --max_batch_size 4

NOTE: Redirects are currently not supported in Windows or MacOs.
Traceback (most recent call last):
  File "/Volumes/samsung-2tb/rk/llama/example_text_completion.py", line 55, in <module>
    fire.Fire(main)
  File "/Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/samsung-2tb/rk/llama/example_text_completion.py", line 18, in main
    generator = Llama.build(
                ^^^^^^^^^^^^
  File "/Volumes/samsung-2tb/rk/llama/llama/generation.py", line 62, in build
    torch.distributed.init_process_group("nccl")
  File "/Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 907, in init_process_group
    default_pg = _new_process_group_helper(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 1013, in _new_process_group_helper
    raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 4931) of binary: /Volumes/samsung-2tb/rk/llm_venv/bin/python
Traceback (most recent call last):
  File "/Volumes/samsung-2tb/rk/llm_venv/bin/torchrun", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
example_text_completion.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-08-03_07:04:38
  host      : 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 4931)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

It is failing with

RuntimeError: Distributed package doesn't have NCCL built in

What is NCCL?

As per https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/overview.html,

The NVIDIA Collective Communications Library (NCCL, pronounced “Nickel”) is a library providing inter-GPU communication primitives that are topology-aware and can be easily integrated into applications. NCCL implements both collective communication and point-to-point send/receive primitives. It is not a full-blown parallel programming framework; rather, it is a library focused on accelerating inter-GPU communication.

So, Can't I run Llama2 on Mac M1?

As Mac M1s don't have NVIDIA GPU cards, can't I run Llama on my machine?
I found this issue still open https://github.com/facebookresearch/llama/issues/112.
People are using https://github.com/ggerganov/llama.cpp, which is a Port of Facebook's LLaMA model in C/C++.
Found this link - https://gist.github.com/cedrickchee/e8d4cb0c4b1df6cc47ce8b18457ebde0. Will try this.

Llama2 on Hugging Face

https://huggingface.co/meta-llama

Llama recipes

https://github.com/facebookresearch/llama-recipes/