- Created - 2023/08/02
- Last updated - 2023/08/02
Using Llama2
Download Llama2 model
- Provide details and accept the policy at https://ai.meta.com/llama/
- A email is sent to the provided email with download details.
- Download the repo https://github.com/facebookresearch/llama
Executing the following commands on Mac M1 machine.
git clone https://github.com/facebookresearch/llama
cd llama
brew install wget
brew install md5sha1sum
sh download.sh
Enter the URL from email: https://download.llamameta.net/...
Enter the list of models to download without spaces (7B,13B,70B,7B-chat,13B-chat,70B-chat), or press Enter for all: 7B
Downloading LICENSE and Acceptable Usage Policy
...
-2023-08-02 22:10:34-- https://download.llamameta.net/LICENSE?Policy=&Key-Pair-Id=K15QR
Resolving download.llamameta.net (download.llamameta.net)... 18.67.111.89, 18.67.111.127, 18.67.111.46, ...
Connecting to download.llamameta.net (download.llamameta.net)|18.67.111.89|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7020 (6.9K) [binary/octet-stream]
Saving to: ‘./LICENSE’
./LICENSE 100%[====================================================================================================================================================>] 6.86K --.-KB/s in 0s
2023-08-02 22:10:34 (209 MB/s) - ‘./LICENSE’ saved [7020/7020]
...
./llama-2-7b/checklist.chk 100%[====================================================================================================================================================>] 100 --.-KB/s in 0s
2023-08-02 22:49:46 (1.64 MB/s) - ‘./llama-2-7b/checklist.chk’ saved [100/100]
Checking checksums
consolidated.00.pth: OK
params.json: OK
The download placed the model in llama-2-7b
directory.
ls -l ./llama-2-7b
-rw-r--r-- 1 rk staff 102 14 Jul 09:00 params.json
-rw-r--r-- 1 rk staff 100 14 Jul 09:00 checklist.chk
-rw-r--r-- 1 rk staff 13476925163 14 Jul 09:00 consolidated.00.pth
- The
7B
parameter model size is about 13GB.
Inference
Different models require different model-parallel (MP) values:
Model | MP |
---|---|
7B | 1 |
13B | 2 |
70B | 8 |
All models support sequence length up to 4096 tokens, but we pre-allocate the cache according to max_seq_len
and max_batch_size
values. So set those according to your hardware.
cd llama
ls -l
-rw-r--r-- 1 rk staff 50 14 Jul 08:27 tokenizer_checklist.chk
-rw-r--r-- 1 rk staff 499723 14 Jul 08:27 tokenizer.model
-rw-r--r-- 1 rk staff 7020 15 Jul 10:06 LICENSE
-rw-r--r-- 1 rk staff 4766 15 Jul 10:06 USE_POLICY.md
-rwxr-xr-x 1 rk staff 426 2 Aug 22:04 setup.py
-rwxr-xr-x 1 rk staff 35 2 Aug 22:04 requirements.txt
-rwxr-xr-x 1 rk staff 1552 2 Aug 22:04 example_text_completion.py
-rw-r--r-- 1 rk staff 2774 2 Aug 22:04 example_chat_completion.py
-rw-r--r-- 1 rk staff 2171 2 Aug 22:04 download.sh
-rw-r--r-- 1 rk staff 1253223 2 Aug 22:04 Responsible-Use-Guide.pdf
-rwxr-xr-x 1 rk staff 6285 2 Aug 22:04 README.md
-rw-r--r-- 1 rk staff 7445 2 Aug 22:04 MODEL_CARD.md
-rw-r--r-- 1 rk staff 1236 2 Aug 22:04 CONTRIBUTING.md
-rw-r--r-- 1 rk staff 3536 2 Aug 22:04 CODE_OF_CONDUCT.md
drwxr-xr-x 5 rk staff 170 2 Aug 22:49 llama-2-7b
drwxr-xr-x 7 rk staff 238 3 Aug 07:01 llama.egg-info
drwxr-xr-x 7 rk staff 238 3 Aug 07:04 llama
pip install -e .
Obtaining file:///Volumes/samsung-2tb/rk/llama
Preparing metadata (setup.py) ... done
Requirement already satisfied: torch in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from llama==0.0.1) (2.0.1)
Requirement already satisfied: fairscale in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from llama==0.0.1) (0.4.13)
Requirement already satisfied: fire in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from llama==0.0.1) (0.5.0)
Requirement already satisfied: sentencepiece in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from llama==0.0.1) (0.1.99)
Requirement already satisfied: numpy>=1.22.0 in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from fairscale->llama==0.0.1) (1.25.2)
Requirement already satisfied: filelock in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from torch->llama==0.0.1) (3.12.2)
Requirement already satisfied: typing-extensions in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from torch->llama==0.0.1) (4.7.1)
Requirement already satisfied: sympy in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from torch->llama==0.0.1) (1.12)
Requirement already satisfied: networkx in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from torch->llama==0.0.1) (3.1)
Requirement already satisfied: jinja2 in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from torch->llama==0.0.1) (3.1.2)
Requirement already satisfied: six in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from fire->llama==0.0.1) (1.16.0)
Requirement already satisfied: termcolor in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from fire->llama==0.0.1) (2.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from jinja2->torch->llama==0.0.1) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages (from sympy->torch->llama==0.0.1) (1.3.0)
Installing collected packages: llama
Attempting uninstall: llama
Found existing installation: llama 0.0.1
Uninstalling llama-0.0.1:
Successfully uninstalled llama-0.0.1
Running setup.py develop for llama
Successfully installed llama-0.0.1
Pretrained Models
torchrun --nproc_per_node 1 example_text_completion.py \
--ckpt_dir llama-2-7b/ \
--tokenizer_path tokenizer.model \
--max_seq_len 128 --max_batch_size 4
NOTE: Redirects are currently not supported in Windows or MacOs.
Traceback (most recent call last):
File "/Volumes/samsung-2tb/rk/llama/example_text_completion.py", line 55, in <module>
fire.Fire(main)
File "/Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/samsung-2tb/rk/llama/example_text_completion.py", line 18, in main
generator = Llama.build(
^^^^^^^^^^^^
File "/Volumes/samsung-2tb/rk/llama/llama/generation.py", line 62, in build
torch.distributed.init_process_group("nccl")
File "/Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 907, in init_process_group
default_pg = _new_process_group_helper(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 1013, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 4931) of binary: /Volumes/samsung-2tb/rk/llm_venv/bin/python
Traceback (most recent call last):
File "/Volumes/samsung-2tb/rk/llm_venv/bin/torchrun", line 8, in <module>
sys.exit(main())
^^^^^^
File "/Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/samsung-2tb/rk/llm_venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
example_text_completion.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2023-08-03_07:04:38
host : 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 4931)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
It is failing with
RuntimeError: Distributed package doesn't have NCCL built in
What is NCCL?
As per https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/overview.html,
The NVIDIA Collective Communications Library (NCCL, pronounced “Nickel”) is a library providing inter-GPU communication primitives that are topology-aware and can be easily integrated into applications. NCCL implements both collective communication and point-to-point send/receive primitives. It is not a full-blown parallel programming framework; rather, it is a library focused on accelerating inter-GPU communication.
So, Can't I run Llama2 on Mac M1?
- As Mac M1s don't have NVIDIA GPU cards, can't I run Llama on my machine?
- I found this issue still open https://github.com/facebookresearch/llama/issues/112.
- People are using https://github.com/ggerganov/llama.cpp, which is a Port of Facebook's LLaMA model in C/C++.
- Found this link - https://gist.github.com/cedrickchee/e8d4cb0c4b1df6cc47ce8b18457ebde0. Will try this.