- Created - 2023/08/04
- Last updated - 2023/08/04
Using Llama2 & Spark
Overview
This exercise is about using Llama 2, an LLM (Large Language Model) from Meta AI, to summarise many documents at once. Spark is used to take advantage of parallel processing.
Note: This is not my work. I followed this page - https://towardsdatascience.com/distributed-llama-2-on-cpus-via-llama-cpp-pyspark-65736e9f466d |
---|
1. Create a virtual env
python -m venv llama_spark_venv
source llama_spark_venv/bin/activate
2. Download the model
This step downloads Llama 2 7B Chat model that has been converted to ggml
format. Apart from this, there're other models available on this page https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML.
mkdir llama_spark
cd llama_spark/
mkdir models
cd models/
wget https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q8_0.bin
ls -lrt
-rw-r--r-- 1 rk staff 7160799872 19 Jul 03:50 llama-2-7b-chat.ggmlv3.q8_0.bin
3. Install llama-cpp Python bindings
pip install llama-cpp-python
Collecting llama-cpp-python
Using cached llama_cpp_python-0.1.77.tar.gz (1.6 MB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting typing-extensions>=4.5.0
Using cached typing_extensions-4.7.1-py3-none-any.whl (33 kB)
Collecting numpy>=1.20.0
Using cached numpy-1.25.2-cp311-cp311-macosx_11_0_arm64.whl (14.0 MB)
Collecting diskcache>=5.6.1
Using cached diskcache-5.6.1-py3-none-any.whl (45 kB)
Building wheels for collected packages: llama-cpp-python
Building wheel for llama-cpp-python (pyproject.toml) ... done
Created wheel for llama-cpp-python: filename=llama_cpp_python-0.1.77-cp311-cp311-macosx_13_0_arm64.whl size=236114 sha256=3054fe6a05eecdae80e979077f8e4578ec2bf6102d089cd1eb81f503f0239e33
Stored in directory: /Users/rk/Library/Caches/pip/wheels/a2/ea/0a/19ffc6aaf5c35243864ffca3f6bb4c971bdaad17fb863f9b9a
Successfully built llama-cpp-python
Installing collected packages: typing-extensions, numpy, diskcache, llama-cpp-python
Successfully installed diskcache-5.6.1 llama-cpp-python-0.1.77 numpy-1.25.2 typing-extensions-4.7.1
Testing
from llama_cpp import Llama
llm = Llama(model_path="./llama-2-7b-chat.ggmlv3.q8_0.bin")
output = llm("Q: Name the planets in the solar system? A: ", max_tokens=400, stop=["Q:", "\n"], echo=True)
print(output)
{'id': 'cmpl-1fd69252-bc63-483e-b1ff-75897054d72d', 'object': 'text_completion', 'created': 1691060842, 'model': './llama-2-7b-chat.ggmlv3.q8_0.bin', 'choices': [{'text': 'Q: Name the planets in the solar system? A: 1. Pluto is no longer considered a planet, but it is still listed as a dwarf planet. 2. Mercury - closest planet to', 'index': 0, 'logprobs': None, 'finish_reason': 'length'}], 'usage': {'prompt_tokens': 15, 'completion_tokens': 32, 'total_tokens': 47}}
4. Download the text that will be summarised
▓▒░ mkdir data
▓▒░ cd data/
▓▒░ curl "https://gutenberg.org/cache/epub/2600/pg2600.txt" -o war_and_peace.txt
▓▒░ ls -l war_and_peace.txt
-rw-r--r-- 1 rk staff 3359834 3 Aug 21:12 war_and_peace.txt
# print lines, words, characters
echo "$(cat ./war_and_peace.txt | wc -l) lines"
echo "$(cat ./war_and_peace.txt | wc -w) words"
echo "$(cat ./war_and_peace.txt | wc -c) characters"
66081 lines
566325 words
3359834 characters
5. Install Pyspark
pip install pyspark
pip install pandas
░▒▓ /Volumes/samsung-2tb/rk/llama_spark ▓▒░ tree
.
├── data
│ └── war_and_peace.txt
├── models
│ └── llama-2-7b-chat.ggmlv3.q8_0.bin
└── process.py
2 directories, 4 files
process.py
import re
import pandas as pd
from pyspark.sql import SparkSession
# this is the function applied per-group by Spark
# the df passed is a *Pandas* dataframe!
def llama2_summarize(df):
# read model
from llama_cpp import Llama
# template for this model version, see:
# https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML#prompt-template-llama-2-chat template = """
[INST] <<SYS>> You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.
Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content.
Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct.
If you don't know the answer to a question, please don't share false information.
<</SYS>> {INSERT_PROMPT_HERE} [/INST] """
# create prompt
chapter_text = df.iloc[0]["text"]
chapter_num = df.iloc[0]["chapter"]
prompt = (
"Summarize the following novel chapter in a single sentence (less than 100 words):"
+ chapter_text
)
prompt = template.replace("INSERT_PROMPT_HERE", prompt)
print("Going to invoke llm()")
llm = Llama(
model_path="./models/llama-2-7b-chat.ggmlv3.q8_0.bin",
n_ctx=4096,
n_batch=512,
n_threads=8,
verbose=True,
)
output = llm(prompt, max_tokens=-1, echo=True, temperature=0.2, top_p=0.1)
print(output)
return pd.DataFrame(
{"summary": [output["choices"][0]["text"]], "chapter": [int(chapter_num)]}
)
spark = SparkSession.builder.appName("my-spark-app").getOrCreate()
# read book, remove header/footer
text = open("./data/war_and_peace.txt", "r").read()
text = text.split("PROJECT GUTENBERG EBOOK WAR AND PEACE")[1]
# get list of chapter strings
chapter_list = [x for x in re.split("CHAPTER .+", text) if len(x) > 100]
# print stats
print("number of chapters = " + str(len(chapter_list)))
print("max words per chapter = " + str(max([len(c.split(" ")) for c in chapter_list])))
# create Spark dataframe, show it
df = spark.createDataFrame(
pd.DataFrame({"text": chapter_list, "chapter": range(1, len(chapter_list) + 1)})
)
df.show(10, 60)
# Test with 1 row
pandas_df = df.limit(1).toPandas()
resp = llama2_summarize(pandas_df)
print(resp)
# create summaries via Spark
summaries = (
df
.groupby("chapter")
.applyInPandas(llama2_summarize, schema="summary string, chapter int")
.show(vertical=True, truncate=False)
)
python process.py
23/08/04 19:43:26 WARN Utils: Your hostname, RKs-Mac-mini.local resolves to a loopback address: 127.0.0.1; using 192.168.0.20 instead (on interface en1)
23/08/04 19:43:26 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/08/04 19:43:27 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
number of chapters = 365
max words per chapter = 3636
+------------------------------------------------------------+-------+
| text|chapter|
+------------------------------------------------------------+-------+
|\n\n“Well, Prince, so Genoa and Lucca are now just family...| 1|
|\n\nAnna Pávlovna’s drawing room was gradually filling. T...| 2|
|\n\nAnna Pávlovna’s reception was in full swing. The spin...| 3|
|\n\nJust then another visitor entered the drawing room: P...| 4|
|\n\n“And what do you think of this latest comedy, the cor...| 5|
|\n\nHaving thanked Anna Pávlovna for her charming soiree,...| 6|
|\n\nThe rustle of a woman’s dress was heard in the next r...| 7|
|\n\nThe friends were silent. Neither cared to begin talki...| 8|
|\n\nIt was past one o’clock when Pierre left his friend. ...| 9|
|\n\nPrince Vasíli kept the promise he had given to Prince...| 10|
+------------------------------------------------------------+-------+
only showing top 10 rows
Going to invoke llm()
llama.cpp: loading model from /Volumes/samsung-2tb/rk/llama.cpp/models/llama-2-7b/ggml-model-q4_0.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 4096
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_head_kv = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: n_gqa = 1
llama_model_load_internal: rnorm_eps = 1.0e-06
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: freq_base = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 0.08 MB
llama_model_load_internal: mem required = 4173.96 MB (+ 2048.00 MB per state)
llama_new_context_with_model: kv self size = 2048.00 MB
AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
- The script is blocked here.
- I could not figure what's happening. Will revisit this.