How I use LLMs to code on the command line
Prompting
Humble beginnings
I start by just asking lm something directly from stdin
echo "Add a function that will compute the fibonacci sequence: $(cat -n math_functions.py)" | ./lm
The simplest script
If I catch my self doing this more than a couple times, I will make a bash
script, which I usually call prompt.sh. All prompt.sh ever does is generate
a prompt for lm to consume. That’s it.
Most of these prompt.sh scripts start simple enough
#!/bin/bash
# I set this on all my scripts.
# See http://redsymbol.net/articles/unofficial-bash-strict-mode/ for why.
set -euo pipefail
main() {
cat <<EOF
I have this script: $(cat -n math_functions.py)
(I insert the rest of the prompt here)
EOF
}
main
Add an about section
On larger projects, this gets annoying. So I’ll create a helper function that gives me context about the project:
#!/bin/bash
set -euo pipefail
# files-to-prompt is a slick Simon Willison invention
# https://github.com/simonw/files-to-prompt
about() {
cat <<EOF
Vault is a CLI tool for performing embedding and vector search locally.
Here is the current directory structure:
\`\`\`bash
$(tree)
\`\`\`
And here are the current contents of the project (ignoring prompt.sh,
which is used to define this prompt):
$(files-to-prompt . --ignore "prompt.sh")
EOF
}
main() {
cat <<EOF
Here is information about my project:
$(about)
Generate a README
EOF
}
Add a references section using lynx
One of my common use cases for prompt.sh is for building out features. It turns
out LLMs are really bad at this in a vacuum. The documentation they were trained
on is outdated, they don’t always think everything through, etc. I like to
fix this error by sending them webpages containing useful information.
Normally this is really annoying (I love cURL but hate seeing HTML over plaintext).
Luckily lynx comes to the rescue. The relevant command is:
lynx -dump -nolist https://www.hackernews.com
Initially I wrote it like this:
#!/bin/bash
set -euo pipefail
references() {
cat <<EOF
Article about performing similarity search with DuckDB:
*******************************************************
$(lynx -dump -nolist https://blog.brunk.io/posts/similarity-search-with-duckdb/)
*******************************************************
Vector similarity search in DuckDB (vss extension):
***************************************************
$(lynx -dump -nolist https://duckdb.org/2024/05/03/vector-similarity-search-vss.html)
***************************************************
EOF
}
about() {
cat <<EOF
Vault is a CLI tool for performing embedding and vector search locally.
Here is the current directory structure:
\`\`\`bash
$(tree)
\`\`\`
And here are the current contents of the project (ignoring prompt.sh,
which is used to define this prompt):
$(files-to-prompt . --ignore "prompt.sh")
EOF
}
main() {
cat <<EOF
Here is information about my project:
$(about)
Implement a DuckDB client that does vector search.
Make sure to use the following references.
References:
$(references)
EOF
}
Once I got to 3+ links I decided to make things a little clearer:
#!/bin/bash
set -euo pipefail
reference_links=(
"https://blog.brunk.io/posts/similarity-search-with-duckdb/"
"https://duckdb.org/2024/05/03/vector-similarity-search-vss.html"
"https://motherduck.com/blog/search-using-duckdb-part-1/"
"https://duckdb.org/docs/stable/sql/data_types/array"
"https://duckdb.org/docs/stable/sql/functions/array.html"
"https://click.palletsprojects.com/en/stable/"
"https://docs.astral.sh/uv/concepts/projects/init/"
"https://docs.astral.sh/uv/guides/projects/"
)
# Function to display references in a readable manner
references() {
echo "# Reference Index"
for reference_link in "${reference_links[@]}"; do
# Print a header with Markdown style
echo -e "\n## Reference: $reference_link\n"
lynx -dump -nolist "$reference_link"
echo -e "\n"
done
}
about() {
cat <<EOF
Vault is a CLI tool for performing embedding and vector search locally.
Here is the current directory structure:
\`\`\`bash
$(tree)
\`\`\`
And here are the current contents of the project (ignoring prompt.sh,
which is used to define this prompt):
$(files-to-prompt . --ignore "prompt.sh")
EOF
}
main() {
cat <<EOF
Here is information about my project:
$(about)
Implement a DuckDB client that does vector search.
Make sure to use the following references.
References:
$(references)
EOF
}
Add error handling
This workflow already gets me surprisingly far! Providing reference docs seems to guide the model to making better decisions. That said, sometimes I get annoying failures.
$ ./vault.py
Traceback (most recent call last):
File "/Users/paul.wendt/vault/./vault.py", line 169, in <module>
db_client.search_similar_embeddings("test-model", [0.0, 1.0, 0.0], top_k=1)
File "/Users/paul.wendt/vault/./vault.py", line 90, in search_similar_embeddings
result = self.connection.execute(query, (query_embedding, top_k)).fetchall()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
duckdb.duckdb.BinderException: Binder Error: No function matches the given name and argument types 'array_inner_product(DOUBLE[3], DOUBLE[])'. You might need to add explicit type casts.
Candidate functions:
array_inner_product(FLOAT[ANY], FLOAT[ANY]) -> FLOAT
array_inner_product(DOUBLE[ANY], DOUBLE[ANY]) -> DOUBLE
Naturally, I have to let the LLM know.
#!/bin/bash
set -euo pipefail
references() {
cat <<EOF
Article about performing similarity search with DuckDB:
*******************************************************
$(lynx -dump -nolist https://blog.brunk.io/posts/similarity-search-with-duckdb/)
*******************************************************
Vector similarity search in DuckDB (vss extension):
***************************************************
$(lynx -dump -nolist https://duckdb.org/2024/05/03/vector-similarity-search-vss.html)
***************************************************
EOF
}
run() {
# remove the embeddings DB file if it already exists
if [ -f "file_to_remove" ]; then
rm "embeddings.db"
fi
cat <<EOF
\`\`\`bash
\$ ./vault.py
$(./vault.py 2>&1)
\`\`\`
EOF
}
about() {
cat <<EOF
Vault is a CLI tool for performing embedding and vector search locally.
Here is the current directory structure:
\`\`\`bash
$(tree)
\`\`\`
And here are the current contents of the project (ignoring prompt.sh,
which is used to define this prompt):
$(files-to-prompt . --ignore "prompt.sh")
EOF
}
main() {
cat <<EOF
Here is information about my project:
$(about)
The script is failing with the following error.
$(run)
Explain the error and suggest a fix, or steps to debug futher.
Make sure to use the following references:
$(references)
EOF
}
Musings
Bash as a templating language
I have a love hate relationship with bash:
Love:
- Bash scripts are easy to spin up and can connect with everything that has a CLI
- Bash is everywhere, so the
prompt.shscripts I write are portable-ish - Everything is text! So there’s no need to worry about image output
set -euo pipefailgets reasonable-ish behavior
Hate:
- HEREDOCS are ugly
- Bash generally, and HEREDOCS specifically, have weird syntax quirks
- As far as I know there’s no way to build up prompts async
For instance, I’d love to have the
referencessection call out to lynx while therunsection runs the script, but I don’t think it’s possible (or at the very least it’s not easy) to do these at the same time
What I really want is a new programming language for prompt templating. Templates should be pretty and sub-prompts should be able to run async. The language will have to be powerful too - it’ll probably resemble a souped up shell more than anything
