Lance Martin

Context

Q+A assistants over various types of content are a good application for LLMs. I’ve released a few apps and have been cataloging some of the common failure cases and / or user requests:

1/ Split Size

Split size and k (documents returned via similarity search) have a strong influence on performance. I have some evaluations looking at this the effect of split size here. Below I also show how the number of documents affects the latency.

Untitled

2**/ Latency**

Latency increases with respect to context window (and, currently, higher-tier models). Streaming in apps can improve the UX, but latency remains a concern for apps that demand rapid response.

Untitled

3/ Sensitivity to query

Answer sensitivity to prompt remains a problem. For example, see below from the app here. There are many tools to address this (e.g., this is a general problem with search), but it would be very convenient to package such tools into popular LLM libraries.

4/ Metadata filtering / hybrid search

Based on some helpful Twitter discussion, metadata filtering is missing: if I ask, what does Karpathy think abt transformers or summarize episode 118 the query does not focus only on that episode (eg., using metadata filtering). Metadata filtering gives more granular control on the data that we want to run a semantic search on.

5/ Suggestions