Context
Q+A assistants over various types of content are a good application for LLMs. I’ve released a few apps and have been cataloging some of the common failure cases and / or user requests:
1/ Split Size
Split size and k
(documents returned via similarity search) have a strong influence on performance. I have some evaluations looking at this the effect of split size here. Below I also show how the number of documents affects the latency.
2**/ Latency**
Latency increases with respect to context window (and, currently, higher-tier models). Streaming in apps can improve the UX, but latency remains a concern for apps that demand rapid response.
3/ Sensitivity to query
Answer sensitivity to prompt remains a problem. For example, see below from the app here. There are many tools to address this (e.g., this is a general problem with search), but it would be very convenient to package such tools into popular LLM libraries.
According to Andrej Karpathy, the transformer architecture is good for text modeling and some aspects of video modeling, but it is still limited in its ability to capture feedback mechanisms, causality, and counterfactual reasoning. He believes that AGI will require models that can do experiments and come up with hypotheses based on evidence, rather than just solving simple puzzles or generating HTML buttons. While transformers may become better with more data and bigger models, they may not be able to capture the full range of capabilities needed for AGI.
There is no information provided about what Andrej Karpathy says about transformers in the given passages.
4/ Metadata filtering / hybrid search
Based on some helpful Twitter discussion, metadata filtering is missing: if I ask, what does Karpathy think abt transformers
or summarize episode 118
the query does not focus only on that episode (eg., using metadata filtering). Metadata filtering gives more granular control on the data that we want to run a semantic search on.
5/ Suggestions