CS294 - LLM Agents - Notes
LLM Agents: a free UC berkeley MOOC course — CS 294/194–196
Intermediate reasoning steps improve LLM performance. (LLM Does not know reasoning as such?)
Chain-of-thought prompting is very effective. An example problem involves teaching:
Bill Gates --> ls; Elon Musk --> nk ; Then, Barrack Obama --> ?? (Answer is ka)
Since these exact words are not in training data, but how does it work? Does LLM have basic reasoning related to natural language and we leverage that ?
The reflection questions are effective only when the answer is wrong. If the answer is right, the reflection forces to give some other wrong answer.
Should we generate multiple LLM answers and then the LLM compare the answers and select the best one ? Probably not, because every answer generation itself is based on "Highest quality next token". Not clear if The whole answer evaluation is possible or not ??? Multiple LLMs can generate answers and we can use another LLM to compare the answers and select one from that, not sure if such an approach makes sense. May be an agent can do this.
LLMs can be easily distracted by irrelevant info, but when we remind explicitly to ignore irrelevant information, it performs better.
LLMs are good in picking up training using few shot examples. Basically prompting technique, where we explicitly explain the task to LLM, it does it. We can probably call it as 0 shot, OR 1 shot OR few shots learning.
There is Training, Fine Tuning, Reinforcement Learning, RAG, etc.