LangChain, chains and agents, a great piece of engineering work to facilitate prompt chaining


According to the official site, LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model, but will also be:

  1. Data-aware: connect a language model to other sources of data
  2. Agentic: allow a language model to interact with its environment

The LangChain framework is designed around these principles, with the large language model (llm) as the engine where we must assume the llm works as it expects.

Two most important concepts in Langchain are chains and agents.

Chains

Using an LLM in isolation is fine for some simple applications, but many more complex ones require chaining LLMs - either with each other or with other experts. LangChain provides a standard interface for Chains, as well as some common implementations of chains for ease of use.

Why do we need chains?

Chains allow us to combine multiple components together to create a single, coherent application. For example, we can create a chain that takes user input, formats it with a PromptTemplate, and then passes the formatted response to an LLM. We can build more complex chains by combining multiple chains together, or by combining chains with other components.

Quick start: Using LLMChain

The LLMChain is a simple chain that takes in a prompt template, formats it with the user input and returns the response from an LLM.

To use the LLMChain, first create a prompt template.

from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI

llm = OpenAI(temperature=0.9)
prompt = PromptTemplate(
input_variables=["product"],
template="What is a good name for a company that makes {product}?",
)

We can now create a very simple chain that will take user input, format the prompt with it, and then send it to the LLM.

from langchain.chains import LLMChain
chain = LLMChain(llm=llm, prompt=prompt)

# Run the chain only specifying the input variable.
print(chain.run("colorful socks"))
Colorful Toes Co.

If there are multiple variables, you can input them all at once using a dictionary.

prompt = PromptTemplate(
input_variables=["company", "product"],
template="What is a good name for {company} that makes {product}?",
)
chain = LLMChain(llm=llm, prompt=prompt)
print(chain.run({
'company': "ABC Startup",
'product': "colorful socks"
}))
Socktopia Colourful Creations.

You can use a chat model in an LLMChain as well:

from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
ChatPromptTemplate,
HumanMessagePromptTemplate,
)
human_message_prompt = HumanMessagePromptTemplate(
prompt=PromptTemplate(
template="What is a good name for a company that makes {product}?",
input_variables=["product"],
)
)
chat_prompt_template = ChatPromptTemplate.from_messages([human_message_prompt])
chat = ChatOpenAI(temperature=0.9)
chain = LLMChain(llm=chat, prompt=chat_prompt_template)
print(chain.run("colorful socks"))
Rainbow Socks Co.

Agents

Some applications will require not just a predetermined chain of calls to LLMs/other tools, but potentially an unknown chain that depends on the user’s input. In these types of chains, there is a “agent” which has access to a suite of tools. Depending on the user input, the agent can then decide which, if any, of these tools to call.

At the moment, there are two main types of agents in Langchain:

  1. “Action Agents”: these agents decide an action to take and take that action one step at a time

  2. “Plan-and-Execute Agents”: these agents first decide a plan of actions to take, and then execute those actions one at a time.

When should you use each one? Action Agents are more conventional, and good for small tasks. For more complex or long running tasks, the initial planning step helps to maintain long term objectives and focus. However, that comes at the expense of generally more calls and higher latency. These two agents are also not mutually exclusive - in fact, it is often best to have an Action Agent be in charge of the execution for the Plan and Execute agent.

Action Agents

High level pseudocode of agents looks something like:

  • Some user input is received

  • The agent decides which tool - if any - to use, and what the input to that tool should be

  • That tool is then called with that tool input, and an observation is recorded (this is just the output of calling that tool with that tool input)

  • That history of tool, tool input, and observation is passed back into the agent, and it decides what step to take next

  • This is repeated until the agent decides it no longer needs to use a tool, and then it responds directly to the user.

The different abstractions involved in agents are as follows:

  • Agent: this is where the logic of the application lives. Agents expose an interface that takes in user input along with a list of previous steps the agent has taken, and returns either an AgentAction or AgentFinish
    AgentAction corresponds to the tool to use and the input to that tool

    AgentFinish means the agent is done, and has information around what to return to the user

  • Tools: these are the actions an agent can take. What tools you give an agent highly depend on what you want the agent to do

  • Toolkits: these are groups of tools designed for a specific use case. For example, in order for an agent to interact with a SQL database in the best way it may need access to one tool to execute queries and another tool to inspect tables.

  • Agent Executor: this wraps an agent and a list of tools. This is responsible for the loop of running the agent iteratively until the stopping criteria is met.

The most important abstraction of the four above to understand is that of the agent. Although an agent can be defined in whatever way one chooses, the typical way to construct an agent is with:

  • PromptTemplate: this is responsible for taking the user input and previous steps and constructing a prompt to send to the language model

  • Language Model: this takes the prompt constructed by the PromptTemplate and returns some output

  • Output Parser: this takes the output of the Language Model and parses it into an AgentAction or AgentFinish object.

Plan-and-Execute Agents

High level pseudocode of agents looks something like:

  • Some user input is received

  • The planner lists out the steps to take

  • The executor goes through the list of steps, executing them

The most typical implementation is to have the planner be a language model, and the executor be an action agent.

Reveal the mystery behind agents

It might sound like the agents are so smart, but the power actually comes from the large langage model itself.
With some clever prompt design, we put the whole workflow to llm through prompting, and let the llm to tell us what to do next.

Here we take a custom LLM agent as an example.
An LLM chat agent consists of three parts:

  • PromptTemplate: This is the prompt template that can be used to instruct the language model on what to do

  • ChatModel: This is the language model that powers the agent

  • stop sequence: Instructs the LLM to stop generating as soon as this string is found

  • OutputParser: This determines how to parse the LLMOutput into an AgentAction or AgentFinish object

The LLMAgent is used in an AgentExecutor. This AgentExecutor can largely be thought of as a loop that:

  1. Passes user input and any previous steps to the Agent (in this case, the LLMAgent)

  2. If the Agent returns an AgentFinish, then return that directly to the user

  3. If the Agent returns an AgentAction, then use that to call a tool and get an Observation

  4. Repeat, passing the AgentAction and Observation back to the Agent until an AgentFinish is emitted.

AgentAction is a response that consists of action and action_input. action refers to which tool to use, and action_input refers to the input to that tool. log can also be provided as more context (that can be used for logging, tracing, etc).

AgentFinish is a response that contains the final message to be sent back to the user. This should be used to end an agent run.

Set up environment

Do necessary imports, etc.

!pip install langchain
!pip install google-search-results
!pip install openai
from langchain.agents import Tool, AgentExecutor, LLMSingleActionAgent, AgentOutputParser
from langchain.prompts import BaseChatPromptTemplate
from langchain import SerpAPIWrapper, LLMChain
from langchain.chat_models import ChatOpenAI
from typing import List, Union
from langchain.schema import AgentAction, AgentFinish, HumanMessage
import re
from getpass import getpass

Set up tool

Set up any tools the agent may want to use. This may be necessary to put in the prompt (so that the agent knows to use these tools).

SERPAPI_API_KEY = getpass()
# Define which tools the agent can use to answer user queries
search = SerpAPIWrapper(serpapi_api_key=SERPAPI_API_KEY)
tools = [
Tool(
name = "Search",
func=search.run,
description="useful for when you need to answer questions about current events"
)
]

Prompt Template

This instructs the agent on what to do. Generally, the template should incorporate:

  • tools: which tools the agent has access and how and when to call them.

  • intermediate_steps: These are tuples of previous (AgentAction, Observation) pairs. These are generally not passed directly to the model, but the prompt template formats them in a specific way.

  • input: generic user input

    # Set up the base template
    template = """Complete the objective as best you can. You have access to the following tools:

    {tools}

    Use the following format:

    Question: the input question you must answer
    Thought: you should always think about what to do
    Action: the action to take, should be one of [{tool_names}]
    Action Input: the input to the action
    Observation: the result of the action
    ... (this Thought/Action/Action Input/Observation can repeat N times)
    Thought: I now know the final answer
    Final Answer: the final answer to the original input question

    These were previous tasks you completed:



    Begin!

    Question: {input}
    {agent_scratchpad}"""
    # Set up a prompt template
    class CustomPromptTemplate(BaseChatPromptTemplate):
    # The template to use
    template: str
    # The list of tools available
    tools: List[Tool]

    def format_messages(self, **kwargs) -> str:
    # Get the intermediate steps (AgentAction, Observation tuples)
    # Format them in a particular way
    intermediate_steps = kwargs.pop("intermediate_steps")
    thoughts = ""
    for action, observation in intermediate_steps:
    thoughts += action.log
    thoughts += f"\nObservation: {observation}\nThought: "
    # Set the agent_scratchpad variable to that value
    kwargs["agent_scratchpad"] = thoughts
    # Create a tools variable from the list of tools provided
    kwargs["tools"] = "\n".join([f"{tool.name}: {tool.description}" for tool in self.tools])
    # Create a list of tool names for the tools provided
    kwargs["tool_names"] = ", ".join([tool.name for tool in self.tools])
    formatted = self.template.format(**kwargs)
    return [HumanMessage(content=formatted)]
    prompt = CustomPromptTemplate(
    template=template,
    tools=tools,
    # This omits the `agent_scratchpad`, `tools`, and `tool_names` variables because those are generated dynamically
    # This includes the `intermediate_steps` variable because that is needed
    input_variables=["input", "intermediate_steps"]
    )

Output Parser

The output parser is responsible for parsing the LLM output into AgentAction and AgentFinish. This usually depends heavily on the prompt used.

This is where you can change the parsing to do retries, handle whitespace, etc


class CustomOutputParser(AgentOutputParser):

def parse(self, llm_output: str) -> Union[AgentAction, AgentFinish]:
# Check if agent should finish
if "Final Answer:" in llm_output:
return AgentFinish(
# Return values is generally always a dictionary with a single `output` key
# It is not recommended to try anything else at the moment :)
return_values={"output": llm_output.split("Final Answer:")[-1].strip()},
log=llm_output,
)
# Parse out the action and action input
regex = r"Action\s*\d*\s*:(.*?)\nAction\s*\d*\s*Input\s*\d*\s*:[\s]*(.*)"
match = re.search(regex, llm_output, re.DOTALL)
if not match:
raise ValueError(f"Could not parse LLM output: `{llm_output}`")
action = match.group(1).strip()
action_input = match.group(2)
# Return the action and action input
return AgentAction(tool=action, tool_input=action_input.strip(" ").strip('"'), log=llm_output)
output_parser = CustomOutputParser()

Set up LLM

Choose the LLM you want to use!

OPENAI_API_KEY = getpass()
llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY, temperature=0)

Define the stop sequence

This is important because it tells the LLM when to stop generation.

This depends heavily on the prompt and model you are using. Generally, you want this to be whatever token you use in the prompt to denote the start of an Observation (otherwise, the LLM may hallucinate an observation for you).

Set up the Agent

We can now combine everything to set up our agent

# LLM chain consisting of the LLM and a prompt
llm_chain = LLMChain(llm=llm, prompt=prompt)
tool_names = [tool.name for tool in tools]
agent = LLMSingleActionAgent(
llm_chain=llm_chain,
output_parser=output_parser,
stop=["\nObservation:"],
allowed_tools=tool_names
)

Use the Agent

Now we can use it!

agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)
agent_executor.run("Search for Leo DiCaprio's girlfriend on the internet.")
> Entering new AgentExecutor chain...
Thought: I should use a reliable search engine to get accurate information.
Action: Search
Action Input: "Leo DiCaprio girlfriend"

Observation:He went on to date Gisele Bündchen, Bar Refaeli, Blake Lively, Toni Garrn and Nina Agdal, among others, before finally settling down with current girlfriend Camila Morrone, who is 23 years his junior.
I have found the answer to the question.
Final Answer: Leo DiCaprio's current girlfriend is Camila Morrone.

> Finished chain.
"Leo DiCaprio's current girlfriend is Camila Morrone."

Author: robot learner
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source robot learner !
  TOC