Using ChatGPT to Generate SQL Queries


When working with databases, querying for specific data is essential. Instead of manually writing SQL queries, what if you could simply ask a natural language question and get the desired data? With ChatGPT’s function calling capability, we can achieve just that!

Previoulsy, we have covered function calling in:
how to set an function calling example
how to choose from multiple functions

Step by Step Guide

Let’s see how we can set up a system that uses ChatGPT to generate SQL queries for us, using the Chinook sample database as an example.

1. Connecting to the Database

First, we need to establish a connection to the SQLite database.

import sqlite3

conn = sqlite3.connect("data/Chinook.db")
print("Opened database successfully")

2. Extracting Database Schema

To make well-informed queries, ChatGPT needs to understand the structure of our database. We can achieve this by creating utility functions to extract table names, column names, and overall database information.

def get_table_names(conn):
table_names = []
tables = conn.execute("SELECT name FROM sqlite_master WHERE type='table';")
for table in tables.fetchall():
table_names.append(table[0])
return table_names

def get_column_names(conn, table_name):
column_names = []
columns = conn.execute(f"PRAGMA table_info('{table_name}');").fetchall()
for col in columns:
column_names.append(col[1])
return column_names

def get_database_info(conn):
table_dicts = []
for table_name in get_table_names(conn):
columns_names = get_column_names(conn, table_name)
table_dicts.append({"table_name": table_name, "column_names": columns_names})
return table_dicts

With these functions, you can now generate a schema representation for the database.

3. Function Specifications for ChatGPT

With the database schema in hand, we can define a function specification for ChatGPT. This will provide context for the model about the structure of our database and inform it how to generate the SQL queries.

database_schema_dict = get_database_info(conn)
database_schema_string = "\n".join(
[
f"Table: {table['table_name']}\nColumns: {', '.join(table['column_names'])}"
for table in database_schema_dict
]
)

functions = [
{
"name": "ask_database",
"description": "Use this function to answer user questions about music. Input should be a fully formed SQL query.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": f"SQL query extracting info using this database schema: {database_schema_string}. The query should be returned in plain text, not in JSON."
}
},
"required": ["query"],
},
}
]

4. Execute SQL Queries

Next, we need a function to execute the generated SQL queries against our database.

def ask_database(conn, query):
try:
results = str(conn.execute(query).fetchall())
except Exception as e:
results = f"query failed with error: {e}"
return results

5. ChatGPT Interaction and SQL Query Execution

Now, using the ChatGPT API, we can interact with the model. When we get a function call response from the model, we can execute the SQL query and return the results.

messages = []
messages.append({"role": "system", "content": "Answer user questions by generating SQL queries against the Chinook Music Database."})
messages.append({"role": "user", "content": "Hi, who are the top 5 artists by number of tracks?"})

chat_response = chat_completion_request(messages, functions)
assistant_message = chat_response.json()["choices"][0]["message"]
messages.append(assistant_message)

if assistant_message.get("function_call"):
results = execute_function_call(assistant_message)
messages.append({"role": "function", "name": assistant_message["function_call"]["name"], "content": results})

Conclusion

By combining ChatGPT with database interactions, we’ve demonstrated how we can generate SQL queries using natural language questions. This approach allows for more intuitive data retrieval, especially for users who might not be familiar with SQL syntax. However, always ensure that you validate and sanitize the generated queries, especially if used in a production environment, to maintain data integrity and security.

github code link


Author: robot learner
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source robot learner !
  TOC