A Little Bit of Context
Chatbots are becoming increasingly popular and their numbers are growing. These tools can simplify tasks for us humans in many areas. As a developer, I use these tools as development accelerators that allow me to quickly lay the foundations of the code, which I am then responsible for testing and consolidating.
Recently, these tools have allowed me to interact more efficiently with a SQL database and it is this experience that I wish to share. Although I am an experienced developer, I am far from mastering the SQL language, on which I have trained occasionally online and which I practice only from time to time when a specific need arises in the context of a development.
I believe that my case is not isolated and that many developers have only basic knowledge of SQL, a language most often used indirectly, via layers of abstraction. It seems to me that it is precisely for this type of profile, of developers who are initiated without being truly expert, where the use of a chatbot can prove useful. This is what we are going to try to see together through some concrete cases.
It is important to specify what SQL and a SQL database are. SQL stands for Structured Query Language. It is a programming language used to manage and manipulate relational databases. And a SQL database, which is relational, corresponds to a set of data stored in a particular format, in which this data is related to each other. This can add some complexity to queries, especially in the case of numerous joins, i.e., operations that combine the records of two tables (or more) based on a specified condition.
The Tools Used for the Investigation
Conversational Agents
In the context of this research work, I used three chatbots or conversational agents among those available on the web for free, i.e., free of charge:
ChatGPT 3.5 from OpenAI
The most well-known and pioneering product produced by the US firm OpenAI. Version 3.5 is the version made available for free, with the more powerful paid version being version 4 to date.
Mistral AI's Chat
The tool pushed by French entrepreneurs who have made their classes in large firms based in the United States. Mistral AI made headlines in early 2023, first through an open-source language model accessible to developers. A conversational agent similar to ChatGPT was put online in February 2024. The performance of its language model, close to GPT-4, was notably highlighted.
Google's Gemini
Google's Gemini was made available in early 2024 after being presented with a lot of communication. It is a conversational agent, just like ChatGPT, which has the advantage of being integrated into the Google ecosystem and whose language model is set to play an increasing role in the search engine.
The Database Engine
I based myself on a MySQL 5.7 database, older than its replacement MySQL 8 but still widely used. Feel free to share your feedback on usage with other database engines.
As for the dataset used for the tests, I simply asked a Chatbot to suggest some TABLE creation queries and some queries to insert a few dozen entries, which I then edited manually to my liking. And I must say that it worked rather well, in two or three prompts.
Practical Application
Before we begin, a little digression in the form of an existential question: do we really need to interact directly in SQL with a database?
Indeed, as a developer, it is becoming increasingly rare that we have to write SQL queries directly in our code. The frameworks, CMS, and libraries made available in most modern programming languages add an intermediate language layer that facilitates the writing of queries and makes them agnostic, i.e., allows them to be grafted onto different types of database engines (SQLite, MySQL/MariaDB, PostgreSQL to name the most common).
In PHP, for example, we will most often use at least the PDO library, which facilitates and secures the writing of queries. For Java, it will be JDBC, and for Python, the integrated DB-API associated with a module specific to the type of database.
However, there may be times when we need instant access to the data in a SQL database, to quickly obtain information on the contents of our application, especially since this information is very specific and is not directly available on the site.
Let's take a simple and fairly typical example in the context of this article: an event site. A relational database can be used to store a list of events, a list of locations, and a list of thematic categories that we want to display to the user. Each event can retain a reference to the information of the location where it takes place as well as to one or more categories.
This structure of information allows displaying within a website, depending on the user's choices, a list of events for a given location or for a specific category.
Let's now see concretely what the different chatbots available for free on the web have to offer us to write a SQL query.
Request No. 1: A Simple Request Resolved Efficiently
Let's start with a simple query. We would like to know how many events in the reggae category have taken place in Paris, France.
Here is the type of quick prompt to write that will result in a satisfying query.
Act as a SQL expert.
Write a query that allows me to retrieve the number of events in the reggae category taking place in Paris, France.
The events table contains the events and has id, city, and country columns. The category table contains the categories and has an id and name column.
Let's take the time to break down the instructions.
The first step, classic in the use of a chatbot, is to give a role to our virtual interlocutor, to ensure the relevance of their response.
Then, two essential elements to describe to obtain a query that corresponds to the need:
- the expected result of the query
- the data source, i.e., the tables and columns we want to target and how they are related.
Of course, the more complex the need, the more precise the description will have to be.
ChatGPT's Response (Open AI)
Here is a SQL query to retrieve the number of events in the "reggae" category taking place in Paris, France:
SELECT COUNT(*) AS event_count
FROM events
JOIN category ON events.category_id = category.id
WHERE category.name = 'reggae'
AND events.city = 'Paris'
AND events.country = 'France';This query uses a join between the "events" and "category" tables on the "category_id" column of the "events" table and the "id" column of the "category" table to retrieve the number of events in the "reggae" category. Then, it applies conditions to filter events taking place in Paris, France. Finally, it counts the number of corresponding events.
Analysis: ChatGPT's response is concise and effective. The query is correct. The OpenAI chatbot assumes that the link between an event and a category is located in the EVENTS table and not in an independent table. It must be said that I had not specified the nature of this link and that the proposed solution appears to be the simplest. But one could have hoped for a little more commentary on this assumption.
Mistral AI Chat's Response
I assume there is also a linking table between the "events" and "category" tables, which I will call "event_category" having the columns "event_id" and "category_id". Here is the SQL query:
SELECT COUNT(e.id) as event_count
FROM events e
JOIN event_category ec ON e.id = ec.event_id
JOIN category c ON ec.category_id = c.id
WHERE c.name = 'reggae' AND e.city = 'Paris' AND e.country = 'France';This query retrieves the number of events in the "reggae" category taking place in Paris, France.
Analysis: Also a correct response from the Mistral AI Chat. We notice slight nuances in the syntax, not necessarily significant for our use case.
But we mainly note an additional join, i.e., the call to an additional table, storing the relationship between an event and a category. Isolating the relationship between two entities of a relational database is a fairly classic architectural choice.
The Mistral AI Chat provides a slightly more succinct and less technical description.
Unlike ChatGPT, it assumes that it has deduced information that I had not provided.
Google Gemini's Response
SQL query to count reggae events in Paris
SELECT COUNT(*) AS event_count
FROM events e
INNER JOIN category c ON e.category_id = c.id
WHERE c.name = 'reggae'
AND e.city = 'Paris'
AND e.country = 'France';Use this code with caution.
Explanation of the query:
- SELECT COUNT(*) AS number_events: This line selects the number of records from the events table and assigns it to the alias number_events.
- FROM events e: This line indicates that the query selects data from the events table and gives it the alias e.
- INNER JOIN category c ON e.category_id = c.id: This line joins the events table to the category table on the category_id column. This allows filtering events by category.
- WHERE c.name = 'reggae': This line filters events whose category is reggae.
- AND e.city = 'Paris': This line filters events taking place in Paris.
- AND e.country = 'France': This line filters events taking place in France.
Result:
The result of the query will be a single record with a single column number_events containing the number of reggae events in Paris, France.
Analysis: Also a functional query from Google Gemini. A syntax again slightly nuanced compared to the other two, with marginal differences.
And the same assumption as ChatGPT for the event/category link: the addition of a category_id column in the EVENTS table.
Furthermore, Gemini is generous in its description and explains line by line what the query does.
Request No. 2: Watch out for duplicates
Here is another example of a query. This time specifying the nature of the link between an event and a category.
Act as an SQL expert.
Write a query that allows me to retrieve events taking place in France whose category is neither jazz nor classical and whose title contains the word festival.
The events table contains the events and has the columns id, title, city, country.
The category table contains the categories and has an id column and a name column.
The event_category table contains the links between an event and one or more categories and has the columns event_id and category_id
ChatGPT (Open AI) Response
You can use an SQL query with a join between the events, event_category, and category tables, while using WHERE clauses to filter the events according to your criteria. Here's how you can do it:
SELECT e.id, e.title, e.city, e.country
FROM events e
INNER JOIN event_category ec ON e.id = ec.event_id
INNER JOIN category c ON ec.category_id = c.id
WHERE e.country = 'France'
AND c.name NOT IN ('jazz', 'classical')
AND e.title LIKE '%festival%';This query selects events from the events table in France (e.country = 'France'), whose category is neither jazz nor classical (c.name NOT IN ('jazz', 'classical')), and whose title contains the word "festival" (e.title LIKE '%festival%').
Analysis: Simple and effective explanation of the query. However, the results returned include duplicates, which is not what was expected given the targeted database, which only contains unique entries. By pushing ChatGPT a little, we can get to our goal.
How to avoid duplicates?
To avoid duplicates in the query results, you can use DISTINCT in the SELECT clause. Here's how to modify the query to include DISTINCT:
SELECT DISTINCT e.id, e.title, e.city, e.country
FROM events e
JOIN event_category ec ON e.id = ec.event_id
JOIN category c ON ec.category_id = c.id
WHERE e.country = 'France'
AND c.name NOT IN ('jazz', 'classical')
AND e.title LIKE '%festival%';By adding DISTINCT, the query will return only one occurrence of each unique combination of e.id, e.title, e.city, and e.country, thus avoiding duplicates in the results.
Analysis: The query is good, although I wouldn't have minded being offered a solution with GROUP BY as well, which would also lead to the desired result.
Mistral AI Chat Response
Here is an SQL query that should answer your request:
SELECT e.*
FROM events e
JOIN event_category ec ON e.id = ec.event_id
JOIN category c ON ec.category_id = c.id
WHERE e.country = 'France'
AND c.name NOT IN ('jazz', 'classical')
AND e.title LIKE '%festival%';This query will select events in France whose category is neither jazz nor classical and whose title contains the word 'festival'. It joins the 'events', 'event_category' and 'category' tables on the appropriate keys to obtain the necessary information.
Analysis: Query very close to that of ChatGPT. Same concise style as for the 1st query made with this chatbot and same observation as for ChatGPT concerning the quality of the query, which returns duplicates. Following my prompt, Mistral AI Chat also corrected the shot with a very similar answer to ChatGPT.
Google Gemini Response
SQL query to retrieve events in France
SELECT e.id, e.title, e.city, e.country
FROM events e
INNER JOIN event_category ec ON e.id = ec.event_id
INNER JOIN category c ON ec.category_id = c.id
WHERE e.country = 'France'
AND c.name NOT IN ('jazz', 'classical')
AND e.title LIKE '%festival%';Use this code with caution.
Explanation of the query:
- SELECT e.id, e.title, e.city, e.country: We select the id, title, city and country columns from the events table.
- INNER JOIN event_category ec ON e.id = ec.event_id: We join the events table to the event_category table on the id column.
- INNER JOIN category c ON ec.category_id = c.id: We join the event_category table to the category table on the category_id column.
- WHERE e.country = 'France': We filter the events that take place in France.
- AND c.name NOT IN ('jazz', 'classical'): We exclude events whose category is "jazz" or "classical".
- AND e.title LIKE '%festival%': We filter the events whose title contains the word "festival".
Result:
The query returns a list of events that take place in France, whose category is neither jazz nor classical and whose title contains the word "festival". The list contains the following information for each event:
- ID
- Title
- City
- Country
Analysis: Similar query to those of its competitors, with duplicates as a result. Regarding style, Google Gemini's seems decidedly more verbose and guides us step by step through the specificities of the query. Its response to remove duplicates, below, presents the different solutions, including GROUP BY as an alternative, and is therefore more complete than that of GPT or Mistral.
To avoid duplicates in your query result, you can use the DISTINCT keyword after the SELECT statement.
Query with DISTINCT:
SELECT DISTINCT e.id, e.title, e.city, e.country
FROM events e
INNER JOIN event_category ec ON e.id = ec.event_id
INNER JOIN category c ON c.id = ec.category_id
WHERE e.country = 'France'
AND c.name NOT IN ('jazz', 'classical')
AND e.title LIKE '%festival%'Use this code with caution.
Explanation:
- The DISTINCT keyword ensures that each row of the result is unique.
- In this query, it is used to ensure that each event appears only once in the result.
Other options to avoid duplicates:
- You can use the GROUP BY clause to group events by one or more columns and display only one row for each group.
- You can use the ORDER BY clause to sort events by one or more columns and display unique events first.
Choosing the best option:
The best option to avoid duplicates depends on your specific needs.
- If you only want to display unique events, the DISTINCT clause is the simplest solution.
- If you want to display aggregated information about events, the GROUP BY clause is a better option.
- If you want to display unique events in a specific order, the ORDER BY clause can be used in conjunction with DISTINCT or GROUP BY.
Request No. 3: Pushing chatbots to their limits
Act as an SQL expert. Write a query that allows me to retrieve events taking place in Lyon or Saint-Etienne whose category is pop or jazz, whose title contains the word concer without t and replace these entries with the word concert with a t. The events table contains the events and has the columns id, title, city, country. The category table contains the categories and has an id column and a name column. The event_category table contains the links between an event and one or more categories and has the columns event_id and category_id
ChatGPT (Open AI) Response
Here is the SQL query corresponding to your criteria:
SELECT e.id, REPLACE(e.title, 'concer', 'concert') AS title, e.city, e.country
FROM events e
JOIN event_category ec ON e.id = ec.event_id
JOIN category c ON ec.category_id = c.id
WHERE (e.city = 'Lyon' OR e.city = 'Saint-Etienne')
AND (c.name = 'pop' OR c.name = 'jazz')
AND e.title LIKE '%concer%'This query retrieves events taking place in Lyon or Saint-Etienne, of category "pop" or "jazz", whose title contains the word "concer". The REPLACE function is used to replace "concer" with "concert".
Analysis: The provided query does not meet the need, and here we see the limits of the chatbot. The chatbot made a strictly logical analysis of my prompt without deducing that I wanted to correct a spelling mistake.
The proposed solution will replace "concer" in the word "concert" and will not simply correct the word "concer" which is missing a t, but will also transform concert into "concertt", which is not what I wanted.
Furthermore, it is a SELECT query, which means that the entries in the database will not be modified, since an UPDATE type query would be needed for that.
Mistral AI Chat Response
Here is an SQL query that should answer your request. It updates the title of the events in the "events" table according to the specified criteria. Note that SQL is not case-sensitive, unless you use specific collations.
UPDATE events
SET title = REPLACE(title, 'concer', 'concert')
WHERE city IN ('Lyon', 'Saint-Etienne')
AND id IN (
SELECT event_id
FROM event_category
JOIN category ON event_category.category_id = category.id
WHERE category.name IN ('pop', 'jazz')
)
AND title LIKE '%concer%';This query works in two parts. The first part is the UPDATE clause, which replaces 'concer' with 'concert' in the event title. The second part is the WHERE clause, which filters the events based on the city and category. The subquery is used to retrieve the IDs of the events that belong to the categories 'pop' or 'jazz'.
Analysis: The proposed solution is closer to my expectations since this time we get an UPDATE query right away, which will actually modify the entries in the database. However, Mistral AI Chat still makes an error of interpretation by remaining strictly logical.
Google Gemini Response
SQL query to retrieve events in Lyon or Saint-Etienne with pop or jazz categories and the word "concer" without "t"
SELECT e.id,
REPLACE(e.title, 'concer', 'concert') AS title,
e.city,
e.country
FROM events e
INNER JOIN event_category ec ON e.id = ec.event_id
INNER JOIN categories c ON ec.category_id = c.id
WHERE (e.city = 'Lyon' OR e.city = 'Saint-Etienne')
AND (c.name = 'pop' OR c.name = 'jazz')
AND e.title LIKE '%concer%';Use this code with caution.
Explanation of the query:
- SELECT: We select the id, title, city, and country columns from the events table.
- REPLACE: We replace the word "concer" with "concert" in the title column.
- INNER JOIN: We join the events table to the event_category table on the id column.
- INNER JOIN: We join the event_category table to the categories table on the category_id column.
- WHERE: We filter the events whose city is 'Lyon' or 'Saint-Etienne' and whose category is 'pop' or 'jazz'.
- LIKE: We filter the events whose title contains the word "concer".
Result:
The query returns a table containing the events that match the defined criteria. The title column will have the word "concert" with a "t".
Analysis: Same communication problem as with ChatGPT; I receive a SELECT query and not an UPDATE query. Additionally, the first query proposed had syntax errors (category table named categories) and was not usable as is. I had to make a new prompt to get the result above. I would have liked the chatbot to at least propose both solutions, in case of doubt, so that I could explore the one that suits me.
Second round of discussions for request No. 3
Faced with the failure of the first attempt with the three conversational agents, I thought that I might have been a little too vague and logical in my statement. After some trial and error, I arrived at the following prompt, which delimits my need more:
Write one or more queries that allow me to retrieve events taking place in Lyon or Saint-Etienne whose category is pop or jazz, whose title contains the word concer without t, and then replace and update these entries with the word concert with a t. The events table contains the events and has the columns id, title, city, country. The category table contains the categories and has an id column and a name column. The event_category table contains the links between an event and one or more categories and has the columns event_id and category_id.
In the searched character strings, concer must be followed and preceded by a space. Except if it is at the beginning of the string: in this case, it is not preceded by any character. And except if it is at the end of the string: in this case, it is not followed by any character. The query must be compatible with different versions of MySQL (5 and 8) and PostgreSQL.
ChatGPT (Open AI) Response
Query extracted from ChatGPT's response:
-- Mise à jour des titles des événements
UPDATE events
SET title = REPLACE(title, ' concer', ' concert')
WHERE (city = 'Lyon' OR city = 'Saint-Etienne')
AND (title LIKE '% concer%' OR title LIKE 'concer%');Analysis: ChatGPT proposed two queries: a SELECT and an UPDATE. The UPDATE query, which interests me, correctly replaces the occurrences of "concer" with "concert", but only when the word is preceded and followed by a space, not when it is at the beginning or end of the string. I will not get a more convincing solution from ChatGPT after communicating the error.
However, this query can be used several times by adjusting the strings to search for and replace, so it remains a resource to rely on if I do not wish to strain myself to find the perfect query that does everything in one go.
Mistral AI Chat Response
Query extracted from Mistral AI Chat's response:
UPDATE events e
JOIN event_category ec ON e.id = ec.event_id
JOIN category c ON ec.category_id = c.id
SET e.title = REPLACE(e.title, 'concer', 'concert')
WHERE (e.city = 'Lyon' OR e.city = 'Saint-Etienne')
AND c.name IN ('pop', 'jazz')
AND (e.title LIKE 'concer%' OR e.title LIKE '% concer%' OR e.title LIKE '% concer %');Two queries were also proposed. The UPDATE query correctly replaces the occurrences of "concer" with "concert", but it also replaces "concer" in "concert", so we end up with "concertt" with two t's, and we remain stuck at the same stage as with the first prompt we proposed.
However, when I pointed out the error, the chatbot caught up and gave me the correct query:
I understand the problem now. To avoid replacing 'concer' in 'concert', you can use the CASE function to check if 'concer' is a whole word or not before replacing it. Here is the updated query:
UPDATE events e
JOIN event_category ec ON e.id = ec.event_id
JOIN category c ON ec.category_id = c.id
SET e.title = CASE
WHEN e.title LIKE 'concer%' AND e.title NOT LIKE 'concert%' THEN REPLACE(e.title, 'concer', 'concert')
WHEN e.title LIKE '% concer%' AND e.title NOT LIKE '% concert%' THEN REPLACE(e.title, ' concer', ' concert')
WHEN e.title LIKE '% concer %' AND e.title NOT LIKE '% concert %' THEN REPLACE(e.title, ' concer ', ' concert ')
ELSE e.title
END
WHERE (e.city = 'Lyon' OR e.city = 'Saint-Etienne')
AND c.name IN ('pop', 'jazz')
AND (e.title LIKE 'concer%' OR e.title LIKE '% concer%' OR e.title LIKE '% concer %');This query should now correctly replace 'concer' with 'concert' without affecting the occurrences of 'concert'.
Bingo! Although it required some patience.
Google Gemini Response
Excerpt from Gemini's response:
-- MySQL
UPDATE events e
SET e.title = REPLACE(e.title, ' concer ', ' concert ')
WHERE e.id IN (
SELECT e.id
FROM events e
JOIN event_category ec ON e.id = ec.event_id
JOIN categories c ON ec.category_id = c.id
WHERE e.city IN ('Lyon', 'Saint-Etienne')
AND c.name IN ('pop', 'jazz')
AND e.title LIKE '% concer %'
);
-- PostgreSQL
UPDATE events e
SET e.title = REPLACE(e.title, ' concer ', ' concert ')
WHERE e.id IN (
SELECT e.id
FROM events e
JOIN event_category ec ON e.id = ec.event_id
JOIN categories c ON ec.category_id = c.id
WHERE e.city IN ('Lyon', 'Saint-Etienne')
AND c.name IN ('pop', 'jazz')
AND e.title LIKE '% concer %'
);Analysis: The UPDATE query proposed for MySQL is not good, and I get the following error: You can't specify target table 'e' for update in FROM clause. No better solution was proposed by Gemini after reporting the error. I am stuck. Of course, with imagination and framing, one should be able to achieve one's goals, but at the cost of a lot of linguistic contortions and time spent refining and iterating.
Conclusion
Conversational agents can help us
The three conversational agents used allowed me to write a rather simple query very quickly, which I would have taken longer to write due to not remembering the basic syntax in SQL, which I do not practice daily.
Conversational agents are far from perfect
For more complex queries, the results are mixed. Without a minimum of precaution and knowledge in SQL, it seems difficult to compose an SQL query with a certain complexity. To put all the chances on your side, it is ideal to:
- have at least some knowledge of SQL
- write a logical prompt leaving a minimum of room for interpretation
- break down problems as much as possible to obtain a targeted solution.
In the context of the last query, the more variants you add to the string to replace, the greater the risk that the chatbot will get confused. Decomposing a complex query into several small simple queries can sometimes be a solution when there are no constraints in terms of the number of queries.
The quality of the response varies from one agent to another
Mistral AI Chat seems to have best understood the most advanced prompt in the context of a more complex query. But ChatGPT also provided an exploitable response. On the other hand, Gemini failed, but was very educational in the case of other somewhat simpler queries.
It can be concluded that each of the conversational agents has strengths to offer. However, if we observe nuances with the few prompts presented in this study, it is still difficult to make a definitive diagnosis with such a small sample, especially since these tools are constantly evolving.
In any case, it can be noted that nothing replaces an expert eye, which is all the more important as the need becomes more complex and includes a critical component (particularly in the case of a request that will edit or delete entries in a database). This is one of the more general conclusions that can be drawn from this study (and which have already been made elsewhere). Just as knowing that there are several alternatives in the world of chatbots, which complement each other, allow iteration, and comparison of several solutions.