Drizzle QueryCache plugin [Design notes]
I’ll start writing a series of articles concerning my GSoC project “Drizzle Query Cache plugin”. At the current stage me and Siddharth and our respective mentors Toru Maesaka, Padraig O’Sullivan and the help of Jay Pipes, will all try to come up with the design ideas for the plug-in. As agreed I’ll take care of the caching part and Siddharth will manage the invalidation/replication, our common objective is to reduce the number of invalidation.
The basic capabilities of the Query Cache Plugin are:
- Cache Select queries result sets into Memcached. (Not insert, update ,delete, embedded select, non-cacheable select)
- Return the result set from the cache if an exact similar query (to one cached) is executed.
- Invalidate the cache entry if a table referenced by the query is altered. (with some enhancements)
What is still is to be discussed (but may be at a latter stage):
- Is it possible to handle complex queries: multiple joins, nested queries etc. For sure we can cache them, but can we apply our smart invalidation on them?
I have to mention that MySQL has its own implementation of query cache, and it’s not a bad idea to be “inspired” by some of its concepts, and leave or at least be aware of the unsuccessful ones. It’s also important to mention that MySQL handles the cache in memory only, using Memcached will remove the segmentation problem but also mean communicate over the network if one decides to scale out.
The overall idea of our Query Cache would require the following:
- A query cache mode: MySQL has a deactivated, permanent or an on-demand mode (with SELECT SQL_CACHE directive). I would rather go for just an on demand one, but might be cumbersome for a developer.
- Metadata: And here is the most critical part, in fact to be able to act smart during the invalidation process the plugin have to have some knowledge about the content of the cached queries. Here is some points stating my ideas concerning the importance of each item:
Tables and their aliases: This information is mandatory if a change is made to a table then all cached queries referring to it should be checked for invalidation. A first step would be to work on that.
Selection fields: the selection field can be an important factor later on. imagine the following scenario
cached query: “select name from employee”
update query “update employee set salary=salary+100″.
Here modifying the salary has no impact on the name and should therefore not invalidate our cache!
Condition fields: Our approach is to identify overlapping range selections, and decide if it is necessary to invalidate a cached query based on that
Cache Query: “Select * from employee e where e.salary>10000″
Update Query: “Update employee e set e.salary=e.salary*rate where e.salary”
Obviously the range of both queries is not overlapping but one must be careful, if rate=4 then the cached query will be deprecated and thus must be invalidated! The range query method can be used with Delete statements but for simple queries. Finally I hope to come up with a sufficient technique to make use of the condition fields adequately!
Now that the basic functions and ideas has been exposed, we need the think about how to store and process these information. My initial proposition was to add a qcache table with following fields:
+Query+ list of tables in the query+ list of selection field+ list of condition fields and their ranges+
With this method the query cache adds the necessary information to that table whenever some query is added to memcached, whereas the Checker will simply look at that table and know exactly what is in the cache and decide how to behave (invalide or not). It turns out that using one table is not realistic, because the list of tables can grow bigger and will involve long parsing, for example to decide if the table employee is used by some cache entry we need to sweep through all the qcache table and parse the table list.
I think that we need a more elegant (complex) design with either a relational schema for the query cache system or add a “cached” column to TABLES, COLUMNS ..ect in the information schema.
Your comments/ideas are welcome!