<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Djellel Eddine Difallah</title>
	<atom:link href="http://dedcode.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://dedcode.wordpress.com</link>
	<description>Lessons learned.</description>
	<lastBuildDate>Tue, 07 May 2013 12:58:58 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='dedcode.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/4ffc90949e7f399e5bab42f6c3c76a6a?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>Djellel Eddine Difallah</title>
		<link>http://dedcode.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://dedcode.wordpress.com/osd.xml" title="Djellel Eddine Difallah" />
	<atom:link rel='hub' href='http://dedcode.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Release of OLTP-Bench</title>
		<link>http://dedcode.wordpress.com/2012/03/06/initial-release-of-oltp-bench/</link>
		<comments>http://dedcode.wordpress.com/2012/03/06/initial-release-of-oltp-bench/#comments</comments>
		<pubDate>Tue, 06 Mar 2012 17:08:43 +0000</pubDate>
		<dc:creator>Djellel</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Benchmark]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[JDBC]]></category>
		<category><![CDATA[OLTP]]></category>

		<guid isPermaLink="false">http://dedcode.wordpress.com/?p=82</guid>
		<description><![CDATA[After several months of development, I am happy to announce the official release of OLTP-Bench, an extensible “batteries included” DBMS benchmarking testbed ! This project is ought to be an aggregator of popular and research oriented OLTP benchmarks. It provides a portable framework for workload generation and an API for integrating benchmark queries. Besides it uses JDBC API which [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dedcode.wordpress.com&#038;blog=13721606&#038;post=82&#038;subd=dedcode&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>After several months of development, I am happy to announce the official release of <a title="OLTP-Bench" href="http://www.oltpbenchmark.com" target="_blank">OLTP-Bench</a>, an extensible “batteries included” DBMS benchmarking testbed ! This project is ought to be an aggregator of popular and research oriented OLTP benchmarks. It provides a portable framework for workload generation and an API for integrating benchmark queries. Besides it uses JDBC API which allows you to connect to any DBMS systems with a proper driver.</p>
<p>OLTP-Bench has modular architecture for hooking new benchmarks, hopefully I&#8217;ll write a detailed how-to guide but for now you can already have a look at the implemented benchmarks to have an idea on how to write your queries and use the workload generator. We ported several popular and interesting benchmarks with varying complexity and domain application, it includes: TPCC-like, TATP, SEATS, AuctionMark, YCSB, Wikipedia, Twitter, JPAB, Epinions and Resource Stresser. More information on each benchmark is available <a href="http://oltpbenchmark.com/wiki/index.php?title=Workloads" target="_blank">here</a>.</p>
<p style="text-align:center;"><a href="http://dedcode.files.wordpress.com/2012/03/architecture.png"><img class="size-medium wp-image-87 aligncenter" title="architecture" src="http://dedcode.files.wordpress.com/2012/03/architecture.png?w=300&#038;h=122" alt="" width="300" height="122" /></a></p>
<p style="text-align:left;">The workload generator is driven by an XML configuration file; users have to define phases of execution composed of a target rate (expressed in transactions per seconds), the duration to apply the rate and also the weight of each procedure (or query) of the benchmark. By combining phases one can simulate very complex situations to stress and test the database system. Doing so we have conducting hundreds of experiments on different systems and configuration more details are available here.</p>
<p style="text-align:left;">Hopefully this will get the database community excited as our goal was not to write &#8220;Yet another benchmark&#8221; but rather engage everyone to share their configuration and results.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/dedcode.wordpress.com/82/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/dedcode.wordpress.com/82/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dedcode.wordpress.com&#038;blog=13721606&#038;post=82&#038;subd=dedcode&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dedcode.wordpress.com/2012/03/06/initial-release-of-oltp-bench/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/b2d0211633636be6fa588876b8f63f02?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">dedcode</media:title>
		</media:content>

		<media:content url="http://dedcode.files.wordpress.com/2012/03/architecture.png?w=300" medium="image">
			<media:title type="html">architecture</media:title>
		</media:content>
	</item>
		<item>
		<title>Memory leaks in Java</title>
		<link>http://dedcode.wordpress.com/2011/12/19/memory-leaks-in-java/</link>
		<comments>http://dedcode.wordpress.com/2011/12/19/memory-leaks-in-java/#comments</comments>
		<pubDate>Mon, 19 Dec 2011 10:02:37 +0000</pubDate>
		<dc:creator>Djellel</dc:creator>
				<category><![CDATA[Debug]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[OutOfMemoryError]]></category>

		<guid isPermaLink="false">http://dedcode.wordpress.com/?p=70</guid>
		<description><![CDATA[Programming in Java relieves from a certain burden of self managing memory allocation/de-allocation, however, that doesn&#8217;t mean completely overlooking this aspect at the risk of creating memory ogres instead of programs. The JVM will try to allocate memory up to the max memory heap you specify, this pool of memory is reclaimed by the Garbage [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dedcode.wordpress.com&#038;blog=13721606&#038;post=70&#038;subd=dedcode&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Programming in Java relieves from a certain burden of self managing memory allocation/de-allocation, however, that doesn&#8217;t mean completely overlooking this aspect at the risk of creating memory ogres instead of programs.<br />
The JVM will try to allocate memory up to the max memory heap you specify, this pool of memory is reclaimed by the Garbage Collector once your instances are dereferenced. Obviously if you keep on creating objects you&#8217;ll fill up your memory and the JVM will throw an exception!<br />
<code>java.lang.OutOfMemoryError: Java heap space</code><br />
Such errors are quite easy to spot with a debugger; The real fun starts when you&#8217;ve got a memory leak in your program (or a bug in a 3rd party library you use) which will only appear under some heavy load and long runs. Finding the error might be tedious because that involves trying to reproduce it at first, analyzing memory maps, sort what is normal and what&#8217;s not; fortunately there is a plenty of tools out there to help.<br />
Monitor your application with Jconsole: This monitoring utility could be very useful to spot patterns of memory growth in your application. Fist pass a couple of options to your JVM to activate JMX remote monitoring, for example here I open the 7777 port and disable all security (you might want to activate it .. though <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /><br />
<code>-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=7777 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false</code><br />
Then from a remote machine launch Jconsole client using <code>jconsole host:port</code><br />
Produce memory heaps by passing the following parameter to the JVM:<br />
<code>-XX:+HeapDumpOnOutOfMemoryError</code><br />
Use Jmap to create a dump on the fly i.e: while the application is running.<br />
Look at the content of the Heap: In order to examine the heap files you produce you need to read them with a specific tool. Jhat is a small utility that reads the heap file and launces a web browser from which you can browse its content, you&#8217;ll find: current objects, their references, size, hiarchy and so on. A histogram summary might be the first thing you want to look at since it gives the list of classes ordered by number of instances and memory usage.<br />
So far all those tools are shipped with latest versions of JVM, now if you want a more detailed insight or automatic leak suspect report, there is some free tools like: Eclipse Memory Analyzer tool and Netbeans profiler.<br />
All the tools listed here were usually enough for me to find memory leaks. Although a bit tedious, my strategy is to create heap dump files on the fly at different moments and then examine the differential.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/dedcode.wordpress.com/70/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/dedcode.wordpress.com/70/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dedcode.wordpress.com&#038;blog=13721606&#038;post=70&#038;subd=dedcode&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dedcode.wordpress.com/2011/12/19/memory-leaks-in-java/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/b2d0211633636be6fa588876b8f63f02?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">dedcode</media:title>
		</media:content>
	</item>
		<item>
		<title>This week we&#8217;ve released two of our current&#8230;</title>
		<link>http://dedcode.wordpress.com/2011/08/10/this-week-weve-released-two-of-our-current/</link>
		<comments>http://dedcode.wordpress.com/2011/08/10/this-week-weve-released-two-of-our-current/#comments</comments>
		<pubDate>Wed, 10 Aug 2011 08:23:25 +0000</pubDate>
		<dc:creator>Djellel</dc:creator>
				<category><![CDATA[status]]></category>

		<guid isPermaLink="false">http://dedcode.wordpress.com/2011/08/10/this-week-weve-released-two-of-our-current/</guid>
		<description><![CDATA[This week we&#8217;ve released two of our current projects as open sources, YeY! The first is TSQL Parser, an extension of JSQLParser to support temporal SQL extension and hence, query temporal databases. The parser also supports query rewriting, ie: translating TSQL into standard SQL with added valid transaction timestamps [tstart-tend]. The other related project is [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dedcode.wordpress.com&#038;blog=13721606&#038;post=66&#038;subd=dedcode&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>This week we&#8217;ve released two of our current projects as open sources, YeY! The first is <a href="http://code.google.com/p/tsqlparser/">TSQL Parser</a>, an extension of JSQLParser to support temporal SQL extension and hence, query temporal databases. The parser also supports query rewriting, ie: translating TSQL into standard SQL with added valid transaction timestamps [tstart-tend]. The other related project is <a href="http://code.google.com/p/temporal-jdbc-proxy/">Temporal JDBC Proxy</a>, which basically wraps your driver&#8217;s connection and seamlessly make it support TSQL. Current supported dbms are MySQL and Postgres!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/dedcode.wordpress.com/66/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/dedcode.wordpress.com/66/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dedcode.wordpress.com&#038;blog=13721606&#038;post=66&#038;subd=dedcode&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dedcode.wordpress.com/2011/08/10/this-week-weve-released-two-of-our-current/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/b2d0211633636be6fa588876b8f63f02?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">dedcode</media:title>
		</media:content>
	</item>
		<item>
		<title>KMeans on MAHOUT</title>
		<link>http://dedcode.wordpress.com/2010/11/20/k-means-clustering-with-hadoop-and-mahout/</link>
		<comments>http://dedcode.wordpress.com/2010/11/20/k-means-clustering-with-hadoop-and-mahout/#comments</comments>
		<pubDate>Sat, 20 Nov 2010 19:57:36 +0000</pubDate>
		<dc:creator>Djellel</dc:creator>
				<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[kmeans]]></category>
		<category><![CDATA[Mahout]]></category>

		<guid isPermaLink="false">http://dedcode.wordpress.com/?p=48</guid>
		<description><![CDATA[Lately I&#8217;ve been mainly working on parallel computing, especially that I have access to this powerful LONI supercomputers with hundred of nodes and lots of memory. Specifically I wanted to compare MPI, openmp and Hadoop, both in terms of efficiency and ease of use. K-means being my favorite easy to implement clustering algorithm, I first [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dedcode.wordpress.com&#038;blog=13721606&#038;post=48&#038;subd=dedcode&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Lately I&#8217;ve been mainly working on parallel computing, especially that I have access to this powerful <a href="http://www.loni.org">LONI </a> supercomputers with hundred of nodes and lots of memory. Specifically I wanted to compare MPI, openmp and Hadoop, both in terms of efficiency and ease of use.</p>
<p>K-means being my favorite easy to implement clustering algorithm, I first tried to modify my C++ code and exploit the Hadoop Pipes library to implement a map reduce program. I confess it was the first time for me writing some Hadoop code, but when you read about it the word &#8220;easy&#8221; pops a lot. If your program differs from wordcount, don&#8217;t get fooled. Rethinking an algorithm in M/R is essentially the first thing to do on paper, however those little chunk of data and their movement, quickly becomes a nightmare to implement, especially that the only documentation is word count. How about having multiple types of inputs files? specific format? and algorithm that iterates taking the output from a previous iteration?<br />
Well definitely I dropped the idea of using the C++ Pipes, and jumped to a machine learning library <a href="http://mahout.apache.org/">Mahout</a> that implements a number of clustering algorithms including K-Means. I cracked open the Mahout&#8217;s code and started to discover what is really needed to implement M/R code on Hadoop. While I am not going to cover that here, a word of advice, if you are a beginner, stick to Java and get you a book.<br />
Using Mahout saves you the time to write the desired algorithm in M/R but still requires that you write some code to transform your input file into something it can understand, namely SequenceFile, and this is the input preparation phase. I really cannot elaborate more about that for now because in my case I found a work around! so basically if you use command line, and have a valid transformed sequence file as your input for both your data points and a self generated clusters (unless you specify k):<br />
<code>$  bin/mahout kmeans -i input -c clusters  -o output -k num_clusters -dm measure -x maxIter -cd convergence_delta </code></p>
<p>Mahout provides some <a href="https://cwiki.apache.org/confluence/display/MAHOUT/Quickstart">examples </a> to run the corresponding implementation on a predefined raw datasets. For Kmeans, it has this example of <a href="https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data">syntheticcontrol </a> data, where the input is space delimited. Now, depending on what you want to run you can use the provided examples but make sure the data format used corresponds to yours !<br />
In the Kmeans example case you should have this kind of input: (space delimited vectors)<br />
35.5351 	41.7067 	39.1705 	48.3964 	.. 	38.6103<br />
24.2104 	41.7679 	45.2228 	43.7762 	.. 	48.8175<br />
You still need to push this data to the HDFS under input directory for example:<br />
<code>$ $MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job -i input -o output -t1 .. -t2 .. -cd .. -x 10</code><br />
Lets examine the options: -cd is the convergence delta, -x is the max iteration. Like me you&#8217;d legitimately ask &#8220;but, where is kmeans&#8217; K?&#8221; well it seems that this example runs the Canopy algorithm using the t1 and t2 distances and generates the initial cluster &#8230; I don&#8217;t want that! what I need is to provide k (integer) as an input and randomly select K points from my clusters.<br />
Remember in the command line mode you have the K parameter, so there should be a Random Generator code somewhere! After reading the corresponding piece of code I have finally understood where it is and how to use it <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  . I figured I can push my code to mahout so that others can use it directly, but basically I created my own Job class based on the syntheticcontrol and called it generic, then I&#8217;ve made the following modification to the source code:<br />
$MAHOUT_HOME/examples/src/main/java/org/apache/mahout/clustering/generic/kmeans/Job.java<br />
<code><br />
    Path directoryContainingConvertedInput = new Path(output, DIRECTORY_CONTAINING_CONVERTED_INPUT);<br />
    log.info("Preparing Input");<br />
    InputDriver.runJob(input, directoryContainingConvertedInput, "org.apache.mahout.math.RandomAccessSparseVector");<br />
    // log.info("Running Canopy to get initial clusters");  // Old code using CanopyDriver<br />
    // CanopyDriver.run(conf, directoryContainingConvertedInput, output, measure, t1, t2, false, false); // Old Code calling the Canopy Driver<br />
    log.info("Running random seed to get initial clusters");<br />
    Path clusters= new Path(output, Cluster.INITIAL_CLUSTERS_DIR);<br />
    clusters = RandomSeedGenerator.buildRandom(directoryContainingConvertedInput, clusters, k, measure);<br />
    log.info("Running KMeans");<br />
    KMeansDriver.run(conf,<br />
                     directoryContainingConvertedInput,<br />
                     clusters,<br />
                     output,<br />
                     measure,<br />
                     convergenceDelta,<br />
                     maxIterations,<br />
                     true,<br />
                     false);<br />
</code><br />
If you are interested you can check the code of the InputDriver that convert the space separated file into a SequenceFile, and the RandomSeedGenerator that takes the converted input and extract k random vectors and put them into clusters directory.<br />
Finally, if you want to run Kmeans using Mahout and you have a space separated input file you can use my example code, (waiting for the merge here is the patch: <a href="https://issues.apache.org/jira/browse/MAHOUT-551">patch</a> ):<br />
<code>$MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.generic.kmeans.Job -i input -o output -k .. -cd .. -x 10</code><br />
Bottom line, you really need to read some to write some useful Hadoop M/R code in order to grasp the idea of sequencefiles, what interfaces your classes should implement and what not! a good place to start are the example packages and Mahout.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/dedcode.wordpress.com/48/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/dedcode.wordpress.com/48/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dedcode.wordpress.com&#038;blog=13721606&#038;post=48&#038;subd=dedcode&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dedcode.wordpress.com/2010/11/20/k-means-clustering-with-hadoop-and-mahout/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/b2d0211633636be6fa588876b8f63f02?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">dedcode</media:title>
		</media:content>
	</item>
		<item>
		<title>The last weekend I have received my vers&#8230;</title>
		<link>http://dedcode.wordpress.com/2010/09/01/the-last-weekend-i-have-received-my-vers/</link>
		<comments>http://dedcode.wordpress.com/2010/09/01/the-last-weekend-i-have-received-my-vers/#comments</comments>
		<pubDate>Wed, 01 Sep 2010 11:26:22 +0000</pubDate>
		<dc:creator>Djellel</dc:creator>
				<category><![CDATA[status]]></category>
		<category><![CDATA[C++ book]]></category>

		<guid isPermaLink="false">http://dedcode.wordpress.com/2010/09/01/the-last-weekend-i-have-received-my-vers/</guid>
		<description><![CDATA[The last weekend I have received my version of &#8220;Effective C++&#8221; 3rd edition by Scott Myers. That book was recommended to me by Jay and Toru. It offers a number of best practices in C++ programming (55 items) so far it&#8217;s very informative and the author really addresses each point from all perspectives and explains [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dedcode.wordpress.com&#038;blog=13721606&#038;post=42&#038;subd=dedcode&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>The last weekend I have received my version of &#8220;Effective C++&#8221; 3rd edition by Scott Myers. That book was recommended to me by <a href="http://www.jaypipes.com">Jay </a>and <a href="http://torum.net">Toru</a>. It offers a number of best practices in C++ programming (55 items) so far it&#8217;s very informative and the author really addresses each point from all perspectives and explains why things should be the way he says so.<br />
One point, however, I regret not finding THE section I was looking for, the dark side of C++ &#8220;pointers&#8221;, the author talks a lot about smart pointers as the best way to avoid memory leaks. To be honest I&#8217;ve never used them, but my real concern was to learn aspects of allocating memory from the heap, when to do it and what are the best practice, not pointers as a &#8220;variable&#8221; so to speak.<br />
In general I am satisfied with the book and in fact I need to stop reading and start refactoring some of my code <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/dedcode.wordpress.com/42/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/dedcode.wordpress.com/42/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dedcode.wordpress.com&#038;blog=13721606&#038;post=42&#038;subd=dedcode&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dedcode.wordpress.com/2010/09/01/the-last-weekend-i-have-received-my-vers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/b2d0211633636be6fa588876b8f63f02?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">dedcode</media:title>
		</media:content>
	</item>
		<item>
		<title>I just finalized a first version of the &#8230;</title>
		<link>http://dedcode.wordpress.com/2010/08/16/memcached-query-cache-plugin-in-action/</link>
		<comments>http://dedcode.wordpress.com/2010/08/16/memcached-query-cache-plugin-in-action/#comments</comments>
		<pubDate>Mon, 16 Aug 2010 08:38:31 +0000</pubDate>
		<dc:creator>Djellel</dc:creator>
				<category><![CDATA[Drizzle]]></category>
		<category><![CDATA[GSoC 2010]]></category>

		<guid isPermaLink="false">http://dedcode.wordpress.com/?p=27</guid>
		<description><![CDATA[I just finalized a first version of the query cache plugin for Drizzle. The good news is that the version is functional (YES you can try it out) but I guess it&#8217;s just a starting point for finding bugs and adding the new stuff in , in short &#8230; make it really functional. In a [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dedcode.wordpress.com&#038;blog=13721606&#038;post=27&#038;subd=dedcode&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I just finalized a first version of the query cache plugin for Drizzle. The good news is that the version is functional (YES you can try it out) but I guess it&#8217;s just a starting point for finding bugs and adding the new stuff in , in short &#8230; make it really functional.</p>
<p>In a previous article I have exposed the plugin query cache interface and hooks in the Drizzle kernel. Few modifications were made to that part and I will talk about that and all the technical details in a next article, for now I am just going to give a &#8220;how-to&#8221; to get you started using the plugin.</p>
<p>First we need to get the branch where the latest code is pushed from my launchpad directory with the command:</p>
<pre><span style="color:rgb(153,51,102);">$ bzr branch lp:~dedzone/drizzle/query-cache-gsoc</span></pre>
<p>The plugin has two dependencies, Memcached and libmemcached, so make sure have these two installed before going further otherwise the compilation would fail. Then compile the branch and specify the query cache plugin (it&#8217;s not by default):</p>
<pre><span style="color:rgb(153,51,102);">$ ./config/autorun.sh
$ ./configure --with-memcached-query-cache-plugin
$ ./make -j2 &amp;&amp; make install</span></pre>
<p>If you wish you can run the test plugin&#8217;s test suite (try to have a memcached instance runing on localhost:11211, if the test poses problems)</p>
<pre><span style="color:rgb(153,51,102);">$./tests/dtr --suite=memcached_query_cache</span></pre>
<p>After that you would have an operational Drizzle server, and since we are going to use the memcached query cache plugin, we will also need a runing instance of memcached (it can be local or remote). Note that the option &#8211;add-plugin would cause our plugin to get loaded, again it&#8217;s not loaded by default, and don&#8217;t forget to create a directory for your data exp: /home/user/db</p>
<pre><span style="color:rgb(153,51,102);">$ memcached -l 127.0.0.1 -p 11211 -d
$ drizzled --datadir=/home/user/db/ --add-plugin=memcached_query_cache</span></pre>
<p>Now that we are all set, lets try out the memcached query cache plugin by starting a new client (drizzle c++ client or boots) and connect to drizzle. At this point the plugin is loaded and pointing to a default memcached server localhost:11211, if you have a different configuration you can change it dinamically with</p>
<pre><span style="color:rgb(153,51,102);">drizzle&gt; set global query_cache_servers="localhost:11311 remotehost:11211"</span></pre>
<p>You can also set the expiration time of a memcached entry (default is 0, never), that is a global parameter that everybody will be bound to, but can also be changed dynamically.</p>
<pre><span style="color:rgb(153,51,102);">drizzle&gt; set global query_cache_expiry= 1000;</span></pre>
<p>So, where is the fun part? I have launched a discussion on Drizzle mailing list to decide on the syntax to cache or not a query. My very first orientation was to extend the sql syntax to and SQL_CACHE and SQL_NO_CACHE hints, but (thanks to all the contributor)  switched to a server command that will switch the on or off the cache on demand and per Session.</p>
<pre><span style="color:rgb(153,51,102);">drizzle&gt;set query_cache_enable= on;</span></pre>
<p>Now your next Select statement will be cached and if you execute the exact same command (including spacing) the results you will get will come out from memcached \o/. Well, not totally true, a Select statement to be cached has to be cacheable, i.e. time functions, random, user, database .. are not deterministic, thus must not be cached. Also user defined functions are not supported (we just don&#8217;t know what you&#8217;ve done out there), and queries referring to Data_dictionary schema, that will make the server stuck in a time point &#8230; not good <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>But, How would you know if your query is really cached! there is in fact a local cache structure that stores the current cached queries that you can freely query</p>
<pre><span style="color:rgb(153,51,102);">drizzle&gt; select * from emp;
+----------+------+
| name     | age  |
+----------+------+
| ded      |   28 |
| ded2     |   27 |
| vjsamuel |   19 |
+----------+------+
3 rows in set (0 sec)

drizzle&gt; select * from emp where name like "ded";
+------+------+
| name | age  |
+------+------+
| ded  |   28 |
+------+------+
1 row in set (0 sec)

drizzle&gt; select * from emp where age &lt; 25;
+----------+------+
| name     | age  |
+----------+------+
| vjsamuel |   19 |
+----------+------+
1 row in set (0 sec)

drizzle&gt; select sysdate() from emp where age &lt; 25;
+---------------------+
| sysdate()           |
+---------------------+
| 2010-08-16 03:19:52 |
+---------------------+
1 row in set (0 sec)

drizzle&gt; select * from data_dictionary.query_cache_entries;
+----------------------------------+--------+-----------------------------------------+
| key                              | schema | sql                                     |
+----------------------------------+--------+-----------------------------------------+
| 376555ba10dae3d09ca6df52a0839be6 | cool   | select * from emp where name like "ded" |
| b1a551e4aa3f717798380e41dcee0960 | cool   | select * from emp                       |
| e2e2402c2b29ac0c2fad1769a31d2d0a | cool   | select * from emp where age &lt; 25        |
+----------------------------------+--------+-----------------------------------------+
3 rows in set (0.01 sec)</span></pre>
<p>Here we issued 5 select statement and the 3 of them is in the cache except the ones calling sysdate(), or the data_dictionary. The hash key that you see is an md5 of your query, it&#8217;s the sole identification of your query cache in memcached and locally. One can play and look at the local cache content (It is a lovely protobuff message) with the following function:</p>
<pre><span style="color:rgb(153,51,102);">drizzle&gt; select print_query_cache_meta("e2e2402c2b29ac0c2fad1769a31d2d0a");

key: "e2e2402c2b29ac0c2fad1769a31d2d0a"
schema: "cool"
sql: "select * from emp where age &lt; 25"
select_header {
table_meta {
schema_name: "cool"
table_name: "emp"
}
field_meta {
field_name: "name"
field_alias: "name"
table_name: "emp"
table_alias: "emp"
schema_name: "cool"
}
field_meta {
field_name: "age"
field_alias: "age"
table_name: "emp"
table_alias: "emp"
schema_name: "cool"
}
}</span></pre>
<p>That&#8217;s nice, but what happen if the data cached gets tempered with (DML, DDL) ! don&#8217;t worry, the invalidation will take that in charge, in fact the plugin will keep a track of all the tables referenced in a query and will build a table similar to query_cache_entries, but will contain the reverse information, thus the table and all the entries of the cache making reference to it, that will avoid the burden of iterating overall the cache.</p>
<pre><span style="color:rgb(153,51,102);">drizzle&gt; select * from data_dictionary.query_cached_tables;
+---------+--------------------------------------------------------------------------------------------------------+
| Table   | Cache_Keys                                                                                             |
+---------+--------------------------------------------------------------------------------------------------------+
| coolemp | ::b1a551e4aa3f717798380e41dcee0960::376555ba10dae3d09ca6df52a0839be6::e2e2402c2b29ac0c2fad1769a31d2d0a |
+---------+--------------------------------------------------------------------------------------------------------+
1 row in set (0 sec)</span></pre>
<p>Next, if one will change (or drop) the content of the table emp in the schema cool the replication system will cause the invalidation of all the cache entries  of the cache:</p>
<pre><span style="color:rgb(153,51,102);">drizzle&gt; delete from emp where name like "ded1";
Query OK, 1 row affected (0 sec)
drizzle&gt; select * from data_dictionary.query_cached_tables;
Empty set (0 sec)
drizzle&gt; select * from data_dictionary.query_cache_entries;
Empty set (0 sec)</span></pre>
<p>Final capability is the query_cache_flush() function, it will basically wipe out the whole content the local cache and the content of the memcached instances registred (make sure your instances are not used for another purpose)</p>
<pre><span style="color:rgb(153,51,102);">drizzle&gt; select query_cache_flush();
+---------------------+
| query_cache_flush() |
+---------------------+
|                   1 |
+---------------------+
1 row in set (0 sec)
drizzle&gt; select * from data_dictionary.query_cached_tables;
Empty set (0 sec)
drizzle&gt; select * from data_dictionary.query_cache_entries;
Empty set (0 sec)</span></pre>
<p><strong>Conclusion:</strong></p>
<p>That was a small kinda manual to get you started with the query cache plugin, I hope to extend the functionalities to support fine grained invalidation in the near future. But for now there is still many work to be done to get rid of all the withstanding problems.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/dedcode.wordpress.com/27/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/dedcode.wordpress.com/27/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dedcode.wordpress.com&#038;blog=13721606&#038;post=27&#038;subd=dedcode&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dedcode.wordpress.com/2010/08/16/memcached-query-cache-plugin-in-action/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/b2d0211633636be6fa588876b8f63f02?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">dedcode</media:title>
		</media:content>
	</item>
		<item>
		<title>Drizzle QueryCache plugin [Plugin Interface]</title>
		<link>http://dedcode.wordpress.com/2010/07/11/20/</link>
		<comments>http://dedcode.wordpress.com/2010/07/11/20/#comments</comments>
		<pubDate>Sun, 11 Jul 2010 06:54:41 +0000</pubDate>
		<dc:creator>Djellel</dc:creator>
				<category><![CDATA[GSoC 2010]]></category>

		<guid isPermaLink="false">http://dedcode.wordpress.com/?p=20</guid>
		<description><![CDATA[It has been a while since I didn&#8217;t wrote on my GS0C project &#8220;Query Cache&#8221;, though many advancements have been done. So far I can say that I have a much clearer idea of the query processing flow and the different data structures used. I have been able to write a first plugin that can [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dedcode.wordpress.com&#038;blog=13721606&#038;post=20&#038;subd=dedcode&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>It has been a while since I didn&#8217;t wrote on my GS0C project &#8220;Query Cache&#8221;, though many advancements have been done. So far I can say that I have a much clearer idea of the query processing flow and the different data structures used. I have been able to write a first plugin that can actually cache a query into memcached and keep some metadata on a local map object. Another cool option is to use data dictionary views to check int real time the cache content (thanks to the very easy interface). The next step from here is to write the retrieval functionality, this is probably the most difficult part for me cuz I still need to understand the functioning of the client interface.<br />
Before making any further step, here is wiki I have posted on the of drizzle&#8217;s wiki page. It is basically an explanation of the query cache plugin interface and the hooks, this has yet to be approved and merged into drizzle trunk.</p>
<h2>Principle of the Query Cache</h2>
<p>The Query Cache permits to return the results of a query from a cache repository if the query has been already cached; or, to add its results to a cache repository if it&#8217;s not cached yet. Therefore the simplest capability that a Query Cache plugin interface needs to provide is the ability to say if the received query is cached or not.<br />
Then:<br />
If the query is cached:<br />
- Retrieve the resultset from the cache and skip the query parsing/processing<br />
If the query is not cached:<br />
- Initiate the Query Cache plugin, where the specific plugin will:<br />
- Check if the query is cacheable<br />
- Reserve a temp area and return a pointer to the session<br />
- Build the hash key<br />
- etc<br />
- Add the results to the current temp area.<br />
- Push the the temp variable and the respective cache key to the cache.</p>
<p><strong>The Query Cache plugin Interface design</strong><br />
In the current design we have  introduced the following functions to the interface (drizzled/plugin/query_cache.h)<br />
<strong>static bool isCached(Session *session);</strong><br />
This method is responsible to probe the used cache system for an entry corresponding to the received query.<br />
&#8221;Note: The plugin is responsible to check if the query is a Select.&#8221;<br />
<strong>Static bool sendCachedResultset(Session *session);</strong><br />
This method is called directly if the query is cached and skip the lexical and query processing.<br />
<strong>static bool prepareResultset(Session *session);</strong><br />
This method is called before the query handling (see: handle_select() method) in order to prepare the resultset that will receive the generated data. An opened session will have its own resultset variable and current query key, the plugin will have to initiate these two variables added to the Class Session.<br />
<strong>static bool insertRecord(Session *session, List&lt;Item&gt; &amp;item);</strong><br />
The result rows generated will ultimately go though the client interface, the query cache system intercept these calls and populate the resultset with the data being send to the client (see: select_send() method)<br />
<strong>static bool setResultset(Session *session);</strong><br />
Once the query processing is terminated this method will send the resultset variable to the cache repository and finalize the environment variables.</p>
<p>The query resultset being stored and populated in the session is of type Google Protobuf:<br />
drizzled/message/resultset.proto</p>
<h2>Plugin Hooks</h2>
<p>The following branch contains the plugin interface hooks in the kernel drizzle:<br />
[lp:~dedzone/drizzle/query-cache-hook/changes]</p>
<p><strong>The query interception</strong><br />
Upon reception of a query, (sql_parse.cc:717) in the mysql_parse method, and before even proceeding to the lexical and query processing, a call to &#8221;isCached(session)&#8221; is made, a session pointer is passed for the query text, schema, and may be other parameters (for future use).<br />
If isCached returns &#8221;&#8217;true&#8221;&#8217; then we&#8217;ll try to retrieve the results from the cache with &#8221;sendCachedResultset&#8221;. If the retrieving operation succeeds (returns false) then we skip the remaining steps.<br />
In the contrary &#8211; ie:<br />
- If fetching the results fails or<br />
- If the query is not cached<br />
Then =&gt; proceed normally with the query parsing and processing<br />
<code><br />
void mysql_parse(Session *session, const char *inBuf, uint32_t length)<br />
{<br />
lex_start(session);<br />
session-&gt;reset_for_next_command();<br />
/* Check if the Query is Cached if and return true if yes<br />
* TODO the plugin has to make sure that the query is cacheble<br />
* by setting the query_safe_cache param to TRUE<br />
*/<br />
bool res= true;<br />
if (plugin::QueryCache::isCached(session))<br />
{<br />
res= plugin::QueryCache::sendCachedResultset(session);<br />
}<br />
if (not res)<br />
{<br />
#if defined(DEBUG)<br />
errmsg_printf(ERRMSG_LVL_DBUG,_("Results retrieved from cache"));<br />
#endif /* DEBUG */<br />
return;<br />
}<br />
LEX *lex= session-&gt;lex;<br />
Lex_input_stream lip(session, inBuf, length);<br />
bool err= parse_sql(session, &amp;lip);<br />
........<br />
........<br />
lex-&gt;unit.cleanup();<br />
session-&gt;set_proc_info("freeing items");<br />
session-&gt;end_statement();<br />
session-&gt;cleanup_after_query();<br />
}<br />
</code></p>
<p><strong>The Cache preparation and reception of the Resulset</strong><br />
In our journey executing a query, if the query is a Select will land in the (sql_parse.cc:494) execute_sqlcom_select method, this is basically the entry point of the query processing because it prepares to launch the handle_select method. before making this call will insert our prepareResultset hook, which will instruct the plugin implementation to:<br />
- Prepare a cache to receive the resultset, and link it to the session::resultset pointer.<br />
- create a hash key for our query, and link it to the session::query_cache_key member.<br />
At this point the Session will be &#8216;aware&#8217; that there is a plugin trying to cache the results of the query.<br />
Immediatly after executing the handle_select, where all the resultset were generated, send to the client and added to the current resultset (As we shall see in the next session), we make a setResultset call, this will:<br />
- Push the resultset to the cache area initiated by the plugin<br />
- Reset the session members, resultset and query_cache_key.<br />
<code><br />
bool execute_sqlcom_select(Session *session, TableList *all_tables)<br />
{<br />
LEX    *lex= session-&gt;lex;<br />
select_result *result=lex-&gt;result;<br />
bool res= false;<br />
.....<br />
.....<br />
if (!result &amp;&amp; !(result= new select_send()))<br />
return true;<br />
/* Init the Query Cache plugin */<br />
plugin::QueryCache::prepareResultset(session);<br />
res= handle_select(session, lex, result, 0);<br />
/* Send the Resultset to the cache */<br />
plugin::QueryCache::setResultset(session);<br />
if (result != lex-&gt;result)<br />
delete result;<br />
}<br />
}<br />
return res;<br />
}<br />
</code></p>
<p><strong>Populating the resultset</strong><br />
The idea behind adding data to the resultset is to intercept the send_data() method in (select_send.h:87) When sending a row to the client interface. So the hook will just pass the row (items) to the query cache plugin using insertResultset(session, items).<br />
<code><br />
/* Send data to client. Returns 0 if ok */<br />
bool send_data(List&lt;Item&gt; &amp;items)<br />
{<br />
.......<br />
.......<br />
while ((item=li++))<br />
{<br />
if (item-&gt;send(session-&gt;client, &amp;buffer))<br />
{<br />
my_message(ER_OUT_OF_RESOURCES, ER(ER_OUT_OF_RESOURCES), MYF(0));<br />
break;<br />
}<br />
}<br />
/* Insert this record to the Resultset into the cache */<br />
if (session-&gt;query_cache_key != "" &amp;&amp; session-&gt;getResultsetMessage() != NULL)<br />
plugin::QueryCache::insertRecord(session, items);<br />
session-&gt;sent_row_count++;<br />
......<br />
}</code></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/dedcode.wordpress.com/20/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/dedcode.wordpress.com/20/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dedcode.wordpress.com&#038;blog=13721606&#038;post=20&#038;subd=dedcode&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dedcode.wordpress.com/2010/07/11/20/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/b2d0211633636be6fa588876b8f63f02?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">dedcode</media:title>
		</media:content>
	</item>
		<item>
		<title>Drizzle QueryCache plugin [Design notes]</title>
		<link>http://dedcode.wordpress.com/2010/05/21/drizzle-querycache-plugin-design-notes/</link>
		<comments>http://dedcode.wordpress.com/2010/05/21/drizzle-querycache-plugin-design-notes/#comments</comments>
		<pubDate>Fri, 21 May 2010 14:51:11 +0000</pubDate>
		<dc:creator>Djellel</dc:creator>
				<category><![CDATA[GSoC 2010]]></category>

		<guid isPermaLink="false">http://dedcode.wordpress.com/?p=9</guid>
		<description><![CDATA[I&#8217;ll start writing a series of articles concerning my GSoC project &#8220;Drizzle Query Cache plugin&#8221;. At the current stage me and Siddharth and our respective mentors Toru Maesaka, Padraig O&#8217;Sullivan and the help of Jay Pipes, will all try to come up with the design ideas for the plug-in. As agreed I&#8217;ll take care of [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dedcode.wordpress.com&#038;blog=13721606&#038;post=9&#038;subd=dedcode&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I&#8217;ll start writing a series of articles concerning my GSoC project &#8220;Drizzle Query Cache plugin&#8221;. At the current stage me and Siddharth and our respective mentors Toru Maesaka, Padraig O&#8217;Sullivan and the help of Jay Pipes, will all try to come up with the design ideas for the plug-in. As agreed I&#8217;ll take care of the caching part and Siddharth will manage the invalidation/replication, our common objective is to reduce the number of invalidation.</p>
<p>The basic capabilities of the Query Cache Plugin are:</p>
<ul>
<li>Cache Select queries result sets into Memcached. (Not insert, update ,delete, embedded select, non-cacheable select)</li>
<li>Return the result set from the cache if  an exact similar query (to one cached) is executed.</li>
<li>Invalidate the cache entry if a table referenced by the query is altered. (with some enhancements)</li>
</ul>
<p>What is still is to be discussed (but may be at a latter stage):</p>
<ul>
<li>Is it possible to handle complex queries: multiple joins, nested queries etc. For sure we can cache them, but can we apply our smart invalidation on them?</li>
</ul>
<p>I have to mention that MySQL has its own implementation of query cache, and it&#8217;s not a bad idea to be &#8220;inspired&#8221; by some of its concepts, and leave or at least be aware of the unsuccessful ones. It&#8217;s also important to mention that MySQL handles the cache in memory only, using Memcached will remove the segmentation problem but also mean communicate over the network if one decides to scale out.</p>
<p>The overall idea of our Query Cache would require the following:</p>
<ul>
<li>A query cache mode: MySQL has a deactivated, permanent or an on-demand mode (with SELECT SQL_CACHE directive). I would rather go for just an on demand one, but might be cumbersome for a developer.</li>
<li>Metadata: And here is the most critical part, in fact to be able to act smart during the invalidation process the plugin have to have some knowledge about the content of the cached queries. Here is some points stating my ideas concerning the importance of each item:</li>
</ul>
<blockquote><p><strong>Tables and their aliases:</strong> This information is mandatory if a change is made to a table then all cached queries referring to it should be checked for invalidation.  A first step would be to work on that.</p>
<p><strong>Selection fields:</strong> the selection field can be an important factor later on. imagine the following scenario</p>
<p>cached query: &#8220;select name from employee&#8221;</p>
<p>update query &#8220;update employee set salary=salary+100&#8243;.</p>
<p>Here modifying the salary has no impact on the name and should therefore not invalidate our cache!</p>
<p><strong>Condition fields:</strong> Our approach is to identify overlapping range selections, and decide if it is necessary to invalidate a cached query based on that</p>
<p>Example:</p>
<p>Cache Query: &#8220;Select * from employee e where e.salary&gt;10000&#8243;</p>
<p>Update Query: &#8220;Update employee e set e.salary=e.salary*rate where e.salary&#8221;</p>
<p>Obviously the range of both queries is not overlapping but one must be careful, if rate=4 then the cached query will be deprecated and thus must be invalidated!  The range query method can be used with Delete statements but for simple queries. Finally I hope to come up with a sufficient technique to make use of the condition fields adequately!</p></blockquote>
<p>Now that the basic functions and ideas has been exposed, we need the think about how to store and process these information. My initial proposition was to add a qcache table with following fields:</p>
<p>+Query+ list of tables in the query+ list of selection field+ list of condition fields and their ranges+</p>
<p>With this method the query cache adds the necessary information to that table whenever some query is added to memcached, whereas the Checker will simply look at that table and know exactly what is in the cache and decide how to behave (invalide or not). It turns out that using one table is not realistic, because the list of tables can grow bigger and will involve long parsing, for example to decide if the table employee is used by some cache entry we need to sweep through all the qcache table and parse the table list.</p>
<p>I think that we need a more elegant (complex) design with either a relational schema for the query cache system or add a &#8220;cached&#8221; column to TABLES, COLUMNS ..ect in the information schema.</p>
<p>Your comments/ideas are welcome!</p>
<div id="_mcePaste" style="position:absolute;left:-10000px;top:0;width:1px;height:1px;overflow:hidden;">
<h1>Padraig O&#8217;Sullivan</h1>
</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/dedcode.wordpress.com/9/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/dedcode.wordpress.com/9/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dedcode.wordpress.com&#038;blog=13721606&#038;post=9&#038;subd=dedcode&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dedcode.wordpress.com/2010/05/21/drizzle-querycache-plugin-design-notes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/b2d0211633636be6fa588876b8f63f02?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">dedcode</media:title>
		</media:content>
	</item>
		<item>
		<title>Starting up and running Memcached</title>
		<link>http://dedcode.wordpress.com/2010/05/13/starting-up-and-running-memcached/</link>
		<comments>http://dedcode.wordpress.com/2010/05/13/starting-up-and-running-memcached/#comments</comments>
		<pubDate>Thu, 13 May 2010 09:28:00 +0000</pubDate>
		<dc:creator>Djellel</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://dedcode.wordpress.com/2010/05/13/starting-up-and-running-memcached</guid>
		<description><![CDATA[Memcached is a key value object cache used by many websites to relieve the load on their databases and provide faster answer to the client. it&#8217;s very interesting since it needs only commodity PCs and can scale indefinitely. The keys distribution is solely made by your application, a hash function is usually used, thus the [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dedcode.wordpress.com&#038;blog=13721606&#038;post=7&#038;subd=dedcode&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Memcached is a key value object cache used by many websites to relieve the load on their databases and provide faster answer to the client. it&#8217;s very interesting since it needs only commodity PCs and can scale indefinitely. The keys distribution is solely made by your application, a hash function is usually used, thus the different instances share nothing and have no knowledge about each other!<br />I have installed memcached on my Ubuntu using the ppa, you can right away start your instance, in fact you can start multiple instances of memcached on different  ports on your machine. With the following commands I started two instance on 11211 and 11311:<br />
<blockquote>ded@ubuntu:~$ memcached -p 11211 -d<br />ded@ubuntu:~$ memcached -p 11311 -d</p></blockquote>
<p>-p is used to specify the listening port, 11211 is the one by default, -d tells Memcached to start as a daemon.</p>
<p>You can test your running memcached instance with Telnet, the following example connect to the instance set a key &#8220;key1&#8243;to &#8220;hello world&#8221; then get that key.</p>
<p>Lets connect to the first instance<br />
<blockquote>ded@ubuntu:~$ telnet localhost 11211<br />Trying 127.0.0.1&#8230;<br />Connected to localhost.<br />Escape character is &#8216;^]&#8217;.</p></blockquote>
<p>Now we want to store the word &#8220;hello&#8221; with the key &#8220;key1&#8243;, the first digit is the flag, the second is the expiry time (0 means unlimited), last is the number of bytes expected (5 for the word hello) if you make a mistake on that you will receive a bad data chunk error message!<br />
<blockquote>set key1 0 0 5<br />hello<br />STORED</p></blockquote>
<p>Lets now retrieve that value : <br />
<blockquote>get key1<br />VALUE key1 0 5<br />hello<br />END</p></blockquote>
<p>we close the connection using : <br />
<blockquote>quit<br />Connection closed by foreign host.</p></blockquote>
<p>Next I used libmemcached, a c++ library, to access and use memcached instances. Here I relied on MyCache, a class that <a href="http://posulliv.github.com/2009/09/19/using-memcached-with-c.html">Padraig O&#8217;Sillivan wrote on his blog</a>. Here is a simple main that make use of the that class to cache a string:<br />
<blockquote>int main()<br />{<br />std::string text=&#8221;Hello world&#8221;; // our string to cache<br />std::vector raw(text.size()); // Cache objects need to in vector format<br />memcpy(&amp;raw[0],text.c_str(),text.size());<br />MyCache::singleton().set(&#8220;key1&#8243;,raw); // cache the vector with the key &#8220;key1&#8243;<br />std::vector data=MyCache::singleton().get(&#8220;key1&#8243;); // retrieve the object in a vector<br />std::cout&lt;&lt;&amp;data[0]&lt;<br />return 0;<br />}</p></blockquote>
<p>To compile a program that uses libmemcache don&#8217;t forget the -lmemcached compiler flag !<br />
<blockquote>g++ -lmemcached -o test test.cc</p></blockquote>
<p> Note: It&#8217;s very easy to write your own MyCache Class it&#8217;s just a wrapper of memcache::Memcache, but if you use Padraig&#8217;s one you need to specify in the class MyCache the number of instances that you have in num_of_clients. and modify the GetCache() method that randomly select a client, with a deterministic one, since you&#8217;ll need to retrieve your cached object exactly in the instance where you put it!<br />I will write a MyCache version that will take care of that shortly .. stay tuned</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/dedcode.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/dedcode.wordpress.com/7/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dedcode.wordpress.com&#038;blog=13721606&#038;post=7&#038;subd=dedcode&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dedcode.wordpress.com/2010/05/13/starting-up-and-running-memcached/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/b2d0211633636be6fa588876b8f63f02?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">dedcode</media:title>
		</media:content>
	</item>
		<item>
		<title>Memcpy with String, char and vector</title>
		<link>http://dedcode.wordpress.com/2010/05/12/memcpy-with-string-char-and-vector/</link>
		<comments>http://dedcode.wordpress.com/2010/05/12/memcpy-with-string-char-and-vector/#comments</comments>
		<pubDate>Wed, 12 May 2010 17:54:00 +0000</pubDate>
		<dc:creator>Djellel</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://dedcode.wordpress.com/2010/05/12/memcpy-with-string-char-and-vector</guid>
		<description><![CDATA[Figured I&#8217;ll put down some of my new insight about memcpy dealing with strings, char and vectors in C++. First initializing a char arrays, it&#8217;s important to specify the size of the array !char c[]=&#8221;hello&#8221;; // will auto init your array of chars given the length of the sentencechar *c= new char[string.size()];// will init the [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dedcode.wordpress.com&#038;blog=13721606&#038;post=6&#038;subd=dedcode&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Figured I&#8217;ll put down some of my new insight about memcpy dealing with strings, char and vectors in C++.</p>
<p>First initializing a char arrays, it&#8217;s important to specify the size of the array !<br />char c[]=&#8221;hello&#8221;; // will auto init your array of chars given the length of the sentence<br />char *c= new char[string.size()];// will init the char array given a size</p>
<p>To transform a string into a char array:<br />// by using the c_str()<br />string text=&#8221;Hello&#8221;;<br />const char* c=new char[text.size()];<br />c=text.c_str(); //c_str() return a const *char, thus c must a const char*<br />// by using memcpy<br />memcpy(c,text.c_str(),text.size());</p>
<p>Dealing with vector of char:<br />vector vec(text.size()); // need to specify the size of the that vector<br />memcpy(&amp;vec[0],text.c_str(),text.size());</p>
<p>Despite the fact the a vector can be dynamically expanded using push_back, when using memcpy we have to make sure its size corresponds to the source. Furthermore memcpy needs a void* pointer, that&#8217;s why we convert raw or string into char. *char is somehow equivalent to void*.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/dedcode.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/dedcode.wordpress.com/6/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dedcode.wordpress.com&#038;blog=13721606&#038;post=6&#038;subd=dedcode&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://dedcode.wordpress.com/2010/05/12/memcpy-with-string-char-and-vector/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/b2d0211633636be6fa588876b8f63f02?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">dedcode</media:title>
		</media:content>
	</item>
	</channel>
</rss>
