Update 3: Woohoo! Sql Server 2016 has really delivered a lot of improved features for in-memory tables. New post is coming with details….
UPDATE 2: Due to Sql Server 2016 improving many of the features of in-memory tables, I’m postponing some of this work until that version is released. I’ve heard it addresses the row-size limit.
UPDATE: Due to the promising results below, I’ve started an open-source project to do a proper implementation.
So I finally got around to looking at Sql Server’s awesome new feature: in-memory tables. I’ve written a lot of code that moves real-time data through Sql Server to reporting systems, so this is of great interest to me. I really could have used this feature about 10 years ago. After delving into the details I concluded there are definitely some limitations, but lots of exciting possibilities as well. (E.g. We’ll be migrating our ASP session database to persisted in-memory tables soon.) I won’t delve into how in-memory tables work; many others have already done a good job of that. The focus of this post is to explore the idea of using non-persisted in-memory tables as a replacement for memcached. Yes, you read that right. I’m thinking of replacing memcached with Sql Server.
At Rackspace, we use memcached to store a couple different things that don’t need to be persisted to disk (e.g. throttling data). We also use it as a fast caching layer for our main operational database via NHibernate. Those are two very common use-cases, so let’s define them for reference:
1. Canonical data source for non-critical data that doesn’t need to be persisted.
2. Secondary low-latency data source for critical high-use data that is persisted to a DB.
We started using memcached quite a while ago, before other alternatives became available, and due to some of its limitations we’ve been mulling over a replacement. When I started reviewing Sql Server’s in-memory tables it got me thinking: If it’s all in memory… I wonder if it’s as fast as memcached? And if so, would it be a viable replacement?
Let’s start with that first question, because the results will help better answer the second question with respect to the use cases above. All the code for this test is available on my Github repo. Note: You have to set up Sql Server with the correct file structure for in-memory tables. Also, the build of memcached for Windows that I use can be downloaded from this stack overflow discussion.
Notes on this performance test:
- The test app, sql server, and memcached are all running on my laptop, so there’s no network latency.
- Before the stopwatch starts, I create a client and make an initial write to establish a connection.
- All read/writes are done sequentially, not asynchronously.
- The Sql Server tables are in-memory only.
(in-memory tables can optionally be persisted, but it is not enabled for this test.)
- I also included writing to regular ol’ disk-based tables for reference.
All times are in micro-seconds.
|# rows||memcached||Sql in-memory||Sql on-disk||memcached||Sql in-memory||Sql on-disk|
I would probably get more consistent and real-world metrics with the test app running on one server, and memcached and Sql server on their own, but I think this is an adequate back-of-the-envelope kind of test. Here’s the interesting part: Over multiple tests there was a good amount of overlap between the best & worst times of each platform, however on each individual test, Sql Server was always a little slower than memcached. This makes sense when you think about it. Memcached has one job: implement a hash in memory. Sql Server is infinitely more complicated, but that’s not a bad thing in this case because Sql Server gives you an incredibly flexible framework to customize your cache. You could implement different caches in a number of ways that involve multiple tables, or various columns to better fit your caching needs. And you get to manipulate that cache using the incredibly flexible T-SQL. You d have to be careful not to make it too complex however: In some additional testing I did, inserting into two tables took about 1.5 times as long as inserting into just one table. Also, in-memory tables don’t support foreign keys or left outer joins. My point is this: if you try to do anything custom with memcached, you’ll end up having to do it in the client code, which is bad technological debt that’s hard to migrate/upgrade, etc. Implementing something custom in natively compiled stored procedures is centrally located and will be easy to modify (or fix!) later on.
There are of course other considerations than just speed. Storage is a big one. Memcached can store a value up to 1MB by default. (More if you change the config but it’s not recommended.) The new in-memory tables can only store a max of 8060 bytes. If you’re storing unicode text as NVARCHAR (as I did) that will halve it, so you may want to consider a binary serializer and possibly some compression.
So what does Sql Server have going for it? Security comes to mind, and the complete lack of it in memcached has particularly been on my mind. I really like the fact that using Sql Server as a cache has built-in security, and I can control that security in a very granular way. Memcached and most other memory stores can’t even come close to Sql Server’s authentication and authorization capabilities. Sql Server is also good at things like clustering and fail-over models, which simply doesn’t exist in the memcached world. Memcached relies on the clients to handle redundancy, which is almost worse than having no redundancy at all since you could potentially have different webheads talking to different memcached servers, delivering stale data or causing stale data to be written back to the persistence layer. Fail-over redundancy should always be implemented at either the server application level or network level. Another thing I like: I already have a Sql Server Always-On cluster in my environment, so if I want to use Sql Server for my caching then I don’t have to worry about standing up another cluster of servers to run a cache. As long as I have the memory to spare, I get caching for free. There’s also a lot of possibilities about how you can segment your data in Sql Server. Right now we have multiple disparate data sets all intermingled in memcached, sharing the same key space. Using Sql Server I could set up different tables for different purposes, which may also have different security requirements. The client and stored procedures could also be modified to execute multiple reads/writes per call, which could significantly reduce time-per-call and possibly (on average) beat memcache response times.
Last thing: I haven’t explored it yet, but there’s a chance you could more tightly integrate your caching layer with your persistent layer by having it all in Sql Server. E.g. you could serve a dashboard from an in-memory cache that’s recompiled every 30 seconds from multiple underlying persistent tables. This is an actual use-case I’ve had to implement many times in the past. One thing though: precompiled stored procedures are vital to realizing the performance of an in-memory table, but they can’t query regular persisted tables. However stored procedures that aren’t precompiled can still access in-memory tables. This means that the pre-compiled stored procedures that can quickly access in-memory tables can’t jump over to the tables on disk if the data isn’t cached. That’s a shame because that would be a great feature for an L2 cache.
Let’s summarize in a table:
|Feature||memcached||Sql Server in-memory tables|
|Microseconds per write (heavily rounded):||~100-500||~150-650|
|Microseconds per read (heavily rounded):||~100-500||~150-650|
|Max Value size (uncompressed):||1 MB||~8 kB|
|Clients available:||Many, for every language.||None. Although you’re just calling stored procs, so kinda.
(I’m probably going to write a custom NHibernate L2 cache adapter for it soon.)
|Customizable:||None, unless you feel like forking the project and working with C++ code.||Yes! And it’s as easy as creating tables and writing SQL.|
|Support for SQL:||None.||Of course!|
So should you consider this as a viable option? I certainly will, although I have a lot more testing to complete. I really like the security and the flexibility. If I want to change something, I just change my table definition and update (or create new) stored procedures. The size of the value is a significant concern for me, so I may do some tests with compression, although that could have a considerable CPU hit on the client, and with such small payloads the compression header + body may end up being more than the original payload. Again, maybe Sql Server 2016 will improve the size limit. I will post updates as I explore this option more.