Category Archives: Sql Server

A Condemnation of Invoke-SqlCmd

Recently Mike Fal penned a post defending the Invoke-SqlCmd cmdlet. This is a response to that post. (Update: Apparently Mike Fal’s post was in response to a post by Drew Furgiuele that was also critical of Invoke-Sqlcmd.) I want to give Mike proper credit: he states multiple times that Invoke-SqlCmd is not the right tool for every sql-related job. Kudos for that. However in this post I’m going to make the argument that Invoke-SqlCmd is the wrong tool for any job, period.

I believe that not only is Invoke-SqlCmd deficient, but is also potentially dangerous to use in production. Let me immediately clarify: I’m not saying the cmdlet is buggy. I’m simply saying that the way it’s intended to be used encourages bad coding practices that, at the very least can produce unintended results, and at worst can introduce serious security issues.

The root cause of all the issues with Invoke-SqlCmd cmdlet is that it’s the classic sqlcmd.exe utility that has some powershell hammered onto it. As a result, most of the patterns and conventions used for cmdlets are thrown out the window in order to maintain backwards compatibility with the base executable. Here’s a great example:

PS c:\> Invoke-SqlCmd -ServerInstance "MyServer\MyInstance" -Database "db1" -Username "ctigeek" -Password "whyAmIPassingInAPassword???" -Query "select * from sometable1"

Now I challenge you to find one other cmdlet, published by Microsoft, that accepts a clear-text password on the command line. Any other cmdlet that needs this information takes in a Credential object. Another example is how it uses the AbortOnError switch instead of the standard ErrorAction parameter. All of these proprietary parameters are intended to maintain parity with the underlying executable. The MSDN documentation even itemizes which options from sqlcmd.exe are included and how they map to the powershell cmdlet. I find all this completely intolerable. The standards by which Powershell commands are built should not be bent (or in this case, flat-out broken) in order to conform to the underlying executable. If DBAs want to use sqlcmd.exe, then they can use it directly without this layer of obfuscation that doesn’t appear to add any real value.

Sqlcmd.exe was originally designed for DBAs to execute DDL and other scripts related to the administration of the database. Yes, it can also return data; it was built to generate “flat” static-font reports that you see printed out in any 80s TV show that had a computer in it. This is how most db reports were generated up until relatively recently. My point is this: sqlcmd was never intended to be used to query data in the kind of environment in which it is used with Powershell. In other words, it is completely lacking in the features necessary to be a useful and safe tool in a modern production application.

  1. It doesn’t support ANY standard cmdlet parameters. I would expect any cmdlet published by Microsoft, especially one that interacts with a database, to have ErrorAction, Verbose, WhatIf, and Confirm. If it can consume user IDs and/or passwords, I expect it to accept a Credential object.
  2. It doesn’t support non-queries or scalar queries. If I execute an update or delete using Invoke-SqlCmd, I will have no idea how many rows I just impacted. If confirmation is a requirement, I then have to execute a second, possibly expensive table-scanning, query to confirm.
  3. It doesn’t support transactions. I can’t overemphasize how big a deal this is. This is one of those things that people don’t think about because it’s not even available. The only way to do transactions with Invoke-SqlCmd is by having the entire transaction inside the SQL command being sent. This usually isn’t practical, sometimes necessitating the script to dynamically generate the sql with all the correct commits, rollbacks, and checks thereto. That’s insane. And it completely prevents the Powershell script from doing other things within the scope of the transaction. Did you know that Powershell supports distributed transactions? Yup. You can start a transaction, make changes on multiple database servers, change a file, a registry key, and put a message into an MSMQ queue, all part of a transaction. But Invoke-SqlCmd doesn’t support that.
  4. Support for SQL parameters is awful. This is the dangerous part I was eluding to earlier. The support for defining and passing in parameters is awful. How bad? The variable and the value are defined in a single string. e.g. "var1='val1'" And you have to escape single quotes in the value. This means that if the values are already in proper Powershell variables, you have to escape the value, build strings, put that into an array, pass that into …. never mind. It’s an unusable mess. And that’s the point. Anyone with 5 minutes experience with Powershell will just create the SQL using string concatenation based on the function’s input parameters. And of course that’s begging for a whole litany of security issues. If proper SQL parameters were supported, people would use it.
  5. It only supports Sql Server. Do you really only EVER work with Sql Server? Ever? If so, good for you, but in 20 years that’s never been the case for me. The great thing about SQL is that it’s (mostly) universal. I want a Powershell command that works the same way, with the same options, across any relational database.

So what’s the harm if Invoke-SqlCmd supports your current needs? The problem is your needs tomorrow will be different. Your code is going to change, and sooner or later you’re going to hit the much-closer-than-you-think wall of usefulness. So what’s the alternative? Lots of .NET coding?

Two years ago I realized all this and started the OSS project, InvokeQueryPowershellModule. It’s written in c#, based on ADO.NET, and supports all of the functionality above and more. There’s lots of examples in the Readme. It’s available from the Powershell gallery now. I’ve been using it in a production environment for over a year with no issues. I strongly encourage you to consider it as a superior alternative.

Cache Comparison: Memcached vs. Sql Server ……… (wait…, what?)

Update 3: Woohoo! Sql Server 2016 has really delivered a lot of improved features for in-memory tables. New post is coming with details….

UPDATE 2: Due to Sql Server 2016 improving many of the features of in-memory tables, I’m postponing some of this work until that version is released. I’ve heard it addresses the row-size limit.

UPDATE: Due to the promising results below, I’ve started an open-source project to do a proper implementation.

So I finally got around to looking at Sql Server’s awesome new feature: in-memory tables. I’ve written a lot of code that moves real-time data through Sql Server to reporting systems, so this is of great interest to me. I really could have used this feature about 10 years ago. After delving into the details I concluded there are definitely some limitations, but lots of exciting possibilities as well. (E.g. We’ll be migrating our ASP session database to persisted in-memory tables soon.) I won’t delve into how in-memory tables work; many others have already done a good job of that. The focus of this post is to explore the idea of using non-persisted in-memory tables as a replacement for memcached. Yes, you read that right.  I’m thinking of replacing memcached with Sql Server.

At Rackspace, we use memcached to store a couple different things that don’t need to be persisted to disk (e.g. throttling data). We also use it as a fast caching layer for our main operational database via NHibernate. Those are two very common use-cases, so let’s define them for reference:
1. Canonical data source for non-critical data that doesn’t need to be persisted.
2. Secondary low-latency data source for critical high-use data that is persisted to a DB.

We started using memcached quite a while ago, before other alternatives became available, and due to some of its limitations we’ve been mulling over a replacement. When I started reviewing Sql Server’s in-memory tables it got me thinking: If it’s all in memory… I wonder if it’s as fast as memcached? And if so, would it be a viable replacement?

Let’s start with that first question, because the results will help better answer the second question with respect to the use cases above.  All the code for this test is available on my Github repo. Note: You have to set up Sql Server with the correct file structure for in-memory tables.  Also, the build of memcached for Windows that I use can be downloaded from this stack overflow discussion.

Notes on this performance test:

  • The test app, sql server, and memcached are all running on my laptop, so there’s no network latency.
  • Before the stopwatch starts, I create a client and make an initial write to establish a connection.
  • All read/writes are done sequentially, not asynchronously.
  • The Sql Server tables are in-memory only.
    (in-memory tables can optionally be persisted, but it is not enabled for this test.)
  • I also included writing to regular ol’ disk-based tables for reference.

All times are in micro-seconds.

~~~~~~~~~~~~~~~~~~~~~~~~~Writes~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Reads~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# rows memcached Sql in-memory Sql on-disk memcached Sql in-memory Sql on-disk
1000 460 624 1521 448 578 584
5000 660 716 1762 358 616 670
10000 975 1729 4370 933 1590 1772

I would probably get more consistent and real-world metrics with the test app running on one server, and memcached and Sql server on their own, but I think this is an adequate back-of-the-envelope kind of test. Here’s the interesting part: Over multiple tests there was a good amount of overlap between the best & worst times of each platform, however on each individual test, Sql Server was always a little slower than memcached.  This makes sense when you think about it. Memcached has one job: implement a hash in memory. Sql Server is infinitely more complicated, but that’s not a bad thing in this case because Sql Server gives you an incredibly flexible framework to customize your cache. You could implement different caches in a number of ways that involve multiple tables, or various columns to better fit your caching needs. And you get to manipulate that cache using the incredibly flexible T-SQL.  You d have to be careful not to make it too complex however: In some additional testing I did, inserting into two tables took about 1.5 times as long as inserting into just one table. Also, in-memory tables don’t support foreign keys or left outer joins.  My point is this: if you try to do anything custom with memcached, you’ll end up having to do it in the client code, which is bad technological debt that’s hard to migrate/upgrade, etc.  Implementing something custom in natively compiled stored procedures is centrally located and will be easy to modify (or fix!) later on.

There are of course other considerations than just speed. Storage is a big one. Memcached can store a value up to 1MB by default. (More if you change the config but it’s not recommended.) The new in-memory tables can only store a max of 8060 bytes. If you’re storing unicode text as NVARCHAR (as I did) that will halve it, so you may want to consider a binary serializer and possibly some compression.

So what does Sql Server have going for it? Security comes to mind, and the complete lack of it in memcached has particularly been on my mind. I really like the fact that using Sql Server as a cache has built-in security, and I can control that security in a very granular way. Memcached and most other memory stores can’t even come close to Sql Server’s authentication and authorization capabilities. Sql Server is also good at things like clustering and fail-over models, which simply doesn’t exist in the memcached world. Memcached relies on the clients to handle redundancy, which is almost worse than having no redundancy at all since you could potentially have different webheads talking to different memcached servers, delivering stale data or causing stale data to be written back to the persistence layer. Fail-over redundancy should always be implemented at either the server application level or network level. Another thing I like: I already have a Sql Server Always-On cluster in my environment, so if I want to use Sql Server for my caching then I don’t have to worry about standing up another cluster of servers to run a cache. As long as I have the memory to spare, I get caching for free. There’s also a lot of possibilities about how you can segment your data in Sql Server. Right now we have multiple disparate data sets all intermingled in memcached, sharing the same key space. Using Sql Server I could set up different tables for different purposes, which may also have different security requirements. The client and stored procedures could also be modified to execute multiple reads/writes per call, which could significantly reduce time-per-call and possibly (on average) beat memcache response times.

Last thing: I haven’t explored it yet, but there’s a chance you could more tightly integrate your caching layer with your persistent layer by having it all in Sql Server. E.g. you could serve a dashboard from an in-memory cache that’s recompiled every 30 seconds from multiple underlying persistent tables. This is an actual use-case I’ve had to implement many times in the past. One thing though: precompiled stored procedures are vital to realizing the performance of an in-memory table, but they can’t query regular persisted tables. However stored procedures that aren’t precompiled can still access in-memory tables. This means that the pre-compiled stored procedures that can quickly access in-memory tables can’t jump over to the tables on disk if the data isn’t cached. That’s a shame because that would be a great feature for an L2 cache.

Let’s summarize in a table:

Feature memcached Sql Server in-memory tables
Microseconds per write (heavily rounded): ~100-500 ~150-650
Microseconds per read (heavily rounded): ~100-500 ~150-650
Max Value size (uncompressed): 1 MB ~8 kB
Security: None. Lots.
Clients available: Many, for every language. None. Although you’re just calling stored procs, so kinda.
(I’m probably going to write a custom NHibernate L2 cache adapter for it soon.)
Customizable: None, unless you feel like forking the project and working with C++ code. Yes! And it’s as easy as creating tables and writing SQL.
Support for SQL: None. Of course!

So should you consider this as a viable option? I certainly will, although I have a lot more testing to complete. I really like the security and the flexibility. If I want to change something, I just change my table definition and update (or create new) stored procedures. The size of the value is a significant concern for me, so I may do some tests with compression, although that could have a considerable CPU hit on the client, and with such small payloads the compression header + body may end up being more than the original payload. Again, maybe Sql Server 2016 will improve the size limit. I will post updates as I explore this option more.
-Steve