Category Archives: Code

A Condemnation of Invoke-SqlCmd

Recently Mike Fal penned a post defending the Invoke-SqlCmd cmdlet. This is a response to that post. (Update: Apparently Mike Fal’s post was in response to a post by Drew Furgiuele that was also critical of Invoke-Sqlcmd.) I want to give Mike proper credit: he states multiple times that Invoke-SqlCmd is not the right tool for every sql-related job. Kudos for that. However in this post I’m going to make the argument that Invoke-SqlCmd is the wrong tool for any job, period.

I believe that not only is Invoke-SqlCmd deficient, but is also potentially dangerous to use in production. Let me immediately clarify: I’m not saying the cmdlet is buggy. I’m simply saying that the way it’s intended to be used encourages bad coding practices that, at the very least can produce unintended results, and at worst can introduce serious security issues.

The root cause of all the issues with Invoke-SqlCmd cmdlet is that it’s the classic sqlcmd.exe utility that has some powershell hammered onto it. As a result, most of the patterns and conventions used for cmdlets are thrown out the window in order to maintain backwards compatibility with the base executable. Here’s a great example:

PS c:\> Invoke-SqlCmd -ServerInstance "MyServer\MyInstance" -Database "db1" -Username "ctigeek" -Password "whyAmIPassingInAPassword???" -Query "select * from sometable1"

Now I challenge you to find one other cmdlet, published by Microsoft, that accepts a clear-text password on the command line. Any other cmdlet that needs this information takes in a Credential object. Another example is how it uses the AbortOnError switch instead of the standard ErrorAction parameter. All of these proprietary parameters are intended to maintain parity with the underlying executable. The MSDN documentation even itemizes which options from sqlcmd.exe are included and how they map to the powershell cmdlet. I find all this completely intolerable. The standards by which Powershell commands are built should not be bent (or in this case, flat-out broken) in order to conform to the underlying executable. If DBAs want to use sqlcmd.exe, then they can use it directly without this layer of obfuscation that doesn’t appear to add any real value.

Sqlcmd.exe was originally designed for DBAs to execute DDL and other scripts related to the administration of the database. Yes, it can also return data; it was built to generate “flat” static-font reports that you see printed out in any 80s TV show that had a computer in it. This is how most db reports were generated up until relatively recently. My point is this: sqlcmd was never intended to be used to query data in the kind of environment in which it is used with Powershell. In other words, it is completely lacking in the features necessary to be a useful and safe tool in a modern production application.

  1. It doesn’t support ANY standard cmdlet parameters. I would expect any cmdlet published by Microsoft, especially one that interacts with a database, to have ErrorAction, Verbose, WhatIf, and Confirm. If it can consume user IDs and/or passwords, I expect it to accept a Credential object.
  2. It doesn’t support non-queries or scalar queries. If I execute an update or delete using Invoke-SqlCmd, I will have no idea how many rows I just impacted. If confirmation is a requirement, I then have to execute a second, possibly expensive table-scanning, query to confirm.
  3. It doesn’t support transactions. I can’t overemphasize how big a deal this is. This is one of those things that people don’t think about because it’s not even available. The only way to do transactions with Invoke-SqlCmd is by having the entire transaction inside the SQL command being sent. This usually isn’t practical, sometimes necessitating the script to dynamically generate the sql with all the correct commits, rollbacks, and checks thereto. That’s insane. And it completely prevents the Powershell script from doing other things within the scope of the transaction. Did you know that Powershell supports distributed transactions? Yup. You can start a transaction, make changes on multiple database servers, change a file, a registry key, and put a message into an MSMQ queue, all part of a transaction. But Invoke-SqlCmd doesn’t support that.
  4. Support for SQL parameters is awful. This is the dangerous part I was eluding to earlier. The support for defining and passing in parameters is awful. How bad? The variable and the value are defined in a single string. e.g. "var1='val1'" And you have to escape single quotes in the value. This means that if the values are already in proper Powershell variables, you have to escape the value, build strings, put that into an array, pass that into …. never mind. It’s an unusable mess. And that’s the point. Anyone with 5 minutes experience with Powershell will just create the SQL using string concatenation based on the function’s input parameters. And of course that’s begging for a whole litany of security issues. If proper SQL parameters were supported, people would use it.
  5. It only supports Sql Server. Do you really only EVER work with Sql Server? Ever? If so, good for you, but in 20 years that’s never been the case for me. The great thing about SQL is that it’s (mostly) universal. I want a Powershell command that works the same way, with the same options, across any relational database.

So what’s the harm if Invoke-SqlCmd supports your current needs? The problem is your needs tomorrow will be different. Your code is going to change, and sooner or later you’re going to hit the much-closer-than-you-think wall of usefulness. So what’s the alternative? Lots of .NET coding?

Two years ago I realized all this and started the OSS project, InvokeQueryPowershellModule. It’s written in c#, based on ADO.NET, and supports all of the functionality above and more. There’s lots of examples in the Readme. It’s available from the Powershell gallery now. I’ve been using it in a production environment for over a year with no issues. I strongly encourage you to consider it as a superior alternative.

Using Tasks with IObservable

This is a follow up to my post on using Async/Await with IEnumerable and Yield-Return.

Jump to the code on Github.

In my previous post we looked at using tasks with IEnumerable. But what if you are returning tasks from an IObservable instead of an IEnumerable? How would you run them and how do you limit the concurrency?  Better yet, can we take in an IObservable<Task<T>>, run the tasks automatically, and then produce an IObservable<T>?

The short answer is yes, but it’s not trivial due to the concurrency requirement.  Let’s solve it without that requirement first. I’ll be using a fictional Account type as my “T” just like in my previous post.

   var repo = new AccountRepository();
   var tasks = repo.GetAccountWithSubs("123123123"); //this returns IEnumerable
   var tasksSource = tasks.ToObservable();           //this converts IEnumerable to IObservable<Task<Account>>

   var accountSource = tasksSource.Select(ts => ts.Result); //this returns an IObservable<Account>
   var subscription = accountSource.Subscribe(
        acc => {
                  Console.WriteLine(acc.AccountNumber);
               },
        ex =>  { },
        () => { }
     );

Thanks to the LINQ extensions for IObservable it’s pretty easy to “run” the task and return the result as a new IObservable. Examples like this really show the beautiful parity between IEnumerable and IObservable.

Unfortunately, this example runs all the tasks synchronously. What if we want to run multiple tasks in parallel? To accomplish this, we would need an observer receiving calls from the first IObservable (i.e. the parent), wait for the tasks to complete, and then to call the observer for the second IObserverable with the account object. On top of that, we need to be able to propagate any exceptions swallowed by the task to the subscriber. We also have to support the user canceling the IObservable before the collection is done via the dispose method. In order to implement all this functionality, we’ll have to do it “the long way”, which is to say, we’ll have to actually implement our own IObservable for the job. i.e. we’ll be given an IObservable<Task<T>> and we need to create and return an IObservable<T>. To get the ball rolling, let’s re-implement the same functionality in the code above, but with our own made-from-scratch IObservable. (Side note: I tried to do this using the Observable.Create method, but it became too complex for delegates to handle.)

    internal class TObservable<T> : IObservable<T>, IDisposable
    {
        private readonly IObservable<Task<T>> taskObservable;
        private IDisposable taskObservableSubscriber;
        private IObserver<T> observer;

        public TObservable(IObservable<Task<T>> taskObservable)
        {
            this.taskObservable = taskObservable;
        }

        public IDisposable Subscribe(IObserver<T> observer)
        {
            this.observer = observer;
            this.taskObservableSubscriber =
                             taskObservable.Subscribe(OnNext, OnError, OnCompleted);
            return this;
        }

        private void OnNext(Task<T> task)
        {
            this.observer.OnNext(task.Result);
        }

        private void OnError(Exception exception)
        {
            this.observer.OnError(exception);
        }

        private void OnCompleted()
        {
            this.observer.OnCompleted();
        }

        public void Dispose()
        {
            if (this.taskObservableSubscriber != null)
            {
                this.taskObservableSubscriber.Dispose();
                this.taskObservableSubscriber = null;
            }
        }
    }

Most of what’s happening here is just listening for events from the “parent” IObservable, and the echoing those events back out to the registered observer. The only exception is for the OnNext method, which receives a task, then synchronously runs the task (by referencing the Result property) and sends the value to the observer.  Again, this should mirror the same functionality we achieved using the Select statement in the first code snippet, but now we have a framework we can expand and add additional functionality.

Before we move one I wanted to point out that our class also implements IDisposable. This saves us from having to create a new object that implements IDisposable just to be returned to the subscriber. Another option is to use the new Disposable.Create method in the Subscribe method:

return Disposable.Create(() =>
{
   if (this.taskObservableSubscriber != null)
   {
      this.taskObservableSubscriber.Dispose();
      this.taskObservableSubscriber = null;
   }
});

This seems like a nice terse way to code it, and the code that’s executed is identical, but having our own dispose method to call will turn out to be more convenient as we add additional functionality.

The first change is to support raising exceptions that’s contained in the task. As a reminder, tasks will swallow any exceptions until that task is await’ed. Since we aren’t using async/await in our code, we need to check each task to see if it’s faulted. If there’s an exception, we should call the subscriber’s onError and end processing of the IObservable. Here are the changes to OnNext and OnError to support this:

        private void OnNext(Task<T> task)
        {
            if (task.IsFaulted)
            {
                this.OnError(task.Exception);
            }
            else
            {
                this.observer.OnNext(task.Result);
            }
        }

        private void OnError(Exception exception)
        {
            this.Dispose();
            this.observer.OnError(exception);
        }

We check the task for an error condition and then raise the error to the subscriber. Before raising the error however, we call our own dispose method, which calls dispose on the “parent” IObservable, which prevents OnNext from being called again with additional tasks. This is important because once OnError (or OnCompleted) is called, the IObservable is not allowed to make any further calls to the observer, and as a result of raising an error the observer may take some kind of drastic measure (e.g. kill the process) and we want to give the parent IObservable an opportunity to clean up. (e.g. close files, write logs, etc.)

Next we need to support the scenario of a task being cancelled. This requires some discretion on the part of the coder: How do we raise that as an IObservable event? You may want to change this to suite your needs. I’ll be raising it as an exception, but you may want to call OnCompleted instead.

        private void OnNext(Task<T> task)
        {
            if (task.IsFaulted)
            {
                this.OnError(task.Exception);
            }
            else if (task.IsCanceled)
            {
                // you may want to call OnCompleted instead depending on if the cancellation is expected or not.
                this.OnError(new OperationCanceledException("The task was cancelled."));
            }
            else
            {
                this.observer.OnNext(task.Result);
            }
        }

Now we can move on to the core functionality: running multiple tasks in parallel. This gets a little trickier because we can’t just parrot events like we’ve been doing. We have to track tasks that are in progress, and then call the subscriber’s OnNext with the task’s result as they complete. Most of this code is directly converted from my extension methods that do the same thing with IEnumerable. First let’s add some local variables for tracking tasks and initialize them in the constructor.

        ....
        private readonly int maxConcurrency;
        private Task<T>[] currentTasks;
        private int taskCount = 0;
        private int nextIndex = 0;

        public TObservable(IObservable<Task<T>> taskObservable, int maxConcurrency)
        {
            this.taskObservable = taskObservable;
            this.maxConcurrency = maxConcurrency;
            this.currentTasks = new Task<T>[maxConcurrency];
        }

Next, let’s take the code that calls the subscriber and put it in its own function. We’ll need to call it from multiple places soon.

        private void CallSubscriber(Task<T> task)
        {
            if (task.IsFaulted)
            {
                this.OnError(task.Exception);
            }
            else if (task.IsCanceled)
            {
                // you may want to call OnCompleted instead based on how you're using the cancellation token.
                this.OnError(new OperationCanceledException("The task was cancelled."));
            }
            else
            {
                this.observer.OnNext(task.Result);
            }
        }

Now we can use the OnNext method to track our tasks in an array and wait for them to finish:

        private void OnNext(Task<T> task)
        {
            currentTasks[nextIndex] = task;
            taskCount++;
            if (taskCount == maxConcurrency)
            {
                nextIndex = Task.WaitAny(currentTasks);
                CallSubscriber(currentTasks[nextIndex]);
                currentTasks[nextIndex] = null;
                taskCount--;
            }
            else
            {
                nextIndex++;
            }
        }

The task is added to an array. If the array is full, it waits for the first task to finish (via WaitAny). Once a task is done, it signals the subscriber (via CallSubscriber) and then removes it from the array. The next time OnNext is called, that empty spot in the array will be taken by the new task. This continues until OnError or OnCompleted is called. Let’s run down the happy path and look at OnCompleted first.

        private void OnCompleted()
        {
            while (taskCount > 0)
            {
                currentTasks = currentTasks.Where(t => t != null).ToArray();
                nextIndex = Task.WaitAny(currentTasks);
                CallSubscriber(currentTasks[nextIndex]);
                currentTasks[nextIndex] = null;
                taskCount--;
            }
            this.Dispose();
            this.observer.OnCompleted();
        }

The WaitAny method doesn’t like null values, so we have to rebuild our array to remove them. Then we wait for a task to complete, notify the subscriber, and remove it from the array. Same as above. Once we are done running the tasks, we dispose the parent subscriber and then call OnCompleted on our subscriber to let them know we’ve completed.

The OnError case is straight forward. We just dispose the parent subscriber and then echo the exception to our subscriber like we did before:

        private void OnError(Exception exception)
        {
            this.Dispose();
            this.observer.OnError(exception);
        }

One problem: what if we have tasks still in progress? Do we orphan them? We could possibly add in a cancellation token and then take care of it in the Dispose method:

        public void Dispose()
        {
            if (taskCount > 0)
            {
                //cancel running tasks?
            }
        }

I’ll leave it up to you how to handle that one.

 

Here’s the full code of the gist:

Using Async/Await with IEnumerable and Yield-Return

Have you tried doing a yield return inside an async method? It doesn’t work. In fact, it won’t even compile. This isn’t a bug or missing feature; the two are just fundamentally incompatible. When you use yield return, a lot of code is generated by the compiler.  The async/await pattern also generates code, but is even more complicated at run time. Regardless of how much code is generated, you can’t defer execution of an IEnumerable expression. This has caused me more than a little grief over the past few months and I’m hoping to share some thoughts (and some code) that came out of it.

Jump to the code on GitHub.

Background

A lot of what I do is “back end” programming, usually running as a windows service or in response to a queued message.  I’ve recently been working on a process that has to sift through tens-of-thousands of accounts and crunch some numbers.  Sometimes information is fetched by an API call, or by hitting one or more databases.  All these request-and-wait operations are a prime target for using async/await; that’s what it was designed for.  In fact by using async/await I can process multiple accounts in parallel and increase my throughput anywhere from 4 to 10 times.  Hurrah!  But we are getting ahead of ourselves…

Of course there’s a catch:  We have a quite a few large enterprise customers that have thousands of sub-accounts that must be processed as part of the main account, and many of those sub-accounts have thousands of services assigned to them.  When we initially ran a non-optimized version of our code, one of the larger accounts actually generated a stack overflow. That’s the first stack overflow I’ve seen in years that wasn’t caused by an infinite loop.  The overflow was caused by loading every service for every sub-account, along with all the services for the main account.  This obviously wasn’t going to work for our current customers, and we had a mandate to handle even bigger customers in the future. (I’m obliged to point out that these memory issues happened on a small development server. The code may have run fine on our massive prod boxes, but anytime your software runs out of memory, it means it’s time to go back and do some refactoring.)

To solve the memory consumption issue we modified our code to process the services one sub-account at a time.  To do this, we now passed in an IEnumerable<Account>. (Previously it was a List<Service>.)   The repository class that delivers accounts to our system was actually quite complex (part API and part database), and would have been very difficult to implement any kind of pagination.  Thankfully that wasn’t an issue since we were able to refactor it to use a yield-return, which returns sub-accounts as they are fetched. Each sub-account is processed, and the next sub-account is fetched.  This allows the garbage collector to clean up account objects after they have been processed.  This would meet our scaling requirements since only one account object needed to be in memory at any time.  We could now process our large customers.  Great.  However, we were still limited by the number of accounts we could process in parallel. We could do some parallelization, but the most important I/O, fetching an account, didn’t include async/await, and it was uncertain how to utilize it since we were heavily reliant on yield-return.  We had solved our memory issues, but now we had a performance issue because we were always synchronously waiting on I/O.

So, here’s where we are with our code. Notice we are yield-returning each account as we fetch it.  Because of the scope within the foreach, sub-account objects are disposed once they are processed. Nothing too difficult.


public void ProcessAccount(string accountNumber) {
   foreach (var account in GetAccountWithSubAccounts(accountNumber)) {
       // ... process the accounts....
   }
}
public IEnumerable<Account> GetAccountWithSubAccounts(string accountNumber) {
    //this is an async repository, so we have to call .Result.
    var parentAccount = accountRepository.GetAccount(accountNumber).Result;
    yield return parentAccount; //this returns an account object.

    foreach (var childAccountNumber in parentAccount.ChildAccountNumbers) {
        var childAccount = accountRepository.GetAccount(childAccountNumber).Result;
        yield return childAccount;
    }
}

Problem 1: How to combine Async/Await with IEnumerable

The bottom line is you can’t. So we’ll do the next best thing: instead of returning an IEnumerable of Account, we’ll return an IEnumerable of Task<Account>.  Let’s look at the updated code and then we’ll discuss:


public void ProcessAccount(string accountNumber) {
   foreach (var accountTask in GetTasksForAccountWithSubAccounts(accountNumber)) {
       var account = accountTask.Result;
       // ... process the accounts....
   }
}

//notice, no async keyword...
public IEnumerable<Task<Account>> GetTasksForAccountWithSubAccounts(string accountNumber) {
    //get the parent account from the repo....
    var parentAccountTask = accountRepository.GetAccount(accountNumber);
    yield return parentAccountTask;

    var parentAccount = parentAccountTask.Result;
    foreach (var childAccountNumber in parentAccount.ChildAccountNumbers) {
        //notice there is no await. we want to return the task, not the account.
        var childAccountTask = accountRepository.GetAccount(childAccountNumber);
        yield return childAccountTask;
    }
}

The first thing we notice is that the GetTasksForAccountWithSubAccounts doesn’t have async or await keywords.  As we’ve already discussed, it uses yield-return so it can’t defer execution.  Instead, we are getting the task from the repository and using yield-return to send the task back to the caller.  Something else that’s important to note: because we are returning a true IEnumerable (vs a List that’s impersonating IEnumerable), each task doesn’t get created until the foreach statement calls MoveNext() on the enumerator. This means the scope of each task is one iteration of the foreach loop. This strategy will scale well and meet our growth requirements.

One odd thing you may have noticed: we have to “run” the parentAccountTask (via the Result parameter) to get the child account numbers.  Because we’ve already performed a yield-return on that task, it will have already been processed by the caller.  This means referencing the Result property won’t block.

We are headed in the right direction but this doesn’t solve our core problem: we aren’t running anything in parallel.

Problem 2: How to run X number of tasks from an IEnumerable<Task> without reifying the enumerable.

We need to be able to start multiple tasks at the same time and wait for them to finish.  Let’s see if one of the static helper methods in the Task class can help us:

public static Task WhenAll( IEnumerable<Task> tasks )

Hey cool! It takes IEnumerable.  Maybe we can use this method and rely on the task manager to throttle our requests. Let’s take a look at what WhenAll is doing under the hood:

//This code block Copyright (c) Microsoft Corporation.  All rights reserved.
public static Task<TResult[]> WhenAll<TResult>(IEnumerable<Task<TResult>> tasks)
{
    .....
    List<Task<TResult>> taskList = new List<Task<TResult>>();
    foreach (Task<TResult> task in tasks) {
        ....
        taskList.Add(task);
    }
    // Delegate the rest to InternalWhenAll<TResult>().
    return InternalWhenAll<TResult>(taskList.ToArray());
}

What! It reifies the enumerable to a list, then converts it to an array. (An array! What is this, .NET 2.0?)  That does us no good. It will create all the tasks at once and then run them. The WhenAny method performs the same operations. The WaitAll and WaitAny methods don’t even take IEnumerable, only an array.

The Tasks.Parallel class has a lot of static methods that look promising, but you have to wrap each task in an Action, which, when you look under the hood, is wrapped in a new task. Ugh. I already have tasks. Why not just run the tasks I already have?  Also, Tasks.Parallel was not made with Async operations in mind; it was designed for CPU bound parallel processing.

Okay, enough of this. Time to crank out some code.

So the extension methods I wrote are a little dense.  Instead of dissecting those I’m going to build it out like I did initially (in class form) for easier discussion, then we’ll look at how it was refactored into extension methods.


    public class ParallelTaskRunner<T>
    {
        private int taskCount;
        private Task<T>[] currentTasks;
        private int maxConcurrency;

        public ParallelTaskRunner(int maxConcurrency)
        {
            this.maxConcurrency = maxConcurrency;
            currentTasks = new Task<T>[maxConcurrency];
        }

        public T AddTask(Task<T> task)
        {
            var returnValue = default(T);
            AddTaskToArray(task);
            taskCount++;
            if (taskCount == maxConcurrency)
            {
                var taskindex = Task.WaitAny(currentTasks);
                returnValue = currentTasks[taskindex].Result;
                currentTasks[taskindex] = null;
                taskCount--;
            }
            return returnValue;
        }

        public IEnumerable<T> WaitRemainingTasks()
        {
            var runningTasks = currentTasks.Where(t => t != null).ToArray();
            if (taskCount > 0)
            {
                Task.WaitAll(runningTasks);
            }
            return runningTasks.Select(t => t.Result);
        }

        private void AddTaskToArray(Task<T> task)
        {
            for (int i = 0; i < currentTasks.Length; i++)
            {
                if (currentTasks[i] == null)
                {
                    currentTasks[i] = task;
                    break;
                }
            }
        }
    }

And here’s how we use it…

   var accountTasks = GetTasksForAccountWithSubAccounts(someAccountNumber);
   var taskRunner = new ParallelTaskRunner<Account>(5);
   foreach (var task in tasks) {
      var account = taskRunner.AddTask(task);
      if (account != null) {
         // do stuff with account....
      }
   }
   foreach (var account in taskRunner.WaitRemainingTasks()) {
      // do stuff with account....
   }

The constructor just initializes a few things.  The AddTask method is where all the magic happens.  It adds a task to the array, and if the array is full it will do a Task.WaitAny. This will block and only return when any of the tasks is complete. The value for that task is then returned, at which point the caller can process the return value, and then add the next task by calling AddTask again.  This continues until all the tasks have been added.  Then the remaining tasks are processed and returned by calling WaitRemainingTasks.

The extension methods are a little more efficient, especially when adding new tasks to the array.  There are two almost-identical methods: one for Task and another for Task<T>.  I tried to combine them but had issues with the type of the Action parameter.  If anyone has suggestions for combining those let me know.

One of the big differences between my original code above and the extension methods (shown in full below) is the return type (or the delegate type for the extension methods): the code above returns T. The delegate takes either Task or Task<T>.  This is important because, as it’s written, exceptions are swallowed into the task.  You should always check the status of the task to see if it’s faulted.

Let’s see the same example using the generic extension method:

   //this will return IEnumerable<Task<Account>>
   var accountTasks = GetTasksForAccountWithSubAccounts(someAccountNumber);
   accountTasks.RunTasks(5, task =>
      {
         if (task.Status == TaskStatus.Faulted)
         {
            Console.WriteLine(task.Exception);
            // ....deal with exception....
         }
         else
         {
            var account = task.Result;
            //.... do stuff....
         }
      });

Possibilities for changes & improvements:

  • I’ll probably create a variation that uses WhenAny instead of WaitAny.  WhenAny returns a Task<T> instead of the index. Thus, the action parameter would become a Func<Task<T>>. This would allow the entire extension method to be async, which could certainly be desirable.
  • The delegate is currently an Action.  It could be changed to a Func to allow a bool to be returned to determine if the entire process should be cancelled. The other option is to have an overload that includes a cancellation token, which would be consistent with the async/await pattern.

Related Blog Articles:

Also check out Concurrency in c# Cookbook, by Stephen Cleary. I just finished the pre-release edition and it was extremely helpful.

Here’s the full source of the gist:

Unit test verification using StringBuilderStream

UPDATE!
Thanks to reddit user Porges, I now know that you *can* access the underlying buffer of a memory stream after it’s disposed. Use the ToArray or GetBuffer methods.

I recently had to write a unit test for a method that called the following private function:


private async Task WriteStreamToFileAsync(Stream stream, string filePath)
{
  // FileSystem.OpenFile is a wrapper for System.IO.OpenFile...
  using (var fileStream = FileSystem.OpenFile(filePath, FileMode.CreateNew))
  {
    await stream.CopyToAsync(fileStream);
    await stream.FlushAsync();
  }
}

The FileSystem object is a very thin wrapper around the System.IO.* methods of the same name. This allows a Mock to be injected for the unit tests. During the Setup() for the unit test, I new’d up a class level MemoryStream object and had it returned by the Mock call to OpenFile when the FileMode passed in is set to CreateNew. The goal here is to verify what was written to the MemoryStream after each unit test.

     private MemoryStream writableStream;
     private Mock<fileSystem> fileSystem;
     [Setup]
     public void Setup() {
         writableStream = new MemoryStream();
         fileSystem= new Mock<fileSystem>();
         fileSystem.Setup(f => f.OpenFile(It.IsAny<string>(), FileMode.CreateNew))
                .Returns(writableStream);
         //... other setups omitted.
     }

Then I wrote some unit tests that called the method being tested (which subsequently called the WriteStreamToFileAsync function that writes to the stream), followed by the verification. The verification consisted of seeking back to the start of the stream, and then reading the stream to verify its contents.

However when I actually ran the test, it threw an exception stating that the stream was closed. This makes sense after inspecting the method that uses the stream. It’s wrapped in a Using statement: of course the stream would be closed since its being disposed. Quite inconveniently, Microsoft decided to stay true to the philosophy of the Dispose method and does not allow the stream to be reopened. I haven’t looked under the hood, but I’m guessing any memory allocated for storage is released upon disposal. Bummer.

We could modify the WriteStreamToFileAsync function, but changing production code to conform to unit tests is bad for your production code (in this case, a potential memory leak) and creates brittle tests. Another option is to feed the unit test a stream that writes to a file on disk, then the verification could read the file to assert its contents. That will work, but it would really slow down the unit tests and introduce an unnecessary external dependency. What’s really needed is a stream that stores data in memory, but still exposes its underlying data storage mechanism after being disposed.

With those requirements in mind, I wrote StringBuilderStream. As the name implies, it uses a StringBuilder object as the underlying storage mechanism. The data I was trying to verify was text, so using a StringBuilder for storage has the added benefit of giving me a string that I could immediately verify without pulling it out of a stream. Also, a StringBuilder will automatically grow as needed, just like a MemoryStream object. (The full gist for StringBuilderStream is at the bottom of this post.)

Now using the StringBuilderStream, here’s my working unit test:

     private const string testString = "ipsum lorem blah blah blah...";
     private ObjectToBeTested objectToBeTested;
     private StringBuilderStream writableStream;
     private Mock<fileSystem> fileSystem;
     [Setup]
     public void Setup() {
         writableStream = new StringBuilderStream();
         fileSystem= new Mock<fileSystem>();
         fileSystem.Setup(f => f.OpenFile(It.IsAny<string>(), FileMode.CreateNew))
                .Returns(writableStream);
         //... other setups omitted.
         objectToBeTested = new ObjectToBeTested(fileSystem.Object);
     }
     [Test]
     public void WriteToFile() {
        objectToBeTested.WriteToFile(testString);
        // you don't need to seek to the beginning, just use the overloaded ToString method....
        Assert.That(writableStream.ToString(), Is.EqualsTo(testString));
     }

A few notes:

  • You should still dispose of a StringBuilderStream object; I didn’t include the teardown for brevity.
  • The StringBuilderStream does support seeking and reading via the standard Stream methods.
  • If you seek to a spot other than the end and start writing, everything from that point to the end of the StringBuilder is erased. I know that’s not typical Stream behavior but it was a shortcut to save time. I may change that later but I don’t see a lot of use cases for seeking inside a StringBuilderStream.

Here’s the gist for StringBuilderStream: