Andrew's Adventures in Technology

Streaming UUEncoder in .NET.

2015-04-24T17:09:00.001-07:00

Flashback

The last time I used Unix-to-Unix format (AKA UUEncoding) was when USENET was still the big thing and Mosaic web browser was just coming out. That was until recently, when I had a requirement to encode and decode this file type.

Searching for an Implementation

Since Base64 has largely replaced this older format, it was hard to find a current implementation for the .NET platform. I did run across a port of KDE's kcodecs. Although the port wasn't a streaming solution in the context of implementing the Stream class. Also, it allocated a lot of one item byte arrays using the ReadByte call for each character.

Creating an Implementation

Originally I tried to create my own solution by implementing the .NET Encoder class but the interface didn't fit the requirements of UUEncoding. For example, the GetBytes call works on a per character basis whereas UUEncoding takes 3 characters at a time. Also, a header and footer needs to be written, and the encoded payload is segmented into lines prefixed by encoded line lengths.

I ended up creating my own encoder class that was scoped to only handle data line by line.

public static class UUEncoder
{
  // Assumes the current position is at the start of a new line.
  public static byte[] DecodeLine(Stream buffer)
  {
    // ...
  }  
  public static byte[] EncodeLine(byte[] buffer)
  {
    // ...
  }
}

I then created encode and decode Stream classes that depended on the encoder. Having the encoding and decoding happening in a Stream based way was critical for my requirements since I was lazily evaluating the data and wouldn't just read it all up front. This was important since some of the files tended to be Gigabytes in size and an in-memory solution would have created an unacceptable memory footprint. Along with the nastiness that potentially comes with it like thrashing.

Using the Code

You can find my implementation, with tests, on Github here.

To decode any stream:

using (Stream encodedStream = /* Any readable stream. */)
using (Stream decodedStream = /* Any writeable stream. */)
using (var decodeStream = new UUDecodeStream(encodedStream))
{ 
  decodeStream.CopyTo(decodedStream);
  // Decoded contents are now in decodedStream.
}

To encode any stream:

bool unixLineEnding = // True if encoding with Unix line endings, otherwise false.
using (Stream encodedStream = /* Any readable stream. */)
using (Stream decodedStream = /* Any writeable stream. */)
using (var encodeStream = new UUEncodeStream(encodedStream, unixLineEnding))
{
  decodedStream.CopyTo(encodeStream);
  // Encoded contents are now in encodedStream.
}

Note on Licensing

I published the code under version 2 (not 2.1) of the LGPL since I took the bit twiddling and encoder maps from KDE's implementation.

More Resources & Reading

Get Windows Service name from executable in PowerShell.

2015-04-20T22:50:00.002-07:00

I was recently putting some PowerShell scripts together for deployment and maintenance of software to our machine instances. One of the requirements was to be able to discover the service name from a Windows Service executable that uses ServiceInstaller. I needed to be able to extract this value in a generic way in order to query the service and stop it if running. Here is what I was able to put together.

Function Get-WindowsServiceName([string]$exePath)
{
    $assembly = [System.Reflection.Assembly]::LoadFrom($exePath)
    $type = $assembly.GetTypes() | Where { $_.GetCustomAttributes([System.ComponentModel.RunInstallerAttribute], 0).Length -gt 0 } | Select -First 1
    $installer = [System.Configuration.Install.Installer][Activator]::CreateInstance($type)
    $serviceInstaller = [System.ServiceProcess.ServiceInstaller]$installer.Installers[0]
    $serviceInstaller.ServiceName
}

Note that this doesn't support multiple installers in a single executable.

Log per class pattern.

2015-03-29T14:12:00.002-07:00

Rookie Moves

Awhile ago, I had originally created a single logger for each service and shared it statically across the application.

public static class Log
{
  private readonly static Lazy<Log> instance = new Lazy<Log>(() => new Log(), true);
  public static Log Instance
  {
    get { return instance.Value; }
  }
}

It always had felt strange doing this since it violated encapsulation. I was referencing a static instance inside my objects without ensuring the existence of the log instance itself, while also making assumptions about the instance's state. Essentially reaching outside the class every time to the singleton, which is a global state instance in my case.

Best Practices

As I researched more on best practices with logging, logger per class appeared to be the best pattern since it offered the most fine grain control with respect to filtering and configuration.

When using logging in your class, you should separate the concern of how the log gets created from the use of the log. This can be achieved by having a factory accessor.

public class Log
{
  // Set your factory property to the actual log implementation you wish to use.
  public static Func<Type, Log> Factory { get; set; }

  // Instance based properties and methods.
}

You can also use the abstract factory pattern if you have to wrap your logging implementations.

public abstract class LogBase
{
  // Set your factory property to the actual log implementation you wish to use.
  public static Func<Type, LogBase> Factory { get; set; }

  // Abstract properties and methods.
}

Then just call the factory by passing in the class type it is for.

public class SomeObject
{
  private static readonly LogBase log = Log.Factory(typeof(SomeObject));
}

The Difference

It may not seem like much of a change but subtle differences are happening:

Calling class can now communicate it's state to the log's creation.
Log creation is no longer the responsibility, implicitly or otherwise, of the class.

In my case, this was very liberating since we had several log implementations in our large codebase. I no longer needed to worry about which log implementation was being used by messing with the singleton construction, and could leverage filtering when I need to isolate a single component or service that was causing trouble. Something that I couldn't do before.

Creating XML based templates in log4net.

2015-03-28T23:51:00.002-07:00

Motivation

I decided to use log4net for a recent project I had been working on. There is a concept of loggers and appenders. Loggers are composed of appenders and a log threshold. Appenders are consumers of logging information and provide specific implementations (eg. file, email, event log, database). You can configure the loggers and appenders either in the application configuration or at runtime.

Configuring logging in the application configuration provides the most flexibility. It is great being able to change settings on the fly, especially when it is running in a production environment and redeploying the build is out of the question. Although this approach comes at the expense of having a lot of information in your application configuration for the loggers and appenders. No big deal though if you just have to configure once.

Why Templates?

I had quite a bit of projects with a lot of redundant logging configuration information in each one's application configuration file. Much of the information had a standard form that we wanted uniform across our different projects too (eg. log file name conventions, event log setup, email format). Also, if we updated the logging appender configuration for a new standard, we would need to do it to every project's application configuration file. This is where templates came into play.

Writing the Code

To cut down the amount of configuration information needed to start a new project with logging and make the configuration more uniform where needed, we offloaded it into the code and left the rest in the application configuration file like so.

<log4net xsi:noNamespaceSchemaLocation="log4net.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<logger name="LoggerTemplate">
  <appender name="SmtpAppenderTemplate" type="log4net.Appender.SmtpAppender">
    <to value="peter@initech.com" />
    <from value="system@initech.com" />
    <smtpHost value="mail.initech.com" />
    <username value="peter" />
    <password value="abc123" />
  </appender>
</logger>

<root>
  <level value="INFO" />
</root>

</log4net>

We don't use the logger directly but rather as a template for our root logger. Now we just need to craft a method to consume the template and create the root appenders at runtime.

/// <summary>
/// Get appenders matching the logger template name and use them to populate the root appenders at runtime.
/// </summary>
/// <param name="loggerTemplateName">The logger template name found in the application configuration.</param>
public static void ConfigureRootFromTemplate(string loggerTemplateName)
{
  ILog logTemplate = LogManager.GetLogger(loggerTemplateName);

  if (logTemplate == null)
  {
    throw new ArgumentException(
      String.Format(
        "Logger template {0} not found in log4net configuration. Make sure there is an " +
        "logger in the log4net configuration with the name {0}.",
        loggerTemplateName),
        "loggerTemplateName");
  }

  IAppender[] appenderTemplates = logTemplate.Logger.Repository.GetAppenders();
  var smtpAppenderTemplate = appenderTemplates.FirstOrDefault(a => a is SmtpAppender) as SmtpAppender;

  if (smtpAppenderTemplate == null)
  {
    throw new ArgumentException(
      String.Format(
        "SmtpAppender template not found in log4net configuration. Make sure there is an " +
        "SmtpAppender in the log4net {0} logger.",
        loggerTemplateName),
        "loggerTemplateName");
  }

  // Can repeat the above pattern with other appenders as well.
  // Create appenders using the template information from above.
  
  AddAppendersToRootAndConfigure(
    new AppenderCollection 
 {
      // Put your created appenders here.
    });
}

private static void AddAppendersToRootAndConfigure(AppenderCollection appenders)
{
  // Get the log repository.
  var hierarchy = (Hierarchy)log4net.LogManager.GetRepository();
  // Get the root logger.
  Logger rootLogger = hierarchy.Root;
  foreach (var appender in appenders)
  {
    // Add all the appenders and activate.
    rootLogger.AddAppender(appender);
    ((AppenderSkeleton)appender).ActivateOptions();
  }
  // Flag root logger as configured with new appender information.
  rootLogger.Hierarchy.Configured = true;
}

Then just call the configuration method at the application's startup.

class Program
{
  /// 
  /// The main entry point for the application.
  /// 
  static void Main()
  {
    // Other startup configuration code.

    log4net.Config.XmlConfigurator.Configure(); // Load the application configuration information.
    Log.ConfigureRootFromTemplate("LoggerTemplate");

    // More startup configuration code.
  }

Considerations

I don't recommend using this approach in all cases. It definitely cuts down the amount of application configuration needed but at the cost of information hiding, since it has been moved to code. Also, it may not be obvious to an uninitiated developer what your application configuration is doing, especially since this template approach is not encoded into the structure of log4net's XML. Although, if you have many projects and need to effect changes to logging across them, this may be a good solution for you.

Using type converters for JSON.NET.

2015-03-28T22:10:00.000-07:00

Motivation

I had these flyweights that added a lot of overhead to the serialization process. They weren't really needed in the serialized payload either. In fact, I could recreate the flyweight in memory from just a single property on the object.

public class FlyweightObject
{
   public string Key { get; private set; }
   public string AProperty { get; private set; }
   public string AnotherProperty { get; private set; }
   public string YetAnotherProperty { get; private set; }
   // ... Lots of properties.

   // Overridden GetHashCode and Equals methods to make equality by Key.
}

public class FlyweightObjectFactory
{
   public static FlyweightObjectFactory Instance { get; private set; }

   // Singleton initialization code.

   public FlyweightObject GetObject(string key)
   {
      // Get from dictionary, or create object and add to dictionary.
      // Return object from dictionary.
   }
}

Nothing out of the ordinary here. Although, these flyweights were also used as keys in a dictionary I was serializing. This was a problem too because only scalars can be used when serializing dictionaries with JSON.NET. It does mention that I can use a type converter though.

Serializing

Let focus on serializing this object to a scalar first. The JSON.NET serialization guide mentions that I can override the Object.ToString method. So let's do that.

public class FlyweightObject
{
   public string Key { get; private set; }
   public string AProperty { get; private set; }
   public string AnotherProperty { get; private set; }
   public string YetAnotherProperty { get; private set; }
   // ... Lots of properties.

   // Overridden GetHashCode and Equals methods to make equality by Key.

   public override string ToString()
   {
      return this.Key;
   }
}

I'm done right? Who needs type converters? Well this doesn't help us when I need to deserialize. This is where type converters come into play.

Implementing a Type Converter

First we have to provide the concrete implementation of our type converter class for the flyweight.

internal class FlyweightObjectConverter
   : TypeConverter
{      
   /// <summary>
   /// Returns whether this converter can convert an object of the given type to the type of this converter, using the specified context.
   /// </summary>
   /// <returns>
   /// true if this converter can perform the conversion; otherwise, false.
   /// </returns>
   /// <param name="context">An  that provides a format context. </param>
   /// <param name="sourceType">A <see cref="T:System.Type"/> that represents the type you want to convert from. </param>
   public override bool CanConvertFrom(ITypeDescriptorContext context, Type sourceType)
   {
      return sourceType == typeof(string) || base.CanConvertFrom(context, sourceType);
   }

   /// <summary>
   /// Converts the given object to the type of this converter, using the specified context and culture information.
   /// </summary>
   /// <returns>
   /// An <see cref="T:System.Object"/> that represents the converted value.
   /// </returns>
   /// <param name="context">An <see cref="T:System.ComponentModel.ITypeDescriptorContext"/> that provides a format context. </param>
   /// <param name="culture">The <see cref="T:System.Globalization.CultureInfo"/> to use as the current culture. </param>
   /// <param name="value">The <see cref="T:System.Object"/> to convert. </param><exception cref="T:System.NotSupportedException">The conversion cannot be performed. </exception>
   public override object ConvertFrom(ITypeDescriptorContext context, CultureInfo culture, object value)
   {
      return FlyweightObjectFactory.Instance.GetObject((string)value);
   }
}

Then attribute the convertible class with the implementation.

[TypeConverter(typeof(FlyweightObjectConverter))]
public class FlyweightObject
{
   public string Key { get; private set; }
   public string AProperty { get; private set; }
   public string AnotherProperty { get; private set; }
   public string YetAnotherProperty { get; private set; }
   // ... Lots of properties.

   // Overridden GetHashCode and Equals methods to make equality by Key.
}

Good to go. Now I can deserialize the flyweights. The JSON.NET secret sauce can be found in the static method Newtonsoft.Json.Utilities.ConvertUtils.TryConvertInternal. This method is used by the internal reader class for its own deserialization method and consequently for the final Newtonsoft.Json.JsonSerializer class, which is the lynchpin for all the serialization helper methods in the JSON.NET ecosystem.

Considerations

Instead of overriding the Object.ToString method, I could implement the TypeConverter.CanConvertTo and TypeConverter.ConvertTo methods. It is really up to you. In my case, I already had the Object.ToString method overridden since the key property uniquely identifies the object without having to create an implicitly structured string (ie. comma delimited properties as a single string; "Prop1|Prop2|Prop3").

If you only need a type conversion for JSON, you can also use JSON.NET's own custom type converters. Although this doesn't work in the case of dictionary keys.

Lastly, if you have a complex type as your dictionary key, you may want to consider just serializing it as a collection and then deserializing it using a JSON.NET custom converter.

Using CC.NET & Gallio for priority based smoke testing.

2014-10-17T11:07:00.002-07:00

Pitfalls in Production

Being able to monitor production services for potential errors is critical. Especially if the services are dependant on external resources which may become unavailable or exhibit unexpected behavior. Even if you follow good software development discipline, this is always a source of concern. Think network outages, unannounced third party API changes, hosted services becoming unavailable, etc.

For large software projects, creating a testing strategy that involves unit and integration is helpful when managing complexity of the commit to deployment workflow. Functional/smoke tests are also good to ensure critical functionality works as expected. Although, in an environment where the running of your software is dependent on external resources, you need a system of continuous monitoring that run these smoke tests.

Monitoring Confusion

At Xignite, we use Gallio for our testing framework and CC.NET for our continuous integration. I used these tools for my production smoke tests but soon realized that not all tests were equal. Getting paged at 2am for a test failure that is not mission critical sucks. Even worse, these lower priority failures can mask very high priority ones since they cause the entire test fixture to fail, and require scrutiny on behalf of the dev-ops person to ensure that a high priority failure doesn't fall through the cracks.

Consider the following test fixture and how it lumps all the different tests together.

namespace MyServices.Tests.Smoke
{
   using Gallio.Framework;
   using Gallio.Model;
   using MbUnit.Framework;

   [TestFixture]
   public class OneOfMyServicesTests
   {
      [SetUp]
      public void Setup()
      {
         // Setup an individual test to run.
      }

      [FixtureSetUp]
      public void TestFixtureSetUp()
      {
         // Configure fixture for testing.
      }

      [FixtureTearDown]
      public void TestFixtureTearDown()
      {
         if (TestContext.CurrentContext.Outcome == TestOutcome.Failed)
         {
            // Send signal to monitoring system that test fixture has failed.
         }
         else if (TestContext.CurrentContext.Outcome == TestOutcome.Passed)
         {
            // Send signal to monitoring system that test fixture has succeeded.
         }
         else
         {
            // Handle some other outcome.
         }
      }     

      [Test]
      public void MissionCritical()
      {
         // ...
      }

      [Test]
      public void Important()
      {
         // ...
      }

      [Test]
      public void MinorFunctionality()
      {
         // ...
      }     
   }
}

Any test failure will make the entire context outcome to fail whether it was the mission critical test or the test that affects minor functionality. I tried looking through the Gallio/MbUnit API, event its source, but couldn't find a way to find out which tests failed within a fixture. If anyone knows how to determine this, please let me know.

Prioritized Testing

What I do know though is that you can inherit the TestAttribute class and override its Execute method. I created a required parameter to specify the priority of the test and then used a LoggedTestingContext class to store all the results.

namespace MyServices.Tests
{
   using Gallio.Framework.Pattern;
   using MbUnit.Framework;

   public class LoggedTestAttribute
      : TestAttribute
   {
      public const int MinPriority = 1;
      public const int MaxPriority = 3;

      private readonly int priority;
      public int Priority { get { return this.priority; } }

      public LoggedTestAttribute(int priority)
      {
         if (priority < MinPriority || priority > MaxPriority)
         {
            throw new ArgumentException("Priority must be 1, 2, or 3.", "priority");
         }
         this.priority = priority;
      }

      protected override void Execute(PatternTestInstanceState state)
      {
         try
         {
            base.Execute(state);
            LoggedTestingContext.AddTest(this, state.Test, true);
         }
         catch (Exception)
         {            
            LoggedTestingContext.AddTest(this, state.Test, false);
            throw;
         }
      }
   }
}

namespace MyServices.Tests
{
   using System;
   using System.Collections.Generic;
   using System.Linq;
   using Gallio.Framework;
   using Gallio.Model.Tree;

   public static class LoggedTestingContext
   {
      private class TestFailure
      {
         public string FullName { get; private set; }

         public LoggedTestAttribute TestAttribute { get; private set; }

         public TestFailure(LoggedTestAttribute testAttribute, Test test)
         {
            this.FullName = test.FullName;
            this.TestAttribute = testAttribute;
         }
      }

      private const int PriorityCount = LoggedTestAttribute.MaxPriority - LoggedTestAttribute.MinPriority + 1;
      
      private static readonly Dictionary nameToFailure = new Dictionary();

      internal static void AddTest(LoggedTestAttribute testAttribute, Test test, bool passed)
      {         
         if (passed)
         {
            return;
         }
         var failure = new TestFailure(testAttribute, test);
         if (!nameToFailure.ContainsKey(failure.FullName))
         {
            nameToFailure.Add(failure.FullName, failure);
         }
      }

      private static bool HasFailed(Test fixtureTest, int priority)
      {
         return fixtureTest.Children
            .Any(c =>
               nameToFailure.ContainsKey(c.FullName) &&
               nameToFailure[c.FullName].TestAttribute.Priority == priority);
      }

      public static void LogSmokeTests(Test fixtureTest, string serviceName)
      {     
         foreach (var priority in Enumerable.Range(LoggedTestAttribute.MinPriority, PriorityCount))
         {
            if (HasFailed(fixtureTest, priority))
            {
               // Send signal to monitoring system that test fixture has failed for priority # tests.
            }
            else
            {
               // Send signal to monitoring system that test fixture has succeeded for priority # tests.
            }
         }
      }
   }
}

Finally, putting it together, I replaced the TestAttribute with the new LoggedTestAttribute and then process the results in the test fixture teardown.

namespace MyServices.Tests.Smoke
{
   using Gallio.Framework;
   using Gallio.Model;
   using MbUnit.Framework;

   [TestFixture]
   public class OneOfMyServicesTests
   {
      [SetUp]
      public void Setup()
      {
         // Setup an individual test to run.
      }

      [FixtureSetUp]
      public void TestFixtureSetUp()
      {
         // Configure fixture for testing.
      }

      [FixtureTearDown]
      public void TestFixtureTearDown()
      {
         LoggedTestingContext.LogSmokeTests(TestContext.CurrentContext.Test, "OneOfMyServices");
      }     

      [LoggedTest(1)]
      public void MissionCritical()
      {
         // ...
      }

      [LoggedTest(2)]
      public void Important()
      {
         // ...
      }

      [LoggedTest(3)]
      public void AffectsFunctionalityByDoesntRequireImmediateAttention()
      {
         // ...
      }     
   }
}

State behavioral pattern to the rescue.

2014-03-17T22:27:00.000-07:00

The Creeping Problem

I recently found myself developing a request-response style system, where the lifetime of a request could be interrupted at any moment. For most same process execution, like your average desktop application, this is a concern, but arises more often when dealing with multiple coordinated processes. My case ended up being the latter.

One of the ways to ensure redundancy is to isolate the steps of the request-response workflow into isolated atomic units or states. This way, if it fails, it can always be re-executed without having to perform the work that came before it. It is especially helpful when the total resources required are large and there is a higher probability of failure. We can just divvy up the work into states that act like idempotent functions. Below is a great simplification of the actual project I worked on but I wanted to boil it down to its simplest form, eliminating excessive states that I have collapsed to the CreateResponse state.

In my original implementation, I modeled the requests as queued items (UserRequest) that I would dequeue and start work on.

foreach (var request in requests) // as IEnumerable
{
   switch (request.State)
   {
      case State.ReceiveRequest:
         if (TryReceiveRequest(request)) request.State = State.CreateResponse;
         break;
         
      case State.CreateResponse:
         if (TryCreateResponse(request)) request.State = State.SendResponse;
         break;

      case State.SendResponse:
         if (TrySendResponse(request)) request.State = State.ResponseSent;
         break;

      case State.ResponseSent:
         break;

      case State.Faulted:

      default:
         throw new ArgumentOutOfRangeException("request.State");
   }
   if (request.State != State.ResponseSent && request.State != State.Faulted)
      requests.Enqueue(request);
}

Seems simple enough, but in my case the CreateResponse state ended being fairly computationally intensive and could take anywhere from a few seconds to several minutes. These long delays could be due to the workload of remote processes it was waiting on, temporal failure points like the network or even the system the process was running on. Another added complexity was that these requests were being serviced in parallel, by multiple processes that could be on the same system or not. Lastly, actual production level code never ends up being this simple; you quickly find yourself adding a lot of instrumentation and covering of edge cases.

foreach (var request in requests)
{
   log.LogDebug("request.Id = {0}: Request dequeued in state {1}.", request.Id, request.State);
   switch (request.State)
   {
      case State.ReceiveRequest:
         logger.LogDebug("request.Id = {0}: Trying to receive request.", request.Id);
         if (TryReceiveRequest(request)) request.State = State.CreateResponsePart1;
         break;
         
      case State.CreateResponsePart1:
         logger.LogDebug("request.Id = {0}: Trying to create response for part 1.", request.Id);
         if (TryCreateResponsePart1(request)) request.State = State.CreateResponsePart2;
         break;

      case State.CreateResponsePart2:
         logger.LogDebug("request.Id = {0}: Trying to create response for part 2.", request.Id);
         if (TryCreateResponsePart2(request))
         {
            request.State = State.CreateResponsePart3;
         }
         else
         {
            request.State = State.CreateResponsePart1;
            ExecuteCreateResponsePart2Cleanup();
            logger.LogError("request.Id = {0}: Unexpected failure while evaluation create response part 2.", request.Id);
         }
         break;

      case State.CreateResponsePart3:
         logger.LogDebug("request.Id = {0}: Trying to create response for part 3.", request.Id);
         bool unrecoverable;
         if (TryCreateResponsePart3(request, out unrecoverable))
         {
            request.State = State.SendResponse;
         }
         else
         {
           if (unrecoverable)
           {
               logger.LogError("request.Id = {0}: Failure is unrecoverable, faulting request.", request.Id);
               request.State = State.Faulted;
           }
           else
           {
               request.State = State.CreateResponse2;
           }
         }
         break;

      case State.SendResponse:
         logger.LogDebug("request.Id = {0}: Trying to send response.", request.Id);
         if (TrySendResponse(request)) request.State = State.ResponseSent;
         break;

      case State.ResponseSent:
         break;

      case State.Faulted:
         logger.LogCritical("request.Id = {0}: Request faulted.", request.Id);
         break;

      default:
         throw new ArgumentOutOfRangeException("request.State");
   }
   log.LogDebug("request.Id = {0}: Request transitioned to state {1}.", request.Id, request.State);
   if (request.State != State.ResponseSent && request.State != State.Faulted)
   {
      logger.LogDebug("request.Id = {0}: Re-enqueuing request for further evaluation.", request.Id);
      requests.Enqueue(request);
   }
   else
   {
      logger.LogDebug("request.Id = {0}: Request evaluation is complete, not re-enqueuing.", request.Id);
   }
}

What a mess! This code quickly starts getting bloated. In addition, not every state evaluation will be successful and be considered exceptional. Maybe it is polling another process and can't transition until that process is ready. As the result of each state evaluation changes beyond a simple yes/no (true/false), we end up with a state machine that could have multiple transitions. This makes for ugly code and too much coupling. All the state evaluation logic is in the same class and you have this huge switch statement. We could get around the extensive logging by using dependency injection but what do we inject? There is no consistent call site signature to inject to. The ever growing case statements could be extracted into their own method, but then readability suffers. This sucks.

You may be saying, "Well you obviously could solve it by ..." and I would agree with you. This code ugliness could be solved many different ways, and is intentionally crap for the purpose of this post. The major problems I faced was:

A large state machine object and large code blocks.
Lack of symmetry in state handling.
Multiple method postconditions that couldn't be expressed by the boolean return result alone.
Coupling of state transition logic, business logic and diagnostics.

I knew something was wrong but I wasn't quite sure how to solve it without adding more complexity to the system and allowing readability to suffer. As someone who has had to spend hours reading other people's unreadable code, I didn't want to commit the same sin.

Looking For A Solution

In university, they teach you how to be a good Computer Scientist; you learn complexity analysis, synthetic languages and the theoretical underpinning of computation. Although, none of this really prepares you to be a software engineer. I could concoct my own system, by why do this when I can stand on the shoulders of giants.

I always read or heard references to the Gang of Four book, even listened to talks by the original authors and became familiar with some of the more famous patterns (Factory and Singleton come to mind). Maybe there was a solution in there. I can't be the first one to come across this simple design problem. So there I found it in the State design pattern.

The design is pretty simple. You have a context that is used by the end user, and the states themselves wrapped by the context. The context can have a range of methods that behave differently based on the concrete state type being used at that moment (eg. behavior of a cursor click in a graphics editor). I modified this design, using a single method to abstract workflow and act as a procedural agent for processing multiple state machines.

The Code

The first step was to construct a state object that will be the super type to all of my concrete states.

public abstract class StateBase   
{
   // Let the concrete type decide what the next transition state will be.
   protected abstract StateBase OnExecute();  

   public StateBase Execute()
   {
      // Can add diagnostic information here.
      return this.OnExecute();   
   }   
}

Next I need a context class that can create and run the state machine.

public abstract class StateContextBase
{
   private StateBase state;   

   protected abstract StateBase OnCreate();
   protected abstract StateBase OnExecuted(StateBase nextState);
   protected abstract bool OnIsRunning(StateBase state);

   public StateContextBase(StateBase state)
   {
      this.state = state;
   }

   public StateContextBase Execute()
   {
      // Need to create the state machine from something.
      if (this.state == null)
      {
         // We will get to this later.
         this.state = this.OnCreate();
      }
      // Let the concrete context decide what to do after a state transition.
      this.state = this.OnExecuted(state.Execute());
      return this;
   } 
 
   public bool IsRunning()
   {
      // Have the concrete type tell us when it is in the final state.
      return this.OnIsRunning(this.state);
   }
}

While glossing over the details, what will this look like at the application's entry point.

class Program
{
   static void Main(string[] args)
   {
      // Will need to get it from somewhere but won't worry about this for now.
      var requests = Enumerable.Empty<StateContextBase>();

      // Can be changed to false on an exit call.
      var running = true;
      while (running)
      {    
         requests = requests
            .Where(r => r.IsRunning())
            .Select(r => r.Execute());
      }
   }
}

That is beautiful! All I see is the state machine decider logic and I don't even need to be concerned with what type of state machines are running.

So let's dive into the details. First, there is the creation of the state into memory. We have to get this from somewhere, so let's add another abstraction on top of our StateBase super type. Something that can be persisted in case the process crashes and can be accessed across many systems (eg. database).

In my case, I used the Entity Framework ORM, which is based off of the unit of work and repository design patterns. There is a context (DataContext) that I will get my model object (UserRequest) from to figure out the current state. A unique key (UserRequest.Id : Guid) will be used to identify the persisted object. We won't concern ourselves as to why this is just unique and not an identity key (that could be in another post) but it basically comes down to the object's initial creation at runtime not relying on any persistence store for uniqueness.

public class DataContext
   : System.Data.Entity.DbContext
{
   public DbSet UserRequests { get; set; }

   public DataContext()
      : base("name=DataContext")
   {
   }
}

public abstract class PersistedStateBase<TEntity>
   : StateBase
   where TEntity : class
{
   private Guid id;

   protected abstract StateBase OnExecuteCommit(DataContext context, Guid id, TEntity entity);
   protected abstract TEntity OnExecuteCreate(DataContext context, Guid id);
   protected abstract StateBase OnExecuteRollback(DataContext context, Guid id, TEntity entity);

   public PersistedStateBase(Guid id)
   {
      this.id = id;
   }

   protected override StateBase OnExecute()
   {
      // Also consider exceptions thrown by DataContext.
      StateBase nextState = this;
      using (var context = new DataContext())
      {
         TEntity entity = null;
         try
         {
            entity = this.OnExecuteCreate(context, this.id);
            nextState = this.OnExecuteCommit(context, this.id, entity);
            context.SaveChanges();
         }
         catch (Exception ex)
         {
            // Handle exception.
            nextState = this.OnExecuteRollback(context, this.id, entity);
         }
      }
      return nextState;
   }
}

The model object (UserRequest, our entity type) will hold the state as an enumeration (UserRequest.State) and contain all the data needed for processing through the state machine.

public enum UserRequestState
{
   None = 0,  
   Receive = 1,  
   CreateResponse = 3,
   SendResponse = 4,
   ResponseSent = 5,
   Faulted = -1,  
}

[DataContract]
public class UserRequest
{
   [DataMember]
   public Guid Id { get; private set; }
   [DataMember]
   public UserRequestState State { get; private  set; }

   // Other properties here like the location of the user request and other metadata.

   private UserRequest()  // Required by EF to create the POCO proxy.
   {}

   public UserRequest(Guid id, UserRequestState state)
   {
      this.Id = id;
      this.State = state;
   }
}

Now let's implement our first state using the types we have created.

public class ReceiveState
   : PersistedStateBase<UserRequest>
{
   public ReceiveState(Guid id)
      : base(id)
   {}
  
   protected override StateBase OnExecuteCommit(DataContext context, Guid id, UserRequest entity)
   {      
      var successful = false;
      var faulted = false;
      // Receive user request and decide whether successful, unsuccessful with retry or
      // unrecoverable/faulted. 
      if (successful)
      {
         return new CreateResponseState(id);
      }
      else
      {
         return faulted ? new FaultedState(id) : this;
      }
   }

   protected override UserRequest OnExecuteCreate(DataContext context, Guid id)
   {
      // Get model object
      return context.UserRequests.Find(id);
   }

   protected override StateBase OnExecuteRollback(DataContext context, Guid id, UserRequest entity)
   {
      // Rollback any changes possibly made in the OnExecuteCommit method and attempt recovery,
      // if possible, in this method. For this example, we will just return the current state.
      return this;
   }
}

We need to also make our state context concrete with the type below. This tends to have more wiring since type per state doesn't really translate well in an ORM. This class could be greater simplified with attributes on the state types, designating the enumeration value they map to.

public class UserRequestContext
   : StateContextBase
{
   private static Dictionary<Type, UserRequestState> typeToDbState;
   private static bool databaseRead = false;

   public Guid Id { get; private set; }

   static UserRequestContext()
   {
      databaseRead = false;
      typeToDbState = new Dictionary<Type, UserRequestState>()
      {
         { typeof(ReceiveState), UserRequestState.Receive },
         { typeof(CreateResponseState), UserRequestState.CreateResponse},
         { typeof(SendResponse), UserRequestState.SendResponse},
         { typeof(ResponseSent), UserRequestState.ResponseSent},
         { typeof(FaultedState), UserRequestState.Faulted },
      };
   }

   public UserRequestContext(Guid id)
      : base(null)
   {
      this.Id = id;
   }

   public static IEnumerable<Guid> GetRunningIds()
   {
      if (UserRequestContext.databaseRead)
      {
         var ids = Enumerable.Empty<Guid>(); // Get from message queue.    
         return ids;
      }
      else
      {
         using (var dataContext = new DataContext())
         {
            var ids = dataContext.UserRequests
               .Where(u => 
                  u.State != UserRequestState.ResponseSent &&
                  u.State != UserRequestState.Faulted)
               .Select(u => u.Id)
               .ToArray(); // Force evaluation.

            UserRequestContext.databaseRead = true;

            return ids;
         }
      }
   }

   protected override bool OnIsRunning(StateBase state)
   {
      return !(state is CompleteState);
   }

   protected override StateBase OnCreate()
   {
      using (var dataContext = new DataContext())
      {
         // Maps persisted state enumeration to runtime types.
         var entity = dataContext.UserRequests.Find(this.Id);
         switch (entity.State)
         {
            case UserRequestState.Receive:
               return new ReceiveState(this.Id);

            case UserRequestState.CreateResponse:
               return new CreateResponseState(this.Id);

            case UserRequestState.SendResponse:
               return new SendResponseState(this.Id);

            case UserRequestState.ResponseSent:
               return new ResponseSentState(this.Id);

            case UserRequestState.Faulted:
               return new FaultedState(this.Id);

            default:
               throw new ArgumentOutOfRangeException();
         }
      }
   }

   protected override StateBase OnExecuted(StateBase nextState)
   {
      // Run any other deciding logic in here that is independent 
      // of the states themselves (eg. logging, perf counters).

      return nextState;      
   }
}

Finally, let's come full circle and show what the application entry point will look like once all is said and done.

class Program
{
   static void Main(string[] args)
   {
      var requests = Enumerable.Empty<StateContextBase>();

      var running = true;
      while (running)
      {         
         requests = requests
            .Where(r => r.IsRunning())
            // Append new user requests found.
            .Concat((IEnumerable)UserRequestContext
               .GetRunningIds()
               .Select(i => new UserRequestContext(i)));
            .Select(r => r.Execute());
      }
   }
}

Ahhhh Yeah...

After fleshing it all out, I really got that satisfying feeling you get as a software engineer when you know that you made the right design decisions. I isolated my business logic into their own types (eg. ReceiveRequestState), separated it from the state machine transition logic, added symmetrical handling of persistence logic by layering it on top of the state type (PersistedStateBase) and contained the persistence-runtime bridge (from UserRequest to PersistedStateBase subtypes) into its own type (UserRequestContext). If I want to add more states, I can simply add to the model's state enumeration (UserRequest.State) and update the state context (UserRequestContext). If I want to change the transition logic, all I need to do is go to the concrete state type itself (eg. ReceiveRequestState) and feel confident that my variables are all scoped correctly. No coupling, no excessive mutations and no excessive side effects.

Using The Right Tool

This design pattern isn't for every state machine problem. In simple cases, it can definitely be overkill; you can see a bit of starter code is needed. Although, if you find yourself designing a state machine with multiple outbound transitions and final states, this could be the right modified pattern for you.

Understanding type theory (Part one: Lambda Calculus).

2013-04-14T20:57:00.002-07:00

Series Introduction

I initially was going to put this together as a single post but quickly realized that it would be a bit much to try and put all of the information into one post. To make it more digestable, I decided to break it up into separate parts. This is by no means a comprehensive study of lambda calculus or type theory, but rather a quick introductory series so you can can get started. My goal is that, by the end, you will be able to understand enough to start reading papers on your own and have some cursory knowledge of type systems. Also, I am targeting programmers that want to get a better understanding of the theory behind type systems in compiler design. If you are comfortable programming in functional languages, then you will start to see many parallels quickly. Although, I will give examples using an imperative approach to cover programmers who may not immediately appreciate these parallels.

Mathematical Formalisms

The mathematical underpinnings for describing computability is mostly famously attributed to Kurt Gödel, Alan Turing and Alonzo Church. Church and Turing took different routes to the same ending conclusion (ie. universal Turing machine, -calculus). What is now known as the Church-Turing thesis. In this post we will focus strictly on Church's work, more specifically his notation. By understanding the lambda calculus syntax, we will start to grasp an idealized model of a programming language and what it means later for learning type systems.

Notation & Basic Structure

Let's look at a function that returns the square of its input:

int f(int x) { return x * x; }

In lambda calculus, we can represent it as:

Let's break this down into its parts and start defining those pieces.

Term: Since x is a variable, it can be considered a term and the whole expression itself is also a term (specifically a lambda abstraction).
Bound Variable: Any variable in the term that is a parameter of the function. In this case it is x.
Free Variable: The term above has none since any argument to the term would be applied to all variables in it. If the term were to be instead, then y would be our free variable. We will give another example that demonstrates free variables later.
Abstraction Symbol: The symbol indicates the beginning of the function, followed by the parameter.
Function Body: Everything following the dot is part of the function body (ie. ).

Binding & Applying Terms

Now the switch from our named function (f) to the lambda term was not entirely without loss. The symbol is not the function name but rather indicates that the following symbol (x) is a function parameter, just like the period represents the beginning of the function body. This is where the term anonymous function comes from. We took a named function (f) and anonymized it. Now let's try defining the function without loss of information.

It can sometimes be taken for granted in programming that a function name is just another type of variable. The above example makes that pretty clear. Much like a program function name directly references the code it needs to execute, we bound the function to a variable (f) that directly references the term. Although the lowercase letter name isn't considered good convention in lambda calculus. Terms are normally bound to a capitalized letter in these situations; it would be better form to call it F instead.

That's better. What about function pointers? How would they be expressed in the lambda calculus? Let's define some function pointer (G), our previous function (F) and bind them in the C language:

int (*G)(int);
int F(int x) { return x * x; }

int main(void)
{
   G = &F;
   (*G)(2);
}

This is generally equivalent to:

I could have written our term as but this is implicit in the lambda calculus. Terms are always applied from left to right; first F is applied to G and then 2 is applied to that. Having these semantics eliminates the need for excessive parentheses. Now let's break down the application of the terms in steps.

The first line is like the declares in our C program (line 1 and 2). The second line is akin to when we assign the address of the function F to G (line 6) and dereference G below it (line 7). The third line is when we apply the value 2 to the function (line 7).

Multiple Parameters

You may have noticed that all of our terms only accept a single parameter. This is called the curried form or Schönfinkeled form if you want to get all crazy about it. You may have heard this mentioned before when people talk about closures. Since we can't demonstrate currying in C (nested functions aren't supported), let's do it in Python:

# uncurried form
def f(x, y):
    return x

# curried form using named functions
def g(x):
    def h(y):
        return x
    return h

# curried form using lambda functions
h = lambda x: lambda y: x

f(2,1)
g(2)(1)
h(2)(1)

I took a lot of liberties comparing the lambda terms earlier against the C program and is why I wrote that they were generally equivalent. You may have noticed how line 12 looks almost exactly like the actual lambda calculus notation form of . This is why programmers who know functional programming have a much easier time learning lambda calculus and type systems. In Python, the type system is dynamic in the sense that a single variable can be rebound to a different type during its lifetime. Also, lambda functions are explicitly available, where they were not in C. These kind of features are what differentiates different languages in terms of expressibility with respect to the lambda calculus.

Beta Reductions

You might not have known it, but by applying different terms, you were performing what is called a beta-reduction. This is the process of normalizing to its beta-redex term form. So in the case of FG, the beta-redex term is . Let's try something with more terms and reduce it.

That normalized nicely. Let's do another one.

This is a case where we could keep applying the terms ad-nauseum but that isn't needed because the repeating term is , which is also our beta-redex term. So there is no need to go any further. Now let's do another example using free variables.

I mentioned at the beginning I would give an example using free variables and here it is (ie. t). So what happened to 5? Well it was bound to w and w is not in the function body of the last lambda term (), so it didn't survive. Although t did and that is because t is not bound to anything, thus making it a free variable.

Beta reductions are a lot like refactoring some complex code down to its simpler and functionally equivalent form.

Untyped Lambda Calculus

So far we haven't really considered types. In the examples where multiplication was required, we went beyond the simple untyped system to demonstrate the similarities of lambda calculus with other more familiar systems of computation. In the untyped system, the only type is the function type. Our previous example could still be considered in the untyped system if we just interpret 5 as a free variable; attaching no meaning to the variable itself.

As developers, our building blocks are binary logical operations. Let's start by defining some of these essentials in the lambda calculus.

Let T be true and F be false:

Now let's define some binary operations:

To convince ourselves that these will give the desired results, let's start with AND:

Then for good measure, let's do OR as well:

Learning the untyped variant of lambda calculus is a good starting place, because we don't place any domain restrictions on our functions and it is fairly simple to get started defining particular behaviors.

Coming up next

Part two will cover type systems and get you familiar with the syntax. If you want to learn lambda calculus in more depth or go through some exercises, check out the links below. Also, please feel free to post comments with questions, suggestions or corrections.

PCAP-NG reader for .NET.

2013-04-10T18:46:00.000-07:00

Introduction & Scope

The PCAP-NG file format is used to store network captured data used by great programs like Wireshark. There are several API's available for both Linux and Windows but none that are ported to .NET. I decided to make one for my own use in building a network traffic replay agent.

This code demonstrates an approach to reading in PCAP-NG files. Most of the time I was concerned with the enhanced packet block type since it stored the TCP payloads I wanted to replay.

Design Considerations

All block types derive from a common abstract class (BlockBase). This is useful when you need a collection of blocks with different subtypes, as in the second code segments below. Internally, all classes populate themselves from the underlying binary reader.

The reader class can be used to iterate over a single block type.

var reader = new PcapngFile.Reader("test.pcapng");   
foreach (var packet in reader.EnhancedPackets)
{
   byte[] payload = packet.Data;
}

... or all of them, cast into their parent type.

var blocks = new List<PcapngFile.BlockBase>();
while (reader.PeekStoreType() != PcapngFile.BlockType.None)
{
   blocks.Add(reader.ReadBlock());
}

The Code and NuGet Package

Get access to the complete project here on my GitHub account. If you have any suggestions or comments, feel free to leave a comment. I also have the assembly available via NuGet here.

Automated combinatorial and spot testing of web service operations.

2013-03-26T21:21:00.003-07:00

Introduction & Scope

When developing a new web service, you may need to do an initial spot test walk through the operations or try to break the service based on the permissible data type values. You could hand craft all the operations, with their respective arguments, into a handful of URLs but that would seem pretty inefficient. I found myself in a similar situation and wanted a programmatic way to walk through the service operations with a specific input data set. To be clear, I am not attempting to recreate a full web service testing application but rather a simple script to spot check responses and catch any bugs in a web service endpoint's handler.

Investigating the WSDL

We can automatically flesh out our operations and the message data types by leveraging the endpoint's WSDL. In my case, I was specifically interested in the request/response operations available under the HTTP GET verb. We can get the parameter names and argument types by looking at the types section and then the actual operations under a WSDL binding that includes an HTTP binding element with the verb attribute value of GET.

Fortunately, I am dealing with input messages using just the built in data type schema provided by the WSDL specification. This means that no complex types are involved and the testing input domain is somewhat simplified. Also, most of the web service operations use a standard set of parameter names such that I am able to create a smaller data set of inputs based around them.

The Code

Your modifications would be to the test_args dictionary. You can specify a wide range of different arguments to feed your operations by parameter names. The service I was testing against used many of the same parameter names and corresponding domain.

Conclusion

Thrown together, the script provides an easy way to spot test a web service by automatically processing the WSDL for a particular binding and running those bound operations against a provided set of input messages for it.

Text generation using Markov chains.

2013-01-04T19:11:00.002-08:00

Introduction

If you are already familiar with finite automata, then understanding a Markov chain will not be as hard. If you are not familiar with one, you can think of finite automata as a flowchart, with states and paths leading from one state to another based on some decision being made. An example could be a room with a light switch; there are the two states of having the light on or off. Using the light switch will transition you between the two states.

The difference between standard finite automata and Markov chains is that instead of having some definite outcome determining each path (eg. yes or no), they are probabilities. Take for example a coin flip, there is a 50% (0.5) chance that it could either be heads or tails.

An Example

Let's explore Markov chains a little deeper with respect to natural languages and take a look at some sample text that will form the corpus of our analysis.

I like turtles. I like rabbits. I don't like snails.

This would render a Markov chain as follows:

The above chain means that, among the sentences:

100% (1.0) will start with I.
This will be followed by like for 66% (0.66) of the time and don't for 33% (0.33) of the time.
The word don't will always (100% / 1.0) be followed by like.
Then lastly, turtles, rabbits and snails will all follow like 33% (0.33) of the time.

The nice thing about Markov chains, with respect to natural language modelling, is that they form word dependencies without any knowledge of the language's syntax or semantics. The chain gets created purely based on statistical knowledge extracted from the corpus.

Text Generation

Random Selection

Using the Markov chain example from before and taking the weight of probabilities into account, we can randomly traverse the chain to generate new sentences like:

I don't like turtles. I like snails.

Although, our small corpus does not make for very interesting or new text. We could make it more interesting by adding to it or switching to a much larger corpus altogether. In my case, I decided to take all of Shakespear's sonnets and generate the text from that. You can see some example code that uses it as the corpus below.

The Code

The Markov chains will be built up for each new sentence it comes across. The critical parts to the MarkovChain object are its:

Cardinalities: The cardinality of the sample space for a given state. In our original example, the cardinality of like is 3 since snails, rabbits and turtles follow out of the like state. The cardinality of I would be 2 since don't and like follow.
Occurrences: The edge valued occurrences between the different states. For example, a transition from the I to like state occurred twice. The probability can then be easily calculated by taking the number of occurrences on a transition and divide it by the cardinality of the state's sample space. Note, there is also a special start state transition that can be used at the beginning of each new sentence generation.

Using these two parts, a random walk can be made across a chain path, starting from the special start state. Each walk will generate a new sentence based on probabilities of the transitions along the path.

Additional Resources

Reading and writing large binary data in T-SQL and ADO.NET.

2012-10-03T13:40:00.000-07:00

Introduction

In some cases, you may want to read/write binary data from/to your relational database. This can be a problem when the data set size exceeds the memory address size (eg. 32-bit processes) or the managed large object heap cannot load it all without throwing an OutOfMemoryException. In those cases, you will need to load the data into memory in chunks. Fortunately, the T-SQL syntax supports this operation.

Latest Standard

The old standard for reading and writing raw text data was by using the UPDATETEXT, READTEXT and TEXTPTR commands. Although, these are being obsoleted for the newer command SUBSTRING and the addition of the .WRITE argument to the UPDATE command. I found the newer syntax much easier to use and requiring less parameters. Below is an example schema and implementation.

Example Schema

For this example, we will use a very simple table setup with a primary key and binary data column.

CREATE TABLE [dbo].[DataTable] (
 [Id] [int] IDENTITY(1,1) NOT NULL,
 [Data] [varbinary](max) NOT NULL
)

Example Code

The Read method simply provides an offset into the data and the size of the chunk it wishes to read. The same applies for the Write method, except that a data argument is also supplied.

private const int ReadChunkSize = 1 << 21;   // 2MB Chunks
private const int WriteChunkSize = 1 << 21;  // 2MB Chunks

...

public static void Read(string connectionString, string readPath)
{
    using (var connection = new SqlConnection(connectionString))
    {
        int offsetIndex = 0;
        var buffer = new byte[1];
        var offset = new SqlParameter("@Offset", SqlDbType.Int, 32);
        var command = new SqlCommand(@"SELECT SUBSTRING([Data], @Offset, " + ReadChunkSize + ") FROM [dbo].[DataTable] WHERE [Id] = 1", connection);

        command.Parameters.Add(offset);

        connection.Open();

        using (var stream = File.Create(readPath))
        {
            using (var writer = new BinaryWriter(stream))
            {
                while (buffer.Length > 0)
                {
                    offset.Value = offsetIndex;
                    buffer = (byte[])command.ExecuteScalar();
                    if (buffer.Length > 0)
                    {
                        writer.Write(buffer);
                        offsetIndex += ReadChunkSize;
                    }
                }
            }
        }
    }
}

public static void Write(string connectionString, string writePath)
{
    using (var connection = new SqlConnection(connectionString))
    {
        int offsetIndex = 0;
        var buffer = new byte[1];
        var offset = new SqlParameter("@Offset", SqlDbType.Int, 32);
        var data = new SqlParameter("@Data", SqlDbType.Binary, WriteChunkSize);
        var command = new SqlCommand(@"UPDATE [dbo].[DataTable] SET [Data] .WRITE(@Data, @Offset, " + WriteChunkSize + ") WHERE [Id] = 1", connection);

        command.Parameters.Add(offset);
        command.Parameters.Add(data);

        connection.Open();

        using (var stream = File.OpenRead(writePath))
        {
            using (var reader = new BinaryReader(stream))
            {
                while (buffer.Length > 0)
                {
                    buffer = reader.ReadBytes(WriteChunkSize);
                    if (buffer.Length > 0)
                    {
                        offset.Value = offsetIndex;
                        data.Value = buffer;
                        command.ExecuteNonQuery();
                        offsetIndex += WriteChunkSize;
                    }
                }
            }
        }
    }
}

Performance Considerations

When benchmarking my code against the simple database schema, execution time was drastically different if the data was already read into memory. When a read was performed after a write, it took 2 seconds. A read without a write beforehand took about 10 seconds. This is also the same time that the write took before a read. The chunk size did not seems to have any significant impact when adjusted between 512K and 4MB sizes. Your results may vary though depending on your system's memory layout.

These time benchmarks are very informal and highly dependent on system architecture, network latency and hardware specifications. Although, the relative performance gain between the two benchmarks can be gleaned from these tests.

A further performance gain would be to front load the data into memory on a separate thread and process the loaded data into and out of the database on another. This would require some tweaking and knowledge of the system's memory constraints to ensure it doesn't become a performance bottleneck. If properly done, a lot can be gained from not having to block on I/O between every chunk read and write.

Understanding compression and Huffman Coding.

2012-09-20T00:29:00.000-07:00

Definitions

In a lot of texts regarding compression you will run across various conventions of syntax to describe the coding of data. Much of them are used in fields like automata theory, group theory and statistics. Although, the field of coding theory itself has a long and rich history in the 20st century; much of the terminology is borrowed from fields that gave rise to it.

I find the best way to demonstrate the following definitions is by example. Let us take the text "abbcc" as a sample.

Alphabet: The set of symbols that we will be encoding.
- Form:
- Example: where
String: A permutation of symbols from our alphabet. From our example alphabet, strings like "a", "aba", "abc" or "cccb" could all be possible. Using regular expression syntax, we can describe this in our example.
- Form:
- Example:
Probabilities: The set of probabilities for the alphabet. This is also known as their weights and can sometimes be denoted as w instead of p.
- Form:
- Example: where since 'a' occurs once, 'b' occurs twice and 'c' occurs twice out of 5 symbols all together.
Codewords: The set of codewords that we will create for every symbol in our alphabet.
- Form:
- Example: where for symbols respectively. Another way to look at it is that we are providing a bijective function: .

Code Width

Standard coding of information for most computers today is byte aligned. For example, in character codes there is ASCII, which codes the entire English alpha-numeric system in a single byte. While UTF-8 has proven to be a much better standard due to better localization support and variable width codes, ASCII serves our purposes for this discussion. ASCII is considered a fixed width coding scheme:

When performance is at stake, it is best to code information atomically aligned to the computer architecture that it is being used for. This means allocating and defining information in terms of bytes (defined as 8-bits) for the majority of systems today. Although, if space is the priority, performance can take a hit to shift around bits along the byte boundaries and create variable width codes that do not byte align.

Note: The ellipses in the alphabet denote the control characters (eg. CR, LB) and grammatical structure symbols (eg. #, $, %) contained in the ASCII code page. The alphabets will reflect this in the ellipses.

Data Compression

Variable Width Savings

Coming back to the example of ASCII, if we wish to encode a string, w, like "aaabb", it will take up 5 bytes. Although we only really need one bit to encode our information if we assign 0 to 'a' and 1 to 'b'.

We just took a 5 byte ASCII encoded string and compressed it down to under a byte. Now let us take another string, w, and give it "aaabbcc". Three alphabet symbols need to be coded; it will take at least 2 bits. At first glance, this assignment would make sense:

Although this gives us an ambiguous string since it could either be "aaabbcc", "aaabbabab", "aacbcc" or any other permutation of our codeword set when we attempt to decompress it. The reason for this ambiguity is due to the fact that the codeword for 'a' (0), is used to start the codeword for 'c' (01).

Prefix Codes

The way to get around this ambiguity is to ensure that all of your codes are prefix type. Something like this would work:

This ensures that we maintain the one-to-one mapping between our alphabet and codewords; we can translate alphabet symbols to codewords and codewords back to alphabet symbols without confusion.

Codeword Assignment

So how do you decide which symbols, get which codewords? Also, since the size of the codeword is critical to the compression ratio (average codeword length to average symbol length), designing an algorithm to weigh the more frequently used symbols is important. In our previous example, it would be poor design to give 'a' the longest codeword because it is the most used symbol in the string. This is where the Huffman Coding algorithm comes into play.

Huffman Coding Algorithm

Count the number of times each symbol occurs in the data source.
Put [symbol:count] pairing as nodes in a minimum priority queue based on the symbol's count.
Evaluate number of queue nodes.

If queue has at least two nodes:
- Dequeue the two nodes.
- Create a new node with a null symbol and set the count equal to the sum of the two dequeued node's counts.
- Attach the two nodes as children to the new node.
- Put the newly created node into the queue.
- Go to step 3.
If queue has only one node, dequeue and set as the root of the tree.

Assign each left branch a 0 and each right branch a 1, or vice versa if you choose.
The symbol nodes will be the children of the tree and their codeword will be the bits that are traversed to get there.

Algorithm Example

Given our previous example string of "aaaabbcc", let us perform the algorithm on it:

Initial queue state with only symbol nodes in it.
New node created from first two symbol nodes and requeued.
New node created from non-symbol and symbol node, becoming the tree's root node.
The final alphabet to codeword mapping is:

If you need further examples, you can look here, here and here. Once the you get a grasp of the algorithm, you will see the simplicity and beauty of it.

Working Code Example

If you want to see a naive implementation of the Huffman Coding algorithm, I posted some source code on my GitHub account. Below is the central code to the algorithm execution.

IMAPI version 2 managed wrapper.

2012-09-18T10:03:00.000-07:00

Introduction

Microsoft introduced native image mastering support in Vista, going forward. The library itself is not too complex but the latest version of the image mastering API (IMAPI) has some quirks. This is well documented by Eric Haddan in his article describing how to build an interop file and use it from .NET.

Implementation

I wanted to take it a step further and provide a library wrapper that was CLI compliant. The end result is a class that provides 3 methods, called in order, to burn an image. The project can be found on my GitHub account.

Properties

Media: A read only property that populates after the LoadMedia call and reports what type of physical media is connected (eg. CD-R, DVD-R, DVD+RW, etc.).
MediaStates: A read only collection of states in which the media is under, that populates after the LoadMedia call (eg. is blank, overwriteable, unsupported).
MediaCapacity: A read only property that populates after the LoadMedia call and reports the size capacity of the media.
Nodes: A root list of file and directory nodes that are to be written to the media.
Recorders: A read only selectable collection. A recorder must be selected for the LoadRecorder call.
VolumeLabel: A property that sets the volume label of the media to be written to.

Methods

LoadRecorder: Verifies a recorder is selected, loads the recording drive resources and verifies that it is working.
LoadMedia: Loads the media resource and verifies that it is valid for the recorder. An exception will occur if the media isn't supported (eg. closed session disc, no disc present).
FormatMedia: Formats the media of any previous data.
WriteImage: The final step to write your files to disc. This is a blocking call.

Events

FormatEraseUpdate: Reports progress of FormatMedia call.
FormatWriteUpdate: Reports progress of files being written to the media.
ImageUpdate: Reports progress of files being added to the media image.

Caveats

I use this library for the simple single purpose of writing finalized images to a blank CD. It has not been fully tested to accomodate all media and recorder state scenarios; I haven't tried it yet with DVD's or multi-session discs. If you would like to extend the library, feel free to push some changes to the GitHub project and I will wrap them in.

I decided to leave the method calls as blocking (synchronous) since there could be many different synchronization models that may use this library. For my purposes, I used this in an WPF application that would call the WriteImage method from a Task and then post back the progress events of the write using the Dispatcher from my WPF window. Below is a code snippet demonstrating this.

private ImageMaster _burner;

...

private void _CreateCd()
{
 // files added to _burner and _burner.WriteImage called
}

private void _burner_FormatWriteUpdate(IMAPI2.ImageMaster sender, IMAPI2.FormatWriteUpdateEventArgs e)
{
 this.Dispatcher.Invoke(() => this.statusProgressBar.Value == (e.ElapsedTime / e.TotalTime) * 100);
}

private void CreateCdWindow_Loaded(object sender, System.Windows.RoutedEventArgs e)
{
 System.Threading.Tasks.Task.Factory.StartNew(_CreateCd);
}

Implementing a bit reader/writer in C.

2012-09-13T16:30:00.000-07:00

Introduction

I was working on implementing a simple compression tool based on naive Huffman coding and initially assumed all of my prefix codes would be at or under a byte long. Although I failed to account for the worst case of Huffman coding, which can yield codes up to 255 bits in length.

This required a bit reading and writing function that would allow for variable length codes. I could have used other methods to balance worst case Huffman trees, but this issue presented a unique challenge. How to write a bit reader/writer in C.

Bit Reader/Writer Specifications

Read and write using reader and writer ADT's.
Handle any possible bit length given.
Remember bits read from file but are outside the requested bit length (eg. 2 bytes read in but only 12 bits requested, leaving 4 bits in the buffer).
Flush a partially written block to disk on a free of the bit writer ADT.
Return number of bits read and written on the read and write functions.

Design

The underlying structure to the ADT will contain fields for the: file pointer, the byte long buffer and the buffer bit length. The examples given for every case are in steps; what is originally in the buffer, what bits are requested, what bits are read from or written to the file, and finally what bits are left over and buffered.

Reader

There are two main cases to consider when designing the read function.

Empty Buffer: The is simple since the file read bits will already be byte aligned to your output bits. You just need to simply mask out the unneeded bits and buffer them for the next read (eg. no bits buffered, 12 bits requested, 16 bits read in, 4 bits buffered).
Populated Buffer: The can get a little trickier, especially in the last two cases of this.

Requested bits already buffered. This is very easy since it doesn't require any file read and the buffer just needs to shift out the read bits (eg. 7 bits in the buffer and 3 are requested; buffer serves out 3 and truncates to 4).
Not all requested bits buffered; end of read bits not in same byte block as end of output bits. The last output byte block must mask out the unneeded bits, put them in the buffer and then append the additional unneeded read bits from the next byte block (eg. 4 bits buffered, 13 bits requested, 16 bits read in, 8 bits buffered).
Not all requested bits buffered; end of read bits in same byte block as end of output bits. The last output byte block must mask out the unneeded bits and put them in the buffer (eg. 4 bits buffered, 10 bits requested, 8 bits read in, 2 bits buffered).

Writer

There are only two cases to consider when designing the write function.

Full Buffer: Shift in write bits, commit to disk and repeat until all bits are processed. If the last bit to write is not on the buffer byte boundary, it will wait until the next write (eg. 6 bits buffered, 12 bits to write, 16 bits written out, 2 bits buffered).
Partial Buffer: Shift in write bits (eg. 2 bits buffered, 4 bits to write, no bits written out, 6 bits buffered).

Caveats

The code below is purely a demonstration of how a bit reader/writer works in a generic sense. There is room for plenty of optimization if you know what type of architecture your targeting and the domain of your reader/write arguments.

The Code

Header File

Source File

Licensing using XML digital signing.

2012-08-14T09:00:00.000-07:00

Motivation

One of the main concerns in any licensing implementation is that authored license files are coming from the trusted authority (ie. your company) and not some third party. This can be accomplished in software by encrypting and decrypting the license with a single key, but symmetric key storage can be cumbersome and have added complexity.

Assymetric cryptography, on the other hand, allows for two keys: a private one to author files with and a public one to verify its authenticity. This is well suited for licensing, where we want to ensure that the authored license has not been altered in anyway and the content can be relied on.

Implementation

I ran across a great article that details the use of XML digital signatures in the .NET framework. It describes the type of digital signatures that can be implemented and lays out the classes and methods needed to author and verify license files.

This specific implementation allows for an: authentication key to compare against some unique value tying the license to the system, expiration date to set a time span on the software use and set of available features the user is authorized for.

Base Class

This forms the runtime object representation of the license and will be inherited by the license reader and writer.

using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Text;
using System.Security;
using System.Security.Cryptography;
using System.Security.Cryptography.Xml;
using System.Xml;

namespace Licensing
{

public abstract class LicenseBase
{ 
 private byte[] _authenticationKey; 
 private DateTime _expiration; 
 private List _features;
    private int _id;

 public byte[] AuthenticationKey {
  get { return _authenticationKey; }
  protected set { _authenticationKey = value; }
 }
 public DateTime Expiration {
  get { return _expiration; }
  protected set { _expiration = value; }
 }
 public int Id {
  get { return _id; }
        protected set { _id = value; }
 }
 protected List Features {
        get { return _features; }
 } 

 public LicenseBase()
 {
  _features = new List();
 }

    public void Clear()
    {
        _id = 0;
        _authenticationKey = null;
        _features.Clear();
    }
}

}

Writer Class

This class will be used internally to author the license files that will be distributed with the released software. Note that the private key is added as an embedded resource in the project and should not be distributed since the public key can be derived from it.

using System;
using System.IO;
using System.Security;
using System.Security.Cryptography;
using System.Security.Cryptography.Xml;
using System.Text;
using System.Xml;

namespace Licensing
{
    class License : LicenseBase
    {

        License(int id, byte[] authenticationKey, DateTime expiration, string[] features)
        {
            this.AuthenticationKey = authenticationKey;            
            this.Expiration = expiration;
            this.Id = id;
            this.Features.AddRange(features);
        }

        /// 
        /// Generates a key pair for digital signing and verification.
        /// 
        /// 
        /// 
        static internal void GenerateAssymetricKeys()
        {
            string datestamp = null;
            StreamWriter output = null;
            RSA key = RSA.Create();
            key.KeySize = 1024;

            datestamp = DateTime.UtcNow.ToString("yyyyMMdd");

            // Generate private key to only be used internally (DO NOT DISTRIBUTE).
            output = File.CreateText("private-" + datestamp + ".key");
            output.Write(key.ToXmlString(true));
            output.Close();

            // Generate public key to be used by customers (distribute).
            output = File.CreateText("public-" + datestamp + ".key");
            output.Write(key.ToXmlString(false));
            output.Close();
        }

        /// 
        /// Digitally sign an XML document.
        /// 
        /// The XML document to sign.
        /// The private key to sign it with.    
        /// 
        private static void _SignXmlDocument(System.Xml.XmlDocument document, RSA privateKey)
        {
            SignedXml signedDocument = new SignedXml(document);
            signedDocument.SigningKey = privateKey;
            signedDocument.SignedInfo.CanonicalizationMethod = SignedXml.XmlDsigCanonicalizationUrl;

            // Add reference to XML data
            Reference @ref = new Reference("");
            @ref.AddTransform(new XmlDsigEnvelopedSignatureTransform(false));
            signedDocument.AddReference(@ref);

            // Build the signature.
            signedDocument.ComputeSignature();

            // Attach the signature to the XML document.
            XmlElement signature = signedDocument.GetXml();
            document.DocumentElement.AppendChild(signature);
        }

        /// 
        /// Write the contents and digitally sign the document.
        /// 
        /// The file path to the digitally signed document.
        public void WriteDocument(string filepath)
        {
            XmlDocument document = new XmlDocument();
            document.AppendChild(document.CreateXmlDeclaration("1.0", "UTF-8", null));

            XmlElement root = document.CreateElement("License");
            document.AppendChild(root);

            XmlElement id = document.CreateElement("Id");
            id.InnerText = this.Id.ToString();
            root.AppendChild(id);

            XmlElement authenticationKey = document.CreateElement("AuthenticationKey");            
            authenticationKey.InnerText = Convert.ToBase64String(this.AuthenticationKey);
            root.AppendChild(authenticationKey);

            XmlElement expiration = document.CreateElement("Expiration");
            expiration.InnerText = this.Expiration.ToString();
            root.AppendChild(expiration);

            XmlElement options = document.CreateElement("Features");
            XmlElement featureItem = null;
            foreach (string feature in this.Features)
            {
                featureItem = document.CreateElement("Feature");
                featureItem.InnerText = feature;
                options.AppendChild(featureItem);
            }
            root.AppendChild(options);

            XmlElement publickey = document.CreateElement("Certificate");
            publickey.SetAttribute("DateCode", KeyCreationDateCode);
            publickey.InnerXml = File.ReadAllText(PublicKeyFilename);
            root.AppendChild(publickey);

            RSA privateKey = RSA.Create();
            privateKey.FromXmlString(Licensing.Properties.Resources.privatekey);
            _SignXmlDocument(document, privateKey);

            File.WriteAllText(filepath, document.InnerXml);            
        }

    }
}

Reader Class

This class will be used in the field to verify the license files that will be distributed with the released software. It is best to wrap the license read and authenticate methods in try-catch blocks since they represent exceptional behavior that fall outside the post-conditions of the method. Handling the exceptions will allow for better error reporting in the field so you can decide what information you will share with the end user (eg. bad authentication key, malformed/tampered licenses). Note again that the public key is an embedded resource for the project. It should never be distributed with the license since an attacker could generate their own keypair and envelope their own public key, with signature, in the license.

using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Text;
using System.Security;
using System.Security.Cryptography;
using System.Security.Cryptography.Xml;
using System.Xml;

namespace Licensing
{
    class LicenseReader : LicenseBase
    {
        LicenseReader()
        {
        }

        public void Authenticate()
        {
            // Check the hardware key value against the AuthenticationKey        
            if (!_AuthenticationKeyMatches(hardwareKey))
                throw new LicenseAuthenticationException("The license failed to authenticate the hardware key.");

            // ... or check some node locking ID (eg. MAC ID, hard drive serial number) against the AuthenticationKey.
            if (!_AuthenticationKeyMatches(nodeLockedId))
                throw new LicenseAuthenticationException("The license failed to authenticate the node locked ID.");

            if (this.Expiration > DateTime.UtcNow)
                throw new LicenseAuthenticationException("The license has expired.");
        }

        /// 
        /// Compare byte array against authentication key byte array.
        /// 
        /// Byte array to compare against.
        /// True if a match, else false.
        private bool _AuthenticationKeyMatches(byte[] compare)
        {
            if (this.AuthenticationKey == null)
                return false;

            int upperBound = Math.Min(this.AuthenticationKey.Length, compare.Length);
            for (int i = 0; i < upperBound; i++)
            {
                if (this.AuthenticationKey[i] != compare[i])
                    return false;
            }
            return true;
        }

        public bool IsFeature(string featureName)
        {
            return this.Features.Contains(featureName);
        }

        /// 
        /// Read the digitally signed document and load its contents.
        /// 
        /// The file path to the digitally signed document.
        protected virtual void ReadDocument(string filepath)
        {
            this.Clear();

            XmlDocument document = new XmlDocument();
            document.Load(filepath);

            RSA publicKey = RSA.Create();
            publicKey.FromXmlString(Licensing.Properties.Resources.publickey);

            if (_VerifyXmlDocument(document, publicKey))
            {
                this.Id = int.Parse(document.SelectSingleNode("//License/Id").InnerText);
                this.AuthenticationKey = Convert.FromBase64String(document.SelectSingleNode("//License/AuthenticationKey").InnerText);
                this.Expiration = DateTime.Parse(document.SelectSingleNode("//License/Expiration").InnerText);

                XmlNodeList features = document.SelectNodes("//License/Features/Feature");
                foreach (XmlNode feature in features)
                {
                    this.Features.Add(feature.InnerText);
                }
            }

        }

        /// 
        /// Verify the digital signature of an XML document.
        /// 
        /// The XML document containing the signature.
        /// The public key to verify signature authenticity.
        /// True if the signature is authentic, else false.
        /// 
        private bool _VerifyXmlDocument(XmlDocument document, RSA publicKey)
        {
            SignedXml signedDocument = new SignedXml(document);
            try
            {
                XmlNode signature = document.GetElementsByTagName("Signature", SignedXml.XmlDsigNamespaceUrl)[0];
                signedDocument.LoadXml((XmlElement)(signature));
            }
            catch
            {
                return false;
            }
            return signedDocument.CheckSignature(publicKey);
        }

    }
}

Additions & Caveats

Symmetric encryption could also be used to hinder casual viewing of the license file. The issue of symmetric key storage must be revisited but will no longer be critical to maintaining the integrity of the license files.

Strong naming of the license reading assembly is encouraged to prevent dissasembly and re-embedding of an attacker's own public key. Although, this is by no means an attack-proof method since strong naming can be removed by changing information in the CLI header of the portable executable (PE). Even if the application was able to secure the public key outside the application in a PKI, there is still the issue of code injection or recompilation once strong naming is stripped. Indirection of the public key or obfuscation of the assembly can also help but is not a reliable security measure.

Automate IIS enable on Windows 7.

2012-07-30T11:33:00.000-07:00

Background

There are plenty of sites that will tell you how to enable IIS through the Control Panel. Although, if you are tasked with enabling IIS automatically (eg. for an installer), it can be a nightmare. This is only really needed though in special cases where the installer has more control over the hardware specifications. Determining an upgrade or uninstall scenario with IIS can be very tough since you are not installing your own software but rather enabling a feature of Windows. With that in mind, we can propose a solution for automated enabling of IIS.

Requirements

Windows Editions

The edition of Windows 7 for IIS must be either Ultimate or Professional if you plan on using ASP.NET. You can try this on other editions, or even Vista, but your results may differ since this is only scenario I have tested against. More information can be found at the IIS TechNet site.

DISM Tool

To enable IIS features through the command line, DISM can be used. This can be reliably found on Windows 7 but is inconsistent on Vista, where the older Package Manager is used instead. This was one of the reasons I decided to constrain the specification to Windows 7.

The first step in using DISM, is to find out which features need to be enabled, in order for IIS to work completely. I ran across this MSDN blog entry that lists all the needed features. Next task is to script the DISM commands.

Scripting

Caveats

Iterating over each feature and enabling it with DISM can be very time consuming. The faster approach is to evaluate the enabled state of each feature and turn them all on in one command. PowerShell can be used but the presence of it and its needed modules can be inconsistent on Windows 7. I had to fall back on batch scripting, which is the part that can cause nightmares due to its ugly syntax.

Arrays are need to store the enabled states of each feature and the feature names themselves. In my case, I needed an array of pairs. I was able to fake it using a suggestion on Stack Overflow. Also, special syntax is needed to reference variables inside a loop that are scoped outside the loop.

Last, but not least, remember to register the ASP.NET framework version being used with IIS. This will need to be done once IIS is enabled (reboot is required). Adjust the framework bitness and version below according to the one you are using.

[WindowsFolder]\Microsoft.NET\Framework64\v4.0.30319\aspnet_regiis.exe -i

The Code

The following batch script will discover which features are disabled, concatenate them into the enable command and then execute. You may need to replace SysNative with system32 or the Windows system folder being used by your target system if this is being ran outside the installer.

setlocal enableextensions enabledelayedexpansion
set features=IIS-ApplicationDevelopment IIS-ASPNET IIS-CommonHttpFeatures IIS-HostableWebCore IIS-HttpCompressionDynamic IIS-HttpCompressionStatic IIS-HttpErrors IIS-HttpLogging IIS-HttpRedirect IIS-HttpTracing IIS-Metabase IIS-NetFxExtensibility IIS-Performance IIS-StaticContent IIS-URLAuthorization IIS-WebServerRole WAS-ConfigurationAPI WAS-NetFxEnvironment WAS-ProcessModel WAS-WindowsActivationService
set dism_params=

set n=0
for %%f in (%features%) do (
 set /a n+=1
 set dism_params=!dism_params! /FeatureName:%%f
 set features[!n!]=%%f
)
set n=0
for /f "tokens=3*" %%i in ('%WINDIR%\SysNative\Dism.exe /Online /Get-FeatureInfo %dism_params% ^| findstr /b /r "^State : "') do (
 set /a n+=1
 set states[!n!]=%%i
)

set dism_params=
for /l %%i in (1,1,%n%) do (
 if "!states[%%i]!" == "Disabled" set dism_params=!dism_params! /FeatureName:!features[%%i]!
)

if "%dism_params%" neq "" %WINDIR%\SysNative\Dism.exe /Online /Enable-Feature %dism_params%

Converting thermocouple voltages to temperature.

2012-07-25T13:23:00.000-07:00

Background

Unlike most sensors used in data acquisition, thermocouples do not have a linear mapping between their voltage and celcius values. This means that instead of factoring some scalar value against the data, in order to scale it to engineering units (eg. volts to acceleration), an experimentally derived equation must be created to approximate the temperature at specific voltages. This non-linearity is primarily due to the materials being used to discover the temperature difference.

The science and engineering behind thermocouples goes beyond the scope of this post but more information can be found below:

K-Type Thermocouples

One of the most general purpose thermocouple types used in data acquisition are K types since their are relatively inexpensive to make. This will also be the thermocouple that I will focus on.

Luckily, NIST provides a table of discrete values, correlating voltages to celcius and vice-versa. In addition, they also provide the exact polynomial equation to use underneath the given tables, depending on the voltage range the value falls in. Following is the exact functions to do the conversion for K-type thermocouples.

Voltage To Celcius

Given some ordered set of coefficients, D, and a voltages value, v, let the voltage to celcius conversion be defined as a function of v as follows.

D is dependent on the voltage value being evaluated:

values greater than 206.44 mV will have D = {-131.8058, 48.30222, -1.646031, 0.05464731, -0.0009650715, 0.000008802193, -0.0000000311081, 0.0, 0.0, 0.0 };
values geater than 0.0 V will have D = { 0.0, 25.08355, 0.07860106, -0.2503131, 0.0831527, -0.01228034, 0.0009804036, -0.0000441303, 0.000001057734,-0.00000001052755};
and values less than 0.0 V will have D = { 0.0, 25.173462, -1.1662878, -1.0833638, -0.8977354, -0.37342377, -0.086632643, -0.010450598, -0.00051920577, 0.0 }.

Celcius To Voltage

Given some ordered set of coefficients, C, an ordered set of exponentials, A, and a celcius value, x, let the celcius to voltage conversion be defined as a function of x as follows.

C is dependent on the celius value being evaluated:

positive and zero values will have C = { -0.017600413686, 0.038921204975, 0.000018558770032, -0.000000099457592874, 0.00000000031840945719, -0.00000000000056072844889, 0.00000000000000056075059059, -3.2020720003E-19, 9.7151147152E-23, -1.2104721275E-26};
negative values will have C = { 0.0, 0.039450128025, 0.000023622373598, -0.00000032858906784, -0.0000000049904828777, -0.000000000067509059173, -0.00000000000057410327428, -0.0000000000000031088872894, -1.0451609365E-17, -1.9889266878E-20, -1.6322697486E-23};

A is a fixed ordered set of exponentials; A = {0.1185976, -0.0001183432, 126.9686}.

Performance

Since the polynomial equation's number of terms are bounded by a constant (ie. |D| or |C|) the performance hit will be a constant factor to the number of voltage or celcius values to calculate. Also, temperature is sampled at a very low rate, yielding small data sets that usually have redundant values in them. Further optimization can be made by caching previous computations and reusing their results if the same voltage or celcius value is encountered again.

The Code

private static double[] _thermocoupleCoefficientsTypeKneg = {
 0.0,
 0.039450128025,
 2.3622373598E-05,
 -3.2858906784E-07,
 -4.9904828777E-09,
 -6.7509059173E-11,
 -5.7410327428E-13,
 -3.1088872894E-15,
 -1.0451609365E-17,
 -1.9889266878E-20,
 -1.6322697486E-23
};
private static double[] _thermocoupleCoefficientsTypeKpos = {
 -0.017600413686,
 0.038921204975,
 1.8558770032E-05,
 -9.9457592874E-08,
 3.1840945719E-10,
 -5.6072844889E-13,
 5.6075059059E-16,
 -3.2020720003E-19,
 9.7151147152E-23,
 -1.2104721275E-26
};
private static double[] _thermocoupleExponentialsTypeK = {
 0.1185976,
 -0.0001183432,
 126.9686
};
/// 
/// Type K thermocouple inverse coefficient values for voltage to celcius conversions.
/// 
/// Valid values for -200C - 0C / -5.891mV - 0mV.
private static double[] _thermocoupleInverseCoefficientsTypeK0 = {
 0.0,
 25.173462,
 -1.1662878,
 -1.0833638,
 -0.8977354,
 -0.37342377,
 -0.086632643,
 -0.010450598,
 -0.00051920577,
 0.0
};
/// 
/// Type K thermocouple inverse coefficient values for voltage to celcius conversions.
/// 
/// Valid values for 0C - 500C / 0mV - 20.644mV.
private static double[] _thermocoupleInverseCoefficientsTypeK1 = {
 0.0,
 25.08355,
 0.07860106,
 -0.2503131,
 0.0831527,
 -0.01228034,
 0.0009804036,
 -4.41303E-05,
 1.057734E-06,
 -1.052755E-08
};
/// 
/// Type K thermocouple inverse coefficient values for voltage to celcius conversions.
/// 
/// Valid values for 500C - 1372C / 20.644mV - 54.886mV.
private static double[] _thermocoupleInverseCoefficientsTypeK2 = {
 -131.8058,
 48.30222,
 -1.646031,
 0.05464731,
 -0.0009650715,
 8.802193E-06,
 -3.11081E-08,
 0.0,
 0.0,
 0.0
};


public static double CelciusToVoltageTypeK(double value)
{
 double[] a = _thermocoupleExponentialsTypeK;
 double[] c = null;
 double result = 0.0;
 if ((value >= 0.0)) {
  c = _thermocoupleCoefficientsTypeKpos;
 } else {
  c = _thermocoupleCoefficientsTypeKneg;
 }
 for (int index = 0; index <= c.Length - 1; index++) {
  result += (c[index] * Math.Pow(value, index)) + (a[0] * Math.Exp(a[1] * Math.Pow(value - a[2], 2)));
 }
 return result;
}

public static double VoltageToCelciusTypeK(double value)
{
 double[] d = null;
 double result = 0.0;
 if ((value > 0.020644)) {
  d = _thermocoupleInverseCoefficientsTypeK2;
 } else if ((value >= 0.0)) {
  d = _thermocoupleInverseCoefficientsTypeK1;
 } else {
  d = _thermocoupleInverseCoefficientsTypeK0;
 }
 for (int index = 0; index <= d.Length - 1; index++) {
  result += d[index] * Math.Pow(value, index);
 }
 return result;
}

Bit specific builds in Visual Studio 2010.

2012-04-23T11:41:00.001-07:00

Using Conditioned Tags

The Visual Studio 2010 UI does not have options to handle conditional references based on what kind of configuration and platform you are running. For example, you may have a project reference that needs to change depending on whether you are doing a 32 or 64 bit build.

Fortunately, the project file syntax does support conditional references. I was able to find this out on stack overflow.

In our case, we have our DLL's in source control under the project's bin directory. For some reason I kept getting the 64-bit version of the DLL and then found out it was because the Content tags also needed to be conditioned.

Example

Lets say you have a project (MyProject) with two DLL's, 32bit.dll and 64bit.dll, that are referenced by the project and kept under the bin directory for the project:

MyProject\bin\x86\32bit.dll
MyProject\bin\x64\64bit.dll

You will need to hand edit the MyProject\MyProject.vbproj file and add these entries to it (or change it if they are already in there). The first two entries are for the project references and the last two are for their inclusion as a content item in the project.

   <Reference Include="32bit" Condition="'$(Platform)' == 'x86'">
      <HintPath>bin\x86\32bit.dll</HintPath>
   </Reference>  
   <Reference Include="64bit" Condition="'$(Platform)' == 'x64'">
      <HintPath>bin\x64\64bit.dll</HintPath>
   </Reference>
   ...
   <ItemGroup>    
      <Content Include="bin\x86\32bit.dll" Condition="'$(Platform)' == 'x86'"/>
      <Content Include="bin\x64\64bit.dll" Condition="'$(Platform)' == 'x64'"/>
   </ItemGroup>

Now, whenever the platform is changed to x86, it will only use the x86 conditioned tags and likewise when changed to x64. This can also be applied to other conditions too, check the links below for a complete breakdown of the MSBuild schema and conditions.

WPF and data multithreading deadlocks.

2011-12-21T10:38:00.000-08:00

WPF & Multithreading

In designing applications, it may be advantageous to run data processing methods on a separate thread from the UI. This allows greater responsiveness in the application and takes advantage of concurrent computation.

The cornerstone interface to any binding in WPF is INotifyPropertyChanged, which is implemented on a data structure's different properties. Although this can be the source of deadlocks if the data structure is being processed on a separate thread where locks are explicitly implemented by the developer.

A deadlock could occur if some UI event was being triggered on a data structure (eg. expanding a view, changing a text box entry) while the data thread was processing it with an acquired lock. Internally, the PropertyChanged event will call a Dispatcher.Invoke on the UI. Although the UI could be waiting for the lock to release on that same property that the data thread already has, creating a cyclic dependency or deadlock.

The call stack for the UI thread. Note the wait call.

The call stack for the data thread.

Deadlock Scenario

As an example, suppose you have your standard WPF application UI thread and a separate data thread. The data thread polls a time stamp from a remote database and updates a class property called DataObject.Timestamp, which raises the PropertyChanged event. The property is also bound to a text box in our WPF application window, which is made visible in the window based on the user's preference (ie. collapsing and expanding a view).

In our deadlock example, the data thread puts a lock on the DataObject.Timestamp property while it updates it and other data structures. While this is happening, the view is expanded, causing the text box to start binding to the DataObject.Timestamp property and wait for an acquire of the lock that the data thread is holding. The data thread then sets the DataObject.Timestamp property inside the critical region code and the PropertyChanged event is raised. This causes the new time stamp value to be marshaled back onto the Dispatcher with a blocking/synchronous Dispatcher.Invoke call. Although the data thread now deadlocks in the critical region, waiting for the UI Dispatcher.Invoke to return, which is waiting on the data thread to release the lock.

Solution

To prevent the data thread's deadlock ~~critical region from blocking on the Dispatcher.Invoke~~, override the OnPropertyChanged method to marshal the value asynchronously to the dispatcher.

protected override void OnPropertyChanged(string propertyName)
{
 Application.Current.Dispatcher.BeginInvoke(new Action(() => { base.OnPropertyChanged(propertyName); }));
}

This will ensure that the critical region doesn't block and the property changes are still serviced by the dispatcher.

Tips for Java performance on Android.

2011-12-16T11:32:00.000-08:00

While Java may be a higher level language that makes code writeability easier, conventions must be broken when developing on Android to reap performance gains. This is especially true for games where logic and rendering must be done on a tight loop.

The Android developer page provides helpful tips for optimizing your Java code. Below are the optimizations that I have found to improve performance.

Front load memory allocation at startup instead of dynamically.

We saw the largest performance gain here. The GC can kill performance in a CPU intensive application or game. If you pool your instances in an array instead of dynamically allocating it, you can save the GC calls from happening. This will result in a slightly larger memory footprint if you tweak it right.

Below is code that provides a fixed-size generic list implementation; the internal array never grows/shrinks. The executePendingRemoves routine is needed since object removal must sometimes be put into a critical region with a synchronized block. You may want to queue up removals but still make them available to other threads up until the moment they are removed in bulk.

public class List<T> {

 private int mCount;
 private T mItems[];
 private int mRemoveCount;
 private int[] mRemoveIndices; 
 
 @SuppressWarnings("unchecked")
 public List(int capacity) {
  mItems = (T[]) new Object[capacity];
  mRemoveIndices = new int[capacity];
  mCount = 0;
 }
 
 public void add(T item){
  mItems[mCount] = item;
  mCount++;
 }
 
 public void clear(){
  for(int index=0; index < mCount; index++)
  {
   mItems[index] = null;
  }
  mCount = 0;
 }
 
 public void executePendingRemoves() {
  for (int i=0;i < mRemoveCount; i++) {
   mItems[mRemoveIndices[i]] = null;
  }
  int newCount=0;
  for (int i=0;i < mCount; i++) {
   if (mItems[i] != null) {
    mItems[newCount] = mItems[i];
    newCount++;
   }
  }
  mCount = newCount;
  mRemoveCount = 0;
 }
 
 public T get(int index) { return mItems[index];}
 public int getCount() { return mCount;}
 public int getIndex(T item) {
  for (int i=0; i < mCount; i++) {
   if (mItems[i] == item) return i;
  }
  return -1;
 }
 public int getPendingRemoveCount() { return mRemoveCount; }
 public int[] getPendingRemoves() { return mRemoveIndices; }
 public Object[] getItems() { return mItems;}
 
 public void pendingRemoveAt(int index) {
  for (int i=0; i< mRemoveCount; i++) {
   if (index == mRemoveIndices[i])
    return;
   else if  (index < mRemoveIndices[i]) {
    int swapIndex = mRemoveIndices[i];
    mRemoveIndices[i] = index;
    index = swapIndex;
   }
  }
  mRemoveIndices[mRemoveCount] = index;
  mRemoveCount++;
 }
 
 public void removeAt(int index) {
  if (index < mCount) {
            for (int i = index; i < mCount; i++) {
                if (i + 1 < mItems.length && i + 1 < mCount) {
                 mItems[i] = mItems[i + 1];
                } else {
                 mItems[i]  = null;
                }
            }
            mCount--;
        }
 }
 
 public T removeLast() {
  mCount--;
  
  final T item = mItems[mCount];
  mItems[mCount] = null;
  return item;
 }
 
}

Lastly is the pooling object that can be inherited by any application level instance (eg. FlyingWizardsPool extends ObjectPool). When you want to instantiate a flying wizard object, just type in FlyingWizard wizard = wizardPool.checkOut();.

public abstract class ObjectPool<T> {

 private int mCapacity;
 private List<T> mPool;
 
 @SuppressWarnings({ "unchecked", "rawtypes" })
 public ObjectPool(int capacity) {
  mPool = new List(capacity);
  mCapacity = capacity;
  this.fill();
 }
 
 public void checkIn(T poolItem) {
  mPool.add(poolItem);
 }
 
 public T checkOut() {
  return mPool.removeLast();
 }
 
 public abstract void fill();
 
 public int getCount() {
  return mPool.getCount();
 }
 
 public int getCapacity() {
  return mCapacity;
 }

}

Below is an example implementation of the ObjectPool abstract class. Note how the fill routine is implemented. Also, front loading memory like this means that you will need an initialize on your instances so that the class fields can be put back to a "just instantiated" state. Normally we would use a constructor but to keep the allocation static, an initialize call is necessary.

public static class FlyingWizardPool extends ObjectPool {
 public FlyingWizardPool(int capacity) {
  super(capacity);
 }

 public FlyingWizard checkOut(float x, float y, float destinationX, float destinationY) {
  FlyingWizard wizard = this.checkOut();
  wizard.initialize(x, y);
  return wizard;
 }
  
 @Override
 public void fill() {
  for (int i=0; i < this.getCapacity(); i++) {
   this.checkIn(new FlyingWizard());
  }
 }
}

Choose larger block routines instead of calling out to helper subroutines.
Since the Dalvik is a register based architecture, every time a routine is called from another, the register values must be stored to memory before the call and loaded back in after the routine has returned. Avoiding helper routine inside commonly called routines will reduce this load/store overhead.
Remove accessors and allow direct access to class fields.
Accessors are usually good form and observe the ADT. Although, to avoid the overhead involved in a routine call and additional allocation of memory for the copied out value, it is best to allow direct access to the class field. This makes it more difficult to protect the conditions for the field but is a worthy sacrifice if it is called often.
Make routines static if they don't access class fields.
The Android team recommends using this for an approximate 15-20% performance gain. This means less work for the Dalvik during run-time optimization and the routine can be inlined for faster execution.
Use constants instead of enums.
This is more of a negligible optimization; it was delisted from Android's performance page. Instead of having an enum, you can keep the enumeration as an integer constant.

It is important to follow the ADT for good code writeability/readability and bug minimization. Use these optimizations only on code that really needs it. This means it should apply to routines and fields that are called frequently by your application.

POCO not transforming into a POCO proxy.

2011-11-07T14:19:00.000-08:00

A POCO (Plain Old CLR Object) must be transformed into a POCO proxy
MyNamespace.MyClass ->{System.Data.Entity.DynamicProxies.MyClass_0A943C2FC37D33304CEB497A9210C754E3F553B50315F8E6F0F7A6FAF43777F2})
in order for features like lazy loading and change tracking to work.

MSDN has a great article that explains how to prepare a POCO to be transformed into a proxy. If you find that an object retrieved or attached to the context does not show itself in the DynamicProxies namespace (eg. MyNamespace.MyClass instead of ...DynamicProxies.MyClass), then you probably don't have a compliant class. You may have noticed this already when hovering your mouse cursor over an instance and seeing the runtime information on it.

The distilled points are that your POCO must:

be public, not abstract and not sealed;
have a public or protected parameterless constructor;
have public overridable properties that are read/write if they are navigation properties;
and implement ICollection if they are a one-to-many navigation property.

Detecting cycles in decimal digits and lists.

2011-10-16T02:25:00.000-07:00

I was recently working on solving problem 64 for project Euler and one of the needed algorithms was a detection of cycles in a list. This routine works for detecting cycles in lists of cardinality greater than 2.

def get_cycle_len(N):
  i = k = 0
  for j in range(2, len(N)):
    if N[i] == N[j]:
      if k == 0:
        k = j
        i = 1
      else:
        if i == k:
          return k
        i += 1
    else:
      i = 0
      k = 0
  return -1

Entity Framework June 2011 CTP Bug

2011-10-08T17:22:00.000-07:00

I recently installed the Entity Framework June 2011 CTP to preview the enum support but ran into a bug along the way that reported an error while processing the template.

I encountered this error while trying to run an Entity based project using the "Database & Model First" paradigm. Uninstalling the preview removed the problem. Not sure why this error occured since the preview doesn't modify the assembly it is complaining about not finding an entry point in.

EF 4.1 Code First: Many one-to-one properties for same class type on an object.

2011-10-05T15:00:00.000-07:00

I just started using the Entity Framework recently to employ persistence for an already established code base. Initially I was using the "Model & Database First" paradigm where a designer is used to layout the data model. Although, this ended up being tedious since my POCO instance had to exactly match the POCO proxy instance in my model designer view. After upgrading to EF 4.1, I realized the "Code First" paradigm was the way to go since I had to adapt the persistence model to my already existing code.

One of the earliest problems that I ran into was having more than one property on a class that referenced the same class type. This example was taken from a great site that explored the different EF properties.

public class Customer
{
 public int Id { get; set; }
 public int Age { get; set; }
 public Address BillingAddress { get; set; }
 public Address DeliveryAddress { get; set; }
}

Notice how both the BillingAddress and DeliveryAddress properties references the same class type. I kept trying to establish two 1-to-1 relationships on the class but this caused a problem with cascading deletes and foreign key constraints. I found a solution but it still required exceptional behavior on the part of my context and the database schema.

The most elegant solution was to just treat both of the properties as complex types. The only thing I had to do was remove the Id property from the properties' class and the context took care of the rest. So instead of two tables in the database (one for Customer and the other for Address), there would be only one. Below is a layout of what the table would look like.

Customer
{
 Id
 Age
 BillingAddress_City
 BillingAddress_Street
 BillingAddress_Zipcode
 DeliveryAddress_City
 DeliveryAddress_Street
 DeliveryAddress_Zipcode
}

Andrew's Adventures in Technology

Streaming UUEncoder in .NET.

Flashback

Searching for an Implementation

Creating an Implementation

Using the Code

Note on Licensing

More Resources & Reading

Get Windows Service name from executable in PowerShell.

Log per class pattern.

Rookie Moves

Best Practices

The Difference

More Reading & Resources

Creating XML based templates in log4net.

Motivation

Why Templates?

Writing the Code

Considerations

More Reading & Resources

Using type converters for JSON.NET.

Motivation

Serializing

Implementing a Type Converter

Considerations

More Reading & Resources

Using CC.NET & Gallio for priority based smoke testing.

Pitfalls in Production

Monitoring Confusion

Prioritized Testing

More Reading & Resources

State behavioral pattern to the rescue.

The Creeping Problem

Looking For A Solution

The Code

Ahhhh Yeah...

Using The Right Tool

More Reading & Resources

Understanding type theory (Part one: Lambda Calculus).

Series Introduction

Mathematical Formalisms

Notation & Basic Structure

Binding & Applying Terms

Multiple Parameters

Beta Reductions

Untyped Lambda Calculus

Coming up next

More Reading & Resources

PCAP-NG reader for .NET.

Introduction & Scope

Design Considerations

The Code and NuGet Package

More Reading & Resources

Automated combinatorial and spot testing of web service operations.

Introduction & Scope

Investigating the WSDL

The Code

Conclusion

More Reading & Resources

Text generation using Markov chains.

Introduction

An Example

Text Generation

Random Selection

The Code

Additional Resources

Reading and writing large binary data in T-SQL and ADO.NET.

Introduction

Latest Standard

Example Schema

Example Code

Performance Considerations

Understanding compression and Huffman Coding.

Definitions

Code Width

Data Compression

Variable Width Savings

Prefix Codes

Codeword Assignment

Huffman Coding Algorithm