tag:blogger.com,1999:blog-63630871376678869402024-03-19T05:18:54.364-07:00Andrew's Adventures in TechnologyPersonal discovery of solutions to problems in systems and software engineering.awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.comBlogger32125tag:blogger.com,1999:blog-6363087137667886940.post-72251069512603153032015-04-24T17:09:00.001-07:002015-04-24T17:09:22.379-07:00Streaming UUEncoder in .NET.<h3>Flashback</h3>
<p>
The last time I used Unix-to-Unix format (AKA <a href="http://en.wikipedia.org/wiki/Uuencoding">UUEncoding</a>) was when <a href="http://en.wikipedia.org/wiki/Usenet">USENET</a> was still the big thing and <a href="http://en.wikipedia.org/wiki/Mosaic_%28web_browser%29">Mosaic web browser</a> was just coming out. That was until recently, when I had a requirement to encode and decode this file type.
</p>
<h3>Searching for an Implementation</h3>
<p>
Since <a href="http://en.wikipedia.org/wiki/Base64">Base64</a> has largely replaced this older format, it was hard to find a current implementation for the .NET platform. I did run across <a href="http://geekswithblogs.net/kobush/archive/2005/12/18/63486.aspx">a port of</a> <a href="http://websvn.kde.org/trunk/KDE/kdelibs/kdecore/kcodecs.cpp?view=markup&pathrev=486059">KDE's kcodecs</a>. Although the port wasn't a streaming solution in the context of implementing the Stream class. Also, it allocated a lot of one item byte arrays using the <a href="https://msdn.microsoft.com/en-us/library/system.io.stream.readbyte%28v=vs.110%29.aspx">ReadByte</a> call for each character.
</p>
<h3>Creating an Implementation</h3>
<p>
Originally I tried to create my own solution by implementing the .NET <a href="https://msdn.microsoft.com/en-us/library/system.text.encoder%28v=vs.110%29.aspx">Encoder</a> class but the interface didn't fit the requirements of UUEncoding. For example, the GetBytes call works on a per character basis whereas UUEncoding takes 3 characters at a time. Also, a header and footer needs to be written, and the encoded payload is segmented into lines prefixed by encoded line lengths.
</p>
<p>
I ended up creating my own encoder class that was scoped to only handle data line by line.
</p>
<pre brush="csharp">
public static class UUEncoder
{
// Assumes the current position is at the start of a new line.
public static byte[] DecodeLine(Stream buffer)
{
// ...
}
public static byte[] EncodeLine(byte[] buffer)
{
// ...
}
}
</pre>
<p>
I then created encode and decode Stream classes that depended on the encoder. Having the encoding and decoding happening in a Stream based way was critical for my requirements since I was lazily evaluating the data and wouldn't just read it all up front. This was important since some of the files tended to be Gigabytes in size and an in-memory solution would have created an unacceptable memory footprint. Along with the nastiness that potentially comes with it like <a href="http://en.wikipedia.org/wiki/Thrashing_%28computer_science%29">thrashing</a>.
</p>
<h3>Using the Code</h3>
<p>
You can find my implementation, with tests, on Github <a href="https://github.com/awalsh128/Awalsh128.Text">here</a>.
</p>
</p>
To decode any stream:
</p>
<pre brush="csharp">
using (Stream encodedStream = /* Any readable stream. */)
using (Stream decodedStream = /* Any writeable stream. */)
using (var decodeStream = new UUDecodeStream(encodedStream))
{
decodeStream.CopyTo(decodedStream);
// Decoded contents are now in decodedStream.
}
</pre>
<p>
To encode any stream:
</p>
<pre brush="csharp">
bool unixLineEnding = // True if encoding with Unix line endings, otherwise false.
using (Stream encodedStream = /* Any readable stream. */)
using (Stream decodedStream = /* Any writeable stream. */)
using (var encodeStream = new UUEncodeStream(encodedStream, unixLineEnding))
{
decodedStream.CopyTo(encodeStream);
// Encoded contents are now in encodedStream.
}
</pre>
<h3>Note on Licensing</h3>
<p>
I published the code under version 2 (not 2.1) of the <a href="http://en.wikipedia.org/wiki/GNU_Lesser_General_Public_License">LGPL</a> since I took the bit twiddling and encoder maps from KDE's implementation.
</p>
<h3>More Resources & Reading</h3>
<ul>
<li><a href="https://github.com/awalsh128/Awalsh128.Text">Github: Awalsh128.Text Library (contains UUEncoder implementation)</a></li>
<li><a href="https://www.gnu.org/licenses/lgpl-2.1.html">GNU Lesser General Public License, version 2.1</a></li>
</ul>
awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.com0tag:blogger.com,1999:blog-6363087137667886940.post-18605964377132082432015-04-20T22:50:00.002-07:002015-04-20T22:53:20.126-07:00Get Windows Service name from executable in PowerShell.<p>
I was recently putting some PowerShell scripts together for deployment and maintenance of software to our machine instances. One of the requirements was to be able to discover the service name from a Windows Service executable that uses <a href="https://msdn.microsoft.com/en-us/library/system.serviceprocess.serviceinstaller%28v=vs.110%29.aspx">ServiceInstaller</a>. I needed to be able to extract this value in a generic way in order to query the service and stop it if running. Here is what I was able to put together.
</p>
<pre brush="powershell">
Function Get-WindowsServiceName([string]$exePath)
{
$assembly = [System.Reflection.Assembly]::LoadFrom($exePath)
$type = $assembly.GetTypes() | Where { $_.GetCustomAttributes([System.ComponentModel.RunInstallerAttribute], 0).Length -gt 0 } | Select -First 1
$installer = [System.Configuration.Install.Installer][Activator]::CreateInstance($type)
$serviceInstaller = [System.ServiceProcess.ServiceInstaller]$installer.Installers[0]
$serviceInstaller.ServiceName
}
</pre>
<p>
Note that this doesn't support multiple installers in a single executable.
</p>awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.com0tag:blogger.com,1999:blog-6363087137667886940.post-59204723536073066822015-03-29T14:12:00.002-07:002015-04-20T22:36:30.261-07:00Log per class pattern.<h3>Rookie Moves</h3>
<p>
Awhile ago, I had originally created a single logger for each service and shared it statically across the application.
</p>
<pre class="brush: csharp">
public static class Log
{
private readonly static Lazy<Log> instance = new Lazy<Log>(() => new Log(), true);
public static Log Instance
{
get { return instance.Value; }
}
}
</pre>
<p>
It always had felt strange doing this since it violated <a href="http://en.wikipedia.org/wiki/Encapsulation_%28object-oriented_programming%29">encapsulation</a>. I was referencing a static instance inside my objects without ensuring the existence of the log instance itself, while also making assumptions about the instance's state. Essentially reaching outside the class every time to the <a href="http://en.wikipedia.org/wiki/Singleton_pattern">singleton</a>, which is a global state instance in my case.
</p>
<h3>Best Practices</h3>
<p>
As I researched more on best practices with logging, logger per class appeared to be the best pattern since it offered the most fine grain control with respect to filtering and configuration.
</p>
<p>
When using logging in your class, you should separate the concern of how the log gets created from the use of the log. This can be achieved by having a factory accessor.
</p>
<pre class="brush: csharp">
public class Log
{
// Set your factory property to the actual log implementation you wish to use.
public static Func<Type, Log> Factory { get; set; }
// Instance based properties and methods.
}
</pre>
<p>
You can also use the <a href="http://en.wikipedia.org/wiki/Abstract_factory_pattern">abstract factory pattern</a> if you have to wrap your logging implementations.
</p>
<pre class="brush: csharp">
public abstract class LogBase
{
// Set your factory property to the actual log implementation you wish to use.
public static Func<Type, LogBase> Factory { get; set; }
// Abstract properties and methods.
}
</pre>
<p>
Then just call the factory by passing in the class type it is for.
</p>
<pre class="brush: csharp">
public class SomeObject
{
private static readonly LogBase log = Log.Factory(typeof(SomeObject));
}
</pre>
<h3>The Difference</h3>
<p>
It may not seem like much of a change but subtle differences are happening:
</p>
<ul>
<li>Calling class can now communicate it's state to the log's creation.</li>
<li>Log creation is no longer the responsibility, implicitly or otherwise, of the class.</li>
</ul>
<p>
In my case, this was very liberating since we had several log implementations in our large codebase. I no longer needed to worry about which log implementation was being used by messing with the singleton construction, and could leverage filtering when I need to isolate a single component or service that was causing trouble. Something that I couldn't do before.
</p>
<h3>More Reading & Resources</h3>
<ul>
<li><a href="http://c2.com/cgi/wiki?LoggingBestPractices">C&C Inc: Logging Best Practices</a></li>
<li><a href="http://www.beefycode.com/post/Log4Net-Recommended-Practices-pt-1-Your-Code.aspx">beefycode: Log4Net Recommended Practices pt 1: Your Code</a></li>
<li><a href="http://www.jayway.com/2011/06/13/a-nice-basic-log4net-setup/">JayWay: A nice, basic log4net setup</a></li>
<li><a href="http://stackoverflow.com/questions/3143929/why-do-loggers-recommend-using-a-logger-per-class">Stack Overflow: Why do loggers recommend using a logger per class?</a></li>
<li><a href="http://stackoverflow.com/questions/5504148/log4net-configure-to-ignore-messages-from-a-specific-class">Stack Overflow: log4net: Configure to ignore messages from a specific class</a></li>
<li><a href="http://rolf-engelhard.de/2013/03/logging-anti-patterns-part-i/">Rolf Engelhard: Logging Anti-Patterns, Part I</a></li>
</ul>awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.com0tag:blogger.com,1999:blog-6363087137667886940.post-80456546654694215992015-03-28T23:51:00.002-07:002015-03-28T23:51:06.663-07:00Creating XML based templates in log4net.<h3>Motivation</h3>
<p>
I decided to use <a href="https://logging.apache.org/log4net/">log4net</a> for a recent project I had been working on. There is a concept of <a href="https://logging.apache.org/log4net/release/manual/configuration.html#loggers">loggers</a> and <a href="https://logging.apache.org/log4net/release/manual/configuration.html#appenders">appenders</a>. Loggers are composed of appenders and a log threshold. Appenders are consumers of logging information and provide specific implementations (eg. file, email, event log, database). You can configure the loggers and appenders either in the <a href="https://logging.apache.org/log4net/release/manual/configuration.html">application configuration</a> or <a href="http://www.roelvanlisdonk.nl/?p=723">at runtime</a>.
</p>
<p>
Configuring logging in the application configuration provides the most flexibility. It is great being able to change settings on the fly, especially when it is running in a production environment and redeploying the build is out of the question. Although this approach comes at the expense of having a lot of information in your application configuration for the loggers and appenders. No big deal though if you just have to configure once.
</p>
<h3>Why Templates?</h3>
<p>
I had quite a bit of projects with a lot of redundant logging configuration information in each one's application configuration file. Much of the information had a standard form that we wanted uniform across our different projects too (eg. log file name conventions, event log setup, email format). Also, if we updated the logging appender configuration for a new standard, we would need to do it to every project's application configuration file. This is where templates came into play.
</p>
<h3>Writing the Code</h3>
<p>
To cut down the amount of configuration information needed to start a new project with logging and make the configuration more uniform where needed, we offloaded it into the code and left the rest in the application configuration file like so.
</p>
<pre class="brush: xml">
<log4net xsi:noNamespaceSchemaLocation="log4net.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<logger name="LoggerTemplate">
<appender name="SmtpAppenderTemplate" type="log4net.Appender.SmtpAppender">
<to value="peter@initech.com" />
<from value="system@initech.com" />
<smtpHost value="mail.initech.com" />
<username value="peter" />
<password value="abc123" />
</appender>
</logger>
<root>
<level value="INFO" />
</root>
</log4net>
</pre>
<p>
We don't use the logger directly but rather as a template for our <a href="https://logging.apache.org/log4net/release/manual/configuration.html#root">root logger</a>. Now we just need to craft a method to consume the template and create the root appenders at runtime.
</p>
<pre class="brush: csharp">
/// <summary>
/// Get appenders matching the logger template name and use them to populate the root appenders at runtime.
/// </summary>
/// <param name="loggerTemplateName">The logger template name found in the application configuration.</param>
public static void ConfigureRootFromTemplate(string loggerTemplateName)
{
ILog logTemplate = LogManager.GetLogger(loggerTemplateName);
if (logTemplate == null)
{
throw new ArgumentException(
String.Format(
"Logger template {0} not found in log4net configuration. Make sure there is an " +
"logger in the log4net configuration with the name {0}.",
loggerTemplateName),
"loggerTemplateName");
}
IAppender[] appenderTemplates = logTemplate.Logger.Repository.GetAppenders();
var smtpAppenderTemplate = appenderTemplates.FirstOrDefault(a => a is SmtpAppender) as SmtpAppender;
if (smtpAppenderTemplate == null)
{
throw new ArgumentException(
String.Format(
"SmtpAppender template not found in log4net configuration. Make sure there is an " +
"SmtpAppender in the log4net {0} logger.",
loggerTemplateName),
"loggerTemplateName");
}
// Can repeat the above pattern with other appenders as well.
// Create appenders using the template information from above.
AddAppendersToRootAndConfigure(
new AppenderCollection
{
// Put your created appenders here.
});
}
private static void AddAppendersToRootAndConfigure(AppenderCollection appenders)
{
// Get the log repository.
var hierarchy = (Hierarchy)log4net.LogManager.GetRepository();
// Get the root logger.
Logger rootLogger = hierarchy.Root;
foreach (var appender in appenders)
{
// Add all the appenders and activate.
rootLogger.AddAppender(appender);
((AppenderSkeleton)appender).ActivateOptions();
}
// Flag root logger as configured with new appender information.
rootLogger.Hierarchy.Configured = true;
}
</pre>
<p>
Then just call the configuration method at the application's startup.
</p>
<pre class="brush: csharp">
class Program
{
/// <summary>
/// The main entry point for the application.
/// </summary>
static void Main()
{
// Other startup configuration code.
log4net.Config.XmlConfigurator.Configure(); // Load the application configuration information.
Log.ConfigureRootFromTemplate("LoggerTemplate");
// More startup configuration code.
}
</pre>
<h3>Considerations</h3>
<p>
I don't recommend using this approach in all cases. It definitely cuts down the amount of application configuration needed but at the cost of information hiding, since it has been moved to code. Also, it may not be obvious to an uninitiated developer what your application configuration is doing, especially since this template approach is not encoded into the structure of log4net's XML. Although, if you have many projects and need to effect changes to logging across them, this may be a good solution for you.
</p>
<h3>More Reading & Resources</h3>
<ul>
<li><a href="http://c2.com/cgi/wiki?LoggingBestPractices">C&C Inc: Logging Best Practices</a></li>
<li><a href="http://www.beefycode.com/post/Log4Net-Recommended-Practices-pt-1-Your-Code.aspx">beefycode: Log4Net Recommended Practices pt 1: Your Code</a></li>
<li><a href="http://www.jayway.com/2011/06/13/a-nice-basic-log4net-setup/">JayWay: A nice, basic log4net setup</a></li>
<li><a href="http://stackoverflow.com/questions/3143929/why-do-loggers-recommend-using-a-logger-per-class">Stack Overflow: Why do loggers recommend using a logger per class?</a></li>
<li><a href="http://stackoverflow.com/questions/5504148/log4net-configure-to-ignore-messages-from-a-specific-class">Stack Overflow: log4net: Configure to ignore messages from a specific class</a></li>
<li><a href="http://stackoverflow.com/questions/571876/best-way-to-dynamically-set-an-appender-file-path">Stack Overflow: Best way to dynamically set an appender file path</a></li>
</ul>awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.com0tag:blogger.com,1999:blog-6363087137667886940.post-29972756035644911042015-03-28T22:10:00.000-07:002015-03-28T23:52:45.190-07:00Using type converters for JSON.NET.<h3>Motivation</h3>
<p>
I had these <a href="http://en.wikipedia.org/wiki/Flyweight_pattern">flyweights</a> that added a lot of overhead to the <a href="http://en.wikipedia.org/wiki/Serialization">serialization</a> process. They weren't really needed in the serialized payload either. In fact, I could recreate the flyweight in memory from just a single property on the object.
</p>
<pre class="brush: csharp">
public class FlyweightObject
{
public string Key { get; private set; }
public string AProperty { get; private set; }
public string AnotherProperty { get; private set; }
public string YetAnotherProperty { get; private set; }
// ... Lots of properties.
// Overridden GetHashCode and Equals methods to make equality by Key.
}
</pre>
<pre class="brush: csharp">
public class FlyweightObjectFactory
{
public static FlyweightObjectFactory Instance { get; private set; }
// Singleton initialization code.
public FlyweightObject GetObject(string key)
{
// Get from dictionary, or create object and add to dictionary.
// Return object from dictionary.
}
}
</pre>
<p>
Nothing out of the ordinary here. Although, these flyweights were also used as keys in a dictionary I was serializing. This was a problem too because <a href="http://www.newtonsoft.com/json/help/html/SerializationGuide.htm#Dictionarys">only scalars can be used when serializing dictionaries with JSON.NET</a>. It does mention that I can use a <a href="https://msdn.microsoft.com/en-us/library/system.componentmodel.typeconverter%28v=vs.110%29.aspx">type converter</a> though.
</p>
<h3>Serializing</h3>
<p>
Let focus on serializing this object to a <a href="http://en.wikipedia.org/wiki/Primitive_data_type">scalar</a> first. The <a href="http://www.newtonsoft.com/json/help/html/SerializationGuide.htm">JSON.NET serialization guide</a> mentions that I can override the <a href="https://msdn.microsoft.com/en-us/library/system.object.tostring%28v=vs.110%29.aspx"><code>Object.ToString</code></a> method. So let's do that.
</p>
<pre class="brush: csharp">
public class FlyweightObject
{
public string Key { get; private set; }
public string AProperty { get; private set; }
public string AnotherProperty { get; private set; }
public string YetAnotherProperty { get; private set; }
// ... Lots of properties.
// Overridden GetHashCode and Equals methods to make equality by Key.
public override string ToString()
{
return this.Key;
}
}
</pre>
<p>
I'm done right? Who needs type converters? Well this doesn't help us when I need to deserialize. This is where type converters come into play.
</p>
<h3>Implementing a Type Converter</h3>
<p>
First we have to provide the concrete implementation of our type converter class for the flyweight.
</p>
<pre class="brush: csharp">
internal class FlyweightObjectConverter
: TypeConverter
{
/// <summary>
/// Returns whether this converter can convert an object of the given type to the type of this converter, using the specified context.
/// </summary>
/// <returns>
/// true if this converter can perform the conversion; otherwise, false.
/// </returns>
/// <param name="context">An <see cref="T:System.ComponentModel.ITypeDescriptorContext"/> that provides a format context. </param>
/// <param name="sourceType">A <see cref="T:System.Type"/> that represents the type you want to convert from. </param>
public override bool CanConvertFrom(ITypeDescriptorContext context, Type sourceType)
{
return sourceType == typeof(string) || base.CanConvertFrom(context, sourceType);
}
/// <summary>
/// Converts the given object to the type of this converter, using the specified context and culture information.
/// </summary>
/// <returns>
/// An <see cref="T:System.Object"/> that represents the converted value.
/// </returns>
/// <param name="context">An <see cref="T:System.ComponentModel.ITypeDescriptorContext"/> that provides a format context. </param>
/// <param name="culture">The <see cref="T:System.Globalization.CultureInfo"/> to use as the current culture. </param>
/// <param name="value">The <see cref="T:System.Object"/> to convert. </param><exception cref="T:System.NotSupportedException">The conversion cannot be performed. </exception>
public override object ConvertFrom(ITypeDescriptorContext context, CultureInfo culture, object value)
{
return FlyweightObjectFactory.Instance.GetObject((string)value);
}
}
</pre>
<p>
Then attribute the convertible class with the implementation.
</p>
<pre class="brush: csharp">
[TypeConverter(typeof(FlyweightObjectConverter))]
public class FlyweightObject
{
public string Key { get; private set; }
public string AProperty { get; private set; }
public string AnotherProperty { get; private set; }
public string YetAnotherProperty { get; private set; }
// ... Lots of properties.
// Overridden GetHashCode and Equals methods to make equality by Key.
}
</pre>
<p>
Good to go. Now I can deserialize the flyweights. The JSON.NET secret sauce can be found in <a href="https://github.com/JamesNK/Newtonsoft.Json/blob/master/Src/Newtonsoft.Json/Utilities/ConvertUtils.cs">the static method <code>Newtonsoft.Json.Utilities.ConvertUtils.TryConvertInternal</code></a>. This method is used by the internal reader class for its own deserialization method and consequently for the final Newtonsoft.Json.JsonSerializer class, which is the lynchpin for all the serialization helper methods in the JSON.NET ecosystem.
</p>
<h3>Considerations</h3>
<p>
Instead of overriding the <code>Object.ToString</code> method, I could implement the <code>TypeConverter.CanConvertTo</code> and <code>TypeConverter.ConvertTo</code> methods. It is really up to you. In my case, I already had the <code>Object.ToString</code> method overridden since the key property uniquely identifies the object without having to create an implicitly structured string (ie. comma delimited properties as a single string; "Prop1|Prop2|Prop3").
</p>
<p>
If you only need a type conversion for JSON, you can also use JSON.NET's own <a href="http://www.newtonsoft.com/json/help/html/CustomJsonConverter.htm">custom type converters</a>. Although this doesn't work in the case of dictionary keys.
</p>
<p>
Lastly, if you have a <a href="http://en.wikipedia.org/wiki/Composite_data_type">complex type</a> as your dictionary key, you may want to consider just serializing it as a collection and then deserializing it using a JSON.NET custom converter.
</p>
<h3>More Reading & Resources</h3>
<ul>
<li><a href="https://msdn.microsoft.com/en-us/library/98bbex99%28v=vs.110%29.aspx">MSDN: Type Conversion in the .NET Framework</a></li>
<li><a href="https://msdn.microsoft.com/en-us/library/ayybcxe5.aspx">MSDN: How to: Implement a Type Converter</a></li>
<li><a href="https://json.codeplex.com/workitem/23872">JSON.NET: Add support for IReadOnlyList<T> and IReadOnlyDictionary<TKey, TValue></a> - Provides <code>JsonConverter</code> implementations for read only collections.</li>
</ul>awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.com0tag:blogger.com,1999:blog-6363087137667886940.post-21561425026234506102014-10-17T11:07:00.002-07:002015-03-19T10:28:59.720-07:00Using CC.NET & Gallio for priority based smoke testing.<h3>Pitfalls in Production</h3>
<p>
Being able to monitor <a href="http://en.wikipedia.org/wiki/Development_environment_(software_development_process)">production</a> services for potential errors is critical. Especially if the services are dependant on external resources which may become unavailable or exhibit unexpected behavior. Even if you follow good software development discipline, this is always a source of concern. Think network outages, unannounced third party API changes, hosted services becoming unavailable, etc.
</p>
<p>
For large software projects, creating a <a href="http://en.wikipedia.org/wiki/Software_testing">testing strategy</a> that involves unit and integration is helpful when managing complexity of the commit to deployment workflow. Functional/smoke tests are also good to ensure critical functionality works as expected. Although, in an environment where the running of your software is dependent on external resources, you need a system of continuous monitoring that run these smoke tests.
</p>
<h3>Monitoring Confusion</h3>
<p>
At <a href="https://www.xignite.com/">Xignite</a>, we use <a href="https://code.google.com/p/mb-unit/">Gallio</a> for our testing framework and <a href="http://www.cruisecontrolnet.org">CC.NET</a> for our continuous integration. I used these tools for my production smoke tests but soon realized that not all tests were equal. Getting paged at 2am for a test failure that is not mission critical sucks. Even worse, these lower priority failures can mask very high priority ones since they cause the entire <a href="http://en.wikipedia.org/wiki/Test_fixture">test fixture</a> to fail, and require scrutiny on behalf of the dev-ops person to ensure that a high priority failure doesn't fall through the cracks.
</p>
<p>
Consider the following test fixture and how it lumps all the different tests together.
</p>
<pre class="brush: csharp">
namespace MyServices.Tests.Smoke
{
using Gallio.Framework;
using Gallio.Model;
using MbUnit.Framework;
[TestFixture]
public class OneOfMyServicesTests
{
[SetUp]
public void Setup()
{
// Setup an individual test to run.
}
[FixtureSetUp]
public void TestFixtureSetUp()
{
// Configure fixture for testing.
}
[FixtureTearDown]
public void TestFixtureTearDown()
{
if (TestContext.CurrentContext.Outcome == TestOutcome.Failed)
{
// Send signal to monitoring system that test fixture has failed.
}
else if (TestContext.CurrentContext.Outcome == TestOutcome.Passed)
{
// Send signal to monitoring system that test fixture has succeeded.
}
else
{
// Handle some other outcome.
}
}
[Test]
public void MissionCritical()
{
// ...
}
[Test]
public void Important()
{
// ...
}
[Test]
public void MinorFunctionality()
{
// ...
}
}
}
</pre>
<p>
Any test failure will make the entire context outcome to fail whether it was the mission critical test or the test that affects minor functionality. I tried looking through the Gallio/MbUnit API, event its <a href="https://github.com/Gallio/mbunit-v3">source</a>, but couldn't find a way to find out which tests failed within a fixture. If anyone knows how to determine this, please let me know.
</p>
<h3>Prioritized Testing</h3>
<p>
What I do know though is that you can inherit the <code><a href="https://github.com/Gallio/mbunit-v3/blob/master/src/MbUnit/MbUnit/Framework/TestAttribute.cs">TestAttribute</a></code> class and override its <code>Execute</code> method. I created a required parameter to specify the priority of the test and then used a <code>LoggedTestingContext</code> class to store all the results.
</p>
<pre class="brush: csharp">
namespace MyServices.Tests
{
using Gallio.Framework.Pattern;
using MbUnit.Framework;
public class LoggedTestAttribute
: TestAttribute
{
public const int MinPriority = 1;
public const int MaxPriority = 3;
private readonly int priority;
public int Priority { get { return this.priority; } }
public LoggedTestAttribute(int priority)
{
if (priority < MinPriority || priority > MaxPriority)
{
throw new ArgumentException("Priority must be 1, 2, or 3.", "priority");
}
this.priority = priority;
}
protected override void Execute(PatternTestInstanceState state)
{
try
{
base.Execute(state);
LoggedTestingContext.AddTest(this, state.Test, true);
}
catch (Exception)
{
LoggedTestingContext.AddTest(this, state.Test, false);
throw;
}
}
}
}
</pre>
<pre class="brush: csharp">
namespace MyServices.Tests
{
using System;
using System.Collections.Generic;
using System.Linq;
using Gallio.Framework;
using Gallio.Model.Tree;
public static class LoggedTestingContext
{
private class TestFailure
{
public string FullName { get; private set; }
public LoggedTestAttribute TestAttribute { get; private set; }
public TestFailure(LoggedTestAttribute testAttribute, Test test)
{
this.FullName = test.FullName;
this.TestAttribute = testAttribute;
}
}
private const int PriorityCount = LoggedTestAttribute.MaxPriority - LoggedTestAttribute.MinPriority + 1;
private static readonly Dictionary<string, TestFailure> nameToFailure = new Dictionary<string, TestFailure>();
internal static void AddTest(LoggedTestAttribute testAttribute, Test test, bool passed)
{
if (passed)
{
return;
}
var failure = new TestFailure(testAttribute, test);
if (!nameToFailure.ContainsKey(failure.FullName))
{
nameToFailure.Add(failure.FullName, failure);
}
}
private static bool HasFailed(Test fixtureTest, int priority)
{
return fixtureTest.Children
.Any(c =>
nameToFailure.ContainsKey(c.FullName) &&
nameToFailure[c.FullName].TestAttribute.Priority == priority);
}
public static void LogSmokeTests(Test fixtureTest, string serviceName)
{
foreach (var priority in Enumerable.Range(LoggedTestAttribute.MinPriority, PriorityCount))
{
if (HasFailed(fixtureTest, priority))
{
// Send signal to monitoring system that test fixture has failed for priority # tests.
}
else
{
// Send signal to monitoring system that test fixture has succeeded for priority # tests.
}
}
}
}
}
</pre>
<p>
Finally, putting it together, I replaced the <code>TestAttribute</code> with the new <code>LoggedTestAttribute</code> and then process the results in the test fixture teardown.
</p>
<pre class="brush: csharp">
namespace MyServices.Tests.Smoke
{
using Gallio.Framework;
using Gallio.Model;
using MbUnit.Framework;
[TestFixture]
public class OneOfMyServicesTests
{
[SetUp]
public void Setup()
{
// Setup an individual test to run.
}
[FixtureSetUp]
public void TestFixtureSetUp()
{
// Configure fixture for testing.
}
[FixtureTearDown]
public void TestFixtureTearDown()
{
LoggedTestingContext.LogSmokeTests(TestContext.CurrentContext.Test, "OneOfMyServices");
}
[LoggedTest(1)]
public void MissionCritical()
{
// ...
}
[LoggedTest(2)]
public void Important()
{
// ...
}
[LoggedTest(3)]
public void AffectsFunctionalityByDoesntRequireImmediateAttention()
{
// ...
}
}
}
</pre>
<h3>More Reading & Resources</h3>
<ul>
<li><a href="https://code.google.com/p/mb-unit/downloads/list">mb-unit: Downloads</a></li>
<li><a href="http://www.cruisecontrolnet.org/">CruiseControl.NET</a></li>
</ul>awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.com0tag:blogger.com,1999:blog-6363087137667886940.post-81827820514942536142014-03-17T22:27:00.000-07:002014-03-20T10:45:31.364-07:00State behavioral pattern to the rescue.<h3>The Creeping Problem</h3>
<p>
I recently found myself developing a request-response style system, where the lifetime of a request could be interrupted at any moment. For most same process execution, like your average desktop application, this is a concern, but arises more often when dealing with multiple coordinated processes. My case ended up being the latter.
</p>
<p>
One of the ways to ensure redundancy is to isolate the steps of the request-response workflow into isolated atomic units or states. This way, if it fails, it can always be re-executed without having to perform the work that came before it. It is especially helpful when the total resources required are large and there is a higher probability of failure. We can just divvy up the work into states that act like <a href="http://stackoverflow.com/questions/1077412/what-is-an-idempotent-operation">idempotent functions</a>. Below is a great simplification of the actual project I worked on but I wanted to boil it down to its simplest form, eliminating excessive states that I have collapsed to the CreateResponse state.
</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhK7jKmfmX6PR-Y90l-g3gy4IMosakHQObBTXGxnppsE1-NCjoYHyCbzBTIzhVw3Pydx_Q3XvnuRaDC0LT4VAv34W58BGjT6FVAAxnFLzWKm2DQRxvVZMbfMb8l7zCdwBauSkLCT0n1haEb/s1600/requestresponse-states.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhK7jKmfmX6PR-Y90l-g3gy4IMosakHQObBTXGxnppsE1-NCjoYHyCbzBTIzhVw3Pydx_Q3XvnuRaDC0LT4VAv34W58BGjT6FVAAxnFLzWKm2DQRxvVZMbfMb8l7zCdwBauSkLCT0n1haEb/s1600/requestresponse-states.png" /></a></div>
<p>
In my original implementation, I modeled the requests as queued items (UserRequest) that I would dequeue and start work on.
</p>
<pre class="brush: csharp">
foreach (var request in requests) // as IEnumerable<UserRequest>
{
switch (request.State)
{
case State.ReceiveRequest:
if (TryReceiveRequest(request)) request.State = State.CreateResponse;
break;
case State.CreateResponse:
if (TryCreateResponse(request)) request.State = State.SendResponse;
break;
case State.SendResponse:
if (TrySendResponse(request)) request.State = State.ResponseSent;
break;
case State.ResponseSent:
break;
case State.Faulted:
default:
throw new ArgumentOutOfRangeException("request.State");
}
if (request.State != State.ResponseSent && request.State != State.Faulted)
requests.Enqueue(request);
}
</pre>
<p>
Seems simple enough, but in my case the CreateResponse state ended being fairly computationally intensive and could take anywhere from a few seconds to several minutes. These long delays could be due to the workload of remote processes it was waiting on, temporal failure points like the network or even the system the process was running on. Another added complexity was that these requests were being serviced in parallel, by multiple processes that could be on the same system or not. Lastly, actual production level code never ends up being this simple; you quickly find yourself adding a lot of instrumentation and covering of edge cases.
</p>
<pre class="brush: csharp">
foreach (var request in requests)
{
log.LogDebug("request.Id = {0}: Request dequeued in state {1}.", request.Id, request.State);
switch (request.State)
{
case State.ReceiveRequest:
logger.LogDebug("request.Id = {0}: Trying to receive request.", request.Id);
if (TryReceiveRequest(request)) request.State = State.CreateResponsePart1;
break;
case State.CreateResponsePart1:
logger.LogDebug("request.Id = {0}: Trying to create response for part 1.", request.Id);
if (TryCreateResponsePart1(request)) request.State = State.CreateResponsePart2;
break;
case State.CreateResponsePart2:
logger.LogDebug("request.Id = {0}: Trying to create response for part 2.", request.Id);
if (TryCreateResponsePart2(request))
{
request.State = State.CreateResponsePart3;
}
else
{
request.State = State.CreateResponsePart1;
ExecuteCreateResponsePart2Cleanup();
logger.LogError("request.Id = {0}: Unexpected failure while evaluation create response part 2.", request.Id);
}
break;
case State.CreateResponsePart3:
logger.LogDebug("request.Id = {0}: Trying to create response for part 3.", request.Id);
bool unrecoverable;
if (TryCreateResponsePart3(request, out unrecoverable))
{
request.State = State.SendResponse;
}
else
{
if (unrecoverable)
{
logger.LogError("request.Id = {0}: Failure is unrecoverable, faulting request.", request.Id);
request.State = State.Faulted;
}
else
{
request.State = State.CreateResponse2;
}
}
break;
case State.SendResponse:
logger.LogDebug("request.Id = {0}: Trying to send response.", request.Id);
if (TrySendResponse(request)) request.State = State.ResponseSent;
break;
case State.ResponseSent:
break;
case State.Faulted:
logger.LogCritical("request.Id = {0}: Request faulted.", request.Id);
break;
default:
throw new ArgumentOutOfRangeException("request.State");
}
log.LogDebug("request.Id = {0}: Request transitioned to state {1}.", request.Id, request.State);
if (request.State != State.ResponseSent && request.State != State.Faulted)
{
logger.LogDebug("request.Id = {0}: Re-enqueuing request for further evaluation.", request.Id);
requests.Enqueue(request);
}
else
{
logger.LogDebug("request.Id = {0}: Request evaluation is complete, not re-enqueuing.", request.Id);
}
}
</pre>
<p>
What a mess! This code quickly starts getting bloated. In addition, not every state evaluation will be successful and be considered exceptional. Maybe it is polling another process and can't transition until that process is ready. As the result of each state evaluation changes beyond a simple yes/no (true/false), we end up with a state machine that could have multiple transitions. This makes for ugly code and too much coupling. All the state evaluation logic is in the same class and you have this huge switch statement. We could get around the extensive logging by using dependency injection but what do we inject? There is no consistent call site signature to inject to. The ever growing case statements could be <a href="http://refactoring.com/catalog/extractMethod.html">extracted into their own method</a>, but then <a href="http://programmers.stackexchange.com/questions/162923/what-defines-code-readability">readability</a> suffers. This sucks.
</p>
<p>
You may be saying, "Well you obviously could solve it by ..." and I would agree with you. This code ugliness could be solved many different ways, and is intentionally crap for the purpose of this post. The major problems I faced was:
</p>
<ul>
<li>A large state machine object and large code blocks.</li>
<li>Lack of symmetry in state handling.</li>
<li>Multiple method <a href="http://en.wikipedia.org/wiki/Postcondition">postconditions</a> that couldn't be expressed by the boolean return result alone.</li>
<li>Coupling of state transition logic, business logic and diagnostics.</li>
</ul>
<p>
I knew something was wrong but I wasn't quite sure how to solve it without adding more complexity to the system and allowing readability to suffer. As someone who has had to spend hours reading other people's unreadable code, I didn't want to commit the same sin.
</p>
<h3>Looking For A Solution</h3>
<p>
In university, they teach you how to be a good Computer Scientist; you learn complexity analysis, synthetic languages and the theoretical underpinning of computation. Although, none of this really prepares you to be a software engineer. I could concoct my own system, by why do this when I can stand on the shoulders of giants.
</p>
<p>
I always read or heard references to the <a href="http://en.wikipedia.org/wiki/Design_Patterns">Gang of Four book</a>, even listened to talks by the original authors and became familiar with some of the more famous patterns (<a href="http://en.wikipedia.org/wiki/Factory_method_pattern">Factory</a> and <a href="http://en.wikipedia.org/wiki/Singleton_pattern">Singleton</a> come to mind). Maybe there was a solution in there. I can't be the first one to come across this simple design problem. So there I found it in the <a href="http://en.wikipedia.org/wiki/State_pattern">State design pattern</a>.
</p>
<div class="separator" style="clear: both; text-align: center;"><a href="http://upload.wikimedia.org/wikipedia/commons/e/e8/State_Design_Pattern_UML_Class_Diagram.svg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://upload.wikimedia.org/wikipedia/commons/e/e8/State_Design_Pattern_UML_Class_Diagram.svg" /></a></div>
<p>
The design is pretty simple. You have a context that is used by the end user, and the states themselves wrapped by the context. The context can have a range of methods that behave differently based on the concrete state type being used at that moment (eg. behavior of a cursor click in a graphics editor). I modified this design, using a single method to abstract workflow and act as a procedural agent for processing multiple state machines.
</p>
<h3>The Code</h3>
<p>
The first step was to construct a state object that will be the super type to all of my concrete states.
</p>
<pre class="brush: csharp">
public abstract class StateBase
{
// Let the concrete type decide what the next transition state will be.
protected abstract StateBase OnExecute();
public StateBase Execute()
{
// Can add diagnostic information here.
return this.OnExecute();
}
}
</pre>
<p>
Next I need a context class that can create and run the state machine.
</p>
<pre class="brush: csharp">
public abstract class StateContextBase
{
private StateBase state;
protected abstract StateBase OnCreate();
protected abstract StateBase OnExecuted(StateBase nextState);
protected abstract bool OnIsRunning(StateBase state);
public StateContextBase(StateBase state)
{
this.state = state;
}
public StateContextBase Execute()
{
// Need to create the state machine from something.
if (this.state == null)
{
// We will get to this later.
this.state = this.OnCreate();
}
// Let the concrete context decide what to do after a state transition.
this.state = this.OnExecuted(state.Execute());
return this;
}
public bool IsRunning()
{
// Have the concrete type tell us when it is in the final state.
return this.OnIsRunning(this.state);
}
}
</pre>
<p>
While glossing over the details, what will this look like at the application's entry point.
</p>
<pre class="brush: csharp">
class Program
{
static void Main(string[] args)
{
// Will need to get it from somewhere but won't worry about this for now.
var requests = Enumerable.Empty<StateContextBase>();
// Can be changed to false on an exit call.
var running = true;
while (running)
{
requests = requests
.Where(r => r.IsRunning())
.Select(r => r.Execute());
}
}
}
</pre>
<p>
That is beautiful! All I see is the state machine decider logic and I don't even need to be concerned with what type of state machines are running.
</p>
<p>
So let's dive into the details. First, there is the creation of the state into memory. We have to get this from somewhere, so let's add another abstraction on top of our StateBase super type. Something that can be persisted in case the process crashes and can be accessed across many systems (eg. database).
</p>
<p>
In my case, I used the <a href="http://en.wikipedia.org/wiki/Entity_Framework">Entity Framework</a> <a href="http://en.wikipedia.org/wiki/Object-relational_mapping">ORM</a>, which is based off of the <a href="http://martinfowler.com/eaaCatalog/unitOfWork.html">unit of work</a> and <a href="http://martinfowler.com/eaaCatalog/repository.html">repository</a> design patterns. There is a context (DataContext) that I will get my model object (UserRequest) from to figure out the current state. A unique key (UserRequest.Id : Guid) will be used to identify the persisted object. We won't concern ourselves as to why this is just unique and not an identity key (that could be in another post) but it basically comes down to the object's initial creation at runtime not relying on any persistence store for uniqueness.
</p>
<pre class="brush: csharp">
public class DataContext
: System.Data.Entity.DbContext
{
public DbSet<UserRequest> UserRequests { get; set; }
public DataContext()
: base("name=DataContext")
{
}
}
</pre>
<pre class="brush: csharp">
public abstract class PersistedStateBase<TEntity>
: StateBase
where TEntity : class
{
private Guid id;
protected abstract StateBase OnExecuteCommit(DataContext context, Guid id, TEntity entity);
protected abstract TEntity OnExecuteCreate(DataContext context, Guid id);
protected abstract StateBase OnExecuteRollback(DataContext context, Guid id, TEntity entity);
public PersistedStateBase(Guid id)
{
this.id = id;
}
protected override StateBase OnExecute()
{
// Also consider exceptions thrown by DataContext.
StateBase nextState = this;
using (var context = new DataContext())
{
TEntity entity = null;
try
{
entity = this.OnExecuteCreate(context, this.id);
nextState = this.OnExecuteCommit(context, this.id, entity);
context.SaveChanges();
}
catch (Exception ex)
{
// Handle exception.
nextState = this.OnExecuteRollback(context, this.id, entity);
}
}
return nextState;
}
}
</pre>
<p>
The model object (UserRequest, our entity type) will hold the state as an enumeration (UserRequest.State) and contain all the data needed for processing through the state machine.
</p>
<pre class="brush: csharp">
public enum UserRequestState
{
None = 0,
Receive = 1,
CreateResponse = 3,
SendResponse = 4,
ResponseSent = 5,
Faulted = -1,
}
</pre>
<pre class="brush: csharp">
[DataContract]
public class UserRequest
{
[DataMember]
public Guid Id { get; private set; }
[DataMember]
public UserRequestState State { get; private set; }
// Other properties here like the location of the user request and other metadata.
private UserRequest() // Required by EF to create the POCO proxy.
{}
public UserRequest(Guid id, UserRequestState state)
{
this.Id = id;
this.State = state;
}
}
</pre>
<p>
Now let's implement our first state using the types we have created.
</p>
<pre class="brush: csharp">
public class ReceiveState
: PersistedStateBase<UserRequest>
{
public ReceiveState(Guid id)
: base(id)
{}
protected override StateBase OnExecuteCommit(DataContext context, Guid id, UserRequest entity)
{
var successful = false;
var faulted = false;
// Receive user request and decide whether successful, unsuccessful with retry or
// unrecoverable/faulted.
if (successful)
{
return new CreateResponseState(id);
}
else
{
return faulted ? new FaultedState(id) : this;
}
}
protected override UserRequest OnExecuteCreate(DataContext context, Guid id)
{
// Get model object
return context.UserRequests.Find(id);
}
protected override StateBase OnExecuteRollback(DataContext context, Guid id, UserRequest entity)
{
// Rollback any changes possibly made in the OnExecuteCommit method and attempt recovery,
// if possible, in this method. For this example, we will just return the current state.
return this;
}
}
</pre>
<p>
We need to also make our state context concrete with the type below. This tends to have more wiring since type per state doesn't really translate well in an ORM. This class could be greater simplified with attributes on the state types, designating the enumeration value they map to.
</p>
<pre class="brush: csharp">
public class UserRequestContext
: StateContextBase
{
private static Dictionary<Type, UserRequestState> typeToDbState;
private static bool databaseRead = false;
public Guid Id { get; private set; }
static UserRequestContext()
{
databaseRead = false;
typeToDbState = new Dictionary<Type, UserRequestState>()
{
{ typeof(ReceiveState), UserRequestState.Receive },
{ typeof(CreateResponseState), UserRequestState.CreateResponse},
{ typeof(SendResponse), UserRequestState.SendResponse},
{ typeof(ResponseSent), UserRequestState.ResponseSent},
{ typeof(FaultedState), UserRequestState.Faulted },
};
}
public UserRequestContext(Guid id)
: base(null)
{
this.Id = id;
}
public static IEnumerable<Guid> GetRunningIds()
{
if (UserRequestContext.databaseRead)
{
var ids = Enumerable.Empty<Guid>(); // Get from message queue.
return ids;
}
else
{
using (var dataContext = new DataContext())
{
var ids = dataContext.UserRequests
.Where(u =>
u.State != UserRequestState.ResponseSent &&
u.State != UserRequestState.Faulted)
.Select(u => u.Id)
.ToArray(); // Force evaluation.
UserRequestContext.databaseRead = true;
return ids;
}
}
}
protected override bool OnIsRunning(StateBase state)
{
return !(state is CompleteState);
}
protected override StateBase OnCreate()
{
using (var dataContext = new DataContext())
{
// Maps persisted state enumeration to runtime types.
var entity = dataContext.UserRequests.Find(this.Id);
switch (entity.State)
{
case UserRequestState.Receive:
return new ReceiveState(this.Id);
case UserRequestState.CreateResponse:
return new CreateResponseState(this.Id);
case UserRequestState.SendResponse:
return new SendResponseState(this.Id);
case UserRequestState.ResponseSent:
return new ResponseSentState(this.Id);
case UserRequestState.Faulted:
return new FaultedState(this.Id);
default:
throw new ArgumentOutOfRangeException();
}
}
}
protected override StateBase OnExecuted(StateBase nextState)
{
// Run any other deciding logic in here that is independent
// of the states themselves (eg. logging, perf counters).
return nextState;
}
}
</pre>
<p>
Finally, let's come full circle and show what the application entry point will look like once all is said and done.
</p>
<pre class="brush: csharp">
class Program
{
static void Main(string[] args)
{
var requests = Enumerable.Empty<StateContextBase>();
var running = true;
while (running)
{
requests = requests
.Where(r => r.IsRunning())
// Append new user requests found.
.Concat((IEnumerable<StateContextBase>)UserRequestContext
.GetRunningIds()
.Select(i => new UserRequestContext(i)));
.Select(r => r.Execute());
}
}
}
</pre>
<h3>Ahhhh Yeah...</h3>
<p>
After fleshing it all out, I really got that satisfying feeling you get as a software engineer when you know that you made the right design decisions. I isolated my business logic into their own types (eg. ReceiveRequestState), separated it from the state machine transition logic, added symmetrical handling of persistence logic by layering it on top of the state type (PersistedStateBase) and contained the persistence-runtime bridge (from UserRequest to PersistedStateBase subtypes) into its own type (UserRequestContext). If I want to add more states, I can simply add to the model's state enumeration (UserRequest.State) and update the state context (UserRequestContext). If I want to change the transition logic, all I need to do is go to the concrete state type itself (eg. ReceiveRequestState) and feel confident that my variables are all scoped correctly. No coupling, no excessive mutations and no excessive side effects.
</p>
<h3>Using The Right Tool</h3>
<p>
This design pattern isn't for every state machine problem. In simple cases, it can definitely be overkill; you can see a bit of starter code is needed. Although, if you find yourself designing a state machine with multiple outbound transitions and final states, this could be the right modified pattern for you.
</p>
<h3>More Reading & Resources</h3>
<ul>
<li><a href="http://www.amazon.com/Design-Patterns-Object-Oriented-Addison-Wesley-Professional/dp/0201633612">Amazon: Design Patterns Book</a></li>
<li><a href="http://en.wikipedia.org/wiki/Design_Patterns">Wikipedia: Design Patterns (AKA Gang of Four Book)</a></li>
<li><a href="http://en.wikipedia.org/wiki/State_pattern">Wikipedia: State Pattern</a></li>
<li><a href="http://www.se-radio.net/2007/12/episode-81-interview-erich-gamma/">Software Engineering Radio - Episode 81: Interview with one of the Gang of Four.</a></li>
<li><a href="http://java.dzone.com/articles/design-patterns-state">Design Patterns Uncovered: The State Pattern</a></li>
<li><a href="http://sourcemaking.com/design_patterns/state">State Design Pattern: Code examples in other languages.</a></li>
<li><a href="http://www.tutorialspoint.com/design_pattern/state_pattern.htm">State Pattern: Really simple implementation in Java.</a></li>
</ul>awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.com0tag:blogger.com,1999:blog-6363087137667886940.post-39498938945378930822013-04-14T20:57:00.002-07:002013-04-14T20:57:11.382-07:00Understanding type theory (Part one: Lambda Calculus).<h3>Series Introduction</h3>
<p>I initially was going to put this together as a single post but quickly realized that it would be a bit much to try and put all of the information into one post. To make it more digestable, I decided to break it up into separate parts. This is by no means a comprehensive study of lambda calculus or type theory, but rather a quick introductory series so you can can get started. My goal is that, by the end, you will be able to understand enough to start reading papers on your own and have some cursory knowledge of type systems. Also, I am targeting programmers that want to get a better understanding of the theory behind type systems in compiler design. If you are comfortable programming in functional languages, then you will start to see many parallels quickly. Although, I will give examples using an imperative approach to cover programmers who may not immediately appreciate these parallels.</p>
<h3>Mathematical Formalisms</h3>
<p>The mathematical underpinnings for describing computability is mostly famously attributed to <a href="http://en.wikipedia.org/wiki/Kurt_G%C3%B6del">Kurt Gödel</a>, <a href="http://en.wikipedia.org/wiki/Alan_Turing">Alan Turing</a> and <a href="http://en.wikipedia.org/wiki/Alonzo_Church">Alonzo Church</a>. Church and Turing took different routes to <a href="http://en.wikipedia.org/wiki/Church%E2%80%93Turing_thesis">the same ending conclusion</a> (ie. <a href="http://plato.stanford.edu/entries/turing-machine/">universal Turing machine</a>, <a href="http://plato.stanford.edu/entries/lambda-calculus/"><img src="http://latex.codecogs.com/gif.latex?\inline \lambda"/>-calculus</a>). What is now known as the <a href="http://plato.stanford.edu/entries/church-turing/">Church-Turing thesis</a>. In this post we will focus strictly on Church's work, more specifically his notation. By understanding the lambda calculus syntax, we will start to grasp an idealized model of a programming language and what it means later for learning type systems.</p>
<h3>Notation & Basic Structure</h3>
<p>Let's look at a function that returns the square of its input:</p>
<div align="center"><img src="http://latex.codecogs.com/gif.latex?f(x) = x^2" title="Squared Function" /></div>
<div align="center"><code>int f(int x) { return x * x; }</code></div>
<p>In lambda calculus, we can represent it as:</p>
<div align="center"><img src="http://latex.codecogs.com/gif.latex?\lambda x.*x\;x" title="Squared Function" /></div>
<p>Let's break this down into its parts and start defining those pieces.</p>
<ul>
<li><b>Term</b>: Since x is a variable, it can be considered a term and the whole expression itself is also a term (specifically a lambda abstraction).</li>
<li><b>Bound Variable</b>: Any variable in the term that is a parameter of the function. In this case it is x.</li>
<li><b>Free Variable</b>: The term above has none since any argument to the term would be applied to all variables in it. If the term were to be <img src="http://latex.codecogs.com/gif.latex?\inline \lambda x.*x\;y"/> instead, then y would be our free variable. We will give another example that demonstrates free variables later.</li>
<li><b>Abstraction Symbol</b>: The <img src="http://latex.codecogs.com/gif.latex?\inline \lambda"/> symbol indicates the beginning of the function, followed by the parameter.</li>
<li><b>Function Body</b>: Everything following the dot is part of the function body (ie. <img src="http://latex.codecogs.com/gif.latex?\inline *\;x\;x"/>).</li>
</ul>
<br/>
<h3>Binding & Applying Terms</h3>
<p>Now the switch from our named function (f) to the lambda term was not entirely without loss. The <img src="http://latex.codecogs.com/gif.latex?\inline \lambda"/> symbol is not the function name but rather indicates that the following symbol (x) is a function parameter, just like the period represents the beginning of the function body. This is where the term <a href="http://en.wikipedia.org/wiki/Anonymous_function">anonymous function</a> comes from. We took a named function (f) and anonymized it. Now let's try defining the function without loss of information.</p>
<div align="center"><img src="http://latex.codecogs.com/gif.latex?f \equiv \lambda x.*x\;x" title="Squared Function" /></div>
<p>It can sometimes be taken for granted in programming that a function name is just another type of variable. The above example makes that pretty clear. Much like a program function name directly references the code it needs to execute, we bound the function to a variable (f) that directly references the term. Although the lowercase letter name isn't considered good convention in lambda calculus. Terms are normally bound to a capitalized letter in these situations; it would be better form to call it F instead.</p>
<div align="center"><img src="http://latex.codecogs.com/gif.latex?F \equiv \lambda x.*x\;x" title="Squared Function" /></div>
<p>That's better. What about function pointers? How would they be expressed in the lambda calculus? Let's define some function pointer (G), our previous function (F) and bind them in the C language:</p>
<pre class="brush:c">
int (*G)(int);
int F(int x) { return x * x; }
int main(void)
{
G = &F;
(*G)(2);
}
</pre>
<p>This is generally equivalent to:</p>
<div align="center"><img src="http://latex.codecogs.com/gif.latex?G \equiv \lambda g.g\;\land\;F \equiv \lambda x.*x\;x\;:\;GF2 = 4" /></div>
<p>I could have written our <img src="http://latex.codecogs.com/gif.latex?GF2" /> term as <img src="http://latex.codecogs.com/gif.latex?\inline G(F(2))" /> but this is implicit in the lambda calculus. Terms are always <a href="http://en.wikipedia.org/wiki/Associativity">applied from left to right</a>; first F is applied to G and then 2 is applied to that. Having these semantics eliminates the need for excessive parentheses. Now let's break down the application of the terms in steps.</p>
<div align="center"><img src="http://latex.codecogs.com/gif.latex?
\begin{align*}
GF2&\;\;\;\text{Given}\\
\lambda g.g(\lambda x.*x\;x)2&\;\;\;\text{Resolve F and G}\\
(\lambda x.*x\;x)2&\;\;\;\text{Apply F to G}\\
4&\;\;\;\text{Apply 2 to GF}
\end{align*}" /></div>
<p>The first line is like the declares in our C program (line 1 and 2). The second line is akin to when we assign the address of the function F to G (line 6) and dereference G below it (line 7). The third line is when we apply the value 2 to the function (line 7).</p>
<h3>Multiple Parameters</h3>
<p>You may have noticed that all of our terms only accept a single parameter. This is called the <a href="http://en.wikipedia.org/wiki/Currying">curried form</a> or <a href="http://en.wikipedia.org/wiki/Moses_Sch%C3%B6nfinkel">Schönfinkeled</a> form if you want to get all crazy about it. You may have heard this mentioned before when people talk about <a href="http://en.wikipedia.org/wiki/Closure_(computer_science)">closures</a>. Since we can't demonstrate currying in C (nested functions aren't supported), let's do it in Python:</p>
<pre class="brush:python">
# uncurried form
def f(x, y):
return x
# curried form using named functions
def g(x):
def h(y):
return x
return h
# curried form using lambda functions
h = lambda x: lambda y: x
f(2,1)
g(2)(1)
h(2)(1)
</pre>
<p>I took a lot of liberties comparing the lambda terms earlier against the C program and is why I wrote that they were generally equivalent. You may have noticed how line 12 looks almost exactly like the actual lambda calculus notation form of <img src="http://latex.codecogs.com/gif.latex?\inline \lambda x.\lambda y.x"/>. This is why programmers who know functional programming have a much easier time learning lambda calculus and type systems. In Python, the <a href="http://pythonconquerstheuniverse.wordpress.com/2009/10/03/static-vs-dynamic-typing-of-programming-languages/">type system is dynamic</a> in the sense that a single variable can be rebound to a different type during its lifetime. Also, lambda functions are explicitly available, where they were not in C. These kind of features are what differentiates different languages in terms of expressibility with respect to the lambda calculus.</p>
<h3>Beta Reductions</h3>
<p>You might not have known it, but by applying different terms, you were performing what is called a beta-reduction. This is the process of normalizing to its beta-redex term form. So in the case of FG, the beta-redex term is <img src="http://latex.codecogs.com/gif.latex?\inline \lambda x.*x\;x"/>. Let's try something with more terms and reduce it.</p>
<div align="center"><img src="http://latex.codecogs.com/gif.latex?
\begin{align*}
&(\lambda w.w)((\lambda x.xx)(\lambda y.y))\\
&(\lambda w.w)((\lambda x.xx)[x := \lambda y.y])\\
&(\lambda w.w)((\lambda y.y)(\lambda y.y))\\
&(\lambda w.w)((\lambda y.y)[y := \lambda y.y])\\
&(\lambda w.w)\lambda y.y\\
&(\lambda w.w)[w := \lambda y.y]\\
&\lambda y.y\\
\end{align*}" /></div>
<p>That normalized nicely. Let's do another one.</p>
<div align="center"><img src="http://latex.codecogs.com/gif.latex?
\begin{align*}
&(\lambda x.xx)(\lambda y.yy)\\
&(\lambda x.xx)[x := \lambda y.yy]\\
&(\lambda y.yy)(\lambda y.yy)\\
&(\lambda y.yy)[y := \lambda y.yy]\\
&(\lambda y.yy)(\lambda y.yy)\\
&(\lambda y.yy)[y := \lambda y.yy]\\
&(\lambda y.yy)(\lambda y.yy)\\
&\ldots\\
\end{align*}" /></div>
<p>This is a case where we could keep applying the terms ad-nauseum but that isn't needed because the repeating term is <img src="http://latex.codecogs.com/gif.latex?\inline (\lambda y.yy)(\lambda y.yy)"/>, which is also our beta-redex term. So there is no need to go any further. Now let's do another example using free variables.</p>
<div align="center"><img src="http://latex.codecogs.com/gif.latex?
\begin{align*}
&(\lambda x.x)(\lambda z.zz)(\lambda w.t)5\\
&(\lambda x.x)[x := \lambda z.zz](\lambda w.t)5\\
&(\lambda z.zz)(\lambda w.t)5\\
&(\lambda z.zz)[z := \lambda w.t]5\\
&(\lambda w.t)(\lambda w.t)5\\
&(\lambda w.t)[w := \lambda w.t]5\\
&(\lambda w.t)5\\
&(\lambda w.t)[w := 5]\\
&t
\end{align*}" /></div>
<p>I mentioned at the beginning I would give an example using free variables and here it is (ie. t). So what happened to 5? Well it was bound to w and w is not in the function body of the last lambda term (<img src="http://latex.codecogs.com/gif.latex?\inline \lambda w.t"/>), so it didn't survive. Although t did and that is because t is not bound to anything, thus making it a free variable.</p>
<p>Beta reductions are a lot like refactoring some complex code down to its simpler and functionally equivalent form.</p>
<h3>Untyped Lambda Calculus</h3>
<p>So far we haven't really considered types. In the examples where multiplication was required, we went beyond the simple untyped system to demonstrate the similarities of lambda calculus with other more familiar systems of computation. In the untyped system, the only type is the <a href="http://en.wikipedia.org/wiki/Function_type">function type</a>. Our previous example could still be considered in the untyped system if we just interpret 5 as a free variable; attaching no meaning to the variable itself.</p>
<p>As developers, our building blocks are binary logical operations. Let's start by defining some of these essentials in the lambda calculus.</p>
<p>Let T be true and F be false:</p>
<div align="center"><img src="http://latex.codecogs.com/gif.latex?
\begin{center}
&T \equiv \lambda x.\lambda y.x\\
&F \equiv \lambda x.\lambda y.y\\
\end{center}" /></div>
<p>Now let's define some binary operations:</p>
<div align="center"><img src="http://latex.codecogs.com/gif.latex?
\begin{align*}
&AND \equiv \lambda x.\lambda y.xyF\\
&OR \equiv \lambda x.\lambda y.xTy\\
&NOT \equiv \lambda x.xFT\\
\end{align*}" /></div>
<p>To convince ourselves that these will give the desired results, let's start with AND:</p>
<div align="center"><img src="http://latex.codecogs.com/gif.latex?
\begin{align*}
&AND\;T\;F\\
&(\lambda x.\lambda y.xyF)TF\\
&(\lambda x.\lambda y.xyF)[x := T]F\\
&(\lambda y.TyF)F\\
&(\lambda y.TyF)[x := F]\\
&TFF\\
&(\lambda x.\lambda y.x)FF\\
&(\lambda x.\lambda y.x)[x := F]F\\
&(\lambda x.\lambda y.F)F\\
&(\lambda x.\lambda y.F)[x := F]\\
&\lambda y.F\\
&F
\end{align*}" /></div>
<p>Then for good measure, let's do OR as well:</p>
<div align="center"><img src="http://latex.codecogs.com/gif.latex?
\begin{align*}
&OR\;T\;F\\
&(\lambda x.\lambda y.xTy)TF\\
&(\lambda x.\lambda y.xTy)[x := T]F\\
&(\lambda x.\lambda y.TTy)F\\
&(\lambda x.\lambda y.TTy)[y := F]\\
&TTF\\
&(\lambda x.\lambda y.x)TF\\
&(\lambda x.\lambda y.x)[x := T]F\\
&(\lambda y.T)F\\
&(\lambda y.T)[y := F]\\
&T\\
\end{align*}" /></div>
<p>Learning the untyped variant of lambda calculus is a good starting place, because we don't place any domain restrictions on our functions and it is fairly simple to get started defining particular behaviors.</p>
<h3>Coming up next</h3>
<p>Part two will cover type systems and get you familiar with the syntax. If you want to learn lambda calculus in more depth or go through some exercises, check out the links below. Also, please feel free to post comments with questions, suggestions or corrections.</p>
<h3>More Reading & Resources</h3>
<ul>
<li><a href="http://www.cse.iitb.ac.in/~rkj/lambda.pdf">Notes on Untyped Lambda Calculus</a>
<li><a href="http://www.cs.man.ac.uk/~hsimmons/BOOKS/lcalculus.pdf">An Introduction to Lambda Calculi and Arithmetic</a></li>
<li><a href="http://www.cs.bham.ac.uk/~axj/pub/papers/lambda-calculus.pdf">A short introduction to the Lambda Calculus</a></li>
<li><a href="http://www.mscs.dal.ca/~selinger/papers/papers/lambdanotes.pdf">Lectures Notes on the Lambda Calculus</a></li>
<li><a href="http://www.inf.fu-berlin.de/lehre/WS03/alpi/lambda.pdf">A Tutorial Introduction to the Lambda Calculus</a></li>
<li><a href="http://classes.soe.ucsc.edu/cmps112/Spring03/readings/lambdacalculus/project3.html">Lambda Calculus Tutorial</a></li>
<li><a href="http://www.cs.ru.nl/~herman/PUBS/IntroTT-improved.pdf">An Introduction to Type Theory</a></li>
<li><a href="http://events.math.unipd.it/3wftop/pdf/coquandintro.pdf">A Short Introduction to Type Theory</a></li>
</ul>awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.com0tag:blogger.com,1999:blog-6363087137667886940.post-85602263308208453552013-04-10T18:46:00.000-07:002016-03-30T23:52:15.452-07:00PCAP-NG reader for .NET.<h3>Introduction & Scope</h3>
<p>
The <a hre="http://www.winpcap.org/ntar/draft/PCAP-DumpFileFormat.html">PCAP-NG</a> file format is used to store network captured data used by great programs like <a href="http://www.wireshark.org/download.html">Wireshark</a>. There are several API's available for both <a href="http://www.tcpdump.org/">Linux</a> and <a href="http://www.winpcap.org/">Windows</a> but none that are ported to .NET. I decided to <a href="https://github.com/awalsh128/PcapngFile/">make one</a> for my own use in building a network traffic replay agent.
</p>
<p>
This code demonstrates an approach to reading in PCAP-NG files. Most of the time I was concerned with the <a href="http://www.winpcap.org/ntar/draft/PCAP-DumpFileFormat.html#sectionepb">enhanced packet block type</a> since it stored the TCP payloads I wanted to replay.
</p>
<h3>Design Considerations</h3>
<p>
All block types derive from a common abstract class (BlockBase). This is useful when you need a collection of blocks with different subtypes, as in the second code segments below. Internally, all classes populate themselves from the underlying binary reader.
</p>
<p>
The reader class can be used to iterate over a single block type.
</p>
<pre class="brush: csharp">
var reader = new PcapngFile.Reader("test.pcapng");
foreach (var packet in reader.EnhancedPackets)
{
byte[] payload = packet.Data;
}
</pre>
<p>
... or all of them, cast into their parent type.
</p>
<pre class="brush: csharp">
var blocks = new List<PcapngFile.BlockBase>();
while (reader.PeekStoreType() != PcapngFile.BlockType.None)
{
blocks.Add(reader.ReadBlock());
}
</pre>
<h3>The Code and NuGet Package</h3>
<p>
Get access to the complete project <a href="https://github.com/awalsh128/PcapngFile/">here</a> on my GitHub account. If you have any suggestions or comments, feel free to leave a comment. I also have the assembly available via NuGet <a href="https://www.nuget.org/packages/PcapngFile/1.0.1">here</a>.
</p>
<h3>More Reading & Resources</h3>
<ul>
<li><a href="http://www.winpcap.org/ntar/draft/PCAP-DumpFileFormat.html">WinPcap: PCAP-NG File Format</a></li>
<li><a href="http://wiki.wireshark.org/Development/PcapNg">Wireshark: Development - PcapNg</a></li>
<li><a href="http://pcapdotnet.codeplex.com">Pcap.Net: Useful for reading transport payloads.</a></li>
</ul>awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.com21tag:blogger.com,1999:blog-6363087137667886940.post-25414590595816637812013-03-26T21:21:00.003-07:002013-03-26T21:21:59.669-07:00Automated combinatorial and spot testing of web service operations.<h3>Introduction & Scope</h3>
<p>When developing a new web service, you may need to do an initial spot test walk through the operations or try to break the service based on the permissible data type values. You could hand craft all the operations, with their respective arguments, into a handful of URLs but that would seem pretty inefficient. I found myself in a similar situation and wanted a programmatic way to walk through the service operations with a specific input data set. To be clear, I am not attempting to recreate a <a href="http://www.soapui.org/">full web service testing application</a> but rather a simple script to spot check responses and catch any bugs in a web service endpoint's handler.</p>
<h3>Investigating the WSDL</h3>
<p>We can automatically flesh out our operations and the message data types by leveraging the endpoint's <a href="http://www.w3.org/TR/wsdl">WSDL</a>. In my case, I was specifically interested in the <a href="http://www.w3.org/TR/wsdl#_request-response">request/response operations</a> available under the <a href="http://www.w3.org/TR/wsdl#_http">HTTP GET verb</a>. We can get the parameter names and argument types by looking at the <a href="http://www.w3.org/TR/wsdl#_types">types section</a> and then the actual operations under a WSDL binding that includes an HTTP binding element with the verb attribute value of GET.</p>
<p>Fortunately, I am dealing with input messages using just the <a href="http://www.w3.org/TR/xmlschema-2/#built-in-datatypes">built in data type schema</a> provided by the WSDL specification. This means that no complex types are involved and the testing input domain is somewhat simplified. Also, most of the web service operations use a standard set of parameter names such that I am able to create a smaller data set of inputs based around them.</p>
<h3>The Code</h3>
<p>
Your modifications would be to the test_args dictionary. You can specify a wide range of different arguments to feed your operations by parameter names. The service I was testing against used many of the same parameter names and corresponding domain.
</p>
<script src="http://gist-it.appspot.com/github/awalsh128/webservices/raw/master/wsdltester.py"></script>
<h3>Conclusion</h3>
<p>Thrown together, the script provides an easy way to spot test a web service by automatically processing the WSDL for a particular binding and running those bound operations against a provided set of input messages for it.</p>
<h3>More Reading & Resources</h3>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Web_Services_Description_Language">Wikipedia: Web Services Description Language</a></li>
<li><a href="http://msdn.microsoft.com/en-us/library/ms996486.aspx">MSDN: Understanding WSDL</a></li>
</ul>awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.com0tag:blogger.com,1999:blog-6363087137667886940.post-40389236177504480402013-01-04T19:11:00.002-08:002013-01-04T19:11:30.599-08:00Text generation using Markov chains.<h3>Introduction</h3>
<p>
If you are already familiar with <a href="http://en.wikipedia.org/wiki/Finite-state_machine">finite automata</a>, then understanding a <a href="http://en.wikipedia.org/wiki/Markov_chain">Markov chain</a> will not be as hard. If you are not familiar with one, you can think of finite automata as a <a href="http://en.wikipedia.org/wiki/Flowchart">flowchart</a>, with states and paths leading from one state to another <a href="http://xkcd.com/627/">based on some decision being made</a>. An example could be a room with a light switch; there are the two states of having the light on or off. Using the light switch will transition you between the two states.
</p>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjob7cJLXQUSu6SEsviEvGDNGNh1wLlvOSJSQvs1knojszaZ9zoInAo8s12PO9ijRBeTiKMqGZdfun9kvLDhQbIHtNpzPZSBvr0eHf4B0hdTEQjtisaKYPHGvik1DWESb-3tUX6nUn4IDzc/s1600/simple-automata.png" imageanchor="1" style="margin-left:1em; margin-right:1em"><img border="0" height="183" width="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjob7cJLXQUSu6SEsviEvGDNGNh1wLlvOSJSQvs1knojszaZ9zoInAo8s12PO9ijRBeTiKMqGZdfun9kvLDhQbIHtNpzPZSBvr0eHf4B0hdTEQjtisaKYPHGvik1DWESb-3tUX6nUn4IDzc/s400/simple-automata.png" /></a></div>
<p>
The difference between standard finite automata and Markov chains is that instead of having some definite outcome determining each path (eg. yes or no), they are probabilities. Take for example a coin flip, there is a 50% (0.5) chance that it could either be heads or tails.
</p>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEghVPoQLrPaQMzAN0VeE0DAdDiv50UmruDtORrUY5gqc7qrNuzKXUlJEPLMfVwuvTSHM7GrME5vf-NmAk9gm_FdNFcN-NRcsCKQMulbZPi2ZVPJJuhiA4UX8tbfTKTCRqbdhuBIREBJMLRN/s1600/simple-markov.png" imageanchor="1" style="margin-left:1em; margin-right:1em"><img border="0" height="259" width="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEghVPoQLrPaQMzAN0VeE0DAdDiv50UmruDtORrUY5gqc7qrNuzKXUlJEPLMfVwuvTSHM7GrME5vf-NmAk9gm_FdNFcN-NRcsCKQMulbZPi2ZVPJJuhiA4UX8tbfTKTCRqbdhuBIREBJMLRN/s400/simple-markov.png" /></a></div>
<h3>An Example</h3>
<p>
Let's explore Markov chains a little deeper with respect to natural languages and take a look at some sample text that will form the <a href="http://en.wikipedia.org/wiki/Text_corpus">corpus</a> of our analysis.
</p>
<q>I like turtles. I like rabbits. I don't like snails.</q>
<p>
This would render a Markov chain as follows:
</p>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiKwthnMrAM9Ij34-X5ObSWq3b6kwuJH_EN7rw90poZTLp3NtN3GcyYE_rC2hQbZcSp_BRiaBltMvAPhRbfcJebqOXmOtkhSiP_bDDdXBSV3Y9Xu5T113fYJFxoU_HUXN9v-MRoF-s90ZJz/s1600/text-markov.png" imageanchor="1" style="margin-left:1em; margin-right:1em"><img border="0" height="345" width="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiKwthnMrAM9Ij34-X5ObSWq3b6kwuJH_EN7rw90poZTLp3NtN3GcyYE_rC2hQbZcSp_BRiaBltMvAPhRbfcJebqOXmOtkhSiP_bDDdXBSV3Y9Xu5T113fYJFxoU_HUXN9v-MRoF-s90ZJz/s400/text-markov.png" /></a></div>
<p>The above chain means that, among the sentences:</p>
<ul>
<li>100% (1.0) will start with <i>I</i>.</li>
<li>This will be followed by <i>like</i> for 66% (0.66) of the time and <i>don't</i> for 33% (0.33) of the time.</li>
<li>The word <i>don't</i> will always (100% / 1.0) be followed by <i>like</i>.</li>
<li>Then lastly, <i>turtles</i>, <i>rabbits</i> and <i>snails</i> will all follow <i>like</i> 33% (0.33) of the time.</li>
</ul>
<p>
The nice thing about Markov chains, with respect to natural language modelling, is that they form <a href="http://en.wikipedia.org/wiki/Dependency_grammar">word dependencies</a> without any knowledge of the language's syntax or semantics. The chain gets created purely based on statistical knowledge extracted from the corpus.
</p>
<h3>Text Generation</h3>
<h2>Random Selection</h2>
<p>
Using the Markov chain example from before and taking the weight of probabilities into account, we can randomly traverse the chain to generate new sentences like:
</p>
<q>I don't like turtles. I like snails.</q>
<p>
Although, our small corpus does not make for very interesting or new text. We could make it more interesting by adding to it or switching to a much larger corpus altogether. In my case, I decided to take all of <a href="https://raw.github.com/awalsh128/nlp/master/sonnets.txt">Shakespear's sonnets</a> and generate the text from that. You can see some example code that uses it as the corpus below.
</p>
<h2>The Code</h2>
<p>
The Markov chains will be built up for each new sentence it comes across. The critical parts to the <code>MarkovChain</code> object are its:
</p>
<ul>
<li><b>Cardinalities</b>: The cardinality of the sample space for a given state. In our original example, the cardinality of <i>like</i> is 3 since snails, rabbits and turtles follow out of the <i>like</i> state. The cardinality of <i>I</i> would be 2 since <i>don't</i> and <i>like</i> follow.</li>
<li><b>Occurrences</b>: The edge valued occurrences between the different states. For example, a transition from the <i>I</i> to <i>like</i> state occurred twice. The probability can then be easily calculated by taking the number of occurrences on a transition and divide it by the cardinality of the state's sample space. Note, there is also a special start state transition that can be used at the beginning of each new sentence generation.</li>
</ul>
<p>
Using these two parts, a random walk can be made across a chain path, starting from the special start state. Each walk will generate a new sentence based on probabilities of the transitions along the path.
</p>
<br/>
<script src="http://gist-it.appspot.com/github/awalsh128/nlp/raw/master/generation.py"></script>
<script src="http://gist-it.appspot.com/github/awalsh128/nlp/raw/master/test.py"></script>
<br/>
<h3>Additional Resources</h3>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Stochastic_grammar">Wikipedia: Stochastic Grammar</a></li>
<li><a href="http://en.wikipedia.org/wiki/Natural_language_generation">Wikipedia: Natural language generation</a></li>
<li><a href="http://www.cs.princeton.edu/courses/archive/spr05/cos126/assignments/markov.html">Princeton CS206: Markov Model of Natural Language</a></li>
<li><a href="http://www.owlnet.rice.edu/~cz1/prog/markov/markov.html">Markov-chain Text Generator</a></li>
<li><a href="http://www.siggen.org/">ACL Special Interest Group on Natural Language Generation (SIGGEN)</a></li>
</ul>awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.com0tag:blogger.com,1999:blog-6363087137667886940.post-45417423112867595772012-10-03T13:40:00.000-07:002012-10-03T13:42:20.332-07:00Reading and writing large binary data in T-SQL and ADO.NET.<h3>Introduction</h3>
<p>
In some cases, you may want to read/write binary data from/to your relational database. This can be a problem when the data set size exceeds the memory address size (eg. 32-bit processes) or the <a href="http://msdn.microsoft.com/en-us/magazine/cc534993.aspx">managed large object heap</a> cannot load it all without throwing an <a href="http://msdn.microsoft.com/en-us/library/system.outofmemoryexception.aspx">OutOfMemoryException</a>. In those cases, you will need to load the data into memory in chunks. Fortunately, the <a href="http://en.wikipedia.org/wiki/Transact-SQL">T-SQL</a> syntax supports this operation.
</p>
<h3>Latest Standard</h3>
<p>
The old standard for reading and writing raw text data was by using the <a href="http://msdn.microsoft.com/en-us/library/ms189466.aspx">UPDATETEXT</a>,
<a href="http://msdn.microsoft.com/en-us/library/ms187365.aspx">READTEXT</a> and
<a href="http://msdn.microsoft.com/en-us/library/ms176068.aspx">TEXTPTR</a> commands. Although, these are being obsoleted for the newer command <a href="http://msdn.microsoft.com/en-us/library/ms187748.aspx">SUBSTRING</a> and the addition of the <code>.WRITE</code> argument to the <a href="http://msdn.microsoft.com/en-us/library/ms177523.aspx">UPDATE</a> command. I found the newer syntax much easier to use and requiring less parameters. Below is an example schema and implementation.
</p>
<h3>Example Schema</h3>
<p>
For this example, we will use a very simple table setup with a primary key and binary data column.
</p>
<pre class="brush: sql">
CREATE TABLE [dbo].[DataTable] (
[Id] [int] IDENTITY(1,1) NOT NULL,
[Data] [varbinary](max) NOT NULL
)
</pre>
<h3>Example Code</h3>
<p>
The <code>Read</code> method simply provides an offset into the data and the size of the chunk it wishes to read. The same applies for the <code>Write</code> method, except that a data argument is also supplied.
</p>
<pre class="brush: csharp">
private const int ReadChunkSize = 1 << 21; // 2MB Chunks
private const int WriteChunkSize = 1 << 21; // 2MB Chunks
...
public static void Read(string connectionString, string readPath)
{
using (var connection = new SqlConnection(connectionString))
{
int offsetIndex = 0;
var buffer = new byte[1];
var offset = new SqlParameter("@Offset", SqlDbType.Int, 32);
var command = new SqlCommand(@"SELECT SUBSTRING([Data], @Offset, " + ReadChunkSize + ") FROM [dbo].[DataTable] WHERE [Id] = 1", connection);
command.Parameters.Add(offset);
connection.Open();
using (var stream = File.Create(readPath))
{
using (var writer = new BinaryWriter(stream))
{
while (buffer.Length > 0)
{
offset.Value = offsetIndex;
buffer = (byte[])command.ExecuteScalar();
if (buffer.Length > 0)
{
writer.Write(buffer);
offsetIndex += ReadChunkSize;
}
}
}
}
}
}
public static void Write(string connectionString, string writePath)
{
using (var connection = new SqlConnection(connectionString))
{
int offsetIndex = 0;
var buffer = new byte[1];
var offset = new SqlParameter("@Offset", SqlDbType.Int, 32);
var data = new SqlParameter("@Data", SqlDbType.Binary, WriteChunkSize);
var command = new SqlCommand(@"UPDATE [dbo].[DataTable] SET [Data] .WRITE(@Data, @Offset, " + WriteChunkSize + ") WHERE [Id] = 1", connection);
command.Parameters.Add(offset);
command.Parameters.Add(data);
connection.Open();
using (var stream = File.OpenRead(writePath))
{
using (var reader = new BinaryReader(stream))
{
while (buffer.Length > 0)
{
buffer = reader.ReadBytes(WriteChunkSize);
if (buffer.Length > 0)
{
offset.Value = offsetIndex;
data.Value = buffer;
command.ExecuteNonQuery();
offsetIndex += WriteChunkSize;
}
}
}
}
}
}
</pre>
<h3>Performance Considerations</h3>
<p>
When benchmarking my code against the simple database schema, execution time was drastically different if the data was already read into memory. When a read was performed after a write, it took 2 seconds. A read without a write beforehand took about 10 seconds. This is also the same time that the write took before a read. The chunk size did not seems to have any significant impact when adjusted between 512K and 4MB sizes. Your results may vary though depending on your system's memory layout.
</p>
<p>
These time benchmarks are very informal and highly dependent on system architecture, network latency and hardware specifications. Although, the relative performance gain between the two benchmarks can be gleaned from these tests.
</p>
<p>
A further performance gain would be to front load the data into memory on a separate thread and process the loaded data into and out of the database on another. This would require some tweaking and knowledge of the system's memory constraints to ensure it doesn't become a performance bottleneck. If properly done, a lot can be gained from not having to block on I/O between every chunk read and write.
</p>awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.com0tag:blogger.com,1999:blog-6363087137667886940.post-65818605485018597952012-09-20T00:29:00.000-07:002012-10-03T13:45:24.158-07:00Understanding compression and Huffman Coding.<h3>Definitions</h3>
<p>
In a lot of texts regarding compression you will run across various conventions of syntax to describe the coding of data. Much of them are used in fields like <a href="http://en.wikipedia.org/wiki/Automata_theory">automata theory</a>, group theory and statistics. Although, the field of <a href="http://en.wikipedia.org/wiki/Coding_theory">coding theory</a> itself has <a href="http://en.wikipedia.org/wiki/Timeline_of_information_theory">a long and rich history</a> in the 20st century; much of the terminology is borrowed from fields that gave rise to it.
</p>
<p>
I find the best way to demonstrate the following definitions is by example. Let us take the text "abbcc" as a sample.
</p>
<ul>
<li style="margin-bottom:10px;">
<b>Alphabet</b>: The set of symbols that we will be encoding.
<ul>
<li>Form: <img src="http://latex.codecogs.com/gif.latex?\small \inline \sum = \{ a_1, a_2, ..., a_n\}" title="Sigma" /></li>
<li>Example: <img src="http://latex.codecogs.com/gif.latex?\small \inline \sum = \{a_1, a_2, a_3\}" title="Alphabet Example" /> where <img src="http://latex.codecogs.com/gif.latex?\small \inline a_1 = \mathrm{a}, a_2 = \mathrm{b}, a_3 = \mathrm{c}" title="Alphabet Example" /></li>
</ul>
</li>
<li style="margin-bottom:10px;">
<b>String</b>: A permutation of symbols from our alphabet. From our example alphabet, strings like "a", "aba", "abc" or "cccb" could all be possible. Using regular expression syntax, we can describe this in our example.
<ul>
<li>Form: <img src="http://latex.codecogs.com/gif.latex?\small \inline w" title="String" /></li>
<li>Example: <img src="http://latex.codecogs.com/gif.latex?\small \inline w = \{a, b, c\}^*" title="String Example" /></li>
</ul>
</li>
<li style="margin-bottom:10px;">
<b>Probabilities</b>: The set of probabilities for the alphabet. This is also known as their weights and can sometimes be denoted as w instead of p.
<ul>
<li>Form: <img src="http://latex.codecogs.com/gif.latex?\small \inline P = \{ p_1, p_2, ..., p_n\}" title="Probability" /></li>
<li>Example: <img src="http://latex.codecogs.com/gif.latex?\small \inline P = \{p_1, p_2, p_3\}" title="Weights Example" /> where <img src="http://latex.codecogs.com/gif.latex?\small \inline p_1 = \frac{1}{5}, p_2 = \frac{2}{5}, p_3 = \frac{2}{5}" title="Probabilities Example" /> since 'a' occurs once, 'b' occurs twice and 'c' occurs twice out of 5 symbols all together.</li>
</ul>
</li>
<li style="margin-bottom:10px;">
<b>Codewords</b>: The set of codewords that we will create for every symbol in our alphabet. <br/>
<ul>
<li>Form: <img src="http://latex.codecogs.com/gif.latex?\small \inline C = \{ c_1, c_2, ..., c_n\}" title="Codeword" /></li>
<li>Example: <img src="http://latex.codecogs.com/gif.latex?\small \inline C = \{ c_1, c_2, c_3\}" title="Codeword Example" /> where <img src="http://latex.codecogs.com/gif.latex?\small \inline c_1 = 00, c_2 = 01, c_3 = 1" title="Codeword Example" /> for symbols <img src="http://latex.codecogs.com/gif.latex?\small \inline a_1, a_2 a_3" title="Codeword Example" /> respectively. Another way to look at it is that we are providing a <a href="http://en.wikipedia.org/wiki/Bijection">bijective</a> function: <img src="http://latex.codecogs.com/gif.latex?\small \inline f: \Sigma \rightarrow C" title="Symbol to Codeword Mapping" />.</li>
</ul>
</li>
</ul>
<h3>Code Width</h3>
<p>
Standard coding of information for most computers today is byte aligned. For example, in character codes there is <a href="http://en.wikipedia.org/wiki/ASCII">ASCII</a>, which codes the entire English alpha-numeric system in a single byte. While UTF-8 has proven to be a much better standard due to better localization support and variable width codes, ASCII serves our purposes for this discussion. ASCII is considered a fixed width coding scheme:
</p>
<div align="center">
<img src="http://latex.codecogs.com/gif.latex?\inline \large \Sigma = \{a-z,A-Z,0-9,...\},\;\; C = 0\{0,1\}^7,\;\;|C| = 2^7 = 128" title="ASCII Codeword" /></div>
<p>
When performance is at stake, it is best to code information atomically aligned to the computer architecture that it is being used for. This means allocating and defining information in terms of bytes (defined as 8-bits) for the majority of systems today. Although, if space is the priority, performance can take a hit <a href="http://awalsh128.blogspot.com/2012/09/implementing-bit-readerwriter-in-c.html">to shift around bits</a> along the byte boundaries and create variable width codes that do not byte align.
</p>
<div align="center">
<img src="http://latex.codecogs.com/gif.latex?\inline \large \Sigma = \{a-z,A-Z,0-9,...\},\;\; C = \{0,1\}^+,\;\;|C| \leq n" title="Variable Width Codewords" /></div>
<p>
<i>Note</i>: The ellipses in the alphabet denote the control characters (eg. CR, LB) and grammatical structure symbols (eg. #, $, %) contained in the ASCII code page. The alphabets will reflect this in the ellipses.
</p>
<h3>Data Compression</h3>
<h2>Variable Width Savings</h2>
<p>
Coming back to the example of ASCII, if we wish to encode a string, w, like "aaabb", it will take up 5 bytes. Although we only really need one bit to encode our information if we assign 0 to 'a' and 1 to 'b'.
</p>
<div align="center">
<img src="http://latex.codecogs.com/gif.latex?\inline \large \Sigma = \{a,b\} \rightarrow C = \{0,1\},\;\; w = aaabb,\;\; f(C,w) = 00011" title="String Example 1" /></div>
<p>
We just took a 5 byte ASCII encoded string and compressed it down to under a byte. Now let us take another string, w, and give it "aaabbcc". Three alphabet symbols need to be coded; it will take at least 2 bits. At first glance, this assignment would make sense:
</p>
<div align="center">
<img src="http://latex.codecogs.com/gif.latex?\inline \large \Sigma = \{a,b,c\} \rightarrow C = \{0,1,01\},\;\; w = aaabbcc,\;\; w(C) = 000110101" title="String Example 2" /></div>
<p>
Although this gives us an ambiguous string since it could either be "aaabbcc", "aaabbabab", "aacbcc" or any other permutation of our codeword set when we attempt to decompress it. The reason for this ambiguity is due to the fact that the codeword for 'a' (0), is used to start the codeword for 'c' (01).
</p>
<h2>Prefix Codes</h2>
<p>
The way to get around this ambiguity is to ensure that all of your codes are <a href="http://en.wikipedia.org/wiki/Prefix_code">prefix type</a>. Something like this would work:
</p>
<div align="center">
<img src="http://latex.codecogs.com/gif.latex?\inline \large \Sigma = \{a,b,c\} \rightarrow C = \{0,10,11\},\;\; w = aaabbcc,\;\; w(C) = 00010101111" title="String Example 2" /></div>
<p>
This ensures that we maintain the one-to-one mapping between our alphabet and codewords; we can translate alphabet symbols to codewords and codewords back to alphabet symbols without confusion.
</p>
<h3>Codeword Assignment</h3>
<p>
So how do you decide which symbols, get which codewords? Also, since the size of the codeword is critical to the compression ratio (average codeword length to average symbol length), designing an algorithm to weigh the more frequently used symbols is important. In our previous example, it would be poor design to give 'a' the longest codeword because it is the most used symbol in the string. This is where the Huffman Coding algorithm comes into play.
</p>
<h2>Huffman Coding Algorithm</h2>
<ol>
<li>Count the number of times each symbol occurs in the data source.</li>
<li>Put [symbol:count] pairing as nodes in a minimum priority queue based on the symbol's count.</li>
<li>Evaluate number of queue nodes.</li>
<ul>
<li>If queue has at least two nodes:
<ul>
<li>Dequeue the two nodes.</li>
<li>Create a new node with a null symbol and set the count equal to the sum of the two dequeued node's counts.</li>
<li>Attach the two nodes as children to the new node.</li>
<li>Put the newly created node into the queue.</li>
<li>Go to step 3.
</ul>
</li>
<li>If queue has only one node, dequeue and set as the root of the tree.</li>
</ul>
<li>Assign each left branch a 0 and each right branch a 1, or vice versa if you choose.</li>
<li>The symbol nodes will be the children of the tree and their codeword will be the bits that are traversed to get there.</li>
</ol>
<h2>Algorithm Example</h2>
<p>
Given our previous example string of "aaaabbcc", let us perform the algorithm on it:
</p>
<ol>
<li>Initial queue state with only symbol nodes in it.<br/><br/>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh7tpOs8bcvqeQykEMvXtJZceBayJaOUjIZS5FhHJZxAqQtkav9vS4XwgG3ok13wYoZE03EwifNngDIuObF3pJqXUS2E_0DE8S4OEP_Jh632TixwANdmqEaNsg5RO1J_8Y9uGFShtrkJOSf/s1600/step1.png" imageanchor="1" style=""><img border="0" height="53" width="267" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh7tpOs8bcvqeQykEMvXtJZceBayJaOUjIZS5FhHJZxAqQtkav9vS4XwgG3ok13wYoZE03EwifNngDIuObF3pJqXUS2E_0DE8S4OEP_Jh632TixwANdmqEaNsg5RO1J_8Y9uGFShtrkJOSf/s400/step1.png" /></a><br/><br/>
</li>
<li>New node created from first two symbol nodes and requeued.<br/><br/>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIZYIknfhnSeMg5XQp-fkSKOdfRlLC2gFT9zcpvYC0nKD9CBhSEbNqsokQbh8_FU9lL9euIaGZYFQAzeDn5_HfkEozcikYe9ES8xQccy1WPiAcSFFrRRgG6AMD_UIXuSUDpPHFpbfWyJiY/s1600/step2.png" imageanchor="1" style=""><img border="0" height="127" width="237" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIZYIknfhnSeMg5XQp-fkSKOdfRlLC2gFT9zcpvYC0nKD9CBhSEbNqsokQbh8_FU9lL9euIaGZYFQAzeDn5_HfkEozcikYe9ES8xQccy1WPiAcSFFrRRgG6AMD_UIXuSUDpPHFpbfWyJiY/s400/step2.png" /></a><br/><br/>
</li>
<li>New node created from non-symbol and symbol node, becoming the tree's root node.<br/><br/>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgjzll7ZrvzCJB6aB0tttyhp8DSjBuNeUxgAU5z8nifEmyBTCbxY_YeziQTXH5Ftsy0Kedt_Qweubgm5FaN5lr-eNStFqpnQNcbVLOCQOLDugoid9ftNjZI0MqgWjFGDl4qnyedWxBKg32I/s1600/step3.png" imageanchor="1" style=""><img border="0" height="204" width="280" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgjzll7ZrvzCJB6aB0tttyhp8DSjBuNeUxgAU5z8nifEmyBTCbxY_YeziQTXH5Ftsy0Kedt_Qweubgm5FaN5lr-eNStFqpnQNcbVLOCQOLDugoid9ftNjZI0MqgWjFGDl4qnyedWxBKg32I/s400/step3.png" /></a><br/><br/>
</li>
<li>The final alphabet to codeword mapping is:<br/><br/>
<img src="http://latex.codecogs.com/gif.latex?\inline \large \Sigma = \{a,b,c\} \rightarrow C = \{1,00,01\}" title="Algorithm Results" />
</li>
</ol>
<p>
If you need further examples, you can look <a href="http://en.wikipedia.org/wiki/File:Huffman_huff_demo.gif">here</a>, <a href="http://www.binaryessence.com/dct/en000080.htm">here</a> and <a href="http://en.nerdaholyc.com/huffman-coding-on-a-string/">here</a>. Once the you get a grasp of the algorithm, you will see the simplicity and beauty of it.
</p>
<h3>Working Code Example</h3>
<p>
If you want to see a naive implementation of the Huffman Coding algorithm, I posted some source code on <a href="https://github.com/awalsh128/huffmancoding">my GitHub account</a>. Below is the central code to the algorithm execution.
</p>
<script src="http://gist-it.appspot.com/github/awalsh128/huffmancoding/raw/master/huffmancoding.c"></script>
<h3>More Reading & Resources</h3>
<p>
If you are interested in coding algorithms, these links might be of interest to you.
</p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Arithmetic_coding">Wikipedia: Arithmetic Coding</a></li>
<li><a href="http://www.codeproject.com/Articles/45320/Arithmetic-Compression-With-C">Code Project: Arithmetic Compression With C#</a></li>
<li><a href="http://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch">LZW Compression</a></li>
<li><a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.93.9447&rep=rep1&type=pdf">Mohamed F. Mansour: Efficient Huffman Decoding With Table Lookup</a></li>
<li><a href="http://en.wikipedia.org/wiki/Golomb_coding">Wikipedia: Golomb-Rice Coding</a></li>
<li><a href="http://www.zlib.net">Zlib Homepage</a></li>
</ul>awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.com0tag:blogger.com,1999:blog-6363087137667886940.post-71631675506479331562012-09-18T10:03:00.000-07:002012-10-03T13:42:35.290-07:00IMAPI version 2 managed wrapper.<h3>Introduction</h3>
<p>
Microsoft <a href="http://msdn.microsoft.com/en-us/library/windows/desktop/aa366450(v=vs.85).aspx">introduced native image mastering</a> support in Vista, going forward. The library itself is not too complex but the latest version of the <a href="http://en.wikipedia.org/wiki/Image_Mastering_API">image mastering API (IMAPI)</a> has some quirks. This is well documented by <a href="http://www.codeproject.com/script/Membership/View.aspx?mid=208197">Eric Haddan</a> in his <a href="http://www.codeproject.com/Articles/24544/Burning-and-Erasing-CD-DVD-Blu-ray-Media-with-C-an">article</a> describing how to build an interop file and use it from .NET.
</p>
<h3>Implementation</h3>
<p>
I wanted to take it a step further and provide a library wrapper that was CLI compliant. The end result is a class that provides 3 methods, called in order, to burn an image. The project can be found on <a href="https://github.com/awalsh128/IMAPI2">my GitHub account</a>.
</p>
<h2>Properties</h2>
<ul>
<li><b>Media</b>: A read only property that populates after the LoadMedia call and reports what type of physical media is connected (eg. CD-R, DVD-R, DVD+RW, etc.).</li>
<li><b>MediaStates</b>: A read only collection of states in which the media is under, that populates after the LoadMedia call (eg. is blank, overwriteable, unsupported).</li>
<li><b>MediaCapacity</b>: A read only property that populates after the LoadMedia call and reports the size capacity of the media.</li>
<li><b>Nodes</b>: A root list of file and directory nodes that are to be written to the media.</li>
<li><b>Recorders</b>: A read only selectable collection. A recorder must be selected for the LoadRecorder call.</li>
<li><b>VolumeLabel</b>: A property that sets the volume label of the media to be written to.</li>
</ul>
<h2>Methods</h2>
<ul>
<li><b>LoadRecorder</b>: Verifies a recorder is selected, loads the recording drive resources and verifies that it is working.</li>
<li><b>LoadMedia</b>: Loads the media resource and verifies that it is valid for the recorder. An exception will occur if the media isn't supported (eg. closed session disc, no disc present).</li>
<li><b>FormatMedia</b>: Formats the media of any previous data.</li>
<li><b>WriteImage</b>: The final step to write your files to disc. This is a blocking call.</li>
</ul>
<h2>Events</h2>
<ul>
<li><b>FormatEraseUpdate</b>: Reports progress of FormatMedia call.</li>
<li><b>FormatWriteUpdate</b>: Reports progress of files being written to the media.</li>
<li><b>ImageUpdate</b>: Reports progress of files being added to the media image.</li>
</ul>
<h3>Caveats</h3>
<p>
I use this library for the simple single purpose of writing finalized images to a blank CD. It has not been fully tested to accomodate all media and recorder state scenarios; I haven't tried it yet with DVD's or multi-session discs. If you would like to extend the library, feel free to push some changes to <a href="https://github.com/awalsh128/IMAPI2">the GitHub project</a> and I will wrap them in.
</p>
<p>
I decided to leave the method calls as blocking (synchronous) since there could be many different synchronization models that may use this library. For my purposes, I used this in an WPF application that would call the WriteImage method from a <a href="http://msdn.microsoft.com/en-us/library/system.threading.tasks.task.aspx">Task</a> and then post back the progress events of the write using the <a href="http://msdn.microsoft.com/en-us/library/system.windows.threading.dispatcher.aspx">Dispatcher</a> from my WPF window. Below is a code snippet demonstrating this.
</p>
<pre class="brush:csharp">
private ImageMaster _burner;
...
private void _CreateCd()
{
// files added to _burner and _burner.WriteImage called
}
private void _burner_FormatWriteUpdate(IMAPI2.ImageMaster sender, IMAPI2.FormatWriteUpdateEventArgs e)
{
this.Dispatcher.Invoke(() => this.statusProgressBar.Value == (e.ElapsedTime / e.TotalTime) * 100);
}
private void CreateCdWindow_Loaded(object sender, System.Windows.RoutedEventArgs e)
{
System.Threading.Tasks.Task.Factory.StartNew(_CreateCd);
}
</pre>awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.com0tag:blogger.com,1999:blog-6363087137667886940.post-59006682523239380382012-09-13T16:30:00.000-07:002012-10-03T13:42:52.301-07:00Implementing a bit reader/writer in C.<h3>Introduction</h3>
<p>
I was working on implementing a simple compression tool based on naive <a href="http://en.wikipedia.org/wiki/Huffman_coding">Huffman coding</a> and initially assumed all of my <a href="http://en.wikipedia.org/wiki/Prefix_code">prefix codes</a> would be at or under a byte long. Although I failed to account for the <a href="http://www.arturocampos.com/ac_fib_huffman.html#Fibbonaci and Huffman">worst case of Huffman coding</a>, which can yield codes up to 255 bits in length.
</p>
<p>
This required a bit reading and writing function that would allow for variable length codes. I could have used other methods to balance worst case Huffman trees, but this issue presented a unique challenge. How to write a bit reader/writer in C.
</p>
<h3>Bit Reader/Writer Specifications</h3>
<ul>
<li>Read and write using reader and writer <a href="http://en.wikipedia.org/wiki/Abstract_data_type">ADT's</a>.</li>
<li>Handle any possible bit length given.</li>
<li>Remember bits read from file but are outside the requested bit length (eg. 2 bytes read in but only 12 bits requested, leaving 4 bits in the buffer).</li>
<li>Flush a partially written block to disk on a free of the bit writer ADT.</li>
<li>Return number of bits read and written on the read and write functions.</li>
</ul>
<h3>Design</h3>
<p>
The underlying structure to the ADT will contain fields for the: file pointer, the byte long buffer and the buffer bit length. The examples given for every case are in steps; what is originally in the buffer, what bits are requested, what bits are read from or written to the file, and finally what bits are left over and buffered.
</p>
<h2>Reader</h2>
There are two main cases to consider when designing the read function.
<ol>
<li><b>Empty Buffer</b>: The is simple since the file read bits will already be byte aligned to your output bits. You just need to simply mask out the unneeded bits and buffer them for the next read (eg. no bits buffered, 12 bits requested, 16 bits read in, 4 bits buffered).
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgv3g9-8hNjqc-GEhap-CNJ8Fl29X9buTfN6o9vleeDhR2rPRJ5U5M95slM7po_GdfZzgvEruG2JnvC-NU0dexwMLwmj1i3atHcCzh-1kGHq543Qn_vwWt-p7okRcmkG5vhEA5kG43j_RmL/s1600/bits1.png" imageanchor="1" style="margin-left:1em; margin-right:1em"><img border="0" height="42" width="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgv3g9-8hNjqc-GEhap-CNJ8Fl29X9buTfN6o9vleeDhR2rPRJ5U5M95slM7po_GdfZzgvEruG2JnvC-NU0dexwMLwmj1i3atHcCzh-1kGHq543Qn_vwWt-p7okRcmkG5vhEA5kG43j_RmL/s400/bits1.png" /></a></div>
</li>
</li>
<li><b>Populated Buffer</b>: The can get a little trickier, especially in the last two cases of this.</li>
<ol>
<li><b>Requested bits already buffered.</b> This is very easy since it doesn't require any file read and the buffer just needs to shift out the read bits (eg. 7 bits in the buffer and 3 are requested; buffer serves out 3 and truncates to 4).</li>
<li><b>Not all requested bits buffered; end of read bits not in same byte block as end of output bits.</b> The last output byte block must mask out the unneeded bits, put them in the buffer and then append the additional unneeded read bits from the next byte block (eg. 4 bits buffered, 13 bits requested, 16 bits read in, 8 bits buffered).
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiIQQEjjcfX8qflXw9cW-QmA2_QlzdAkrJlLTDJpwvsCgwlKgF9GYVju33kw2wAxbG0MBfB5I-IUsySyOg3lee3cVm-UhB4GDbrcYQArkL-jp1hrh28gzTiBIne5LvxZondTODYucueD7KC/s400/bits2-2.png" imageanchor="1" style="margin-left:1em; margin-right:1em">
<img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiIQQEjjcfX8qflXw9cW-QmA2_QlzdAkrJlLTDJpwvsCgwlKgF9GYVju33kw2wAxbG0MBfB5I-IUsySyOg3lee3cVm-UhB4GDbrcYQArkL-jp1hrh28gzTiBIne5LvxZondTODYucueD7KC/s400/bits2-2.png" /></a></div></li>
</li>
<li><b>Not all requested bits buffered; end of read bits in same byte block as end of output bits.</b> The last output byte block must mask out the unneeded bits and put them in the buffer (eg. 4 bits buffered, 10 bits requested, 8 bits read in, 2 bits buffered).
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEigkWjqzokFnyjfsPJgBh4twgE3_X8a_vnJNEV98E2fjWhHHMec-Van0vjQXYJu4UzITMDoFEyaLTX87Omay3XX1VmKFH8rhe_dwlh-jyvyQ4HALvqkauCAaOpxA2BJ5ITbSUDZx-SYoPMg/s1600/bits2-3.png" imageanchor="1" style="margin-left:1em; margin-right:1em"><img border="0" height="42" width="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEigkWjqzokFnyjfsPJgBh4twgE3_X8a_vnJNEV98E2fjWhHHMec-Van0vjQXYJu4UzITMDoFEyaLTX87Omay3XX1VmKFH8rhe_dwlh-jyvyQ4HALvqkauCAaOpxA2BJ5ITbSUDZx-SYoPMg/s400/bits2-3.png" /></a></div>
</li>
</ol>
</ol>
<h2>Writer</h2>
There are only two cases to consider when designing the write function.
<ol>
<li><b>Full Buffer</b>: Shift in write bits, commit to disk and repeat until all bits are processed. If the last bit to write is not on the buffer byte boundary, it will wait until the next write (eg. 6 bits buffered, 12 bits to write, 16 bits written out, 2 bits buffered).</li>
<li><b>Partial Buffer</b>: Shift in write bits (eg. 2 bits buffered, 4 bits to write, no bits written out, 6 bits buffered).
</ol>
<h3>Caveats</h3>
<p>
The code below is purely a demonstration of how a bit reader/writer works in a generic sense. There is room for plenty of optimization if you know what type of architecture your targeting and the domain of your reader/write arguments.
</p>
<h3>The Code</h3>
<h2>Header File</h2>
<script src="http://gist-it.appspot.com/github/awalsh128/huffmancoding/raw/master/bitio.h"></script>
<h2>Source File</h2>
<script src="http://gist-it.appspot.com/github/awalsh128/huffmancoding/raw/master/bitio.c"></script>awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.com4tag:blogger.com,1999:blog-6363087137667886940.post-77332181710396937682012-08-14T09:00:00.000-07:002012-10-03T13:43:30.647-07:00Licensing using XML digital signing.<h3>Motivation</h3>
<p>
One of the main concerns in any licensing implementation is that authored license files are coming from the trusted authority (ie. your company) and not some third party. This can be accomplished in software by encrypting and decrypting the license with a <a href="http://en.wikipedia.org/wiki/Symmetric-key_algorithm">single key</a>, but symmetric <a href="http://en.wikipedia.org/wiki/Key_management">key storage</a> can be cumbersome and have added complexity.
</p>
<p>
<a href="http://en.wikipedia.org/wiki/Public-key_cryptography">Assymetric cryptography</a>, on the other hand, allows for two keys: a private one to author files with and a public one to verify its authenticity. This is well suited for licensing, where we want to ensure that the authored license has not been altered in anyway and the content can be relied on.
</p>
<h3>Implementation</h3>
<p>
I ran across <a href="http://www.codeproject.com/Articles/4940/Using-XML-Digital-Signatures-for-Application-Licen
">a great article</a> that details the use of XML digital signatures in the .NET framework. It describes the type of digital signatures that can be implemented and lays out the classes and methods needed to author and verify license files.
</p>
<p>
This specific implementation allows for an: authentication key to compare against some unique value tying the license to the system, expiration date to set a time span on the software use and set of available features the user is authorized for.
</p>
<h2>Base Class</h2>
<p>
This forms the runtime object representation of the license and will be inherited by the license reader and writer.
</p>
<pre class="brush:csharp">
using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Text;
using System.Security;
using System.Security.Cryptography;
using System.Security.Cryptography.Xml;
using System.Xml;
namespace Licensing
{
public abstract class LicenseBase
{
private byte[] _authenticationKey;
private DateTime _expiration;
private List<string> _features;
private int _id;
public byte[] AuthenticationKey {
get { return _authenticationKey; }
protected set { _authenticationKey = value; }
}
public DateTime Expiration {
get { return _expiration; }
protected set { _expiration = value; }
}
public int Id {
get { return _id; }
protected set { _id = value; }
}
protected List<string> Features {
get { return _features; }
}
public LicenseBase()
{
_features = new List<string>();
}
public void Clear()
{
_id = 0;
_authenticationKey = null;
_features.Clear();
}
}
}
</pre>
<h2>Writer Class</h2>
<p>
This class will be used internally to author the license files that will be distributed with the released software. Note that the private key is added as an embedded resource in the project and should not be distributed since the public key can be derived from it.
</p>
<pre class="brush:csharp">
using System;
using System.IO;
using System.Security;
using System.Security.Cryptography;
using System.Security.Cryptography.Xml;
using System.Text;
using System.Xml;
namespace Licensing
{
class License : LicenseBase
{
License(int id, byte[] authenticationKey, DateTime expiration, string[] features)
{
this.AuthenticationKey = authenticationKey;
this.Expiration = expiration;
this.Id = id;
this.Features.AddRange(features);
}
/// <summary>
/// Generates a key pair for digital signing and verification.
/// </summary>
/// <remarks>
/// </remarks>
static internal void GenerateAssymetricKeys()
{
string datestamp = null;
StreamWriter output = null;
RSA key = RSA.Create();
key.KeySize = 1024;
datestamp = DateTime.UtcNow.ToString("yyyyMMdd");
// Generate private key to only be used internally (DO NOT DISTRIBUTE).
output = File.CreateText("private-" + datestamp + ".key");
output.Write(key.ToXmlString(true));
output.Close();
// Generate public key to be used by customers (distribute).
output = File.CreateText("public-" + datestamp + ".key");
output.Write(key.ToXmlString(false));
output.Close();
}
/// <summary>
/// Digitally sign an XML document.
/// </summary>
/// <param name="document">The XML document to sign.</param>
/// <param name="privateKey">The private key to sign it with.</param>
/// <remarks></remarks>
private static void _SignXmlDocument(System.Xml.XmlDocument document, RSA privateKey)
{
SignedXml signedDocument = new SignedXml(document);
signedDocument.SigningKey = privateKey;
signedDocument.SignedInfo.CanonicalizationMethod = SignedXml.XmlDsigCanonicalizationUrl;
// Add reference to XML data
Reference @ref = new Reference("");
@ref.AddTransform(new XmlDsigEnvelopedSignatureTransform(false));
signedDocument.AddReference(@ref);
// Build the signature.
signedDocument.ComputeSignature();
// Attach the signature to the XML document.
XmlElement signature = signedDocument.GetXml();
document.DocumentElement.AppendChild(signature);
}
/// <summary>
/// Write the contents and digitally sign the document.
/// </summary>
/// <param name="filepath">The file path to the digitally signed document.</param>
public void WriteDocument(string filepath)
{
XmlDocument document = new XmlDocument();
document.AppendChild(document.CreateXmlDeclaration("1.0", "UTF-8", null));
XmlElement root = document.CreateElement("License");
document.AppendChild(root);
XmlElement id = document.CreateElement("Id");
id.InnerText = this.Id.ToString();
root.AppendChild(id);
XmlElement authenticationKey = document.CreateElement("AuthenticationKey");
authenticationKey.InnerText = Convert.ToBase64String(this.AuthenticationKey);
root.AppendChild(authenticationKey);
XmlElement expiration = document.CreateElement("Expiration");
expiration.InnerText = this.Expiration.ToString();
root.AppendChild(expiration);
XmlElement options = document.CreateElement("Features");
XmlElement featureItem = null;
foreach (string feature in this.Features)
{
featureItem = document.CreateElement("Feature");
featureItem.InnerText = feature;
options.AppendChild(featureItem);
}
root.AppendChild(options);
XmlElement publickey = document.CreateElement("Certificate");
publickey.SetAttribute("DateCode", KeyCreationDateCode);
publickey.InnerXml = File.ReadAllText(PublicKeyFilename);
root.AppendChild(publickey);
RSA privateKey = RSA.Create();
privateKey.FromXmlString(Licensing.Properties.Resources.privatekey);
_SignXmlDocument(document, privateKey);
File.WriteAllText(filepath, document.InnerXml);
}
}
}
</pre>
<h2>Reader Class</h2>
<p>
This class will be used in the field to verify the license files that will be distributed with the released software. It is best to wrap the license read and authenticate methods in try-catch blocks since they represent exceptional behavior that fall outside the post-conditions of the method. Handling the exceptions will allow for better error reporting in the field so you can decide what information you will share with the end user (eg. bad authentication key, malformed/tampered licenses). Note again that the public key is an embedded resource for the project. It should never be distributed with the license since an attacker could generate their own keypair and envelope their own public key, with signature, in the license.
</p>
<pre class="brush:csharp">
using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Text;
using System.Security;
using System.Security.Cryptography;
using System.Security.Cryptography.Xml;
using System.Xml;
namespace Licensing
{
class LicenseReader : LicenseBase
{
LicenseReader()
{
}
public void Authenticate()
{
// Check the hardware key value against the AuthenticationKey
if (!_AuthenticationKeyMatches(hardwareKey))
throw new LicenseAuthenticationException("The license failed to authenticate the hardware key.");
// ... or check some node locking ID (eg. MAC ID, hard drive serial number) against the AuthenticationKey.
if (!_AuthenticationKeyMatches(nodeLockedId))
throw new LicenseAuthenticationException("The license failed to authenticate the node locked ID.");
if (this.Expiration > DateTime.UtcNow)
throw new LicenseAuthenticationException("The license has expired.");
}
/// <summary>
/// Compare byte array against authentication key byte array.
/// </summary>
/// <param name="compare">Byte array to compare against.</param>
/// <returns>True if a match, else false.</returns>
private bool _AuthenticationKeyMatches(byte[] compare)
{
if (this.AuthenticationKey == null)
return false;
int upperBound = Math.Min(this.AuthenticationKey.Length, compare.Length);
for (int i = 0; i < upperBound; i++)
{
if (this.AuthenticationKey[i] != compare[i])
return false;
}
return true;
}
public bool IsFeature(string featureName)
{
return this.Features.Contains(featureName);
}
/// <summary>
/// Read the digitally signed document and load its contents.
/// </summary>
/// <param name="filepath">The file path to the digitally signed document.</param>
protected virtual void ReadDocument(string filepath)
{
this.Clear();
XmlDocument document = new XmlDocument();
document.Load(filepath);
RSA publicKey = RSA.Create();
publicKey.FromXmlString(Licensing.Properties.Resources.publickey);
if (_VerifyXmlDocument(document, publicKey))
{
this.Id = int.Parse(document.SelectSingleNode("//License/Id").InnerText);
this.AuthenticationKey = Convert.FromBase64String(document.SelectSingleNode("//License/AuthenticationKey").InnerText);
this.Expiration = DateTime.Parse(document.SelectSingleNode("//License/Expiration").InnerText);
XmlNodeList features = document.SelectNodes("//License/Features/Feature");
foreach (XmlNode feature in features)
{
this.Features.Add(feature.InnerText);
}
}
}
/// <summary>
/// Verify the digital signature of an XML document.
/// </summary>
/// <param name="document">The XML document containing the signature.</param>
/// <param name="publicKey">The public key to verify signature authenticity.</param>
/// <returns>True if the signature is authentic, else false.</returns>
/// <remarks></remarks>
private bool _VerifyXmlDocument(XmlDocument document, RSA publicKey)
{
SignedXml signedDocument = new SignedXml(document);
try
{
XmlNode signature = document.GetElementsByTagName("Signature", SignedXml.XmlDsigNamespaceUrl)[0];
signedDocument.LoadXml((XmlElement)(signature));
}
catch
{
return false;
}
return signedDocument.CheckSignature(publicKey);
}
}
}
</pre>
<h3>Additions & Caveats</h3>
<p>
Symmetric encryption could also be used to hinder casual viewing of the license file. The issue of symmetric key storage must be revisited but will no longer be critical to maintaining the integrity of the license files.
</p>
<p>
Strong naming of the license reading assembly is encouraged to prevent dissasembly and re-embedding of an attacker's own public key. Although, this is by no means an attack-proof method since <a href="http://www.codeproject.com/Articles/15374/Removing-Strong-Signing-from-assemblies-at-file-le">strong naming can be removed</a> by changing information in the <a href="http://www.ntcore.com/Files/dotnetformat.htm">CLI header</a> of the portable executable (PE). Even if the application was able to secure the public key outside the application in a <a href="http://en.wikipedia.org/wiki/Public_key_infrastructure">PKI</a>, there is still the issue of code injection or recompilation once strong naming is stripped. Indirection of the public key or obfuscation of the assembly can also help but is <a href="http://csrc.nist.gov/publications/secpubs/faq-sec.txt">not a reliable security measure</a>.
</p>awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.com0tag:blogger.com,1999:blog-6363087137667886940.post-60629673778481792772012-07-30T11:33:00.000-07:002012-07-30T11:33:50.594-07:00Automate IIS enable on Windows 7.<h3>Background</h3>
<p>
There are plenty of sites that will tell you how to enable IIS through the Control Panel. Although, if you are tasked with enabling IIS automatically (eg. for an installer), it can be a nightmare. This is only really needed though in special cases where the installer has more control over the hardware specifications. Determining an upgrade or uninstall scenario with IIS can be very tough since you are not installing your own software but rather enabling a feature of Windows. With that in mind, we can propose a solution for automated enabling of IIS.
</p>
<h3>Requirements</h3>
<h2>Windows Editions</h2>
<p>
The <a href="http://technet.microsoft.com/en-us/library/cc753473.aspx">edition of Windows 7 for IIS</a> must be either Ultimate or Professional if you plan on using ASP.NET. You can try this on other editions, or even Vista, but your results may differ since this is only scenario I have tested against. More information can be found at the <a href="http://technet.microsoft.com/en-us/library/ee692294(v=ws.10).aspx">IIS TechNet site</a>.
</p>
<h2>DISM Tool</h2>
<p>
To enable IIS features through the command line, <a href="http://technet.microsoft.com/en-us/library/dd744256(v=ws.10).aspx">DISM</a> can be used. This can be reliably found on Windows 7 but is inconsistent on Vista, where the older <a href="http://technet.microsoft.com/en-us/library/cc749465(v=ws.10).aspx">Package Manager</a> is used instead. This was one of the reasons I decided to constrain the specification to Windows 7.
</p>
<p>
The first step in using DISM, is to find out which features need to be enabled, in order for IIS to work completely. I ran across this <a href="http://blogs.msdn.com/b/habibh/archive/2009/08/14/how-to-install-iis-7-5-on-windows-7-using-the-command-line.aspx?Redirected=true">MSDN blog entry</a> that lists all the needed features. Next task is to script the DISM commands.
</p>
<h3>Scripting</h3>
<h2>Caveats</h2>
<p>
Iterating over each feature and enabling it with DISM can be very time consuming. The faster approach is to evaluate the enabled state of each feature and turn them all on in one command. PowerShell can be used but the presence of it and its needed modules can be inconsistent on Windows 7. I had to fall back on batch scripting, which is the part that can cause nightmares due to its ugly syntax.
</p>
<p>
Arrays are need to store the enabled states of each feature and the feature names themselves. In my case, I needed an array of pairs. I was able to fake it using a <a href="http://stackoverflow.com/questions/8039128/batch-script-in-dos-traverse-multiple-lists-pairwise">suggestion on Stack Overflow</a>. Also, special syntax is needed to reference variables inside a loop that are scoped outside the loop.
</p>
<p>
Last, but not least, remember to register the ASP.NET framework version being used with IIS. This will need to be done once IIS is enabled (reboot is required). Adjust the framework bitness and version below according to the one you are using.
</p>
<pre class="brush:shell">
[WindowsFolder]\Microsoft.NET\Framework64\v4.0.30319\aspnet_regiis.exe -i
</pre>
<h2>The Code</h2>
<p>
The following batch script will discover which features are disabled, concatenate them into the enable command and then execute. You may need to replace <a href="http://msdn.microsoft.com/en-us/library/windows/desktop/aa384187(v=vs.85).aspx">SysNative</a> with system32 or the Windows system folder being used by your target system if this is being ran outside the installer.
</p>
<pre class="brush:bash">
setlocal enableextensions enabledelayedexpansion
set features=IIS-ApplicationDevelopment IIS-ASPNET IIS-CommonHttpFeatures IIS-HostableWebCore IIS-HttpCompressionDynamic IIS-HttpCompressionStatic IIS-HttpErrors IIS-HttpLogging IIS-HttpRedirect IIS-HttpTracing IIS-Metabase IIS-NetFxExtensibility IIS-Performance IIS-StaticContent IIS-URLAuthorization IIS-WebServerRole WAS-ConfigurationAPI WAS-NetFxEnvironment WAS-ProcessModel WAS-WindowsActivationService
set dism_params=
set n=0
for %%f in (%features%) do (
set /a n+=1
set dism_params=!dism_params! /FeatureName:%%f
set features[!n!]=%%f
)
set n=0
for /f "tokens=3*" %%i in ('%WINDIR%\SysNative\Dism.exe /Online /Get-FeatureInfo %dism_params% ^| findstr /b /r "^State : "') do (
set /a n+=1
set states[!n!]=%%i
)
set dism_params=
for /l %%i in (1,1,%n%) do (
if "!states[%%i]!" == "Disabled" set dism_params=!dism_params! /FeatureName:!features[%%i]!
)
if "%dism_params%" neq "" %WINDIR%\SysNative\Dism.exe /Online /Enable-Feature %dism_params%
</pre>awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.com0tag:blogger.com,1999:blog-6363087137667886940.post-26337549000891926722012-07-25T13:23:00.000-07:002012-08-01T16:20:36.425-07:00Converting thermocouple voltages to temperature.<h3>Background</h3>
<p>
Unlike most sensors used in data acquisition, thermocouples do not have a linear mapping between their voltage and celcius values. This means that instead of factoring some scalar value against the data, in order to scale it to engineering units (eg. volts to acceleration), an <a href="http://en.wikipedia.org/wiki/Thermocouple#Voltage.E2.80.93temperature_relationship">experimentally derived equation</a> must be created to approximate the temperature at specific voltages. This non-linearity is primarily due to the materials being used to discover the temperature difference.
</p>
<p>
The science and engineering behind thermocouples goes beyond the scope of this post but more information can be found below:
<ul>
<li><a href="http://en.wikipedia.org/wiki/Thermoelectric_effect">Wikipedia: Thermoelectric Effect</a>
<li><a href="http://en.wikipedia.org/wiki/Thermocouple">Wikipedia: Thermocouple</a></li>
<li><a href="http://www.omega.com/temperature/z/pdf/z021-032.pdf">Reference Temperatures</a></li>
<li><a href="http://www.scienceeducationreview.com/open_access/molki-seebeck.pdf">Simple Demonstration of the Seebeck Effect </a></li>
</ul>
</p>
<h3>K-Type Thermocouples</h3>
<p>
One of the most general purpose thermocouple types used in data acquisition are <a href="http://en.wikipedia.org/wiki/Thermocouple#K">K types</a> since their are relatively inexpensive to make. This will also be the thermocouple that I will focus on.
</p>
<p>
Luckily, <a href="http://www.nist.gov">NIST</a> provides a <a href="http://srdata.nist.gov/its90/main/">table of discrete values</a>, correlating voltages to celcius and vice-versa. In addition, they also provide the exact polynomial equation to use <a href="http://srdata.nist.gov/its90/download/type_k.tab">underneath the given tables</a>, depending on the voltage range the value falls in. Following is the exact functions to do the conversion for K-type thermocouples.
</p>
<h2>Voltage To Celcius</h2>
<p>
Given some ordered set of coefficients, <i>D</i>, and a voltages value, <i>v</i>, let the voltage to celcius conversion be defined as a function of <i>v</i> as follows.
</p>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikaQYzk_hsKj4A6o3jmn1A9LJfAlATR9uslpUWyHi09Ot1-RyyaI-h-7j1vQr6ATOiqDr78Le7wXVY9VYRyCTUIc43mKOP6oc9ghztEVLg_8FpATdN-O8lyRH6zq9YeF50nG9EFNbKVn7T/s1600/ktype.gif" imageanchor="1" style="margin-left:1em; margin-right:1em"><img border="0" height="56" width="56" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikaQYzk_hsKj4A6o3jmn1A9LJfAlATR9uslpUWyHi09Ot1-RyyaI-h-7j1vQr6ATOiqDr78Le7wXVY9VYRyCTUIc43mKOP6oc9ghztEVLg_8FpATdN-O8lyRH6zq9YeF50nG9EFNbKVn7T/s400/ktype.gif" /></a></div>
<p>
<i>D</i> is dependent on the voltage value being evaluated:
<ul>
<li>values greater than 206.44 mV will have <i>D = {-131.8058, 48.30222, -1.646031, 0.05464731, -0.0009650715, 0.000008802193, -0.0000000311081, 0.0, 0.0, 0.0 }</i>;</li>
<li>values geater than 0.0 V will have <i>D = { 0.0, 25.08355, 0.07860106, -0.2503131, 0.0831527, -0.01228034, 0.0009804036, -0.0000441303, 0.000001057734,-0.00000001052755}</i>;</li>
<li>and values less than 0.0 V will have <i>D = { 0.0, 25.173462, -1.1662878, -1.0833638, -0.8977354, -0.37342377, -0.086632643, -0.010450598, -0.00051920577, 0.0 }</i>.</li>
</ul>
</p>
<h2>Celcius To Voltage</h2>
<p>
Given some ordered set of coefficients, <i>C</i>, an ordered set of exponentials, <i>A</i>, and a celcius value, <i>x</i>, let the celcius to voltage conversion be defined as a function of <i>x</i> as follows.
</p>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgNfS212F1xO6uY8DnlxwMxaVeFHcz-ypLxVMGzwhIukmxkrWYKeS0jkZnc7ehXZdZ6EoJP_DZqByyrTrZhCt5iiX4-LKj5bU4spUanWRBtV6mBgtQ5Uf5lrM09azpXoB3Qc12a195d4JIT/s1600/ktype-ctov.gif" imageanchor="1" style="margin-left:1em; margin-right:1em"><img border="0" height="56" width="182" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgNfS212F1xO6uY8DnlxwMxaVeFHcz-ypLxVMGzwhIukmxkrWYKeS0jkZnc7ehXZdZ6EoJP_DZqByyrTrZhCt5iiX4-LKj5bU4spUanWRBtV6mBgtQ5Uf5lrM09azpXoB3Qc12a195d4JIT/s400/ktype-ctov.gif" /></a></div>
<p>
<i>C</i> is dependent on the celius value being evaluated:
<ul>
<li>positive and zero values will have <i>C = { -0.017600413686,
0.038921204975,
0.000018558770032,
-0.000000099457592874,
0.00000000031840945719,
-0.00000000000056072844889,
0.00000000000000056075059059,
-3.2020720003E-19,
9.7151147152E-23,
-1.2104721275E-26}</i>;</li>
<li>negative values will have <i>C = { 0.0,
0.039450128025,
0.000023622373598,
-0.00000032858906784,
-0.0000000049904828777,
-0.000000000067509059173,
-0.00000000000057410327428,
-0.0000000000000031088872894,
-1.0451609365E-17,
-1.9889266878E-20,
-1.6322697486E-23}</i>;</li>
</ul>
<i>A</i> is a fixed ordered set of exponentials; <i>A = {0.1185976, -0.0001183432, 126.9686}</i>.
</p>
<h3>Performance</h3>
<p>
Since the polynomial equation's number of terms are bounded by a constant (ie. <i>|D|</i> or <i>|C|</i>) the performance hit will be a constant factor to the number of voltage or celcius values to calculate. Also, temperature is sampled at a very low rate, yielding small data sets that usually have redundant values in them. Further optimization can be made by caching previous computations and reusing their results if the same voltage or celcius value is encountered again.
</p>
<h3>The Code</h3>
<pre class="brush:c-sharp">
private static double[] _thermocoupleCoefficientsTypeKneg = {
0.0,
0.039450128025,
2.3622373598E-05,
-3.2858906784E-07,
-4.9904828777E-09,
-6.7509059173E-11,
-5.7410327428E-13,
-3.1088872894E-15,
-1.0451609365E-17,
-1.9889266878E-20,
-1.6322697486E-23
};
private static double[] _thermocoupleCoefficientsTypeKpos = {
-0.017600413686,
0.038921204975,
1.8558770032E-05,
-9.9457592874E-08,
3.1840945719E-10,
-5.6072844889E-13,
5.6075059059E-16,
-3.2020720003E-19,
9.7151147152E-23,
-1.2104721275E-26
};
private static double[] _thermocoupleExponentialsTypeK = {
0.1185976,
-0.0001183432,
126.9686
};
/// <summary>
/// Type K thermocouple inverse coefficient values for voltage to celcius conversions.
/// </summary>
/// <remarks>Valid values for -200C - 0C / -5.891mV - 0mV.</remarks>
private static double[] _thermocoupleInverseCoefficientsTypeK0 = {
0.0,
25.173462,
-1.1662878,
-1.0833638,
-0.8977354,
-0.37342377,
-0.086632643,
-0.010450598,
-0.00051920577,
0.0
};
/// <summary>
/// Type K thermocouple inverse coefficient values for voltage to celcius conversions.
/// </summary>
/// <remarks>Valid values for 0C - 500C / 0mV - 20.644mV.</remarks>
private static double[] _thermocoupleInverseCoefficientsTypeK1 = {
0.0,
25.08355,
0.07860106,
-0.2503131,
0.0831527,
-0.01228034,
0.0009804036,
-4.41303E-05,
1.057734E-06,
-1.052755E-08
};
/// <summary>
/// Type K thermocouple inverse coefficient values for voltage to celcius conversions.
/// </summary>
/// <remarks>Valid values for 500C - 1372C / 20.644mV - 54.886mV.</remarks>
private static double[] _thermocoupleInverseCoefficientsTypeK2 = {
-131.8058,
48.30222,
-1.646031,
0.05464731,
-0.0009650715,
8.802193E-06,
-3.11081E-08,
0.0,
0.0,
0.0
};
public static double CelciusToVoltageTypeK(double value)
{
double[] a = _thermocoupleExponentialsTypeK;
double[] c = null;
double result = 0.0;
if ((value >= 0.0)) {
c = _thermocoupleCoefficientsTypeKpos;
} else {
c = _thermocoupleCoefficientsTypeKneg;
}
for (int index = 0; index <= c.Length - 1; index++) {
result += (c[index] * Math.Pow(value, index)) + (a[0] * Math.Exp(a[1] * Math.Pow(value - a[2], 2)));
}
return result;
}
public static double VoltageToCelciusTypeK(double value)
{
double[] d = null;
double result = 0.0;
if ((value > 0.020644)) {
d = _thermocoupleInverseCoefficientsTypeK2;
} else if ((value >= 0.0)) {
d = _thermocoupleInverseCoefficientsTypeK1;
} else {
d = _thermocoupleInverseCoefficientsTypeK0;
}
for (int index = 0; index <= d.Length - 1; index++) {
result += d[index] * Math.Pow(value, index);
}
return result;
}
</pre>awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.com0tag:blogger.com,1999:blog-6363087137667886940.post-53395347035023089812012-04-23T11:41:00.001-07:002012-08-01T16:21:00.286-07:00Bit specific builds in Visual Studio 2010.<h2>Using Conditioned Tags</h2>
<p>
The Visual Studio 2010 UI does not have options to handle conditional references based on what kind of configuration and platform you are running. For example, you may have a project reference that needs to change depending on whether you are doing a 32 or 64 bit build.
</p>
<p>
Fortunately, the project file syntax does support conditional references. I was able to find this out on <a href="http://stackoverflow.com/questions/1997268/how-to-reference-different-version-of-dll-with-msbuild">stack overflow</a>.
</p>
<p>
In our case, we have our DLL's in source control under the project's bin directory. For some reason I kept getting the 64-bit version of the DLL and then found out it was because the Content tags also needed to be conditioned.
</p>
<h2>Example</h2>
<p>
Lets say you have a project (MyProject) with two DLL's, 32bit.dll and 64bit.dll, that are referenced by the project and kept under the bin directory for the project:
</p>
<ul>
<li>MyProject\bin\x86\32bit.dll</li>
<li>MyProject\bin\x64\64bit.dll</li>
</ul>
<p>
You will need to hand edit the MyProject\MyProject.vbproj file and add these entries to it (or change it if they are already in there). The first two entries are for the project references and the last two are for their inclusion as a content item in the project.
</p>
<pre class="prettyprint">
<Reference Include="32bit" Condition="'$(Platform)' == 'x86'">
<HintPath>bin\x86\32bit.dll</HintPath>
</Reference>
<Reference Include="64bit" Condition="'$(Platform)' == 'x64'">
<HintPath>bin\x64\64bit.dll</HintPath>
</Reference>
...
<ItemGroup>
<Content Include="bin\x86\32bit.dll" Condition="'$(Platform)' == 'x86'"/>
<Content Include="bin\x64\64bit.dll" Condition="'$(Platform)' == 'x64'"/>
</ItemGroup>
</pre>
<p>
Now, whenever the platform is changed to x86, it will only use the x86 conditioned tags and likewise when changed to x64. This can also be applied to other conditions too, check the links below for a complete breakdown of the MSBuild schema and conditions.
</p>
<ul>
<li><a href="http://msdn.microsoft.com/en-us/library/5dy88c2e.aspx">MSBuild Project File Schema Reference</a></li>
<li><a href="http://msdn.microsoft.com/en-us/library/7szfhaft.aspx">MSBuild Conditions</a></li>
</ul>awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.com0tag:blogger.com,1999:blog-6363087137667886940.post-46640144937224318442011-12-21T10:38:00.000-08:002014-10-25T18:18:59.954-07:00WPF and data multithreading deadlocks.<h2>WPF & Multithreading</h2>
<p>In designing applications, it may be advantageous to run data processing methods on a separate thread from the UI. This allows greater responsiveness in the application and takes advantage of concurrent computation.</p>
<p>The cornerstone interface to any binding in WPF is <a href="http://msdn.microsoft.com/en-us/library/system.componentmodel.inotifypropertychanged.aspx">INotifyPropertyChanged</a>, which is implemented on a data structure's different properties. Although this can be the source of deadlocks if the data structure is being processed on a separate thread where locks are explicitly implemented by the developer.</p>
<p>A deadlock could occur if some UI event was being triggered on a data structure (eg. expanding a view, changing a text box entry) while the data thread was processing it with an acquired lock. <strike>Internally, the <a href="http://msdn.microsoft.com/en-us/library/system.componentmodel.inotifypropertychanged.propertychanged.aspx">PropertyChanged</a> event will call a <a href="http://msdn.microsoft.com/en-us/library/system.windows.threading.dispatcher.invoke.aspx">Dispatcher.Invoke</a> on the UI. Although the UI could be waiting for the lock to release on that same property that the data thread already has, creating a cyclic dependency or deadlock.</strike></p>
<p>The call stack for the UI thread. Note the wait call.</p>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi-4hjeQKzzLQf_2co6yhEl-sgFiBHhd8-nQRpvoUl8nwGYEkd2T_o9RVD6pNV5nnj0xAWH4E8OqjZzm1XRvaOokMrIPCXZmFCdK5ao5GlwsRD36KSpgYjTQHVBaxmHJS0hFjFhhjR1nzxA/s1600/uithread.png" imageanchor="1" style="margin-left:1em; margin-right:1em"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi-4hjeQKzzLQf_2co6yhEl-sgFiBHhd8-nQRpvoUl8nwGYEkd2T_o9RVD6pNV5nnj0xAWH4E8OqjZzm1XRvaOokMrIPCXZmFCdK5ao5GlwsRD36KSpgYjTQHVBaxmHJS0hFjFhhjR1nzxA/s400/uithread.png" /></a></div>
<p>The call stack for the data thread.</p>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFnTM1-CaVwfpXtW40gFgQtPyBNo8sJsZ6KPVEUyGCzh1SBItVC4nFUm7lQyByKOBmdnz2pWDvW_R36L7skAPhG47Nbg62uub8FR67XFKssRS9V5DUejaHJnYc6UdG4Z3QP7P4qUkBgXW6/s1600/workerthread.png" imageanchor="1" style="margin-left:1em; margin-right:1em"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFnTM1-CaVwfpXtW40gFgQtPyBNo8sJsZ6KPVEUyGCzh1SBItVC4nFUm7lQyByKOBmdnz2pWDvW_R36L7skAPhG47Nbg62uub8FR67XFKssRS9V5DUejaHJnYc6UdG4Z3QP7P4qUkBgXW6/s400/workerthread.png" /></a></div>
<h2>Deadlock Scenario</h2>
<p>As an example, suppose you have your standard WPF application UI thread and a separate data thread. The data thread polls a time stamp from a remote database and updates a class property called <code class="inlinecode">DataObject.Timestamp</code>, which raises the <code class="inlinecode">PropertyChanged</code> event. The property is also bound to a text box in our WPF application window, which is made visible in the window based on the user's preference (ie. collapsing and expanding a view).</p>
<p>In our deadlock example, the data thread puts a lock on the <code class="inlinecode">DataObject.Timestamp</code> property while it updates it and other data structures. While this is happening, the view is expanded, causing the text box to start binding to the <code class="inlinecode">DataObject.Timestamp</code> property and wait for an acquire of the lock that the data thread is holding. The data thread then sets the <code class="inlinecode">DataObject.Timestamp</code> property inside the critical region code and the <code class="inlinecode">PropertyChanged</code> event is raised. <strike>This causes the new time stamp value to be marshaled back onto the <code class="inlinecode">Dispatcher</code> with a blocking/synchronous <code class="inlinecode">Dispatcher.Invoke</code> call. Although the data thread now deadlocks in the critical region, waiting for the UI <code class="inlinecode">Dispatcher.Invoke</code> to return, which is waiting on the data thread to release the lock.</strike></p>
<h2>Solution</h2>
<p>To prevent the data thread's deadlock <strike>critical region from blocking on the <code class="inlinecode">Dispatcher.Invoke</code></strike>, override the <code class="inlinecode">OnPropertyChanged</code> method to marshal the value asynchronously to the dispatcher.</p>
<pre class="brush:csharp">
protected override void OnPropertyChanged(string propertyName)
{
Application.Current.Dispatcher.BeginInvoke(new Action(() => { base.OnPropertyChanged(propertyName); }));
}
</pre>
<p>This will ensure that the critical region doesn't block and the property changes are still serviced by the dispatcher.</p>awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.com2tag:blogger.com,1999:blog-6363087137667886940.post-61501318990276388152011-12-16T11:32:00.000-08:002012-10-03T13:45:08.174-07:00Tips for Java performance on Android.<p>While Java may be a higher level language that makes code writeability easier, conventions must be broken when developing on Android to reap performance gains. This is especially true for games where logic and rendering must be done on a tight loop.</p>
<p>The Android developer page provides <a href="http://developer.android.com/guide/practices/design/performance.html">helpful tips</a> for optimizing your Java code. Below are the optimizations that I have found to improve performance.</p>
<ol>
<li>
<strong>Front load memory allocation at startup instead of dynamically.</strong>
<p>We saw the largest performance gain here. The <a href="http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)">GC</a> can kill performance in a CPU intensive application or game. If you pool your instances in an array instead of dynamically allocating it, you can save the GC calls from happening. This will result in a slightly larger memory footprint if you tweak it right.</p>
<p>
Below is code that provides a fixed-size generic list implementation; the internal array never grows/shrinks. The <em>executePendingRemoves</em> routine is needed since object removal must sometimes be put into a <a href="http://en.wikipedia.org/wiki/Critical_section">critical region</a> with a <a href="http://docs.oracle.com/javase/tutorial/essential/concurrency/locksync.html"><em>synchronized</em> block</a>. You may want to queue up removals but still make them available to other threads up until the moment they are removed in bulk.</p>
<pre class="brush:csharp">public class List<T> {
private int mCount;
private T mItems[];
private int mRemoveCount;
private int[] mRemoveIndices;
@SuppressWarnings("unchecked")
public List(int capacity) {
mItems = (T[]) new Object[capacity];
mRemoveIndices = new int[capacity];
mCount = 0;
}
public void add(T item){
mItems[mCount] = item;
mCount++;
}
public void clear(){
for(int index=0; index < mCount; index++)
{
mItems[index] = null;
}
mCount = 0;
}
public void executePendingRemoves() {
for (int i=0;i < mRemoveCount; i++) {
mItems[mRemoveIndices[i]] = null;
}
int newCount=0;
for (int i=0;i < mCount; i++) {
if (mItems[i] != null) {
mItems[newCount] = mItems[i];
newCount++;
}
}
mCount = newCount;
mRemoveCount = 0;
}
public T get(int index) { return mItems[index];}
public int getCount() { return mCount;}
public int getIndex(T item) {
for (int i=0; i < mCount; i++) {
if (mItems[i] == item) return i;
}
return -1;
}
public int getPendingRemoveCount() { return mRemoveCount; }
public int[] getPendingRemoves() { return mRemoveIndices; }
public Object[] getItems() { return mItems;}
public void pendingRemoveAt(int index) {
for (int i=0; i< mRemoveCount; i++) {
if (index == mRemoveIndices[i])
return;
else if (index < mRemoveIndices[i]) {
int swapIndex = mRemoveIndices[i];
mRemoveIndices[i] = index;
index = swapIndex;
}
}
mRemoveIndices[mRemoveCount] = index;
mRemoveCount++;
}
public void removeAt(int index) {
if (index < mCount) {
for (int i = index; i < mCount; i++) {
if (i + 1 < mItems.length && i + 1 < mCount) {
mItems[i] = mItems[i + 1];
} else {
mItems[i] = null;
}
}
mCount--;
}
}
public T removeLast() {
mCount--;
final T item = mItems[mCount];
mItems[mCount] = null;
return item;
}
}
</pre>
<p>Lastly is the pooling object that can be inherited by any application level instance (eg. <em>FlyingWizardsPool extends ObjectPool</em>). When you want to instantiate a flying wizard object, just type in <em>FlyingWizard wizard = wizardPool.checkOut();</em>.</p>
<pre class="brush:csharp">public abstract class ObjectPool<T> {
private int mCapacity;
private List<T> mPool;
@SuppressWarnings({ "unchecked", "rawtypes" })
public ObjectPool(int capacity) {
mPool = new List(capacity);
mCapacity = capacity;
this.fill();
}
public void checkIn(T poolItem) {
mPool.add(poolItem);
}
public T checkOut() {
return mPool.removeLast();
}
public abstract void fill();
public int getCount() {
return mPool.getCount();
}
public int getCapacity() {
return mCapacity;
}
}
</pre>
<p>Below is an example implementation of the <em>ObjectPool</em> abstract class. Note how the <em>fill</em> routine is implemented. Also, front loading memory like this means that you will need an <em>initialize</em> on your instances so that the class fields can be put back to a "just instantiated" state. Normally we would use a constructor but to keep the allocation static, an <em>initialize</em> call is necessary.</p>
<pre class="brush:csharp">public static class FlyingWizardPool extends ObjectPool<Laser> {
public FlyingWizardPool(int capacity) {
super(capacity);
}
public FlyingWizard checkOut(float x, float y, float destinationX, float destinationY) {
FlyingWizard wizard = this.checkOut();
wizard.initialize(x, y);
return wizard;
}
@Override
public void fill() {
for (int i=0; i < this.getCapacity(); i++) {
this.checkIn(new FlyingWizard());
}
}
}
</pre>
</li>
<li>
<strong>Choose larger block routines instead of calling out to helper subroutines.</strong>
<p>Since the Dalvik is a register based architecture, every time a routine is called from another, the register values must be stored to memory before the call and loaded back in after the routine has returned. Avoiding helper routine inside commonly called routines will reduce this load/store overhead.</p>
</li>
<li>
<strong>Remove <a href="http://en.wikipedia.org/wiki/Mutator_method">accessors</a> and allow direct access to class fields.</strong>
<p>Accessors are usually good form and observe the <a href="http://en.wikipedia.org/wiki/Abstract_data_type">ADT</a>. Although, to avoid the overhead involved in a routine call and additional allocation of memory for the copied out value, it is best to allow direct access to the class field. This makes it more difficult to protect the <a href="http://en.wikipedia.org/wiki/Precondition">conditions</a> for the field but is a worthy sacrifice if it is called often.</p>
</li>
<li>
<strong>Make routines static if they don't access class fields.</strong>
<p>The Android team recommends using this for an approximate 15-20% performance gain. This means less work for the Dalvik during run-time optimization and the routine can be inlined for faster execution.</p>
</li>
<li>
<strong>Use constants instead of enums.</strong>
<p>This is more of a negligible optimization; it was delisted from Android's performance page. Instead of having an enum, you can keep the enumeration as an integer constant.</p>
</li>
</ol>
It is important to follow the ADT for good code writeability/readability and bug minimization. Use these optimizations only on code that really needs it. This means it should apply to routines and fields that are called frequently by your application.awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.com1tag:blogger.com,1999:blog-6363087137667886940.post-29356944022210436272011-11-07T14:19:00.000-08:002012-10-03T13:44:50.869-07:00POCO not transforming into a POCO proxy.A POCO (Plain Old CLR Object) must be transformed into a POCO proxy<br />
<span class="Apple-style-span" style="font-size: x-small;">MyNamespace.MyClass ->{System.Data.Entity.DynamicProxies.MyClass_0A943C2FC37D33304CEB497A9210C754E3F553B50315F8E6F0F7A6FAF43777F2}</span>)<br />
in order for features like lazy loading and change tracking to work.<br />
<br />
<a href="http://msdn.microsoft.com/en-us/library/dd468057.aspx">MSDN has a great article</a> that explains how to prepare a POCO to be transformed into a proxy. If you find that an object retrieved or attached to the context does not show itself in the DynamicProxies namespace (eg. MyNamespace.MyClass instead of ...DynamicProxies.MyClass), then you probably don't have a compliant class. You may have noticed this already when hovering your mouse cursor over an instance and seeing the runtime information on it.<br />
<br />
The distilled points are that your POCO must:
<ul>
<li>be public, not abstract and not sealed;</li>
<li>have a public or protected parameterless constructor;</li>
<li>have public overridable properties that are read/write if they are navigation properties;</li>
<li>and implement ICollection if they are a one-to-many navigation property.</li>
</ul>awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.com0tag:blogger.com,1999:blog-6363087137667886940.post-31926093598157795172011-10-16T02:25:00.000-07:002012-08-01T16:22:25.939-07:00Detecting cycles in decimal digits and lists.I was recently working on solving <a href="http://projecteuler.net/problem=64">problem 64 for project Euler</a> and one of the needed algorithms was a detection of cycles in a list. This routine works for detecting cycles in lists of cardinality greater than 2.<br />
<br />
<pre class="brush:python">def get_cycle_len(N):
i = k = 0
for j in range(2, len(N)):
if N[i] == N[j]:
if k == 0:
k = j
i = 1
else:
if i == k:
return k
i += 1
else:
i = 0
k = 0
return -1
</pre>awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.com0tag:blogger.com,1999:blog-6363087137667886940.post-73981496977115957462011-10-08T17:22:00.000-07:002012-08-01T16:24:28.064-07:00Entity Framework June 2011 CTP BugI recently installed the Entity Framework June 2011 CTP to preview the enum support but ran into a bug along the way that reported an error while processing the template.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg6eEg3jjARyLhyphenhyphenWG9x7RIEBqJZu6PkEteADltAS8WMhKHyhm_CeLcvcsoYruvcEpZzt0JjtLFvvj7BjAP9yJ7a6iBS2UrO96Uhlk8LPsU3Di5rDncarJ6TZbcUmpFRJRvPNpJXCNGpYkTq/s1600/entityfail.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="390" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg6eEg3jjARyLhyphenhyphenWG9x7RIEBqJZu6PkEteADltAS8WMhKHyhm_CeLcvcsoYruvcEpZzt0JjtLFvvj7BjAP9yJ7a6iBS2UrO96Uhlk8LPsU3Di5rDncarJ6TZbcUmpFRJRvPNpJXCNGpYkTq/s400/entityfail.png" width="400" /></a></div>
<br />
I encountered this error while trying to run an Entity based project using the "Database & Model First" paradigm. Uninstalling the preview removed the problem. Not sure why this error occured since the preview doesn't modify the <a href="http://msdn.microsoft.com/en-us/library/92t2ye13.aspx">assembly</a> it is complaining about not finding an entry point in.awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.com0tag:blogger.com,1999:blog-6363087137667886940.post-35386057156837316142011-10-05T15:00:00.000-07:002012-08-01T16:27:10.131-07:00EF 4.1 Code First: Many one-to-one properties for same class type on an object.I just started using the Entity Framework recently to employ persistence for an already established code base. Initially I was using the "Model & Database First" paradigm where a designer is used to layout the data model. Although, this ended up being tedious since my <a href="http://en.wikipedia.org/wiki/Plain_Old_CLR_Object">POCO</a> instance had to exactly match the <a href="http://msdn.microsoft.com/en-us/library/dd456853.aspx">POCO proxy</a> instance in my model designer view. After upgrading to EF 4.1, I realized the <a href="http://blogs.msdn.com/b/adonet/archive/2011/09/13/ef-code-first-fluent-api-with-vb-net.aspx">"Code First" paradigm</a> was the way to go since I had to adapt the persistence model to my already existing code.<br />
<br />
One of the earliest problems that I ran into was having more than one property on a class that referenced the same class type. This example was taken from <a href="http://weblogs.asp.net/manavi/archive/2011/03/28/associations-in-ef-4-1-code-first-part-2-complex-types.aspx">a great site</a> that explored the different EF properties.<br />
<span class="Apple-style-span" style="background-color: #f8f8f8; color: #2e2e2e; font-family: 'Courier New', Courier, mono; font-size: 12px;"></span><br />
<pre>public class Customer
{
public int Id { get; set; }
public int Age { get; set; }
public Address BillingAddress { get; set; }
public Address DeliveryAddress { get; set; }
}</pre>
<br />
Notice how both the <span class="Apple-style-span" style="background-color: #f8f8f8; color: #2e2e2e; font-family: monospace; font-size: 12px; white-space: pre;">BillingAddress </span>and <span class="Apple-style-span" style="background-color: #f8f8f8; color: #2e2e2e; font-family: monospace; font-size: 12px; white-space: pre;">DeliveryAddress </span>properties references the same class type. I kept trying to establish two 1-to-1 relationships on the class but this caused a problem with cascading deletes and foreign key constraints. I found <a href="http://weblogs.asp.net/manavi/archive/2011/05/01/associations-in-ef-4-1-code-first-part-5-one-to-one-foreign-key-associations.aspx">a solution</a> but it still required exceptional behavior on the part of my context and the database schema.<br />
<br />
The most elegant solution was to just treat both of the properties as <a href="http://weblogs.asp.net/manavi/archive/2010/12/11/entity-association-mapping-with-code-first-part-1-one-to-one-associations.aspx">complex types</a>. The only thing I had to do was remove the <span class="Apple-style-span" style="background-color: #f8f8f8; color: #2e2e2e; font-family: monospace; font-size: 12px; white-space: pre;">Id </span>property from the properties' class and the context took care of the rest. So instead of two tables in the database (one for <span class="Apple-style-span" style="font-family: monospace; font-size: 12px; white-space: pre;">Customer </span>and the other for <span class="Apple-style-span" style="font-family: monospace; font-size: 12px; white-space: pre;">Address</span>), there would be only one. Below is a layout of what the table would look like.<br />
<br />
<pre class="brush:vbnet" id="code-result" style="font-size: 12px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">Customer
{
Id
Age
BillingAddress_City
BillingAddress_Street
BillingAddress_Zipcode
DeliveryAddress_City
DeliveryAddress_Street
DeliveryAddress_Zipcode
}
</pre>
<br />
<br />awalsh128http://www.blogger.com/profile/05157295791336552159noreply@blogger.com0