Andrew's Adventures in Technology: December 2011

WPF & Multithreading

In designing applications, it may be advantageous to run data processing methods on a separate thread from the UI. This allows greater responsiveness in the application and takes advantage of concurrent computation.

The cornerstone interface to any binding in WPF is INotifyPropertyChanged, which is implemented on a data structure's different properties. Although this can be the source of deadlocks if the data structure is being processed on a separate thread where locks are explicitly implemented by the developer.

A deadlock could occur if some UI event was being triggered on a data structure (eg. expanding a view, changing a text box entry) while the data thread was processing it with an acquired lock. Internally, the PropertyChanged event will call a Dispatcher.Invoke on the UI. Although the UI could be waiting for the lock to release on that same property that the data thread already has, creating a cyclic dependency or deadlock.

The call stack for the UI thread. Note the wait call.

The call stack for the data thread.

Deadlock Scenario

As an example, suppose you have your standard WPF application UI thread and a separate data thread. The data thread polls a time stamp from a remote database and updates a class property called DataObject.Timestamp, which raises the PropertyChanged event. The property is also bound to a text box in our WPF application window, which is made visible in the window based on the user's preference (ie. collapsing and expanding a view).

In our deadlock example, the data thread puts a lock on the DataObject.Timestamp property while it updates it and other data structures. While this is happening, the view is expanded, causing the text box to start binding to the DataObject.Timestamp property and wait for an acquire of the lock that the data thread is holding. The data thread then sets the DataObject.Timestamp property inside the critical region code and the PropertyChanged event is raised. This causes the new time stamp value to be marshaled back onto the Dispatcher with a blocking/synchronous Dispatcher.Invoke call. Although the data thread now deadlocks in the critical region, waiting for the UI Dispatcher.Invoke to return, which is waiting on the data thread to release the lock.

Solution

To prevent the data thread's deadlock ~~critical region from blocking on the Dispatcher.Invoke~~, override the OnPropertyChanged method to marshal the value asynchronously to the dispatcher.

protected override void OnPropertyChanged(string propertyName)
{
 Application.Current.Dispatcher.BeginInvoke(new Action(() => { base.OnPropertyChanged(propertyName); }));
}

This will ensure that the critical region doesn't block and the property changes are still serviced by the dispatcher.

While Java may be a higher level language that makes code writeability easier, conventions must be broken when developing on Android to reap performance gains. This is especially true for games where logic and rendering must be done on a tight loop.

The Android developer page provides helpful tips for optimizing your Java code. Below are the optimizations that I have found to improve performance.

Front load memory allocation at startup instead of dynamically.

We saw the largest performance gain here. The GC can kill performance in a CPU intensive application or game. If you pool your instances in an array instead of dynamically allocating it, you can save the GC calls from happening. This will result in a slightly larger memory footprint if you tweak it right.

Below is code that provides a fixed-size generic list implementation; the internal array never grows/shrinks. The executePendingRemoves routine is needed since object removal must sometimes be put into a critical region with a synchronized block. You may want to queue up removals but still make them available to other threads up until the moment they are removed in bulk.

public class List<T> {

 private int mCount;
 private T mItems[];
 private int mRemoveCount;
 private int[] mRemoveIndices; 
 
 @SuppressWarnings("unchecked")
 public List(int capacity) {
  mItems = (T[]) new Object[capacity];
  mRemoveIndices = new int[capacity];
  mCount = 0;
 }
 
 public void add(T item){
  mItems[mCount] = item;
  mCount++;
 }
 
 public void clear(){
  for(int index=0; index < mCount; index++)
  {
   mItems[index] = null;
  }
  mCount = 0;
 }
 
 public void executePendingRemoves() {
  for (int i=0;i < mRemoveCount; i++) {
   mItems[mRemoveIndices[i]] = null;
  }
  int newCount=0;
  for (int i=0;i < mCount; i++) {
   if (mItems[i] != null) {
    mItems[newCount] = mItems[i];
    newCount++;
   }
  }
  mCount = newCount;
  mRemoveCount = 0;
 }
 
 public T get(int index) { return mItems[index];}
 public int getCount() { return mCount;}
 public int getIndex(T item) {
  for (int i=0; i < mCount; i++) {
   if (mItems[i] == item) return i;
  }
  return -1;
 }
 public int getPendingRemoveCount() { return mRemoveCount; }
 public int[] getPendingRemoves() { return mRemoveIndices; }
 public Object[] getItems() { return mItems;}
 
 public void pendingRemoveAt(int index) {
  for (int i=0; i< mRemoveCount; i++) {
   if (index == mRemoveIndices[i])
    return;
   else if  (index < mRemoveIndices[i]) {
    int swapIndex = mRemoveIndices[i];
    mRemoveIndices[i] = index;
    index = swapIndex;
   }
  }
  mRemoveIndices[mRemoveCount] = index;
  mRemoveCount++;
 }
 
 public void removeAt(int index) {
  if (index < mCount) {
            for (int i = index; i < mCount; i++) {
                if (i + 1 < mItems.length && i + 1 < mCount) {
                 mItems[i] = mItems[i + 1];
                } else {
                 mItems[i]  = null;
                }
            }
            mCount--;
        }
 }
 
 public T removeLast() {
  mCount--;
  
  final T item = mItems[mCount];
  mItems[mCount] = null;
  return item;
 }
 
}

Lastly is the pooling object that can be inherited by any application level instance (eg. FlyingWizardsPool extends ObjectPool). When you want to instantiate a flying wizard object, just type in FlyingWizard wizard = wizardPool.checkOut();.

public abstract class ObjectPool<T> {

 private int mCapacity;
 private List<T> mPool;
 
 @SuppressWarnings({ "unchecked", "rawtypes" })
 public ObjectPool(int capacity) {
  mPool = new List(capacity);
  mCapacity = capacity;
  this.fill();
 }
 
 public void checkIn(T poolItem) {
  mPool.add(poolItem);
 }
 
 public T checkOut() {
  return mPool.removeLast();
 }
 
 public abstract void fill();
 
 public int getCount() {
  return mPool.getCount();
 }
 
 public int getCapacity() {
  return mCapacity;
 }

}

Below is an example implementation of the ObjectPool abstract class. Note how the fill routine is implemented. Also, front loading memory like this means that you will need an initialize on your instances so that the class fields can be put back to a "just instantiated" state. Normally we would use a constructor but to keep the allocation static, an initialize call is necessary.

public static class FlyingWizardPool extends ObjectPool {
 public FlyingWizardPool(int capacity) {
  super(capacity);
 }

 public FlyingWizard checkOut(float x, float y, float destinationX, float destinationY) {
  FlyingWizard wizard = this.checkOut();
  wizard.initialize(x, y);
  return wizard;
 }
  
 @Override
 public void fill() {
  for (int i=0; i < this.getCapacity(); i++) {
   this.checkIn(new FlyingWizard());
  }
 }
}

Choose larger block routines instead of calling out to helper subroutines.
Since the Dalvik is a register based architecture, every time a routine is called from another, the register values must be stored to memory before the call and loaded back in after the routine has returned. Avoiding helper routine inside commonly called routines will reduce this load/store overhead.
Remove accessors and allow direct access to class fields.
Accessors are usually good form and observe the ADT. Although, to avoid the overhead involved in a routine call and additional allocation of memory for the copied out value, it is best to allow direct access to the class field. This makes it more difficult to protect the conditions for the field but is a worthy sacrifice if it is called often.
Make routines static if they don't access class fields.
The Android team recommends using this for an approximate 15-20% performance gain. This means less work for the Dalvik during run-time optimization and the routine can be inlined for faster execution.
Use constants instead of enums.
This is more of a negligible optimization; it was delisted from Android's performance page. Instead of having an enum, you can keep the enumeration as an integer constant.

It is important to follow the ADT for good code writeability/readability and bug minimization. Use these optimizations only on code that really needs it. This means it should apply to routines and fields that are called frequently by your application.

Andrew's Adventures in Technology

Wednesday, December 21, 2011

WPF and data multithreading deadlocks.

WPF & Multithreading

Deadlock Scenario

Solution

Friday, December 16, 2011

Tips for Java performance on Android.