The .NET framework, and especially version 2.0, has made life a lot easier for programmers. In that past it wasn’t especially easy to create multi-threaded programs, but now it’s really pretty simple.
I have a program that downloads data from a large number of web sites (currently about 120), and in many cases the program had to sit doing nothing while waiting to receive its files. If I could have more than one operation executing at a time – asynchronously – the program would finish much more quickly. Here’s how I did it. (All examples in Visual Basic.)
First of all, create a function or subroutine that will go out and do the work. Test this by calling it in the normal, single-threaded way. Don’t try to run it on multiple threads until you’re sure it will run correctly in just one.
Here’s a simplified version of the function definition that I created:
Private Shared Function GetData(ByVal DataItem As WebData) As WebResult
This function accepts an object that contains data about what to download (the ‘DataItem’) and returns that data (of type ‘WebResult’).
Once I had this working, I next created my delegate, which looks very similar to the function definition:
Private Delegate Function GetDataAsync(ByVal DataItem As WebData) As WebResult
As you can see, this looks very much like the function definition above. The signature of this delegate – the number and type of arguments – must be identical to the original subroutine. But in fact it’s not a function definition at all. Instead it’s more like a class definition – in the next step, we’ll create a variable of type GetDataAsync.
Now you’re ready to call your function. As I just mentioned, first you create a variable of type GetDataAsync.
Dim GetDataCaller As New GetDataAsync(AddressOf GetData)
So what are we doing here? We’re creating a new variable using the delegate we defined above, and we’re passing it the address of the function that we want to call. You can pass any function here as long as the signatures of the function and the delegate are the same. Many event handlers work this way.
Now let’s kick off the flotilla of function calls:
For Each DataItem In DataItemList
DataItem.ThreadHandle = GetDataCaller.BeginInvoke(DataItem, Nothing, Nothing)
End
So what’s going on here? We’re calling the BeginInvoke function of the class GetDataCaller (defined as a delegate by us, created as a class by the compiler). The first argument to BeginInvoke is a DataItem, the same as the underlying procedure GetData.
The other two parameters appear at the end of the argument list of every BeginInvoke call. The first is an AsyncCallback procedure. This argument represents the address of a function to be called when our GetData function is complete, to postprocess the results. As for the second parameter, according to Microsoft, “You can also pass an object containing information to be used by the callback method.” This object will normally be the delegate that was used to invoke the thread, so in this case we’d be passing GetDataCaller as the final parameter.
I’m not using a callback subroutine, so I am passing Nothing to these parameters, and since I haven’t fooled around with it I’m not going to embarrass myself by displaying my ignorance (at least, not any more than I already have).
The return value of the BeginInvoke function is an IAsyncResult object. This contains information about the thread invocation. For eample, you can poll the IAsyncResult’s IsCompleted member to see if the thread is done.
My code needs to stop at this point and wait until all of the worker threads are complete, so instead of using a callback, I’ll just loop through the list of objects and use EndInvoke.
For Each DataItem In DataItemList
WebResultItem = GetDataCaller.EndInvoke(DataItem.ThreadHandle)
' Code to process WebResultItem goes here.
End
Now we call GetDataCaller’s EndInvoke method, passing the IAsyncResult that was returned by the BeginInvoke method, and capturing the output of the GetData in a variable of type WebResult. If a particular thread is not completed when you call EndInvoke on it, the program will wait until it has completed, at which point execution will continue – but we don’t care, because we have to wait until all these threads are complete before we can proceed anyway.
Another thing to notice is that I don’t have to worry about how many threads are available. Normally this is 25 per processor, and I’m firing off over a hundred at once, but .NET doesn’t return an error. It simply queues up the requests until a thread is available to process them.
The results? When I ran the program using only a single thread, it completed in around 2:15, or 135 seconds. I then ran the multi-threaded version, and it completed in only 45 seconds – just a third as long. This is pretty significant, since a job that would have taken half an hour can now be completed in ten minutes.
So to recap:
1) Create and debug the procedure you want to run on multiple threads. This is the most important part!
2) Define a delegate with the same signature as your procedure.
3) Create an object and use the name of the delegate you just created as its type.
4) Call BeginInvoke to start execution on a new worker thread.
5) Call EndInvoke to complete execution.
There’s more to this, including setting up callback procedures, waiting for a specified time (instead of forever), etc., but this should give you a start.
Handling date and time is tough, and the .NET Framework has more than a few quirks in this area. I have just run into one of the more painful ones, and I think I have found a simple way around it. (Note: this article refers to the .NET Framework 2.0. Other versions may give different results.)
I have a program that retrieves and stores RSS or Atom files from a number of blogs (the output is at CrimeSpot.net). Each provider, such as Typepad, WordPress, or Blogger, has its own quirks, and many of these revolve around date and time. As a result I have had to spend a lot of time writing a routine that calculates the time a particular post was published in Universal Coordinated Time (UTC – acronym must be French) and in local time. Local time is for display, while UTC is used to calculate how long ago posts were published and in what order.
I have recently been testing some modifications to this program that use web services to store the data on a remote server instead of in a local database. Much to my surprise the dates displayed below the posts were off by an hour. WTF?
So I dug a little deeper. As part of the program I take a Visual Basic class I have created to hold and process the posts and serialize it into XML. I then load the resulting XML into a dataset. So I dumped the XML into a file, and discovered that .NET was taking my carefully calculated UTC dates and times and – silent but deadly – converting them to local time!
Let’s take an example: If I publish a post here in Fort Worth (UTC offset: -0600) at 7:15pm, the UTC timestamp would be 1:15am the following morning. That’s the current time in Greenwich, England, upon which UTC is based. When I serialize this value, .NET just takes 1:15am and sticks -0600 on the end of it.
Since the web server I’m saving this too is in a time zone an hour away – voila – the resulting time is an hour off.
The solution, of course, is to change the type to a correctly computed local time before serialization, and I guess that’s what I’ll have to do. But in .NET 2.0 the DateTime structure includes a “Kind” field, which is getting properly set to UTC. Would it have killed them to check this before mashing my data?
Yeah, I guess it would have.
Update: Got it. It was a little more work that I had anticipated, but I got it working. What I ended up having to do was add another Web Services function that returned the offset from UTC on the server. Here’s the code for that:
return TimeZone.CurrentTimeZone.GetUtcOffset(DateTime.Now).ToString
This returns a TimeSpan object containing the offset from UTC. The .ToString function on the end is required because, despite the fact that the object is serializable, TimeSpan.Duration does not in fact serialize.
Once you have parsed the remote UTC offset, get the offset for the zone on the local computer using the same syntax (without .ToString). Then just do LocalOffset – RemoteOffset, which will give you yet another time span, and add that value to all of your times.
I didn’t go into any detail as to why I need to use UTC. My website actually uses three computers: a workstation to gather and publish data, a Web Services server to store this data in a database, and a web server to display it. All of these servers are in different time zones, so I need a value that will be consistent across all sites.
I think I see one reason why the DateTime structure is serialized as a local value, instead of UTC. While fooling around with reading and writing dates as strings of various formats, I discovered that a DataSet would recognize SortableDateTimeFormat as a valid date, but would not recognize UniversalSortableDateTimeFormat (more information at the DateTimeFormatInfo class). So based on how deserializing strings to dates currently works, it may be impossible to pass these values as Universal time.
When I posted a while back about importing XML documents as objects using serialization, one of the purposes I wanted to put this to was creating a list of objects that could be selected by a unique key value. For example, if you had a list of books, you could pick out the one you wanted by specifiying its ISBN.
In .NET, this kind of lookup is handled by a type of collection called a dictionary. You give it a key and a value, and Presto! You can sort it, look up specific items, etc.
One problem: dictionaries don’t support serialization.
I spent a while banging my head against this brick wall before I found a convenient way around it: a .NET class called KeyedCollection. KeyedCollection derives from the iList interface instead of iDictionary, and is therefore serializable, but also allows you to specify a key. This class is an abstract type, so you must derive your own custom class, but as we’ll see in a second, that’s a snap.
KeyedCollection must be inherited because it has to be a list of a specific object type. Then, instead of specifying your own key value for each item in the list, you indicate which of the object’s fields you want to be used as the key. Here is a class I created this morning:
Public Class SourceTypeList
Inherits System.Collections.ObjectModel.KeyedCollection(Of Long, SourceType)
Protected Overrides Function GetKeyForItem(ByVal item As SourceType) As Long
Return item.SourceTypeID
End Function
Sub New()
MyBase.New()
End Sub
End Class
What does this code do? It tells Visual Basic to create a new type collection derived from KeyedCollection, where the key is a long integer and the value is an object of type SourceType (SourceType represents information about a type of syndication file, such as Atom 0.3 or RSS 2.0). You then overried the function GetKeyForItem and tell VB which field you want to use as the key.
And it works beautifully. I had tested deserialization using a generic List(Of T) and I was able to swap out the code in maybe 5 minutes.
So if you need a keyed list and your objects include unique values, you can use KeyedCollection and get the benefits of serialization as well.
When I first started writing the back-end code for CrimeSpot.net, I was confronted with a dilemma: I had to import two different versions of Atom and three of RSS, all of which had slightly different formats. I had two options. I could create a separate routine within the program to import each of these formats, or I could transform each of them to a single format using XSL templates.
I decided to use templates and import a single, common XML format. Originally, I chose to do this because it made it very simple to separate program and data. Combining the two is one of my biggest pet peeves. By doing it this way, I could just create an entry in the database for each input type and include an XSL file to change it to the common form.
Subsequent events have shown this to be a wise decision.
Why? Well, I have been fooling around with one of the features of the .NET framework – the ability to take objects within programs and “serialize” them to XML files. Normally this is used so that you can retain the object’s value between instances of the program. If you need that object back at a later time, you can “deserialize” that XML file back into an object.
But when you’re deserializing, there’s no reason that the XML must come from an object that was previously serialized. You can use any XML file that matches the object’s format. With a little work, you can even import a collection of objects.
This helps me tremendously because the objects I will be importing need some processing before they can be saved. In particular, I need to inspect a date/time field and capture the offset from Universal time (UTC, aka GMT). This information is lost when the date is created as a date, so I need to get it when the date is still just text.
And .NET supports saving XML directly into a database (via the DataSet object), so when I’m done, I can just serialize the object and save the resulting XML. This approach may have performance issues, but it’s simple and elegant, and I can always buy a faster computer.
UPDATE: Here’s a little source code to show how this works. This code will read XML from a DataSet and import it into a collection of objects. First, the object classes:
Public Class SourceTypes
Private TypeList As New SourceTypeList
<System.Xml.Serialization.XmlElementAttribute("SourceType", Form:=System.Xml.Schema.XmlSchemaForm.Unqualified)> _
Public Property Types() As SourceTypeList
Get
Return Me.TypeList
End Get
Set(ByVal TypeList As SourceTypeList)
Me.TypeList = TypeList
End Set
End Property
Public Sub New()
End Sub
End Class
This class is a serialization wrapper. It exists only to provide a convenient XML representation of the collection of SourceType objects. For information on the SourceTypeList class, please see this post. Incidentally, the XmlElementAttribute causes the list not to have an XML element of its own; instead it presents the list items directly below the root element.
Here is the SourceType class:
Public Class SourceType
Public SourceTypeID As New Long
Public Name As String
Public Description As String
Public ItemField As String
Public UpdateField As String
Public UpdateCheckRX As String
Public UpdateSelectRX As String
Public UpdateReplaceRX As String
Private TemplateString As String
<System.Xml.Serialization.XmlIgnore()> _
Public TemplateTransform As New XslCompiledTransform
.
.
.
End Class
I have simplified the class a bit. It had a property that accepted an XSL string and used it to initialize the TemplateTransform field. I can’t emphasize enough how helpful properties are when using serialization. It makes it easy to do some processing without having to explicity invoke any methods. Here, the XmlIgnore attribute prevents that field from participating in serialization.
Now here’s the guts of the program, where we instanciate the class from the DataSet (which we will assume has already been filled):
Dim TypeList As SourceTypes
Dim TypeSerializer As New XmlSerializer(GetType(SourceTypes))
Dim TypeReader As StringReader
TypeReader = New StringReader(TypeSet.GetXml)
TypeList = CType(TypeSerializer.Deserialize(TypeReader), SourceTypes)
It may be more efficient to use DataSet.WriteXML and an XML reader here, I haven’t tested it. The result is an object that contains a collection of SourceType objects.
As always, please drop a note in the contents if this helps.
Today’s .NET tip is a quickie (previous entries here and here): When you are using a database to fill the rows of a data bound drop-down list control, you will inevitably run into an error when the bound field is null or an empty string. You don’t want to add a null row to the list source, so what do you do?
This one’s simple – there’s a property of drop-down lists called AppendDataBoundItems. Set this to true, and you can provide one or more static entries, with rows from the list data source appened beneath them. Sample code:
<asp:DropDownList ID="BoundList"
runat="server"
DataSourceID="ListDataSource"
DataTextField="EntryDescription"
DataValueField="EntryID"
SelectedValue='<%# Bind("BoundItemID") %>'
AppendDataBoundItems="true">
<asp:ListItem Value=""></asp:ListItem>
</asp:DropDownList>
So, what we have here is a drop-down list called “BoundList”. This list gets its rows from a datasource called “ListDataSource”, displaying the value of “EntryDescription” while binding “EntryID” to the underlying field – which is called “BoundItemID”.
We then provide an empty list item to handle cases where no entry has been selected.
As always, please drop a note in the comments if this code helps you out.