Charlotte's Semantic Web - CalaisDotNet v2.0

Well after much teasing its about time I posted something concrete about the work I've been doing using OpenCalais. As metioned previously I was working on my own .NET plugin / Helper class but, as it turns out, a ex-collegue of mine (who is now hiding away in sunny Australia) Chris Fulstow pipped me to the post and released his .NET Open Calais project on CodePlex. Rather than have two seperate projects doing exactly the same thing I began to merge my work with his and the result can be found here: CalaisDotNet

New Version

While busy merging code, OpenCalais released version 2 of their web service, adding more entities and relationships and now adding 'Simple' and 'Microformats' output types. I was attending a museum data mashup day as part of the the UK Musuems on the Web Conference 2008 last week and wanted to run some museum data into OpenCalais so branched the code and began re-writing it to support the new output types. After a busy weekend finishing off the code and adding relationship (Events/Facts) support I am pleased to say this new version is almost ready for release. I wanted to show some example of how easy it is to use it to process any type of data you submit to it and to use the power of LINQ to then manipulate and query the results set you get back.

Requirements

CalaisDotNet is written using the wonderful-ness of C# 3.0 and relies on having the .NET Framework v3.5 being installed. Remember .NET v3.5 is essentially just a new set of libraries - the CLR is still the same 2.0 version .. which means if you're running .NET 2.0 atm then it isnt not too much of a big deal to install v3.5 as all your old stuff will still work exactly the same way.

You will also need an OpenCalais API Key which can be freely requested after you register at the OpenCalais website. 

Here we go ..

CalaisDotNet can be broken down into two parts. The first one is the call to the web service, handled by the CalaisDotNet object, and second is the processed response data contained within one of three Calais*Document types that represent the three different types of output from the OpenCalais web service. Calling each is trivial.  

var calais = new CalaisDotNet(_apiKey, _content);
var document = calais.Call<CalaisRdfDocument>();

 
.. where:
_apiKey = (string) Your 24 digit API key
_content =(string) Your content to be processed. OpenCalais can accept input in 3 different formats - Plain Text, HTML and XML. CalaisDotNet has support for all three. CalaisDotNet will take a guess at the format of you text or you can specify the input type with an extra parameter in the constructor.
 

var calais = new CalaisDotNet(_apiKey, _content, CalaisInputFormat.Text);
var document = calais.Call<CalaisRdfDocument>();

The document returned represents the processed output from the web service and gives you the ability to access various collections of data such as Entities or Relationships

Simple Format

Documentation: HERE

This is a new format introduced in the latest version which has a reduced set of properties and entities, this format is ideal for doing things such as tag clouds as it only exposes simple list of basic entities, their frequency in the document and the value of each. Also its document description information is much reduced only having five properties.

You can still build up a LINQ query to filter or order the data in anyway that you choose.

Use the CalaisSimpleEntityType enum to filter by entity type.

an example query would look something like this, filtering for results where the entity type is 'Country'

var calais = new CalaisDotNet(_apiKey, _content, CalaisInputFormat.Text);
var document = calais.Call<CalaisSimpleDocument>();

var results = from item in document.Entities
                 where item.Type == CalaisSimpleEntityType.Country
                 select item;

foreach (var result in results)
{
   
Console.WriteLine(result.Value);
}

Microformats

Documentation: HERE 

This is a very basic implementation as, franky, I dont know how we can add any value to it as a lot of the work is done by the web service to format the data into HCalendars and hCards. Most of my time was spent making the RDF stuff work so suggestions are welcome on how best to process this into something useful :)

To grab the unprocessed output use the RawOutput property available on all the Calais*Documents to view see the original response.

var calais = new CalaisDotNet(_apiKey, _content, CalaisInputFormat.Text);
var document = calais.Call<CalaisMicroFormatsDocument>();

Console.WriteLine(document.RawOutput);

RDF Magic

The meat of the semantic data is (of course) contained within the CalaisRdfDocument class. It has a much richer set of document description metadata than the 'Simple' format.

The document also contains an IEnumerable list of Entities and an IEnumerable list of Relationships (Events/Facts). These can, for example, then be filtered by entity type (CalaisRdfEntityType) or relationship type (CalaisRdfRelationshipType).

Each entity/relationship also contains a list of all instances of that entity/relationship in the submitted document. Some examples:

Filtering where CalaisRdfEntityType is 'Company' and printing their location offsets.

var calais = new CalaisDotNet(_apiKey, _content);
var document = calais.Call<CalaisRdfDocument>(); var results = from item in document.Entities
                
where item.EntityType == CalaisRdfEntityType.Company
                
select item;

foreach (var result in results)
{
   
Console.WriteLine(result);

   
foreach (var instance in result.Instances)
    {
       
Console.WriteLine(
               
" - Found at offset: " +
               
instance.Offset + "(" +
               
instance.Length + " chars)"
               
);
    }
}

Returns only 'PersonPolitical' relationships  

var results = from item in document.Relationships
                
where item.RelationshipType == CalaisRdfRelationshipType.PersonPolitical
                
select item;

foreach (var result in results)
{
    
Console.WriteLine(result);
   
foreach (var instance in result.Instances)
    {
       
Console.WriteLine(
               
" - Found at offset: " +
               
instance.Offset + "(" +
               
instance.Length + " chars)"
               
);
    }
}

Slightly more complicated ..

Filters results by country and then looks up any relationships that are related to that country.

var calais = new CalaisDotNet(_apiKey, _content);
var document = calais.Call<CalaisRdfDocument>();

var results = from item in document.Entities
                 
where item.EntityType == CalaisRdfEntityType.Country
                
select item;

foreach (var result in results)
{
   
Console.WriteLine(result);

   
foreach (var instance in result.Instances)
    {
       
Console.WriteLine(
                           
" - Found at offset: " +
                           
instance.Offset + "(" +
                           
instance.Length + " chars)"
                           
);
    }
     

    var
rels = from item in document.Relationships
                 
where item.RelationshipDetails.Values.Contains(result.Value)
                 
select item;

    foreach (var rel in rels)   
    {
       
Console.WriteLine(" - Relationship: " + rel);
    }
}

Download

Currently this version is still in a branch so you will have to compile using the solution in the 'CalaisDotNet-NewFeatures_200805' folder .. you can download the release from the source tab of the Codeplex project site (HERE). Im hoping to make this a release soon once its been QA'd and also when I work out how to do it hehe :P

TO DO

  • MicroFormats - As mentioned earlier we need to look at the Microformats output and work out how to present it usefully.
  • RDFa - It would be reallty nice to be able to output the sumbitted document as RDFa .. we have the entities and we know where they are in the text so this shouldnt be too hard a jump .. personally I just need to understand RDFa better first.

 


Posted by: [mRg]
Posted on: 6/24/2008 at 2:13 PM
Tags: , , , , ,
Categories: Guides
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (0) | Post RSSRSS comment feed

Grand Func Railroad - Functional .. erm. functions


A while ago I read an an excellent post by Andrew Matthews (on his excellent Wandering Glitch blog) about employing functional programming techniques in C# that are enabled by the new features added in C#3.0. Now, I did wonder about posting about this as it contains the same subject matter as the information in Andrews blog (and Andrew describes the hows and whys a lot better than me), but I wanted to post for two seperate reasons.

  1. I use these two functions (below) ~all~ the time now. They are fantastic and (thanks to Andrew) have opened up my code up to this powerful programming technique and wanted to share them here incase they help anyone else.
  2. While I really liked Andrews examples it took me a while to grok them completely thanks to one problem that plagues me as a person bug-bear .. one letter variables as arguments.

I hate, hate, hate one letter variables when used as arguments, I know the problem here is a personal one .. mainly that I am not (or have ever been) a mathematician I didnt do A-Level maths, I was a graphics, pixel art, 3d artist kind of guy who found his way into programming by accident and while now I would never consider myself anything other than a programmer, math-type syntax makes my brain run in the opposite direction .. i.e the original On (which becomes my "Apply"):

Func<T, T> On<T>(this Func<T, T> f, Func<T, T> g)
{
   
return t => g(f(t));
}

 

In learning about these functional techniques (also including my learning with F# as well) I come across time and time again peoples examples that are clearly for the mathematically minded (ie not me hehe!) so I wanted to try and re-present these two functions in the syntax that finally got me to understand them.

I doing this I have to strongly emphasise that I am not "having a go" at anyone, especially not Andrew, as i wouldnt be here if people didnt post such great articles showing new ways of doing stuff, this simply exists to provide a level of clarification for the thickies :D Enough of my jibber-jabber ..

ApplyToSequence

This extension function takes a function as the argument and then applies that function to every element of the IEnumerable list. While being simple its is a very powerful tool, it already exists in one degree in the BCL, if you create a List<T> you can use the .ForEach() method but this function allows you to perform an action on ~anything~ IEnumberable (which makes it very handy ! Although i dont know why this isnt a standard method for IEnumerable already). An example follows these descriptions.

static IEnumerable<TResult> ApplyToSequence<T, TResult>(this IEnumerable<T> sourceSequence, Func<T, TResult> functionToApply)
{
   
foreach (var element in sourceSequence)
    {
        
yield return functionToApply(element);
    }
}

Apply

This extension method take a function as an argument and then applies that function to the original one. The great thing about this is it enable you to "chain" functions togther to make concise, powerful code.

static Func<T, T> Apply<T>(this Func<T, T> sourceFunction, Func<T, T> functionToApply)
{
   
return t => functionToApply(sourceFunction(t));
}

Examples

Here we go .. putting these two simple functions together means we can start doing quite neat things.

    // Two functions one which takes an int and adds 1 to it
    var addOne = ((Func<int, int>)(a => a + 1));

   
// .. and another that take an int and subtracts
   
var subOne = ((Func<int, int>)(a => a - 1));

   
// Two IEnumerables (one int and one string)
   
IEnumerable<int> test = new [] { 22, 44, 553, 345, 23, 32 };
   
IEnumerable<string> test2 = new [] { "<head>", "</head>" };

    // Print originals ..
   
foreach (var i in test)
    {
       
Console.WriteLine(i);
    }

    Console.WriteLine("-----------------------");

   
// Add 1 to each value .. using ApplyToSequence to apply the addOne
    // function to each element.
   
test = test.ApplyToSequence(addOne);

    foreach (var i in test)
    {

       
Console.WriteLine(i);
    }

   
Console.WriteLine("-----------------------");

    // By chaining the functions together we can add 3 then
    // subtract 1 from each element .. in one line too ..
   
test = test.ApplyToSequence(addOne.Apply(addOne).Apply(addOne).Apply(subOne));

   
foreach (var i in test)
    {
       
Console.WriteLine(i);
    }

    Console.WriteLine("-----------------------");

    foreach (var i in test2)
    {
       
Console.WriteLine(i);
    }

    Console.WriteLine("-----------------------");

    // Using ApplyToSequence we can also apply another function to
    //
 a string to do useful things like escaping characters
   
test2 = test2.ApplyToSequence(i => EscapeString(i));
   
   
foreach (var i in test2)

    {
       
Console.WriteLine(i);
   
}

   
Console.WriteLine("-----------------------");

The power of these should (hopefully) speak for themselves and I found myself using these a hell of a lot in recent code. The how and why these functions work or are able to work is described brilliantly in Andrews original article, I hope my lazy renaming simply helps shed some light on these for the less mathematically minded out there :)

 


Posted by: [mRg]
Posted on: 6/4/2008 at 10:55 AM
Tags: ,
Categories: Guides
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (2) | Post RSSRSS comment feed

Tonight We Raid Calais - OpenCalais Tools .. Coming Soon

I have been quiet of late as I have been working on a Open Calais parser tool to enable people to work effectively with open calais output - the output is fairly complex RDF and my little tool should allow quick extraction of terms and values so you can plug open calais into your .NET apps etc. Its been a sharp learning curve in terms of how RDF works (and how to parse it)  but the results now are working rather nicely.

I felt some of the other taggers / parsers let a lot great data slip by as they were just concentrating on the tags (which is fine and is probably what you want) but there can be a lot of rich relationships stored in the documents (such is the joy of RDF) and I wanted to enable people to get at that data if they wanted (which is why it took so bloody long hehe)

Quick sneak preview as you can see .. all entities resolve to real .NET objects populated with all the data and values.

It is on the sanity check stage and I will post some info about it very soon (I am tempted to write a live writer plugin but I think thats me getting disctracted and I just need to get the core thing finalised).

This may all be pointless work and Ive spent too much time getting it to extract data no one wants but it turned into a personal challenge so I had to finish it hehe :P

 


Posted by: [mRg]
Posted on: 3/31/2008 at 11:58 AM
Tags: ,
Categories: General
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (1) | Post RSSRSS comment feed

Ladies wot Launchball - Launchball wins big at SXSW

Well I nearly fell out my seat when I heard that Launchball had won best game at SXSW and was even more suprised when I scrolled down and found it had won 'Best In Show' (almost like Crufts but with more gadgets). I was smiling all the way home knowing that something I had been so directly involved with makes so many people pleased.

As I said before the main kudos has to go to the amazing hard work of Henry @ Preloaded for an outstanding job creating the game engine. My work involved plumbing the underbelly, creating the web services that communicate with the flash, the Sitecore CMS and the underlying databases so to give the science musuem the freedom to create the type of system they wanted for the users.

I had a quick poke in the database and current data is as follows:

91501 Registered Users
57927 Custom Levels have been created so far

.. eek! Poor server :)


Posted by: [mRg]
Posted on: 3/11/2008 at 4:22 PM
Tags: , , ,
Categories: General
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (0) | Post RSSRSS comment feed

No Country for Old Browsers - IE8 Beta1

Well.. here I am .. running in IE8 , freshly released, along with a host of other goodies, from MIX08 happening right now somewhere in Las Vegas. Ive never really got on with Firefox .. I know, I know  .. shoot me .. but I find the temptation to install all those plugins a bit too much and soon poor firefox is crawling to a halt. So I was quite excited to get my hands on IE8.

Developer Tools 

At first I thought they had just included the regular developer toolbar into the main program but on further investigation the developer tools they've added have some very powerful features and should make many developers very happy (me included!) 

You will now find a lot of the debugging and CSS analysis features found in the new Visual Studio 2008 sitting in the 'Developer Tools' console.

Open it up and you find you can now set breakpoints in your javascript code the step through line by line

You can create watches, see autos and effect variables directly with the immediate console. All great stuff ! Also locating problems in CSS got a whole bunch easier (now I dont need have to switch back to Firefox just to use FireBug hehe)

Web Slices

One thing I quite like are the addition of Web Slices I just wanted to quickly note how to get them up and running :) As afr as I can see they are a way to wrap your normal feed in some custom HTML so it looks like it came from your site (so mine would have bad spelling and lots of transformers icons everywhere)

Its essentially creating a small piece of HTML within a page with the correct classes applied. IE8 then picks up these and allows you to subscribe to the 'slice' (jazzed up RSS)
(The important classes are in bold)  

<div id="myblogupdates" class="hslice" >
 <div style="display:none;">
 <a href="#" class="entry-title">Ultramagnus Blog Updates</a> ~
 <a href="#" class="entry-content">&nbsp;</a>
 <a href="http://www.ultramagnus.org/syndication.axd" class="feedurl" rel="feedurl">&nbsp;</a>
 </div>
</div>

 

If all is working, when you view the page you will see a new little swirly purple icon (where your RSS feeds normally appear) with the name of your WebSlice in it...

... clicking on the link will bring up a dialog asking if you want to subscribe to this webslice ..

.. once done the 'slice' will appear as a link in your toolbar.

You do need to tweak the code in your RSS feed to include a new namespace ..

<rss version="2.0"xmlns:mon="http://www.microsoft.com/schemas/rss/monitoring/2007">

 .. it will then use the HTML you defined originally to display the slice in the browser drop-down .. so you can now add a bit of personalisation your RSS feed :)

 


Posted by: [mRg]
Posted on: 3/5/2008 at 10:08 PM
Tags: ,
Categories: Guides
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (0) | Post RSSRSS comment feed

My Dina with Andre - Lurve those fonts ..

While I get stick in the office for my Visual Studio theme (Yes, I like a dark background, I'm old, my eyes hurt) one thing I must recommend is the Dina monospace font .. I started using it mainly for the output windows (black with a green font looks lovely!) but now its replaced Consolas as my IDE font of choice. I often remote desktop into work and Consolas, while being very nice, only looks very nice when using ClearType .. without ? .. freaking awful! I switched over to Dinas for my main font on a whim, loved it and havent looked back. Im sure another font will trump it one day but for now Ill stop rambling !

Dina Programming Font - by Jørgen Ibsen.

FYI - All my code snippets here use this font so if you want to see them how I intended please go, grab and install this right away (hey its only 1 file !) :)

( Some folks like Liberation which is a nice font too and worth checking out if your looking for some new fontage :P )


Posted by: [mRg]
Posted on: 2/7/2008 at 8:58 PM
Tags: ,
Categories: General
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (0) | Post RSSRSS comment feed

Mass Code Auto - Team City

Over the past week or so I have been trialing Team City from JetBrains (makers of Resharper et al). There has been a lot of buzz around this product and I wanted to include all the bits I had found and my thoughts on it as a product.

First a bit of background, as well as our development team we also have an application support team. Its imperative that they can get access to the latest, working builds of all our companies code. After our move to TFS we setup a series of team builds that people could run to produce DLLs / websites etc. on demand. This worked to a certain extent but wasnt perfect. A while ago we changed to running CI builds for our major projects using Cruise Control .NET and havent looked back.

CC.NET is a fantastic tool, we were able to get all the things we want, NUnit tests, code coverage (NCover), documentation generation (Sandcastle), code metrics etc but the bus factor is quite high (ie if the guy who maintains it gets run over by a bus then the rest of the team is screwed cause no one else knows how it works). The mass of XML and supporting files (BATs or MSBuild tasks) that had to be edited to get a new project up and running was quite frightening and most of the time you are copying the same config code various times. I was starting to write a large document on how to get CI up and running for new projects when I realised it was really quite stupidly complicated and began hearing good things about TeamCity.

The test I set for myself with TeamCity was  - Could I get all the things I had running with CC.NET running in TeamCity in less time and with less fiddling with config files ? The answer was suprisingly .. Yes !

The installation was very straight forward and I was up and running with zero issues. I chose a simple project to start off with, I created a source control root to monitor in TFS. The integration with TFS was seamless and posed no problems.

Next I began to setup the build. This takes the form of a 7 stage wizard ... nothing frightening here .. just common sense settings such as when you want a build to be triggered (scheduling is very nice), dependencies and runners (the things that will do the actual build). It comes with a mass of runners for Java and .NET.

For this simple setup I used the 'SLN2008' runner to build from a VS2008 Solution File in the root of my project and set it to build when i checked something in. 

NCover

I used the Ncover MSBuild tasks that come with the program .. pretty standard stuff .. one for a summary and one for a full report. Only one thing to note is that the NCover help file doesnt list the correct enum for genearting a full HTML report out so I had to get busy with Reflector to find that one ! (Its "FullCoverageReport" btw!)

<NCover ToolPath="$(NCoverPath)"
    CommandLineExe="$(NUnitPath)\nunit-console.exe" 
    CommandLineArgs="$(TestDll)" 
    CoverageFile="@(CoverageFile)" />

<!-- Summary Page -->
<NCoverExplorer ToolPath="$(NCoverPath)" 
    ProjectName="$(ProjectName)" 
    OutputDir="..\Docs" 
    CoverageFiles="@(CoverageFile)" 
    SatisfactoryCoverage="80" 
    ReportType="ModuleClassSummary" 
    HtmlReportName="CoverageSummary.html" 
    Exclusions="$(CoverageExclusions)" />

<!-- Full HTML Report -->
<NCoverExplorer ToolPath="$(NCoverPath)" 
    ProjectName="$(ProjectName)" 
    OutputDir="..\Docs\Coverage" 
    CoverageFiles="@(CoverageFile)" 
    SatisfactoryCoverage="80" 
    ReportType="FullCoverageReport" 
    HtmlReportName="Coverage.html" 
    Exclusions="$(CoverageExclusions)" />

The next thing to do was to declare the coverage as artifacts of the build in the configuration section of the build.

Docs/Coverage/* => Coverage,
Docs/Coverage/files/* => Coverage/files,
Docs/CoverageSummary.html

The final part (and this only has to be done once!) is to update the main config so it adds two tabs for the summary and the full report (Dont worry .. if the files arent there the tabs arent shown) 

Locate the TeamCity config file (\.BuildServer\config\main-config.xml) and add two entries.

<report-tab title="Code Coverage Summary" basePath="" startPage="CoverageSummary.html"