Charlotte's Semantic Web - CalaisDotNet v2.0

Well after much teasing its about time I posted something concrete about the work I've been doing using OpenCalais. As metioned previously I was working on my own .NET plugin / Helper class but, as it turns out, a ex-collegue of mine (who is now hiding away in sunny Australia) Chris Fulstow pipped me to the post and released his .NET Open Calais project on CodePlex. Rather than have two seperate projects doing exactly the same thing I began to merge my work with his and the result can be found here: CalaisDotNet

New Version

While busy merging code, OpenCalais released version 2 of their web service, adding more entities and relationships and now adding 'Simple' and 'Microformats' output types. I was attending a museum data mashup day as part of the the UK Musuems on the Web Conference 2008 last week and wanted to run some museum data into OpenCalais so branched the code and began re-writing it to support the new output types. After a busy weekend finishing off the code and adding relationship (Events/Facts) support I am pleased to say this new version is almost ready for release. I wanted to show some example of how easy it is to use it to process any type of data you submit to it and to use the power of LINQ to then manipulate and query the results set you get back.

Requirements

CalaisDotNet is written using the wonderful-ness of C# 3.0 and relies on having the .NET Framework v3.5 being installed. Remember .NET v3.5 is essentially just a new set of libraries - the CLR is still the same 2.0 version .. which means if you're running .NET 2.0 atm then it isnt not too much of a big deal to install v3.5 as all your old stuff will still work exactly the same way.

You will also need an OpenCalais API Key which can be freely requested after you register at the OpenCalais website. 

Here we go ..

CalaisDotNet can be broken down into two parts. The first one is the call to the web service, handled by the CalaisDotNet object, and second is the processed response data contained within one of three Calais*Document types that represent the three different types of output from the OpenCalais web service. Calling each is trivial.  

var calais = new CalaisDotNet(_apiKey, _content);
var document = calais.Call<CalaisRdfDocument>();

 
.. where:
_apiKey = (string) Your 24 digit API key
_content =(string) Your content to be processed. OpenCalais can accept input in 3 different formats - Plain Text, HTML and XML. CalaisDotNet has support for all three. CalaisDotNet will take a guess at the format of you text or you can specify the input type with an extra parameter in the constructor.
 

var calais = new CalaisDotNet(_apiKey, _content, CalaisInputFormat.Text);
var document = calais.Call<CalaisRdfDocument>();

The document returned represents the processed output from the web service and gives you the ability to access various collections of data such as Entities or Relationships

Simple Format

Documentation: HERE

This is a new format introduced in the latest version which has a reduced set of properties and entities, this format is ideal for doing things such as tag clouds as it only exposes simple list of basic entities, their frequency in the document and the value of each. Also its document description information is much reduced only having five properties.

You can still build up a LINQ query to filter or order the data in anyway that you choose.

Use the CalaisSimpleEntityType enum to filter by entity type.

an example query would look something like this, filtering for results where the entity type is 'Country'

var calais = new CalaisDotNet(_apiKey, _content, CalaisInputFormat.Text);
var document = calais.Call<CalaisSimpleDocument>();

var results = from item in document.Entities
                 where item.Type == CalaisSimpleEntityType.Country
                 select item;

foreach (var result in results)
{
   
Console.WriteLine(result.Value);
}

Microformats

Documentation: HERE 

This is a very basic implementation as, franky, I dont know how we can add any value to it as a lot of the work is done by the web service to format the data into HCalendars and hCards. Most of my time was spent making the RDF stuff work so suggestions are welcome on how best to process this into something useful :)

To grab the unprocessed output use the RawOutput property available on all the Calais*Documents to view see the original response.

var calais = new CalaisDotNet(_apiKey, _content, CalaisInputFormat.Text);
var document = calais.Call<CalaisMicroFormatsDocument>();

Console.WriteLine(document.RawOutput);

RDF Magic

The meat of the semantic data is (of course) contained within the CalaisRdfDocument class. It has a much richer set of document description metadata than the 'Simple' format.

The document also contains an IEnumerable list of Entities and an IEnumerable list of Relationships (Events/Facts). These can, for example, then be filtered by entity type (CalaisRdfEntityType) or relationship type (CalaisRdfRelationshipType).

Each entity/relationship also contains a list of all instances of that entity/relationship in the submitted document. Some examples:

Filtering where CalaisRdfEntityType is 'Company' and printing their location offsets.

var calais = new CalaisDotNet(_apiKey, _content);
var document = calais.Call<CalaisRdfDocument>(); var results = from item in document.Entities
                
where item.EntityType == CalaisRdfEntityType.Company
                
select item;

foreach (var result in results)
{
   
Console.WriteLine(result);

   
foreach (var instance in result.Instances)
    {
       
Console.WriteLine(
               
" - Found at offset: " +
               
instance.Offset + "(" +
               
instance.Length + " chars)"
               
);
    }
}

Returns only 'PersonPolitical' relationships  

var results = from item in document.Relationships
                
where item.RelationshipType == CalaisRdfRelationshipType.PersonPolitical
                
select item;

foreach (var result in results)
{
    
Console.WriteLine(result);
   
foreach (var instance in result.Instances)
    {
       
Console.WriteLine(
               
" - Found at offset: " +
               
instance.Offset + "(" +
               
instance.Length + " chars)"
               
);
    }
}

Slightly more complicated ..

Filters results by country and then looks up any relationships that are related to that country.

var calais = new CalaisDotNet(_apiKey, _content);
var document = calais.Call<CalaisRdfDocument>();

var results = from item in document.Entities
                 
where item.EntityType == CalaisRdfEntityType.Country
                
select item;

foreach (var result in results)
{
   
Console.WriteLine(result);

   
foreach (var instance in result.Instances)
    {
       
Console.WriteLine(
                           
" - Found at offset: " +
                           
instance.Offset + "(" +
                           
instance.Length + " chars)"
                           
);
    }
     

    var
rels = from item in document.Relationships
                 
where item.RelationshipDetails.Values.Contains(result.Value)
                 
select item;

    foreach (var rel in rels)   
    {
       
Console.WriteLine(" - Relationship: " + rel);
    }
}

Download

Currently this version is still in a branch so you will have to compile using the solution in the 'CalaisDotNet-NewFeatures_200805' folder .. you can download the release from the source tab of the Codeplex project site (HERE). Im hoping to make this a release soon once its been QA'd and also when I work out how to do it hehe :P

TO DO

  • MicroFormats - As mentioned earlier we need to look at the Microformats output and work out how to present it usefully.
  • RDFa - It would be reallty nice to be able to output the sumbitted document as RDFa .. we have the entities and we know where they are in the text so this shouldnt be too hard a jump .. personally I just need to understand RDFa better first.

 


Posted by: [mRg]
Posted on: 6/24/2008 at 2:13 PM
Tags: , , , , ,
Categories: Guides
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (0) | Post RSSRSS comment feed

Grand Func Railroad - Functional .. erm. functions


A while ago I read an an excellent post by Andrew Matthews (on his excellent Wandering Glitch blog) about employing functional programming techniques in C# that are enabled by the new features added in C#3.0. Now, I did wonder about posting about this as it contains the same subject matter as the information in Andrews blog (and Andrew describes the hows and whys a lot better than me), but I wanted to post for two seperate reasons.

  1. I use these two functions (below) ~all~ the time now. They are fantastic and (thanks to Andrew) have opened up my code up to this powerful programming technique and wanted to share them here incase they help anyone else.
  2. While I really liked Andrews examples it took me a while to grok them completely thanks to one problem that plagues me as a person bug-bear .. one letter variables as arguments.

I hate, hate, hate one letter variables when used as arguments, I know the problem here is a personal one .. mainly that I am not (or have ever been) a mathematician I didnt do A-Level maths, I was a graphics, pixel art, 3d artist kind of guy who found his way into programming by accident and while now I would never consider myself anything other than a programmer, math-type syntax makes my brain run in the opposite direction .. i.e the original On (which becomes my "Apply"):

Func<T, T> On<T>(this Func<T, T> f, Func<T, T> g)
{
   
return t => g(f(t));
}

 

In learning about these functional techniques (also including my learning with F# as well) I come across time and time again peoples examples that are clearly for the mathematically minded (ie not me hehe!) so I wanted to try and re-present these two functions in the syntax that finally got me to understand them.

I doing this I have to strongly emphasise that I am not "having a go" at anyone, especially not Andrew, as i wouldnt be here if people didnt post such great articles showing new ways of doing stuff, this simply exists to provide a level of clarification for the thickies :D Enough of my jibber-jabber ..

ApplyToSequence

This extension function takes a function as the argument and then applies that function to every element of the IEnumerable list. While being simple its is a very powerful tool, it already exists in one degree in the BCL, if you create a List<T> you can use the .ForEach() method but this function allows you to perform an action on ~anything~ IEnumberable (which makes it very handy ! Although i dont know why this isnt a standard method for IEnumerable already). An example follows these descriptions.

static IEnumerable<TResult> ApplyToSequence<T, TResult>(this IEnumerable<T> sourceSequence, Func<T, TResult> functionToApply)
{
   
foreach (var element in sourceSequence)
    {
        
yield return functionToApply(element);
    }
}

Apply

This extension method take a function as an argument and then applies that function to the original one. The great thing about this is it enable you to "chain" functions togther to make concise, powerful code.

static Func<T, T> Apply<T>(this Func<T, T> sourceFunction, Func<T, T> functionToApply)
{
   
return t => functionToApply(sourceFunction(t));
}

Examples

Here we go .. putting these two simple functions together means we can start doing quite neat things.

    // Two functions one which takes an int and adds 1 to it
    var addOne = ((Func<int, int>)(a => a + 1));

   
// .. and another that take an int and subtracts
   
var subOne = ((Func<int, int>)(a => a - 1));

   
// Two IEnumerables (one int and one string)
   
IEnumerable<int> test = new [] { 22, 44, 553, 345, 23, 32 };
   
IEnumerable<string> test2 = new [] { "<head>", "</head>" };

    // Print originals ..
   
foreach (var i in test)
    {
       
Console.WriteLine(i);
    }

    Console.WriteLine("-----------------------");

   
// Add 1 to each value .. using ApplyToSequence to apply the addOne
    // function to each element.
   
test = test.ApplyToSequence(addOne);

    foreach (var i in test)
    {

       
Console.WriteLine(i);
    }

   
Console.WriteLine("-----------------------");

    // By chaining the functions together we can add 3 then
    // subtract 1 from each element .. in one line too ..
   
test = test.ApplyToSequence(addOne.Apply(addOne).Apply(addOne).Apply(subOne));

   
foreach (var i in test)
    {
       
Console.WriteLine(i);
    }

    Console.WriteLine("-----------------------");

    foreach (var i in test2)
    {
       
Console.WriteLine(i);
    }

    Console.WriteLine("-----------------------");

    // Using ApplyToSequence we can also apply another function to
    //
 a string to do useful things like escaping characters
   
test2 = test2.ApplyToSequence(i => EscapeString(i));
   
   
foreach (var i in test2)

    {
       
Console.WriteLine(i);
   
}

   
Console.WriteLine("-----------------------");

The power of these should (hopefully) speak for themselves and I found myself using these a hell of a lot in recent code. The how and why these functions work or are able to work is described brilliantly in Andrews original article, I hope my lazy renaming simply helps shed some light on these for the less mathematically minded out there :)

 


Posted by: [mRg]
Posted on: 6/4/2008 at 10:55 AM
Tags: ,
Categories: Guides
Actions: E-mail | Kick it! | DZone it! | del.icio.us
Post Information: Permalink | Comments (2) | Post RSSRSS comment feed