Let’s throw an hypothetic scenario… you want to know when one of your favorite blogger post something. Maybe that blogger post his updates to Twitter and you can follow that. But sometimes they don’t. Or maybe you are building a system to aggregate their content.
Then you start looking for plugins on how to do it properly and you don’t find much.
Let me show you how to do it without plugins.
Prerequisite
- .NET 3.5 SP1
Step 1 : Retrieve the data
First let’s make our code compatible with multiple feeds. We’ll collect a list of URLs from a few very popular blogs.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15var rssFeeds = new List<Uri>
{
new Uri("http://weblogs.asp.net/scottgu/rss.aspx"),
new Uri("http://feeds.hanselman.com/ScottHanselman"),
new Uri("http://blogs.msdn.com/b/dotnet/rss.aspx"),
};
var client = new HttpClient();
foreach (var rssFeed in rssFeeds)
{
var result = client.GetStreamAsync(rssFeed).Result;
// todo: implement the rest
}
Step 2: Parse the data to retrieve the proper value
Okay, now we have the RAW XML from any RSS or Atom feed. We’ll need to parse that data somehow. Let me introduce you to the SyndicationFeed.
1 | using (var xmlReader = XmlReader.Create(result)) |
The SyndicationFeed is a built-in RSS and Atom parser found in the .NET Framework. No need for plugins here.
In my scenario however, I needed the Author name. With those 3 feed, you will find none of them if you are inspecting the code. Here’s what is fun however. The format is extensible.
Let’s retrieve an extension value.
Step 3: Retrieve Extension Values like “dc:creator” or “dc:publisher”
So those extensions are described in the Dublin Core. As an example, you can see creator and publisher. Those are used in pretty much all Blog Engines like WordPress and BlogEngine.NET.
You will find those extension on each different blog post which is represented in C# by the SyndicationItem (in the array feed.Items from earlier). Here are the extensions method I built to retrieve them:
1 | public static class SyndicationItemExtensions |
Now we should be able to retrieve the author quite easily.
Step 4: Access the values
So let’s rewrite Step 2 but with us trying to find the author of an RSS feed.
1 | using (var xmlReader = XmlReader.Create(result)) |
Here is the output from Console1
Chosen author: ScottGu
Chosen author: Scott Hanselman
Chosen author: The .NET Fundamentals Team
Conclusion
So without any extensions, I was able to parse RSS feeds. If you have a blog, you should try it on your own and see how that works. On my end, my blog was misconfigured by showing “My Name” as one of those entries.
So if you need more info, don’t hesitate to reach me on Twitter @MaximRouiller.
Enjoy!