SBS Latest Online Video RSS Feed
28th June 2009
[An updated (but more complex) script can be found in this post]
I needed an excuse to practice some Perl. So this was my first try.
The Perl script below will convert http://www.sbs.com.au/shows/ajax/getplaylist/playlistId/94/ to an RSS feed. That 94 playlist is a list recent episodes from the TV broadcaster SBS available online. This may not work if the source file's structure changes.
#!/usr/bin/perl
# This script will download the ajax xml file containing the latest full episode videos added to the SBS.com.au site.
#Adapted from the code at http://www.perl.com/pub/a/2001/11/15/creatingrss.html by Chris Ball.
# I declar this code to be in the public domain.
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
use strict;
use warnings;
use LWP::Simple;
use HTML::TokeParser;
use XML::RSS;
use Date::Format;
# Constants
my $playlisturl = "http://www.sbs.com.au/shows/ajax/getplaylist/playlistId/94/"; # Latest Full Ep
#my $playlisturl = "http://www.sbs.com.au/shows/ajax/getplaylist/playlistId/95/"; # Latest Sneek Peek
# LWP::Simple Download the xml file using get();.
my $content = get( $playlisturl ) or die $!;
# Create a TokeParser object, using our downloaded HTML.
my $stream = HTML::TokeParser->new( \$content ) or die $!;
# Create the RSS object.
my $rss = XML::RSS->new( version => '2.0' );
# Prep the RSS.
$rss->channel(
title => "SBS Latest Full Episodes",
link => $playlisturl,
language => 'en',
lastBulidDate => time2str("%a, %d %b %Y %T GMT", time),
description => "Gives the most recent full episodes avaliable from SBS.com.au"
);
$rss->image(
title => "sbs.com.au Latest Full Episodes",
url => "http://www.sbs.com.au/web/images/sbslogo_footer.jpg",
link => $playlisturl
);
# Declare variables.
my ($tag);
# vars from sbs xml
my ($eptitle, $epthumb, $eptime, $baseurl, $img, $url128, $url300, $url1000, $code1char, $code2char, $code1);
#get_tag skips forward in the HTML from our current position to the tag specified, and
#get_trimmed_text will grab plaintext from the current position to the end position specified.
# Find an <a> tag.
while ( $tag = $stream->get_tag("a") ) {
# Inside this loop, $tag is at a <a> tag.
# But do we have a "title" token, too?
if ($tag->[1]{title}) {
# We do!
$eptitle = $tag->[1]{title};
#print $eptitle."\n";
# The next step is an <img></img> set.
$tag = $stream->get_tag('img');
$epthumb = $tag->[1]{src};
#get the flv urls from the img url
#eg. http://videocdn.sbs.com.au/u/thumbnails/SRS_FE_Global_Village_Ep_19_44_48467.jpg
#print $epthumb."\n";
$baseurl = substr($epthumb, 40, length($epthumb)-40-4);
$url128 = "http://videocdn.sbs.com.au/u/video/".$baseurl."_128K.flv";
$url300 = "http://videocdn.sbs.com.au/u/video/".$baseurl."_300K.flv";
$url1000 = "http://videocdn.sbs.com.au/u/video/".$baseurl."_1000K.flv";
#SRS|DOC|MOV
$code1char = substr($baseurl,0,3);
#SP|FE
$code2char = substr($baseurl,4,2);
my %epcode_hash = (
'DOC' => 'Documentary',
'MOV' => 'Movie',
'SRS' => 'Series',
);
$code1 = $epcode_hash{$code1char};
$stream->get_tag('a');
$tag = $stream->get_tag('p');
# Now we can grab $eptime, by using get_trimmed_text
# up to the close of the <p> tag.
$eptime = $stream->get_trimmed_text('/p');
# We need to escape ampersands, as they start entity references in XML.
$eptime =~ s/&/&/g;
# Add the item to the RSS feed.
$rss->add_item(
title => $eptitle,
permaLink => $url1000,
enclosure => { url=>$url1000, type=>"video/x-flv"},
description => "<![CDATA[<img src=\"$epthumb\" width=\"100\" height=\"56\" /><br>
$eptitle<br>
$eptime<br>
Links: <a href=\"$url128\">128k</a>, <a href=\"$url300\">300k</a>, <a href=\"$url1000\">1000k</a><br>
Type: $code1<br>]]>");
}
}
print "Content-Type: application/xml; charset=ISO-8859-1"; # To help your browser display the feed better in your browser.
#$rss->save("sbslatestfullep.rss"); #this will save the RSS XML feed to a file when you run the script.
print $rss->as_string; #this will send the RSS XML feed to stdout when you run the script.