avatar tianjara.net | blog icon Andrew Harvey's Blog

SBS Latest Online Video RSS Feed
28th June 2009

[An updated (but more complex) script can be found in this post]

I needed an excuse to practice some Perl. So this was my first try.

The Perl script below will convert http://www.sbs.com.au/shows/ajax/getplaylist/playlistId/94/ to an RSS feed. That 94 playlist is a list recent episodes from the TV broadcaster SBS available online. This may not work if the source file's structure changes.

#!/usr/bin/perl

# This script will download the ajax xml file containing the latest full episode videos added to the SBS.com.au site.

#Adapted from the code at http://www.perl.com/pub/a/2001/11/15/creatingrss.html by Chris Ball.

# I declar this code to be in the public domain.

# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.

use strict;
use warnings;

use LWP::Simple;
use HTML::TokeParser;
use XML::RSS;
use Date::Format;

# Constants
my $playlisturl = "http://www.sbs.com.au/shows/ajax/getplaylist/playlistId/94/"; # Latest Full Ep
#my $playlisturl = "http://www.sbs.com.au/shows/ajax/getplaylist/playlistId/95/"; # Latest Sneek Peek

# LWP::Simple Download the xml file using get();.
my $content = get( $playlisturl ) or die $!;

# Create a TokeParser object, using our downloaded HTML.
my $stream = HTML::TokeParser->new( \$content ) or die $!;

# Create the RSS object.
my $rss = XML::RSS->new( version => '2.0' );

# Prep the RSS.
$rss->channel(
 title            => "SBS Latest Full Episodes",
 link             => $playlisturl,
 language         => 'en',
 lastBulidDate    => time2str("%a, %d %b %Y %T GMT", time),
 description      => "Gives the most recent full episodes avaliable from SBS.com.au"
 );

$rss->image(
 title    => "sbs.com.au Latest Full Episodes",
 url    => "http://www.sbs.com.au/web/images/sbslogo_footer.jpg",
 link    => $playlisturl
 );

# Declare variables.
my ($tag);

# vars from sbs xml
my ($eptitle, $epthumb, $eptime, $baseurl, $img, $url128, $url300, $url1000, $code1char, $code2char, $code1);

#get_tag skips forward in the HTML from our current position to the tag specified, and
#get_trimmed_text  will grab plaintext from the current position to the end position specified. 

# Find an <a> tag.
while ( $tag = $stream->get_tag("a") ) {
 # Inside this loop, $tag is at a <a> tag.
 # But do we have a "title" token, too?
 if ($tag->[1]{title}) {
 # We do!
 $eptitle = $tag->[1]{title};
 #print $eptitle."\n";

 # The next step is an <img></img> set.
 $tag = $stream->get_tag('img');
 $epthumb = $tag->[1]{src};

 #get the flv urls from the img url
 #eg. http://videocdn.sbs.com.au/u/thumbnails/SRS_FE_Global_Village_Ep_19_44_48467.jpg
 #print $epthumb."\n";
 $baseurl = substr($epthumb, 40, length($epthumb)-40-4);
 $url128 = "http://videocdn.sbs.com.au/u/video/".$baseurl."_128K.flv";
 $url300 = "http://videocdn.sbs.com.au/u/video/".$baseurl."_300K.flv";
 $url1000 = "http://videocdn.sbs.com.au/u/video/".$baseurl."_1000K.flv";

 #SRS|DOC|MOV
 $code1char = substr($baseurl,0,3);
 #SP|FE
 $code2char = substr($baseurl,4,2);

 my %epcode_hash = (
 'DOC'    => 'Documentary',
 'MOV'    => 'Movie',
 'SRS'    => 'Series',
 );
 $code1 = $epcode_hash{$code1char};

 $stream->get_tag('a');
 $tag = $stream->get_tag('p');

 # Now we can grab $eptime, by using get_trimmed_text
 # up to the close of the <p> tag.
 $eptime = $stream->get_trimmed_text('/p');

 # We need to escape ampersands, as they start entity references in XML.
 $eptime =~ s/&/&amp;/g;

 # Add the item to the RSS feed.
 $rss->add_item(
 title         => $eptitle,
 permaLink     => $url1000,
 enclosure    => { url=>$url1000, type=>"video/x-flv"},
 description     => "<![CDATA[<img src=\"$epthumb\" width=\"100\" height=\"56\" /><br>
 $eptitle<br>
 $eptime<br>
 Links: <a href=\"$url128\">128k</a>, <a href=\"$url300\">300k</a>, <a href=\"$url1000\">1000k</a><br>
 Type: $code1<br>]]>");

 }
}
print "Content-Type: application/xml; charset=ISO-8859-1"; # To help your browser display the feed better in your browser.
#$rss->save("sbslatestfullep.rss"); #this will save the RSS XML feed to a file when you run the script.
print $rss->as_string; #this will send the RSS XML feed to stdout when you run the script.
 
Tags: rss.