Andrew Harvey's Blog: Entries Tagged rss

14th August 2009

After someone commented on one of my other posts I had another look around the Channel TEN Video site, this is what I gathered.

So it appears that the whole Channel 10 Video section has been outsourced to http://www.kit-digital.com/ using their vx.roo.com hosting. Thus the vx.roo.com host has to allow for other clients not just Ten. vx.roo.com appears to have assigned Ten's "SiteIdGuid" as "666b8363-97e9-4c40-b665-53846db95ad0". Here was my first port of call http://publish.flashapi.vx.roo.com/666b8363-97e9-4c40-b665-53846db95ad0-4883/PlaylistInfoService.asmx. From there, http://publish.flashapi.vx.roo.com/666b8363-97e9-4c40-b665-53846db95ad0-4883/PlaylistInfoService.asmx?op=GetPlaylistXML is a good place to go. From there they tell you how to make SOAP, HTTP GET and HTTP POST requests. I'll use HTTP GET for now because its easiest for you to follow along.

As I mentioned the first field is SiteIdGuid which for Channel TEN is 666b8363-97e9-4c40-b665-53846db95ad0. The second field is Channel. Some channel codes are listed on Ten's video page, but we want the vxChannel. For example in the URL http://www.australianidol.com.au/video.htm?vxSiteId=666b8363-97e9-4c40-b665-53846db95ad0&vxChannel=S7CUTV%3AAuditions. You can usually find it in the URL somewhere.

Using the vxSiteId and vxChannel you can formulate the RSS feed for that channel,

http://publish.flashapi.vx.roo.com/xmlgenerators/video/$vxSiteId/RSSGenerator.aspx?siteId=$vxSiteId&channel=$vxChannel

From that RSS feed you will usually get a list of clips. For example in the RSS feed, http://publish.flashapi.vx.roo.com/xmlgenerators/video/666b8363-97e9-4c40-b665-53846db95ad0-4882/RSSGenerator.aspx?siteId=666b8363-97e9-4c40-b665-53846db95ad0&channel=S7CUTV:Auditions, the fist item is a link to http://www.australianidol.com.au/video.htm?channel=S7CUTV:Auditions&clipid=2692_030TT070809&bitrate=300&format=flash. This gives you the clipid (and the bitrate and format, but those usually 300 or 700 and flash). Using this you can fill out the details for the fields on this page, http://publish.flashapi.vx.roo.com/666b8363-97e9-4c40-b665-53846db95ad0-4883/PlaylistInfoService.asmx?op=GetPlaylistXML. But that invoke button doesn't work, so just make your own HTTP POST request, something like, http://publish.flashapi.vx.roo.com/PlaylistInfoService.asmx/GetPlaylistXML?SiteIdGuid=666b8363-97e9-4c40-b665-53846db95ad0&Channel=S7CUTV%3AAuditions&Bitrate=300&Format=flash&ThumbnailTypeCode=square_large&RowCount=1&StartPosition=0&ClipId=2692_030TT070809&Artist=&Album=&Criteria=&RelatedLinksKeyName=

It appears you need all those other arguments but they mostly stay the same. Anyway, this gives you an XML file with details/metadata for that clip, but it also gives you the URL of the FLV.

Also I just noticed some things going on over at the forums on whirlpool.net.au, http://forums.whirlpool.net.au/forum-replies.cfm?t=1212283. And this stuff is actually useful. The other day one of my lecturers mentioned a show that had been shown on ABC (but originally aired in the UK) about the history of law in England, which is what we looked at in one of my gen-ed classes at uni.

Anyhow, I hope this to be the last of my posts on this kind of thing, so I can try to do more unswcourse posts.

Tags: rss.

Facebook Video Updates as an RSS Feed (Using a Shell Script)

30th July 2009

We are finally learning common Unix tools at uni. Gosh I wish we had done these earlier because they are so useful! (yes I could have learnt them myself, and I did a bit. But I ended up just learning the parts to get the job done. This didn't always work because I had very little understanding of why things worked (and why they didn't) and thus things turned into trial and error).

So anyway I wanted an RSS feed for videos uploaded on Facebook to public pages. (For example http://www.facebook.com/video/?id=20916311640). So I put my newly learnt skills to good use and wrote a shell script.

#!/bin/sh
wget http://www.facebook.com/video/?id=$1 -q -O - -U 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv: 1.8.0.3) Gecko/20060523 Ubuntu/dapper Firefox/1.5.0.3' | grep 'http://www.facebook.com/video/video.php?v=' | sed -e 's/http:\/\/www.facebook.com\/video\/video.php?v=[0-9]*/\n&\n/g' | grep 'http://www.facebook.com/video/video.php?v=' | uniq | sed -e 's/.*/<item><title>&<\/title><link>&<\/link><\/item>/' | sed "1 s/^/<?xml version=\"1.0\"?><rss version=\"2.0\"><channel><title>Facebook Video Feed<\/title><link>http:\/\/www.facebook.com\/video\/?id=$1<\/link><description>Facebook Videos for ID $1<\/description><language>en-us<\/language>/" | sed '$ s/$/<\/channel><\/rss>/'

UPDATED: (links on the page from facebook no longer have the domain etc in the link)

(the line below gets cut off, but you can select it and copy paste...)

#!/bin/sh
wget http://www.facebook.com/video/?id=$1 -q -O - -U 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv: 1.8.0.3) Gecko/20060523 Ubuntu/dapper Firefox/1.5.0.3' | grep '/video/video.php?v=' | sed -e 's/\/video\/video.php?v=[0-9]*/\n&\n/g' | grep '/video/video.php?v=' | uniq | sed -e 's/.*/<item><title>http:\/\/www.facebook.com&<\/title><link>http:\/\/www.facebook.com&<\/link><\/item>/' | sed "1 s/^/<?xml version=\"1.0\"?><rss version=\"2.0\"><channel><title>Facebook Video Feed<\/title><link>http:\/\/www.facebook.com\/video\/?id=$1<\/link><description>Facebook Videos for ID $1<\/description><language>en-us<\/language>/" | sed '$ s/$/<\/channel><\/rss>/'

Facebook will actually check the user agent and refuse to serve users it doesn't like so I had to spoof it. So anyway the pipeline will grab the html page and find all the links to individual videos and feed these out, one line for each (this is up to just after the uniq). Next I add some text to turn this list into a basic RSS file. I don't worry about making it fancy with the video title, thumbnail etc. because honestly I don't care about that for my use.

To actually use it I can use cron, (actually I think its easiest to make another shell script and put this in /etc/cron.daily/ or /etc/cron.hourly/) to run the command, ./fbvidrss.sh 20916311640 > /var/www/fbvid_20916311640.xml

Tags: rss, sh.

SBS Playlist to RSS Feed Perl Script v2

10th July 2009

I have made some changes to my original script. This new perl script will scrape info from sbs.com.au and give an RSS feed of the items in the specified playlist. I only know of two playlists (94 = Latest Full Episodes, 95 = Preview Clips). Only one line needs to be changed to use the script to give the RSS feed of a different playlist. The major improvement is the items that are only available over RTMP now have the correct URL which was previously incorrect (but now the script runs slower as it has to grab more pages from the web). I use the url, http://player.sbs.com.au/video/smil/index/standalone/$item_code/ to find out the url details.

FLVStreamer appears to do a good job of downloading media over the RTMP protocol. Just use ./flvstreamer -r rtmp://file.flv > file.flv. Mozilla has an article on how to add protocol's to firefox here. But I didn't bother with that as the command is simple as it is, and building an app with a save as dialogue is beyond me for now, but I hope to learn that soon.

[Update: It seems that you also need to have the --swfUrl argument set ('http://player.sbs.com.au/web/flash/standalone_video_player_application.swf' works.). Also the perl script doesn't get the file name correctly (it uses the thumbnail image url, rather it should be using the url's given at the /video/smil/index pages).]

For local use the current format will probably be what you want, but in a production environment you probably want to have the script save the RSS file to disk and have people hit that RSS file with the requests. Just set the perl script to run every now and then. Unfortunately I can't seem to upload .pl files to Wordpress (I've put a link, but that will expire eventually)... I really need to get my own site.. There is so much customisation I would like to do and many experiments to try out on a live server, but the $$$'s are too much...

On another note I tried out EPIC (Eclipse Perl Integration), which was fairly simple to install. It seems much nicer than using a plain text editor and command line, especially the debugging abilities that it adds.

SBSPlaylistToRSSv0.2.1.pl

Tags: computing, rss.

SBS Latest Online Video RSS Feed

28th June 2009

[An updated (but more complex) script can be found in this post]

I needed an excuse to practice some Perl. So this was my first try.

The Perl script below will convert http://www.sbs.com.au/shows/ajax/getplaylist/playlistId/94/ to an RSS feed. That 94 playlist is a list recent episodes from the TV broadcaster SBS available online. This may not work if the source file's structure changes.

#!/usr/bin/perl

# This script will download the ajax xml file containing the latest full episode videos added to the SBS.com.au site.

#Adapted from the code at http://www.perl.com/pub/a/2001/11/15/creatingrss.html by Chris Ball.

# I declar this code to be in the public domain.

# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.

use strict;
use warnings;

use LWP::Simple;
use HTML::TokeParser;
use XML::RSS;
use Date::Format;

# Constants
my $playlisturl = "http://www.sbs.com.au/shows/ajax/getplaylist/playlistId/94/"; # Latest Full Ep
#my $playlisturl = "http://www.sbs.com.au/shows/ajax/getplaylist/playlistId/95/"; # Latest Sneek Peek

# LWP::Simple Download the xml file using get();.
my $content = get( $playlisturl ) or die $!;

# Create a TokeParser object, using our downloaded HTML.
my $stream = HTML::TokeParser->new( \$content ) or die $!;

# Create the RSS object.
my $rss = XML::RSS->new( version => '2.0' );

# Prep the RSS.
$rss->channel(
 title            => "SBS Latest Full Episodes",
 link             => $playlisturl,
 language         => 'en',
 lastBulidDate    => time2str("%a, %d %b %Y %T GMT", time),
 description      => "Gives the most recent full episodes avaliable from SBS.com.au"
 );

$rss->image(
 title    => "sbs.com.au Latest Full Episodes",
 url    => "http://www.sbs.com.au/web/images/sbslogo_footer.jpg",
 link    => $playlisturl
 );

# Declare variables.
my ($tag);

# vars from sbs xml
my ($eptitle, $epthumb, $eptime, $baseurl, $img, $url128, $url300, $url1000, $code1char, $code2char, $code1);

#get_tag skips forward in the HTML from our current position to the tag specified, and
#get_trimmed_text  will grab plaintext from the current position to the end position specified. 

# Find an <a> tag.
while ( $tag = $stream->get_tag("a") ) {
 # Inside this loop, $tag is at a <a> tag.
 # But do we have a "title" token, too?
 if ($tag->[1]{title}) {
 # We do!
 $eptitle = $tag->[1]{title};
 #print $eptitle."\n";

 # The next step is an <img></img> set.
 $tag = $stream->get_tag('img');
 $epthumb = $tag->[1]{src};

 #get the flv urls from the img url
 #eg. http://videocdn.sbs.com.au/u/thumbnails/SRS_FE_Global_Village_Ep_19_44_48467.jpg
 #print $epthumb."\n";
 $baseurl = substr($epthumb, 40, length($epthumb)-40-4);
 $url128 = "http://videocdn.sbs.com.au/u/video/".$baseurl."_128K.flv";
 $url300 = "http://videocdn.sbs.com.au/u/video/".$baseurl."_300K.flv";
 $url1000 = "http://videocdn.sbs.com.au/u/video/".$baseurl."_1000K.flv";

 #SRS|DOC|MOV
 $code1char = substr($baseurl,0,3);
 #SP|FE
 $code2char = substr($baseurl,4,2);

 my %epcode_hash = (
 'DOC'    => 'Documentary',
 'MOV'    => 'Movie',
 'SRS'    => 'Series',
 );
 $code1 = $epcode_hash{$code1char};

 $stream->get_tag('a');
 $tag = $stream->get_tag('p');

 # Now we can grab $eptime, by using get_trimmed_text
 # up to the close of the <p> tag.
 $eptime = $stream->get_trimmed_text('/p');

 # We need to escape ampersands, as they start entity references in XML.
 $eptime =~ s/&/&amp;/g;

 # Add the item to the RSS feed.
 $rss->add_item(
 title         => $eptitle,
 permaLink     => $url1000,
 enclosure    => { url=>$url1000, type=>"video/x-flv"},
 description     => "<![CDATA[<img src=\"$epthumb\" width=\"100\" height=\"56\" /><br>
 $eptitle<br>
 $eptime<br>
 Links: <a href=\"$url128\">128k</a>, <a href=\"$url300\">300k</a>, <a href=\"$url1000\">1000k</a><br>
 Type: $code1<br>]]>");

 }
}
print "Content-Type: application/xml; charset=ISO-8859-1"; # To help your browser display the feed better in your browser.
#$rss->save("sbslatestfullep.rss"); #this will save the RSS XML feed to a file when you run the script.
print $rss->as_string; #this will send the RSS XML feed to stdout when you run the script.

Tags: rss.

tianjara.net | Andrew Harvey's Blog

Entries tagged "rss".

Archive