<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Playing with pipes</title>
	<atom:link href="http://discontents.com.au/shed/playing-with-pipes/feed" rel="self" type="application/rss+xml" />
	<link>http://discontents.com.au/shed/playing-with-pipes</link>
	<description>working for the triumph of content over form, ideas over control, people over systems</description>
	<lastBuildDate>Wed, 11 Jan 2012 16:12:01 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
	<item>
		<title>By: asa letourneau</title>
		<link>http://discontents.com.au/shed/playing-with-pipes/comment-page-1#comment-1908</link>
		<dc:creator>asa letourneau</dc:creator>
		<pubDate>Mon, 01 Mar 2010 10:34:19 +0000</pubDate>
		<guid isPermaLink="false">http://discontents.com.au/?p=699#comment-1908</guid>
		<description>Amanda,

If you are out there and you get this...knew nothing about screen scraping a few days ago. Came across this http://newprosoft.com/web-content-extractor.htm Not bad for someone like me who has ZERO scripting/coding background!</description>
		<content:encoded><![CDATA[<p>Amanda,</p>
<p>If you are out there and you get this&#8230;knew nothing about screen scraping a few days ago. Came across this <a href="http://newprosoft.com/web-content-extractor.htm" rel="nofollow">http://newprosoft.com/web-content-extractor.htm</a> Not bad for someone like me who has ZERO scripting/coding background!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: asa letourneau</title>
		<link>http://discontents.com.au/shed/playing-with-pipes/comment-page-1#comment-1864</link>
		<dc:creator>asa letourneau</dc:creator>
		<pubDate>Sun, 17 Jan 2010 09:08:11 +0000</pubDate>
		<guid isPermaLink="false">http://discontents.com.au/?p=699#comment-1864</guid>
		<description>Hi Tim,

Saved a pipe in which I added the PROV flickr stream to the rss feeds. The title of the pipe is the same as yours. Not sure what the pipe file name is...couldn&#039;t see any &#039;save as&#039; option. When I run the pipe the prov stream shows up fine. When I embed the badge code on the provcommunity ning...the  PROV images don&#039;t show up...even the title of the pipe isn&#039;t the same...will do it again now so you can see what I mean. Not sure what I&#039;m doing wrong? Cheers, Asa.</description>
		<content:encoded><![CDATA[<p>Hi Tim,</p>
<p>Saved a pipe in which I added the PROV flickr stream to the rss feeds. The title of the pipe is the same as yours. Not sure what the pipe file name is&#8230;couldn&#8217;t see any &#8216;save as&#8217; option. When I run the pipe the prov stream shows up fine. When I embed the badge code on the provcommunity ning&#8230;the  PROV images don&#8217;t show up&#8230;even the title of the pipe isn&#8217;t the same&#8230;will do it again now so you can see what I mean. Not sure what I&#8217;m doing wrong? Cheers, Asa.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: tim</title>
		<link>http://discontents.com.au/shed/playing-with-pipes/comment-page-1#comment-1863</link>
		<dc:creator>tim</dc:creator>
		<pubDate>Sun, 17 Jan 2010 06:29:17 +0000</pubDate>
		<guid isPermaLink="false">http://discontents.com.au/?p=699#comment-1863</guid>
		<description>Asa - Yep, that&#039;s basically it. Let me know if you strike any probs, and yes of course you&#039;re welcome to use, embed, tinker, whatever - that&#039;s why I did it.

Actually, I need to do some tinkering myself to fix the NAA Find of the Month link. Also, annoyingly, the NAA has started putting non-collection photos in its photostream. Bah.</description>
		<content:encoded><![CDATA[<p>Asa &#8211; Yep, that&#8217;s basically it. Let me know if you strike any probs, and yes of course you&#8217;re welcome to use, embed, tinker, whatever &#8211; that&#8217;s why I did it.</p>
<p>Actually, I need to do some tinkering myself to fix the NAA Find of the Month link. Also, annoyingly, the NAA has started putting non-collection photos in its photostream. Bah.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: asa letourneau</title>
		<link>http://discontents.com.au/shed/playing-with-pipes/comment-page-1#comment-1862</link>
		<dc:creator>asa letourneau</dc:creator>
		<pubDate>Sun, 17 Jan 2010 03:41:21 +0000</pubDate>
		<guid isPermaLink="false">http://discontents.com.au/?p=699#comment-1862</guid>
		<description>Hi Tim,

Just had a go at adding your pipes rss badge to the provcommunity website and it worked beautifully. Have taken it off because I wanted to ask you if this is okay to do?...and because I would love to tinker with the script, save it then embed it again...i have no programming skills but think i might be able to do it. Do I just follow the link to the pipes url you give, click on clone and then tinker, save and embed. Sounds too easy?

cheers
asa</description>
		<content:encoded><![CDATA[<p>Hi Tim,</p>
<p>Just had a go at adding your pipes rss badge to the provcommunity website and it worked beautifully. Have taken it off because I wanted to ask you if this is okay to do?&#8230;and because I would love to tinker with the script, save it then embed it again&#8230;i have no programming skills but think i might be able to do it. Do I just follow the link to the pipes url you give, click on clone and then tinker, save and embed. Sounds too easy?</p>
<p>cheers<br />
asa</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Basil Dewhurst</title>
		<link>http://discontents.com.au/shed/playing-with-pipes/comment-page-1#comment-1785</link>
		<dc:creator>Basil Dewhurst</dc:creator>
		<pubDate>Mon, 21 Sep 2009 11:41:02 +0000</pubDate>
		<guid isPermaLink="false">http://discontents.com.au/?p=699#comment-1785</guid>
		<description>Amanda - Tim is right screenscraping is frustrating and painful.  I liken screenscaping html and extracting structured data to trying to turn a hamburger into a cow !  For a couple of data sources that contribute to the National Library of Australia&#039;s People Australia program I&#039;ve used two linux apps, wget and tidy to get html documents and turn them into well formed &#039;xhtml-like&#039; documents.  I&#039;ve then created (often complex) XSL Transformations to extract the data and output EAC records in XML format.  I&#039;ve then used php to apply the transformations to thousands of files which can then be harvested.  Painful, worth it but ...  wouldn&#039;t it be nice if people used standard record formats and standard protocols for exchanging information ???  If you&#039;d like further info please let me know: bdewhurs at nla dot gov dot au</description>
		<content:encoded><![CDATA[<p>Amanda &#8211; Tim is right screenscraping is frustrating and painful.  I liken screenscaping html and extracting structured data to trying to turn a hamburger into a cow !  For a couple of data sources that contribute to the National Library of Australia&#8217;s People Australia program I&#8217;ve used two linux apps, wget and tidy to get html documents and turn them into well formed &#8216;xhtml-like&#8217; documents.  I&#8217;ve then created (often complex) XSL Transformations to extract the data and output EAC records in XML format.  I&#8217;ve then used php to apply the transformations to thousands of files which can then be harvested.  Painful, worth it but &#8230;  wouldn&#8217;t it be nice if people used standard record formats and standard protocols for exchanging information ???  If you&#8217;d like further info please let me know: bdewhurs at nla dot gov dot au</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: tim</title>
		<link>http://discontents.com.au/shed/playing-with-pipes/comment-page-1#comment-1756</link>
		<dc:creator>tim</dc:creator>
		<pubDate>Fri, 11 Sep 2009 00:02:24 +0000</pubDate>
		<guid isPermaLink="false">http://discontents.com.au/?p=699#comment-1756</guid>
		<description>Amanda - Screen scraping is the often frustrating process of trying to extract structured data from a web page. In this case, Yahoo Pipes returns a web page as a long string of text, which you can then cut up into useful pieces using &lt;a href=&quot;http://www.regular-expressions.info/&quot; rel=&quot;nofollow&quot;&gt;regular expressions&lt;/a&gt;. In other cases you might be working with the page&#039;s DOM and using &lt;a href=&quot;http://www.w3schools.com/XPath/default.asp&quot; rel=&quot;nofollow&quot;&gt;XPath&lt;/a&gt; expressions to find the elements you want -- or a combination of XPaths and regexp. 

&lt;a href=&quot;http://discontents.com.au/shed/hacks/adb-diy-rss&quot; rel=&quot;nofollow&quot;&gt;Here&#039;s an example&lt;/a&gt; where I used PHP and XPath to build an RSS feed. &lt;a href=&quot;http://discontents.com.au/shed/experiments/cloudy-biographies-and-portrait-walls&quot; rel=&quot;nofollow&quot;&gt;Here&#039;s another&lt;/a&gt; using Python. 

The wonderful &lt;a href=&quot;http://niche-canada.org/programming-historian/&quot; rel=&quot;nofollow&quot;&gt;Programming Historian&lt;/a&gt; site has lots of useful information for any aspiring screen scraper. Of course, screen scraping is also what powers many Zotero translators and there&#039;s plenty of useful info in &lt;a href=&quot;http://niche-canada.org/member-projects/zotero-guide/chapter1.html&quot; rel=&quot;nofollow&quot;&gt;this tutorial&lt;/a&gt;.</description>
		<content:encoded><![CDATA[<p>Amanda &#8211; Screen scraping is the often frustrating process of trying to extract structured data from a web page. In this case, Yahoo Pipes returns a web page as a long string of text, which you can then cut up into useful pieces using <a href="http://www.regular-expressions.info/" rel="nofollow">regular expressions</a>. In other cases you might be working with the page&#8217;s DOM and using <a href="http://www.w3schools.com/XPath/default.asp" rel="nofollow">XPath</a> expressions to find the elements you want &#8212; or a combination of XPaths and regexp. </p>
<p><a href="http://discontents.com.au/shed/hacks/adb-diy-rss" rel="nofollow">Here&#8217;s an example</a> where I used PHP and XPath to build an RSS feed. <a href="http://discontents.com.au/shed/experiments/cloudy-biographies-and-portrait-walls" rel="nofollow">Here&#8217;s another</a> using Python. </p>
<p>The wonderful <a href="http://niche-canada.org/programming-historian/" rel="nofollow">Programming Historian</a> site has lots of useful information for any aspiring screen scraper. Of course, screen scraping is also what powers many Zotero translators and there&#8217;s plenty of useful info in <a href="http://niche-canada.org/member-projects/zotero-guide/chapter1.html" rel="nofollow">this tutorial</a>.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Amanda French</title>
		<link>http://discontents.com.au/shed/playing-with-pipes/comment-page-1#comment-1753</link>
		<dc:creator>Amanda French</dc:creator>
		<pubDate>Thu, 10 Sep 2009 14:47:30 +0000</pubDate>
		<guid isPermaLink="false">http://discontents.com.au/?p=699#comment-1753</guid>
		<description>I&#039;m curious: how does one &quot;screen scrape&quot;? I always hear about it but never know how it&#039;s done. Nice work, btw!</description>
		<content:encoded><![CDATA[<p>I&#8217;m curious: how does one &#8220;screen scrape&#8221;? I always hear about it but never know how it&#8217;s done. Nice work, btw!</p>
]]></content:encoded>
	</item>
</channel>
</rss>

