<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>ActionScript 3.0 Design Patterns &#187; Iterator</title>
	<atom:link href="http://www.as3dp.com/category/design-patterns/iterator/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.as3dp.com</link>
	<description>OOP Techniques for Flash and Flex Developers</description>
	<lastBuildDate>Sun, 29 Jan 2012 17:00:46 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Iterator Pattern Example: Developing a Webpage Scraper</title>
		<link>http://www.as3dp.com/2009/01/iterator-pattern-example-developing-a-webpage-scraper/</link>
		<comments>http://www.as3dp.com/2009/01/iterator-pattern-example-developing-a-webpage-scraper/#comments</comments>
		<pubDate>Thu, 29 Jan 2009 00:28:03 +0000</pubDate>
		<dc:creator>Chandima Cumaranatunge</dc:creator>
				<category><![CDATA[Iterator]]></category>
		<category><![CDATA[Recursion]]></category>

		<guid isPermaLink="false">http://www.as3dp.com/?p=186</guid>
		<description><![CDATA[A couple of readers suggested that a full-fledged example might be a good follow-up to my <a href="http://www.as3dp.com/2008/09/04/the-iterator-pattern-flexible-implementation-of-collections/">previous post </a>introducing the iterator pattern. This is a good suggestion as there are few meaningful examples of the iterator pattern that demonstrate its intent and usefulness. This is probably due to the iterator pattern being a built-in construct in most programming languages. However, built-in iterators are designed to  traverse  native collections such as <em>Arrays</em> and <em>Dictionaries</em>. To traverse a custom data structure, we need to develop an iterator from the ground-up. In this example, we will develop a webpage scraper, like  <a href="http://en.wikipedia.org/wiki/Googlebot" target="_blank">Googlebot</a>, that recursively harvests information from web pages.

Why is an iterator pattern a good candidate for use in developing a webpage scraper? As described in my <a href="http://www.as3dp.com/2008/09/04/the-iterator-pattern-flexible-implementation-of-collections/">previous post</a>, the iterator pattern provides a <strong><em>uniform way to t</em></strong><strong><em>raverse and access elements in a </em></strong><strong><em>collection</em></strong>. A web page is a <em>collection </em>of<em> elements</em>. To harvest the elements ( tags ) we need to traverse and access the elements in the collection (HTML). The iterator pattern light bulb should go off at this point.

Having a uniform interface to access different elements in the web page is very desirable. Why? because there are multiple ways to traverse and access different elements. We can develop several concrete iterators to access different tags. In this example, we will develop two concrete iterators: one to access hyperlinks and the other to access images.

The example will be developed in two parts. My initial novice attempt will be described first ( I'll call this version 1 ). I initially treated a web page as an XML document, so that E4X could be used to traverse and identify elements. However, this introduced a major limitation in that only <em>well-formed</em> web pages could be scraped.  This didn't mean that the web pages had to be declared as <a href="http://www.w3.org/TR/xhtml1/#strict" target="_blank">XHTML Strict</a> per se, but each page had to be structured according to the rules defined in <a href="http://www.w3.org/TR/REC-xml/#sec-well-formed" target="_blank">Section 2.1</a> of the XML 1.0 Recommendation. So, any malformed web pages with missing closing tags, or funky characters would fail the test. In my second attempt (version 2), I treated web pages as text documents and used regular expressions to identify elements. This introduced another more serious limitation in that my knowledge of regular expression pattern matching was minimal. So, version 2 was more of an adventure in slaying the regular expression dragon than anything else. However, the utility of the iterator was amply demonstrated as I could extend the scraper app to meet the new reqruiement without changing any existing code - the ultimate test in reusability. Here is the initial class diagram.

<strong>Webpage scraper - version 1</strong>

[caption id="attachment_613" align="alignnone" width="444" caption="Class diagram of web scraper version 1"]<img class="size-full wp-image-613" title="fig02" src="http://www.as3dp.com/wp-content/uploads/2009/01/fig02.jpg" alt="Class diagram" width="444" height="202" />[/caption]]]></description>
			<content:encoded><![CDATA[<p>A couple of readers suggested that a full-fledged example might be a good follow-up to my <a href="http://www.as3dp.com/2008/09/04/the-iterator-pattern-flexible-implementation-of-collections/">previous post </a>introducing the iterator pattern. This is a good suggestion as there are few meaningful examples of the iterator pattern that demonstrate its intent and usefulness. This is probably due to the iterator pattern being a built-in construct in most programming languages. However, built-in iterators are designed to  traverse  native collections such as <em>Arrays</em> and <em>Dictionaries</em>. To traverse a custom data structure, we need to develop an iterator from the ground-up. In this example, we will develop a webpage scraper, like  <a href="http://en.wikipedia.org/wiki/Googlebot" target="_blank">Googlebot</a>, that recursively harvests information from web pages.</p>
<p>Why is an iterator pattern a good candidate for use in developing a webpage scraper? As described in my <a href="http://www.as3dp.com/2008/09/04/the-iterator-pattern-flexible-implementation-of-collections/">previous post</a>, the iterator pattern provides a <strong><em>uniform way to t</em></strong><strong><em>raverse and access elements in a </em></strong><strong><em>collection</em></strong>. A web page is a <em>collection </em>of<em> elements</em>. To harvest the elements ( tags ) we need to traverse and access the elements in the collection (HTML). The iterator pattern light bulb should go off at this point.</p>
<p>Having a uniform interface to access different elements in the web page is very desirable. Why? because there are multiple ways to traverse and access different elements. We can develop several concrete iterators to access different tags. In this example, we will develop two concrete iterators: one to access hyperlinks and the other to access images.</p>
<p>The example will be developed in two parts. My initial novice attempt will be described first ( I&#8217;ll call this version 1 ). I initially treated a web page as an XML document, so that E4X could be used to traverse and identify elements. However, this introduced a major limitation in that only <em>well-formed</em> web pages could be scraped.  This didn&#8217;t mean that the web pages had to be declared as <a href="http://www.w3.org/TR/xhtml1/#strict" target="_blank">XHTML Strict</a> per se, but each page had to be structured according to the rules defined in <a href="http://www.w3.org/TR/REC-xml/#sec-well-formed" target="_blank">Section 2.1</a> of the XML 1.0 Recommendation. So, any malformed web pages with missing closing tags, or funky characters would fail the test. In my second attempt (version 2), I treated web pages as text documents and used regular expressions to identify elements. This introduced another more serious limitation in that my knowledge of regular expression pattern matching was minimal. So, version 2 was more of an adventure in slaying the regular expression dragon than anything else. However, the utility of the iterator was amply demonstrated as I could extend the scraper app to meet the new reqruiement without changing any existing code &#8211; the ultimate test in reusability. Here is the initial class diagram.</p>
<h4>Webpage scraper &#8211; version 1</h4>
<div id="attachment_613" class="wp-caption alignnone" style="width: 454px"><img class="size-full wp-image-613" title="fig02" src="http://www.as3dp.com/wp-content/uploads/2009/01/fig02.jpg" alt="Class diagram of web scraper example" width="444" height="202" /><p class="wp-caption-text">Class diagram of version 1</p></div>
<p><span id="more-186"></span><br />
Two concrete iterators called <code>HyperlinkIterator</code> and <code>ImageIterator</code> traverse and access elements in the <code>XHTMLPage</code> concrete aggregate.</p>
<p><strong>Class Hierarchy </strong></p>
<p>The <em>com.as3dp.patterns.iterator.scraper</em> package contains the concrete aggregate and concrete iterators.  The package facilitates encapsulation by hiding implementation details from the client. This will be apparent when we look at some code later on. Here is a screenshot of the project panel in Flash CS3 showing the class files and hierarchy.</p>
<div id="attachment_452" class="wp-caption alignnone" style="width: 303px"><img class="size-full wp-image-452 " title="Project window in Flash CS3" src="http://www.as3dp.com/wp-content/uploads/2009/01/project_window_part11.gif" alt="Project window in Flash CS3" width="293" height="285" /><p class="wp-caption-text">Project panel in Flash CS3 showing the class hierarchy for version 1</p></div>
<p><strong>The Interfaces</strong></p>
<p>The public interfaces for the <em>aggregate</em> and <em>iterator</em> are quite straightforward. The method signature in the <code>IIterableAggregate</code> interface is different from the <a href="http://www.as3dp.com/2008/09/04/the-iterator-pattern-flexible-implementation-of-collections/">previous post</a>. The <code>createIterator</code> method now takes an argument specifying the <code>type</code> of iterator. This allows the client flexibility to get <strong>different types of iterators</strong> to traverse the same collection.</p>
<pre title="IIterableAggregate.as" lang="actionscript">package com.as3dp.interfaces.iterator
{
  public interface IIterableAggregate
  {
    function createIterator( type : String = null ) : IIterator
  }
}</pre>
<pre title="IIterator.as" lang="actionscript">package com.as3dp.interfaces.iterator
{
  public interface IIterator
  {
    function reset() : void
    function next() : *
    function hasNext() : Boolean
  }
}</pre>
<p><strong>The Concrete Aggregate: XHTMLPage</strong></p>
<p><strong><span style="font-weight: normal;">The </span><code><span style="font-weight: normal;">XHTMLPage</span></code><span style="font-weight: normal;"> class <em>concrete aggregate</em><span style="font-weight: normal;"><em> </em>is the most complex, so let&#8217;s deal with it first. Web pages are loaded asynchonously in AS3. The client not only has to specify the URL of the web page to load, but will have to wait until it is loaded. The client can specify a callback function that will be called when the page is loaded. If this is the case, XHTMLPage will need the ability to dispatch an event to call the callback function. To meet all these requirements without re-inventing the wheel, we can implement </span><code><span style="font-weight: normal;">XHTMLPage</span></code><span style="font-weight: normal;"> by extending the </span><code><span style="font-weight: normal;">URLLoader</span></code><span style="font-weight: normal;"> class (this choice will eventually break encapsulation &#8211; can you see why? More on that later ).</span></span></strong></p>
<pre title="XHTMLPage.as" lang="actionscript">package com.as3dp.patterns.iterator.scraper
{
  import flash.errors.*;
  import flash.events.*;
  import flash.net.*;

  import com.as3dp.interfaces.iterator.*;
  import com.as3dp.patterns.iterator.scraper.*;

  public class XHTMLPage extends URLLoader implements IIterableAggregate
  {
    public static const HYPERLINKS  :String =  'Iterate over hyperlinks';
    public static const IMAGES    :String =  'Iterate over images';

    public static const LOADED    :String =  'page loaded';

    internal var xml  : XML = null;
    internal var url  : String;

    public function XHTMLPage( pageURL:String, callbackFn:Function )
    {
      url = pageURL;
      var urlRequest:URLRequest = new URLRequest( url );
      addEventListener( IOErrorEvent.IO_ERROR, loadError );
      addEventListener( Event.COMPLETE, loadComplete );
      addEventListener( LOADED, callbackFn ); // register client as listener
      load( urlRequest );
    }

    private function loadComplete( evt:Event ) : void
    {
      try // validate XML
      {
        xml = new XML( evt.target.data );
      } catch( e:Error ) {
        trace( e.message )
      }
      dispatchEvent( new Event( LOADED ) ); // dispatch event to client
    }

    private function loadError( evt:Event ) : void
    {
      trace( evt );
    }

    public function createIterator( type : String = null ) : IIterator
    {
      if ( xml )
      {
        switch( type )
        {
          case null:
          case HYPERLINKS:
            return new HyperlinkIterator( this );
            break;
          case IMAGES:
            return new ImageIterator( this );
            break;
          default:
            throw new ArgumentError('Invalid iterator type specified');
        }
      } else {
        throw new IOError('Cannot create an iterator for a page that is not loaded');
      }
    }
  }
}</pre>
<p>The constructor takes two parameters: the URL of the page and the callback function. Passing the callback function, as opposed to requiring the client to register a listener,  simplifies things as the client is registered as a listener for the <code>LOADED</code> event  in the <em>constructor</em> ( line 26 ). The <code>loadComplete</code> method is interesting in that it first assigns the web page source to a variable of type <code>XML</code> (line 34). If the page source is well formed, a <code>LOADED</code> Event will be dispatched (line 40). If not, an exception will be thrown.</p>
<p>Design Pattern aficionados will notice that the iterator is very similar to the <a href="http://www.adobe.com/devnet/actionscript/articles/ora_as3_design_patterns.html" target="_blank">Factory Method Pattern</a>. In fact, the  <code>createIterator </code>method is a <em>parameterized factory method</em> that returns a concrete iterator. It takes a <code>type</code> parameter that indicates the type of iterator that will be returned. The factory method prevents the client from instantiating an iterator directly (using the <code>new</code> keyword)  and reduces the coupling between the client and iterator. Herein lies the advantage of the factory method. Iterators can be extended, swapped out and otherwise changed without the clients knowledge. There is also an additional check (line 48) to prevent an iterator from being returned unless the page is loaded.</p>
<p>Also note that the aggregate passes itself as an argument when constructing the iterator (lines 54 and 57)  - a form of <em>dependency injection</em>. This is why the xml and and url properties are defined as <em>internal</em>. They are visible to the iterator, as it is within the same package, but invisible to the client, which is outside the package. This sums up the unique relationship between aggregates and iterators: they know and depend on each other &#8211; signified by the arrows going both ways in the class diagram.</p>
<p><strong>The Concrete Iterators: <code>HyperlinkIterator</code> and <code>ImageIterator</code></strong></p>
<p>We can now develop the two concrete iterators that traverse the anchor &lt;a&gt; and image &lt;img&gt; elements . The advantage of  injecting the concrete aggregate into the iterator through its constructor is evident in the <code>next()</code> method. The iterator has access to all the properties of the concrete aggregate ( not only the collection ). The <code>next()</code> method checks if the hyperlink is an absolute or relative URL and does some cleanup before returning it. Having access to the <code>url</code> property declared in the concrete aggregate is essential in this case.</p>
<pre title="HyperlinkIterator.as" lang="actionscript">package com.as3dp.patterns.iterator.scraper
{
  import com.as3dp.interfaces.iterator.*;

  public class HyperlinkIterator implements IIterator
  {
    private var col   : XHTMLPage;
    private var index : Number;

    public function HyperlinkIterator( aCollection : XHTMLPage )
    {
      col = aCollection;
      if ( col.xml.namespace('') != undefined )
      {
        // set the default XML namespace in current scope
        default xml namespace = col.xml.namespace('');
      }
      reset();
    }

    public function reset() : void
    {
      index = -1;
    }

    public function next() : *
    {
      var url:String = col.xml..a[ ++index ].@href;
      // clean up URL
      if ( url.charAt( 0 ) == '/' )
      {
        return col.url + url;
      } else {
        return url;
      }
    }

    public function hasNext() : Boolean
    {
      return ( index &lt; col.xml..a.length() - 1 );
    }
  }
}</pre>
<pre title="ImageIterator.as" lang="actionscript">package com.as3dp.patterns.iterator.scraper
{
  import com.as3dp.interfaces.iterator.*;

  public class ImageIterator implements IIterator
  {
    private var col   : XHTMLPage;
    private var index : Number;

    public function ImageIterator( aCollection : XHTMLPage )
    {
      col = aCollection;
      if ( col.xml.namespace('') != undefined )
      {
        // set the default XML namespace in current scope
        default xml namespace = col.xml.namespace('');
      }
      reset();
    }

    public function reset() : void
    {
      index = -1;
    }

    public function next() : *
    {
      var url:String = col.xml..img[ ++index ].@src;
      // clean up URL
      if ( url.charAt( 0 ) == '/' )
      {
        return col.url + url;
      } else {
        return url;
      }
    }

    public function hasNext() : Boolean
    {
      return ( index &lt; col.xml..img.length() - 1 );
    }
  }
}</pre>
<p><strong>The Client</strong></p>
<p>First, I needed a dummy website to test our little scraper. I never had any use for Apple&#8217;s iWeb application before this, but I must say that it creates a nice dummy website in no time flat!</p>
<p><a title="Dummy website for testing" href="http://tester.as3dp.com" target="_blank">http://tester.as3dp.com</a></p>
<p>Now we can develop some client code to scrape one of its pages for hyperlinks and image links. <code>Main</code> is the <em>document class</em> for the Flash CS3 document.</p>
<pre title="Main.as" lang="actionscript">package
{
  import flash.display.Sprite;
  import flash.events.*;
  import flash.net.*;

  import com.as3dp.interfaces.iterator.*;
  import com.as3dp.patterns.iterator.scraper.*;

  /**
  * Main Class
  * @ purpose:    Document class for movie
  */
  public class Main extends Sprite
  {
    private var webPage : IIterableAggregate;

    public function Main()
    {
      var url:String = 'http://tester.as3dp.com/Welcome.html';
      // create a concrete aggregate - an XHTML page
      webPage = new XHTMLPage( url, pageLoaded );
    }

    private function pageLoaded( evt:Event ) : void
    {
      trace( "r***URLs of all hyperlinks: r");

      // get an iterator for hyperlinks
      var itr:IIterator = webPage.createIterator( XHTMLPage.HYPERLINKS );

      // iterate over hyperlinks
      while ( itr.hasNext() )
      {
        trace ( itr.next() );
      }

      trace( "r*** URLs of all images: r");

      // get an iterator for images
      itr = webPage.createIterator( XHTMLPage.IMAGES );

      // iterate over image links
      while ( itr.hasNext() )
      {
        trace ( itr.next() );
      }
    }
  }
}</pre>
<p>The client side of web scraping is straightforward. Create an XHTMLPage aggregate by passing the page URL and callback function. Within the callback function create iterators to access hyperlinks and image links. The power of the iterator is exemplified by the simple interface used to traverse and access elements in what could be a very complex collection.</p>
<p><strong>The Output</strong></p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p186code1'); return false;">View Code</a> TEXT</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p1861"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
</pre></td><td class="code" id="p186code1"><pre class="text" style="font-family:monospace;">***URLs of all hyperlinks:
&nbsp;
About%20Me.html
Photos.html
Movie.html
&nbsp;
*** URLs of all images:
&nbsp;
Welcome_files/44143A3_a.jpg</pre></td></tr></table></div>

<p>Source file <a href="#source">webscraper-ver1.zip</a></p>
<p><strong>Limitations! limitations!</strong></p>
<p>More often than not, you will encounter a malformed web page: one that is not structured according to the rules defined in <a href="http://www.w3.org/TR/REC-xml/#sec-well-formed" target="_blank">Section 2.1</a> of the XML 1.0 Recommendation. The primary culprit would be missing tags or incorrectly nested elements. This causes the XML validation check in the <code>XHTMLPage</code> class to throw an exception like the following:</p>
<p><code>Error #1090: XML parser failure: element is malformed.</code></p>
<p>The scraper also doesn&#8217;t handle redirects. For example, <a href="http://tester.as3dp.com" target="_blank">http://tester.as3dp.com</a> redirects to <a href="http://tester.as3dp.com/Welcome.html" target="_blank">http://tester.as3dp.com/Welcome.html</a>. However, unlike a web browser, our scraper does not check for this. It will happily load the initial page and proclaim that it doesn&#8217;t have any images or hyperlinks.</p>
<p><strong>Design decision broke encapsulation!</strong></p>
<p>Remember my original decision to subclass <code>URLLoader</code> to implement <code>XHTMLPage</code>.  Well, that was a bad choice as URLLoader has a public property called <code>data</code> that exposes the loaded web page, the collection in this case, to the client. This defeats the original intent of the iterator pattern to hide implementation details of the concrete aggregate. The danger here is that a meddlesome client can mess with the collection while traversing it, and cause all sorts of havoc. Encapsulation is a good thing.</p>
<p>The larger lesson here is a principle that is bandied about quite freely but don&#8217;t see too many examples of why it should be the case:</p>
<blockquote><p>Favor object composition over class inheritance</p></blockquote>
<p>This is a a great example as to why you should look into object composition even though subclassing would seem to be the easy solution. You are at the mercy of the parent class when you inherit as all its public <em>properties</em> and <em>methods</em> will be exposed. Kind of like standing in the middle of the street in your skivvies. Of course you can override the public methods, but the the simpler solution is composition, especially if the parent class is a beast with many public methods.</p>
<p>So, let&#8217;s address some of these issues and build a better scraper. We will not change existing code, but will extend the application to add new concrete aggregates and iterators.</p>
<h4>Building a better web scraper &#8211; version 2</h4>
<div id="attachment_621" class="wp-caption alignnone" style="width: 550px"><img class="size-full wp-image-621" title="Class diagram of the web scraper rev 2" src="http://www.as3dp.com/wp-content/uploads/2009/01/fig04.jpg" alt="Class diagram of the web scraper rev 2" width="540" height="291" /><p class="wp-caption-text">Web scraper class diagram for version 2</p></div>
<p>We will now treat the web page source as a text string instead of an xml object. It is still a collection, but we cannot use E4X to traverse and access elements. We will use regular expressions for this purpose. Now we have a concrete aggregate called <code>HTMLPage</code> and two concrete iterators <code>HyperlinkIteratorRegex</code> and <code>ImageIteratorRegex</code>.</p>
<p>The similarity to the Factory Method pattern should be quite evident here as you will see two <em>parallel class hierarchies</em> that depend on each other to make things work. An aggregate and iterator set that uses E4X and another set that uses Regular Expressions to do its magic.</p>
<p><strong>Slaying the Regular Expression Dragon</strong></p>
<p>Version 1 helped me realize very quickly that <a href="http://labs.apache.org/webarch/uri/rfc/rfc3986.html" target="_blank">URLs can take many forms</a>, all of which are quite excruciating to deal with. Much time was spent trying to figure out how to parse URLs and decompose them into bare components. Without turning this already lengthy post into a discussion on regular expressions, I&#8217;m going to list the resources that were most helpful.</p>
<p>The tools:</p>
<ul>
<li><a href="http://www.gskinner.com/RegExr/" target="_blank">RegExr</a> the online regular expression testing tool ( and its Adobe Air variant <a href="http://www.gskinner.com/RegExr/desktop/" target="_blank">RegExr Desktop</a> ) developed by Grant Skinner. This tool is cool in so many ways. The primary one is that it explains <strong>in plain english</strong> what each miniscule part of the regular expression is doing and what it matches.</li>
<li><a href="http://homepage.mac.com/roger_jolly/software/#regexhibit" target="_blank">RegExhibit</a> ( OSX only ) is very useful to test multiple matches and groups of tokens.</li>
</ul>
<p>Tutorials:</p>
<ul>
<li><a href="http://gnosis.cx/publish/programming/regular_expressions.html" target="_blank">Learning to use Regular Expressions,</a> first published by IBM DeveloperWorks, updated by David Mertz. This taught me one of the most important lessons in pattern matching: reformulating the problem. Instead of asking &#8220;what am I trying to match&#8221; ask yourself &#8220;what do I need to <strong>avoid</strong> matching.&#8221;</li>
<li><a href="http://www.filbar.org/weblog/parsing_incomplete_or_malformed_urls_with_regular_expressions" target="_blank">Parsing Incomplete or Malformed URLs with Regular Expressions</a>, a definitive resource by Vince Filby.</li>
</ul>
<p>The first step was to develop a utility class to parse URLs. I was flailing around looking for a URL parsing regular expression before finding Vince Filby&#8217;s writeup. His method is to treat the URL as a set of components:</p>
<blockquote>
<pre>foo://example.com:8042/over/there?name=ferret#nose
_/   ______________/_________/ _________/ __/
 |           |            |            |        |
scheme     authority       path        query   fragment</pre>
</blockquote>
<p>And construct a regular expression that captures each component using capture groups.</p>
<blockquote>
<pre>^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(?([^#]*))?(#(.*))?</pre>
</blockquote>
<p>It was pretty simple to develop a utility class based on this regular expression.  The <code>parentDirPath()</code> method returns the parent directory of a given URL. This is required to figure out the fully qualified URL of relative URLs on web pages.</p>
<pre title="ParseURL.as" lang="actionscript">package com.as3dp.utils
{
  public class ParseURL
  {
    private var components : Object;

    public function ParseURL( anURL:String )
    {
      var pattern:RegExp = /^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(?([^#]*))?(#(.*))?/i;
      components = pattern.exec( anURL );
      if ( ! components ) throw new URIError( 'Malformed URL - cannot parse' );
    }

    public function get input():String
    {
      return components.input;
    }

    public function get scheme():String
    {
      return components[ 2 ];
    }

    public function get authority():String
    {
      return components[ 4 ];
    }

    public function get path():String
    {
      return components[ 5 ];
    }

    public function get query():String
    {
      return components[ 7 ];
    }

    public function get fragment():String
    {
      return components[ 9 ];
    }

    public function get parentDirPath():String
    {
      var dirPat:RegExp = /.*/(?=[^/.]+.[^/.]+$)/i;
      var dircomp:Object = dirPat.exec( components[ 5 ] );
      var dirPath:String = ( dircomp ) ? dircomp[ 0 ] : components[ 5 ];
      return ( dirPath.search( //$/ ) &gt;= 0 )? dirPath : dirPath + '/';
    }

    public function toString():String
    {
      return String( components );
    }
  }
}</pre>
<p><strong>New class hierarchy for version 2</strong></p>
<div id="attachment_632" class="wp-caption alignnone" style="width: 336px"><img class="size-full wp-image-632" title="New class hierarchy (Flash CS3 Project Panel)" src="http://www.as3dp.com/wp-content/uploads/2009/01/fig05.jpg" alt="New class hierarchy (Flash CS3 Project Panel)" width="326" height="324" /><p class="wp-caption-text">Flash CS3 Project Panel showing the new class hierarchy</p></div>
<p><strong>The new </strong><em><strong>Concrete Aggregate</strong></em><strong>: HTMLPage</strong></p>
<p>The primary difference here is that <code>HTMLPage</code> treats the web page as a text string as opposed to an <code>XML</code> structure. I also squashed the encapsulation issue by subclassing <code>EventDispatcher</code> and <em>composing</em> an <code>URLLoader</code> object to load the URL.</p>
<pre title="HTMLPage.as" lang="actionscript">package com.as3dp.patterns.iterator.scraper
{
  import flash.errors.*;
  import flash.events.*;
    import flash.net.*;

  import com.as3dp.interfaces.iterator.*;
  import com.as3dp.patterns.iterator.scraper.*;

  public class HTMLPage extends EventDispatcher implements IIterableAggregate
  {
    public static const HYPERLINKS  :String =  'Iterate over hyperlinks';
    public static const IMAGES    :String =  'Iterate over images';
    public static const LOADED    :String =  'page loaded';

    internal  var htmlsrc   : String = null;
    internal  var url     : String;

    public function HTMLPage( pageURL:String, callbackFn:Function )
    {
      url = pageURL;
      var urlRequest:URLRequest = new URLRequest( url );
      var loader:URLLoader = new URLLoader();
      loader.addEventListener( IOErrorEvent.IO_ERROR, ioErrorHandler );
      loader.addEventListener( Event.COMPLETE, loadComplete );
      loader.load( urlRequest );

      addEventListener( LOADED, callbackFn ); // register client as listener
    }

    private function loadComplete( evt:Event ) : void
    {
      htmlsrc = evt.target.data;
      dispatchEvent( new Event( LOADED ) ); // dispatch event to client
    }

    private function ioErrorHandler( evt:IOErrorEvent ):void {
            trace( 'ioErrorHandler: ', evt );
    }

    public function createIterator( type : String = null ) : IIterator
    {
      if ( htmlsrc )
      {
        switch( type )
        {
          case null:
          case HYPERLINKS:
            return new HyperlinkIteratorRegex( this );
            break;
          case IMAGES:
            return new ImageIteratorRegex( this );
            break;
          default:
            throw new ArgumentError('Invalid iterator type specified');
        }
      } else {
        throw new IOError('Cannot create an iterator for a page that is not loaded');
      }
    }
  }
}</pre>
<p><strong>Two new <em>concrete iterators</em>: <code>HyperlinkIteratorRegex</code> and <code>ImageIteratorRegex</code></strong></p>
<p>The new iterators are very similar to each other. The constructor builds an array of all URLs using regular expression pattern matching. Note the repeated application of the <code><a href="http://livedocs.adobe.com/flash/9.0/ActionScriptLangRefV3/RegExp.html#exec()" target="_blank">exec</a></code> method in the <code>RegExp</code> class to find multiple matches. The <code>next()</code> method utilizes the <code>ParseURL</code> utility class to convert all relative and absolute URLs into fully qualified ones.</p>
<pre title="HyperlinkIteratorRegex.as" lang="actionscript">package com.as3dp.patterns.iterator.scraper
{
  import com.as3dp.interfaces.iterator.*;
  import com.as3dp.patterns.iterator.scraper.*;
  import com.as3dp.utils.*;

  public class HyperlinkIteratorRegex implements IIterator
  {
    private var col     : HTMLPage;
    private var aList   : Array;
    private var index   : Number;
    private var parsedURL : ParseURL;

    public function HyperlinkIteratorRegex( aCollection : HTMLPage )
    {
      col = aCollection;
      parsedURL = new ParseURL( col.url );

      var pattern:RegExp = "/= 0 ) // absolute URL
      {
        return parsedURL.scheme + '://' + parsedURL.authority + hyperlink;
      }
      else // relative URL
      {
        return parsedURL.scheme + '://' + parsedURL.authority +
              parsedURL.parentDirPath + hyperlink;
      }
    }

    public function hasNext() : Boolean
    {
      return ( index &lt; aList.length - 1 );
    }
  }
}</pre>
<p><strong>The new Client</strong></p>
<p>The client can now do some serious scraping. The <code>pageLoaded</code> callback method recursively traverses each hyperlink and harvests the images on each web page.</p>
<pre title="Main.as" lang="actionscript">package
{
  import flash.display.Sprite;
  import flash.events.*;
  import flash.net.*;

  import com.as3dp.interfaces.iterator.*;
  import com.as3dp.patterns.iterator.scraper.*;

  /**
  * Main Class
  * @ purpose:    Document class for movie
  */
  public class Main extends Sprite
  {
    private var url:String = 'http://tester.as3dp.com/Welcome.html';

    private static const MAXPAGES:uint = 100; // max # of web pages to traverse
    private static var webPageList:Array = [];
    private static var imageList:Array = [];

    public function Main()
    {
      // load a concrete aggregate - an HTML page
      new HTMLPage( url, pageLoaded );
    }

    /**
    * pageLoaded method
    * @ purpose:    Event handler method - called when HTML page is loaded
    */
    private function pageLoaded( evt:Event ) : void
    {
      var webPage:IIterableAggregate = IIterableAggregate( evt.target );

      // get an iterator for hyperlinks
      var linkItr:IIterator = webPage.createIterator( XHTMLPage.HYPERLINKS );

      // iterate over the hyperlinks
      while ( linkItr.hasNext() )
      {
        var hyperlink:String = linkItr.next();
        if ( ( ! inList( hyperlink, webPageList ))  &amp;&amp; ( webPageList.length &lt;= MAXPAGES ) )
        {
          // get an iterator for images
          var imageItr:IIterator = webPage.createIterator( XHTMLPage.IMAGES );
          while ( imageItr.hasNext() )
          {
            // iterate over image links
            var imageLink:String = imageItr.next();
            if ( ! inList( imageLink, imageList ) )
            {
              trace ( imageLink );
            }
          }
          new HTMLPage( hyperlink, pageLoaded ); // load new HTML page
        }
      }
    }

    /**
    * inList Utility Method
    * @ purpose:    Check if passed String (URL) is in the passed Array
    *         If not, push it into the Array.
    */
    private function inList ( anURL:String, aList:Array ) : Boolean
    {
      for each (var item in aList)
      {
        if ( anURL == item ) return true;
      }
      aList.push( anURL );
      return false;
    }

  }
}</pre>
<p>Limiting the recursive traversal is important. Ideally, the web scraper should be limited to scraping pages in a single host or to stop at a particular hierarchical depth on a site. However, I took the easy way out by limiting by the number of web pages treaversed. The <code>inList</code> method serves an important purpose by keeping track of all hyperlinks and image links. This is to prevent traversal of pages already accessed and also to stop the recursion when when the MAXPAGES limit is reached.</p>
<p><strong>Output</strong></p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p186code2'); return false;">View Code</a> TEXT</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p1862"><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code" id="p186code2"><pre class="text" style="font-family:monospace;">http://tester.as3dp.com/Welcome_files/44143A3_a.jpg
http://tester.as3dp.com/Movie_files/movie_stripe.jpg
http://tester.as3dp.com/Photos_files/AA041434_a.jpg
http://tester.as3dp.com/Photos_files/image.png
ioErrorHandler: [IOErrorEvent type=&quot;ioError&quot; bubbles=false cancelable=false eventPhase=2
     text=&quot;Error #2032: Stream Error. URL: javascript:void(0)&quot;]</pre></td></tr></table></div>

<p>Now it traverses the whole site. But there is an error as one of the hyperlinks is a javascript link. Looks like we have to modify the regular expression to only match &lt;a&gt; tags that don&#8217;t have <code>onclick</code> attributes. I&#8217;ll leave that for one of you regular expression wizards. I&#8217;m sure there are more errors due to edge cases.</p>
<p>I wanted to walk through a couple of iterations of developing an application that utilizes the iterator pattern. We managed to discuss some important OOP concepts along the way, which is a bonus. This is a more functional example than the ones we typically develop on this blog. It&#8217;s always a hit-or-miss as to whether the detail and the length of the post get in the way of understanding the core pattern &#8211; comments welcome!</p>
<p><strong><a name="source"></a>Source</strong></p>
<ul>
<li><a href="http://www.as3dp.com/wp-content/uploads/2009/01/webscraper-ver1.zip">webscraper-ver1.zip</a></li>
<li><a href="http://www.as3dp.com/wp-content/uploads/2009/01/webscraper-ver2.zip">webscraper-ver2.zip</a></li>
</ul>
<p><strong>References:</strong><br />
Gamma, Erich; Richard Helm, Ralph Johnson, and John Vlissides (1995). <a href="http://www.amazon.com/dp/0201633612" target="_blank">Design Patterns: Elements of Reusable Object-Oriented Software</a>. Addison-Wesley.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.as3dp.com/2009/01/iterator-pattern-example-developing-a-webpage-scraper/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Iterator Pattern: Flexible implementation of collections</title>
		<link>http://www.as3dp.com/2008/09/the-iterator-pattern-flexible-implementation-of-collections/</link>
		<comments>http://www.as3dp.com/2008/09/the-iterator-pattern-flexible-implementation-of-collections/#comments</comments>
		<pubDate>Thu, 04 Sep 2008 22:05:02 +0000</pubDate>
		<dc:creator>Chandima Cumaranatunge</dc:creator>
				<category><![CDATA[Iterator]]></category>

		<guid isPermaLink="false">http://www.as3dp.com/?p=137</guid>
		<description><![CDATA[We tend to think of the iterator pattern as simply a mechanism to access a collection of items in some sort of order. However, the motivation for the iterator pattern goes quite a bit beyond this everyday requirement. It is specifically designed to hide how you decide to structure the collection, but still provide a [...]]]></description>
			<content:encoded><![CDATA[<p>We tend to think of the <em>iterator pattern</em> as simply a mechanism to access a collection of items in some sort of order. However, the motivation for the iterator pattern goes quite a bit beyond this everyday requirement. It is specifically designed to <strong>hide</strong> how you decide to structure the collection, but still provide a uniform <em>interface</em> to traverse and access its elements. We will look at a couple of examples where we change how the collection is implemented but keep the client oblivious to the change.</p>
<p>Let&#8217;s take a look at the class diagram to determine the relationships between the classes in the iterator pattern.</p>
<p><a href="http://www.as3dp.com/wp-content/uploads/2008/09/fig01.jpg"><img class="alignnone size-full wp-image-122" title="Class diagram of the iterator pattern" src="http://www.as3dp.com/wp-content/uploads/2008/09/fig01.jpg" alt="" width="410" height="205" /></a><br />
 <span id="more-137"></span><br />
 Gamma et. al. refer to a collection of items as an <strong>aggregate</strong>. A <strong>concrete aggregate</strong> implements the <code>IIterableAggregate</code> <em>interface</em>. The <code>IIterableAggregate</code> <em>interface</em> has to define <strong>at least one</strong> method &#8211; called <code>createIterator</code> in this case. Now this is significant because the interface doesn&#8217;t specify anything about what data structure should be used to implement your collection of items. The only thing that the client knows about the concrete aggregate is that the <code><em>createIterator()</em></code> method will return a <strong>concrete iterator</strong>.</p>
<p><strong><em>IIterableAggregate.as</em></strong></p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p137code11'); return false;">View Code</a> ACTIONSCRIPT</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p13711"><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code" id="p137code11"><pre class="actionscript" style="font-family:monospace;">package
<span style="color: #66cc66;">&#123;</span>
    <span style="color: #0066CC;">public</span> <span style="color: #0066CC;">interface</span> IIterableAggregate
    <span style="color: #66cc66;">&#123;</span>
        <span style="color: #000000; font-weight: bold;">function</span> createIterator<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> : IIterator
    <span style="color: #66cc66;">&#125;</span>
<span style="color: #66cc66;">&#125;</span></pre></td></tr></table></div>

<p>The <strong>concrete iterator</strong> is of type <em>IIterator</em>. The <em>IIterator interface</em> defines the basic methods to traverse and access elements of a collection ( I will use the more contemporary term &#8220;collection&#8221; to refer to a concrete aggregate &#8211; not to be confused with the <code>Collection</code> class in AS2 ).</p>
<p><strong><em>IIterator.as</em></strong></p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p137code12'); return false;">View Code</a> ACTIONSCRIPT</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p13712"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
</pre></td><td class="code" id="p137code12"><pre class="actionscript" style="font-family:monospace;">package
<span style="color: #66cc66;">&#123;</span>
    <span style="color: #0066CC;">public</span> <span style="color: #0066CC;">interface</span> IIterator
    <span style="color: #66cc66;">&#123;</span>
        <span style="color: #000000; font-weight: bold;">function</span> reset<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> : <span style="color: #0066CC;">void</span>
        <span style="color: #000000; font-weight: bold;">function</span> next<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> : <span style="color: #66cc66;">*</span>
        <span style="color: #000000; font-weight: bold;">function</span> hasNext<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> : <span style="color: #0066CC;">Boolean</span>
    <span style="color: #66cc66;">&#125;</span>
<span style="color: #66cc66;">&#125;</span></pre></td></tr></table></div>

<p>The <em>next()</em> method returns the next element in the collection. Note that it doesn&#8217;t specify a return type, allowing the collection to contain any object. The <em>hasNext()</em> method returns a boolean value indicating whether the end of the collection has been reached. The <em>reset()</em> method simply re-positions the index to the starting element.</p>
<p>The two interfaces make the intent of the iterator pattern very clear. The interfaces don&#8217;t specify what form or structure  the collection takes. The collection can be an <code>Array</code>, <code>Dictionary</code> or a composite like an <code>XML</code> node. However, in combination they do two essential things that enables flexible implementation of collections. First, the <em>Iterator interface</em> provides a uniform way to access elements in a collection. This means that, no matter what structure you use, the collection will be traversed and elements accessed using the <code>next()</code>, <code>hasNext()</code>, and <code>reset()</code> methods. Second, it offloads the iteration and access methods to a separate <strong><em>ConcreteIterator</em></strong> class. Take a look back at the <strong><em>ConcreteAggregate</em></strong> class in the class diagram and note how the <code><em>createIterator()</em></code> method is implemented. The returned instance of <em>ConcreteIterator </em>has a parameterized constructor that takes in the instance of the <em>ConcreteAggregate</em> object as an argument. You can think of the iterator as a <strong>wrapper</strong> around the collection hiding its implementation details. This is what enables the iterator pattern to create extensible and reusable collections.</p>
<p><strong>The Simple List</strong></p>
<p>Let&#8217;s develop a minimalist example of an iterable aggregate. The simple list enables clients to create a list of items and access its elements sequentially.</p>
<p><strong><em>SimpleList.as</em></strong></p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p137code13'); return false;">View Code</a> ACTIONSCRIPT</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p13713"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
</pre></td><td class="code" id="p137code13"><pre class="actionscript" style="font-family:monospace;">package
<span style="color: #66cc66;">&#123;</span>
    <span style="color: #0066CC;">public</span> <span style="color: #000000; font-weight: bold;">class</span> SimpleList <span style="color: #0066CC;">implements</span> IIterableAggregate
    <span style="color: #66cc66;">&#123;</span>
        <span style="color: #0066CC;">private</span> <span style="color: #000000; font-weight: bold;">var</span> arr : <span style="color: #0066CC;">Array</span>;
&nbsp;
        <span style="color: #0066CC;">public</span> <span style="color: #000000; font-weight: bold;">function</span> SimpleList<span style="color: #66cc66;">&#40;</span> ... <span style="color: #006600;">args</span> <span style="color: #66cc66;">&#41;</span>
        <span style="color: #66cc66;">&#123;</span>
            arr = <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #0066CC;">Array</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>;
            <span style="color: #b1b100;">for</span> <span style="color: #66cc66;">&#40;</span> <span style="color: #000000; font-weight: bold;">var</span> i:uint = <span style="color: #cc66cc;">0</span>; i <span style="color: #66cc66;">&amp;</span>lt; args.<span style="color: #0066CC;">length</span>; i++ <span style="color: #66cc66;">&#41;</span>
            <span style="color: #66cc66;">&#123;</span>
                arr<span style="color: #66cc66;">&#91;</span>i<span style="color: #66cc66;">&#93;</span> = args<span style="color: #66cc66;">&#91;</span>i<span style="color: #66cc66;">&#93;</span>;
            <span style="color: #66cc66;">&#125;</span>
        <span style="color: #66cc66;">&#125;</span>
&nbsp;
        <span style="color: #0066CC;">public</span> <span style="color: #000000; font-weight: bold;">function</span> createIterator<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> : IIterator
        <span style="color: #66cc66;">&#123;</span>
            <span style="color: #b1b100;">return</span> <span style="color: #000000; font-weight: bold;">new</span> SimpleListIterator<span style="color: #66cc66;">&#40;</span> arr <span style="color: #66cc66;">&#41;</span>;
        <span style="color: #66cc66;">&#125;</span>
    <span style="color: #66cc66;">&#125;</span>
<span style="color: #66cc66;">&#125;</span></pre></td></tr></table></div>

<p>The <code>SimpleList</code> class implements the <code>IIterableAggregate</code> interface. An <code>Array</code> is used to hold the collection by inserting all arguments passed to the constructor ( elements in the list ). The <code>createIterator()</code> method returns an iterator of type <code>SimpleListIterator</code>. Note how the Array that holds the collection is passed as an argument to the <code>SimpleListIterator </code>constructor.</p>
<p><strong><em>SimpleListIterator.as</em></strong></p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p137code14'); return false;">View Code</a> ACTIONSCRIPT</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p13714"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
</pre></td><td class="code" id="p137code14"><pre class="actionscript" style="font-family:monospace;">package
<span style="color: #66cc66;">&#123;</span>
    <span style="color: #0066CC;">public</span> <span style="color: #000000; font-weight: bold;">class</span> SimpleListIterator <span style="color: #0066CC;">implements</span> IIterator
    <span style="color: #66cc66;">&#123;</span>
        <span style="color: #0066CC;">private</span> <span style="color: #000000; font-weight: bold;">var</span> <span style="color: #0066CC;">list</span>  : <span style="color: #0066CC;">Array</span>;
        <span style="color: #0066CC;">private</span> <span style="color: #000000; font-weight: bold;">var</span> <span style="color: #0066CC;">index</span> : <span style="color: #0066CC;">Number</span>;
&nbsp;
        <span style="color: #0066CC;">public</span> <span style="color: #000000; font-weight: bold;">function</span> SimpleListIterator<span style="color: #66cc66;">&#40;</span> aList : <span style="color: #0066CC;">Array</span> <span style="color: #66cc66;">&#41;</span>
        <span style="color: #66cc66;">&#123;</span>
            <span style="color: #0066CC;">list</span> = aList;
            reset<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>;
        <span style="color: #66cc66;">&#125;</span>
&nbsp;
        <span style="color: #0066CC;">public</span> <span style="color: #000000; font-weight: bold;">function</span> reset<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> : <span style="color: #0066CC;">void</span>
        <span style="color: #66cc66;">&#123;</span>
            <span style="color: #0066CC;">index</span> = -<span style="color: #cc66cc;">1</span>;
        <span style="color: #66cc66;">&#125;</span>
&nbsp;
        <span style="color: #0066CC;">public</span> <span style="color: #000000; font-weight: bold;">function</span> next<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> : <span style="color: #66cc66;">*</span>
        <span style="color: #66cc66;">&#123;</span>
            <span style="color: #b1b100;">return</span> <span style="color: #0066CC;">list</span><span style="color: #66cc66;">&#91;</span> ++<span style="color: #0066CC;">index</span> <span style="color: #66cc66;">&#93;</span>;
        <span style="color: #66cc66;">&#125;</span>
&nbsp;
        <span style="color: #0066CC;">public</span> <span style="color: #000000; font-weight: bold;">function</span> hasNext<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> : <span style="color: #0066CC;">Boolean</span>
        <span style="color: #66cc66;">&#123;</span>
            <span style="color: #b1b100;">return</span> <span style="color: #66cc66;">&#40;</span> <span style="color: #0066CC;">index</span> <span style="color: #66cc66;">&amp;</span>lt; <span style="color: #0066CC;">list</span>.<span style="color: #0066CC;">length</span> - <span style="color: #cc66cc;">1</span> <span style="color: #66cc66;">&#41;</span>;
        <span style="color: #66cc66;">&#125;</span>
    <span style="color: #66cc66;">&#125;</span>
<span style="color: #66cc66;">&#125;</span></pre></td></tr></table></div>

<p>The concrete aggregate and concrete iterator classes are interdependent as indicated in the class diagram. The <em>dashed arrow</em> from the <code>ConcreteAggregate</code> class to the <code>ConcreteIterator</code> class indicates a <strong>dependency</strong>; <code>ConcreteAggregate</code> depends on <code>ConcreteIterator</code> to implement the <code>createIterator()</code> method. Conversely, the <em>solid arrow</em> from the <code>ConcreteIterator</code> class to the <code>ConcreteAggregate</code> class indicates a one-way <strong>association</strong> where <code>ConcreteIterator</code> can access the internal properties of the <code>ConcreteAggregate</code> class.</p>
<p>Now all we have left to do is implement a client that instantiates a <code>SimpleList</code> and iterates through it.</p>
<p><strong><em>Main.as</em></strong></p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p137code15'); return false;">View Code</a> ACTIONSCRIPT</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p13715"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
</pre></td><td class="code" id="p137code15"><pre class="actionscript" style="font-family:monospace;">package
<span style="color: #66cc66;">&#123;</span>
    <span style="color: #0066CC;">import</span> flash.<span style="color: #006600;">display</span>.<span style="color: #006600;">Sprite</span>;
&nbsp;
    <span style="color: #808080; font-style: italic;">/**
    *   Main Class
    *   @ purpose:      Document class for movie
    */</span>
    <span style="color: #0066CC;">public</span> <span style="color: #000000; font-weight: bold;">class</span> Main <span style="color: #0066CC;">extends</span> Sprite
    <span style="color: #66cc66;">&#123;</span>
        <span style="color: #0066CC;">public</span> <span style="color: #000000; font-weight: bold;">function</span> Main<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>
        <span style="color: #66cc66;">&#123;</span>
&nbsp;
            <span style="color: #808080; font-style: italic;">// create an instance of a concrete aggregate</span>
            <span style="color: #000000; font-weight: bold;">var</span> groceryList:SimpleList = <span style="color: #000000; font-weight: bold;">new</span> SimpleList<span style="color: #66cc66;">&#40;</span> <span style="color: #ff0000;">'bread'</span>,<span style="color: #ff0000;">'butter'</span>,<span style="color: #ff0000;">'eggs'</span> <span style="color: #66cc66;">&#41;</span>;
&nbsp;
            <span style="color: #808080; font-style: italic;">// get an iterator for it</span>
            <span style="color: #000000; font-weight: bold;">var</span> itr:IIterator = groceryList.<span style="color: #006600;">createIterator</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>
&nbsp;
            <span style="color: #808080; font-style: italic;">// iterate over the agregate</span>
            <span style="color: #b1b100;">while</span> <span style="color: #66cc66;">&#40;</span> itr.<span style="color: #006600;">hasNext</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#41;</span>
            <span style="color: #66cc66;">&#123;</span>
                <span style="color: #0066CC;">trace</span> <span style="color: #66cc66;">&#40;</span> itr.<span style="color: #006600;">next</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#41;</span>;
            <span style="color: #66cc66;">&#125;</span>
        <span style="color: #66cc66;">&#125;</span>
    <span style="color: #66cc66;">&#125;</span>
<span style="color: #66cc66;">&#125;</span></pre></td></tr></table></div>

<p>The client creates a grocery list, gets an iterator for it, and traces the list. Pretty simple example, but let&#8217;s see why the iterator is such an interesting pattern by changing how the collection is implemented.</p>
<p><strong>Changing the implementation</strong></p>
<p>Let&#8217;s say that we decide to implement the collection using an <code>XML</code> structure instead of an <code>Array</code>. Assume an hypothetical efficiency provided by the new implementation.</p>
<p><strong>Modified <em>SimpleList.as</em></strong></p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p137code16'); return false;">View Code</a> ACTIONSCRIPT</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p13716"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
</pre></td><td class="code" id="p137code16"><pre class="actionscript" style="font-family:monospace;">package
<span style="color: #66cc66;">&#123;</span>
    <span style="color: #0066CC;">public</span> <span style="color: #000000; font-weight: bold;">class</span> SimpleList <span style="color: #0066CC;">implements</span> IIterableAggregate
    <span style="color: #66cc66;">&#123;</span>
        <span style="color: #0066CC;">private</span> <span style="color: #000000; font-weight: bold;">var</span> <span style="color: #0066CC;">xml</span> : <span style="color: #0066CC;">XML</span>;
&nbsp;
        <span style="color: #0066CC;">public</span> <span style="color: #000000; font-weight: bold;">function</span> SimpleList<span style="color: #66cc66;">&#40;</span> ... <span style="color: #006600;">args</span> <span style="color: #66cc66;">&#41;</span>
        <span style="color: #66cc66;">&#123;</span>
            <span style="color: #0066CC;">xml</span> =
;
            <span style="color: #b1b100;">for</span> <span style="color: #66cc66;">&#40;</span> <span style="color: #000000; font-weight: bold;">var</span> i:uint = <span style="color: #cc66cc;">0</span>; i <span style="color: #66cc66;">&amp;</span>lt; args.<span style="color: #0066CC;">length</span>; i++ <span style="color: #66cc66;">&#41;</span>
            <span style="color: #66cc66;">&#123;</span>
                <span style="color: #0066CC;">xml</span>.<span style="color: #0066CC;">appendChild</span><span style="color: #66cc66;">&#40;</span> <span style="color: #66cc66;">&#123;</span> args<span style="color: #66cc66;">&#91;</span>i<span style="color: #66cc66;">&#93;</span> <span style="color: #66cc66;">&#125;</span> <span style="color: #66cc66;">&#41;</span>;
            <span style="color: #66cc66;">&#125;</span>
        <span style="color: #66cc66;">&#125;</span>
&nbsp;
        <span style="color: #0066CC;">public</span> <span style="color: #000000; font-weight: bold;">function</span> createIterator<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> : IIterator
        <span style="color: #66cc66;">&#123;</span>
            <span style="color: #b1b100;">return</span> <span style="color: #000000; font-weight: bold;">new</span> XMLListIterator<span style="color: #66cc66;">&#40;</span> <span style="color: #0066CC;">xml</span> <span style="color: #66cc66;">&#41;</span>;
        <span style="color: #66cc66;">&#125;</span>
    <span style="color: #66cc66;">&#125;</span>
<span style="color: #66cc66;">&#125;</span></pre></td></tr></table></div>

<p>A new type of iterator is needed now since the collection is implemented using a different data structure. An <code>XMLListIterator</code> class is implemented to iterate through an <code>XML</code> list.</p>
<p><strong><em>XMLListIterator.as</em></strong></p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p137code17'); return false;">View Code</a> ACTIONSCRIPT</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p13717"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
</pre></td><td class="code" id="p137code17"><pre class="actionscript" style="font-family:monospace;">package
<span style="color: #66cc66;">&#123;</span>
    <span style="color: #0066CC;">public</span> <span style="color: #000000; font-weight: bold;">class</span> XMLListIterator <span style="color: #0066CC;">implements</span> IIterator
    <span style="color: #66cc66;">&#123;</span>
        <span style="color: #0066CC;">private</span> <span style="color: #000000; font-weight: bold;">var</span> <span style="color: #0066CC;">xml</span>   : <span style="color: #0066CC;">XML</span>;
        <span style="color: #0066CC;">private</span> <span style="color: #000000; font-weight: bold;">var</span> <span style="color: #0066CC;">index</span> : <span style="color: #0066CC;">Number</span>;
&nbsp;
        <span style="color: #0066CC;">public</span> <span style="color: #000000; font-weight: bold;">function</span> XMLListIterator<span style="color: #66cc66;">&#40;</span> xmlObject : <span style="color: #0066CC;">XML</span> <span style="color: #66cc66;">&#41;</span>
        <span style="color: #66cc66;">&#123;</span>
            <span style="color: #0066CC;">xml</span> = xmlObject;
            reset<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>;
        <span style="color: #66cc66;">&#125;</span>
&nbsp;
        <span style="color: #0066CC;">public</span> <span style="color: #000000; font-weight: bold;">function</span> reset<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> : <span style="color: #0066CC;">void</span>
        <span style="color: #66cc66;">&#123;</span>
            <span style="color: #0066CC;">index</span> = -<span style="color: #cc66cc;">1</span>;
        <span style="color: #66cc66;">&#125;</span>
&nbsp;
        <span style="color: #0066CC;">public</span> <span style="color: #000000; font-weight: bold;">function</span> next<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> : <span style="color: #66cc66;">*</span>
        <span style="color: #66cc66;">&#123;</span>
            <span style="color: #b1b100;">return</span> <span style="color: #0066CC;">xml</span>.<span style="color: #006600;">fooditem</span><span style="color: #66cc66;">&#91;</span> ++<span style="color: #0066CC;">index</span> <span style="color: #66cc66;">&#93;</span>;
        <span style="color: #66cc66;">&#125;</span>
&nbsp;
        <span style="color: #0066CC;">public</span> <span style="color: #000000; font-weight: bold;">function</span> hasNext<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> : <span style="color: #0066CC;">Boolean</span>
        <span style="color: #66cc66;">&#123;</span>
            <span style="color: #b1b100;">return</span> <span style="color: #66cc66;">&#40;</span> <span style="color: #0066CC;">index</span> <span style="color: #66cc66;">&amp;</span>lt; <span style="color: #0066CC;">xml</span>.<span style="color: #006600;">children</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>.<span style="color: #0066CC;">length</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> - <span style="color: #cc66cc;">1</span> <span style="color: #66cc66;">&#41;</span>;
        <span style="color: #66cc66;">&#125;</span>
    <span style="color: #66cc66;">&#125;</span>
<span style="color: #66cc66;">&#125;</span></pre></td></tr></table></div>

<p>Note that we completely changed the implementation of the concrete aggregate and utilized a new concrete iterator as well. However, the <strong>client code will still work as before without requiring any modification.</strong></p>
<p>The iterator pattern allows flexible implementation of collections by not exposing the internal structure of the collection to the client. The iterator pattern is also a good example of how a well defined <em>interface</em> can hide implementation details but expose how modular code can be used.</p>
<p><strong>Why not use <em>for each&#8230;in</em> or <em>for&#8230;in</em>?</strong></p>
<p>What was that? Total waste of time did you say? Why not use the built-in iterator?</p>
<p>Pretty much all the modern languages have built-in iterators to traverse and access collections; and ActionScript 3 is no exception. This is testament to the utility of design patterns as many of the GoF patterns are natively implemented in AS3. The grocery list could have been created and accessed this way:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p137code18'); return false;">View Code</a> ACTIONSCRIPT</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p13718"><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code" id="p137code18"><pre class="actionscript" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">var</span> groceryList:<span style="color: #0066CC;">Array</span> = <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #0066CC;">Array</span><span style="color: #66cc66;">&#40;</span> <span style="color: #ff0000;">'bread'</span>,<span style="color: #ff0000;">'butter'</span>,<span style="color: #ff0000;">'eggs'</span> <span style="color: #66cc66;">&#41;</span>;
&nbsp;
<span style="color: #b1b100;">for</span> <span style="color: #b1b100;">each</span> <span style="color: #66cc66;">&#40;</span><span style="color: #000000; font-weight: bold;">var</span> item:<span style="color: #66cc66;">*</span> <span style="color: #b1b100;">in</span> groceryList<span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&#123;</span>
    <span style="color: #0066CC;">trace</span> <span style="color: #66cc66;">&#40;</span> item <span style="color: #66cc66;">&#41;</span>;
<span style="color: #66cc66;">&#125;</span></pre></td></tr></table></div>

<p>The collection could have been implemented using any one of the native structures that are enumerable such as <code>XML</code>, <code>Dictionary</code> or even <code>Object</code> and easily accessed using the <em>for each&#8230;in</em> looping construct. From a purist&#8217;s standpoint, there is a downside as it requires the implementation of the collection to be exposed to the client &#8211; since the collection itself is an argument to <em>for each&#8230;in</em>.</p>
<p>So, would we ever need to implement an iterator pattern when we have these built-in constructs? Sure! there are collections that are not natively implemented in AS3 such as linked lists, trees, and custom structures that may be required for specific apps. There are also multiple ways of traversing collections. For example, there may be a need to traverse a collection randomly and not in any sequence. A tree structure can be traversed <a href="http://en.wikipedia.org/wiki/Breadth-first_search" target="_blank"><em>breadth-first</em></a> as opposed to a <em>depth-first</em> to access its nodes. Implementing an iterator pattern will provide a uniform interface for each unique traversal method.</p>
<p>Look out for a future article on the iterator pattern where we will look at a more functional example that will utilize multiple traversal methods.</p>
<p><strong>Source:</strong></p>
<ul>
<li><a href="http://www.as3dp.com/wp-content/uploads/2008/09/iterator_example011.zip">iterator_example01</a></li>
<li><a href="http://www.as3dp.com/wp-content/uploads/2008/09/iterator_example02.zip">iterator_example02</a></li>
</ul>
<p><strong>References:</strong><br />
 Gamma, Erich; Richard Helm, Ralph Johnson, and John Vlissides (1995). <a href="http://www.amazon.com/dp/0201633612" target="_blank">Design Patterns: Elements of Reusable Object-Oriented Software</a>. Addison-Wesley.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.as3dp.com%2F2008%2F09%2Fthe-iterator-pattern-flexible-implementation-of-collections%2F&amp;title=The%20Iterator%20Pattern%3A%20Flexible%20implementation%20of%20collections" id="wpa2a_2"><img src="http://www.as3dp.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.as3dp.com/2008/09/the-iterator-pattern-flexible-implementation-of-collections/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
	</channel>
</rss>

