An important announcement from the founder of property-bee.com: The future of Property Bee is assured.

Documentation: Rules

An area to discuss the development and technical aspects of the toolbar

Documentation: Rules

Postby Beerhunter on Wed Oct 07, 2009 8:41 pm

[updated 2009-10-11 with missing PB_ConstructSample code]

Had a bit of a brain dump today about how the rule code works, and how to write and maintain it.

There are probably typos, some mistakes, and bits missing.. however its probably better than having no documentation at all ;)

Overview

For each website supported by the toolbar there is a class which contains the "rules" to parse webpages.

Each class is in its own source file;

  • pb_rules_daft.js - class PB_DaftRules
  • pb_rules_espc.js - class PB_EspcRules
  • pb_rules_gspc.js - class PB_GspcRules
  • pb_rules_propertynews.js - class PB_PropertyNewRules
  • pb_rules_rightmove.js - class PB_RightmoveRules
  • pb_rules_sspc.js - class PB_SspcRules

Each class provides the following public interface;

Code: Select all
function PB_XXXXRules() {

   /*
      Arguments;
       - documentNode is a handle to the document to capture info from
   
      Return;
      - site identifies which site/rules were used
      - results contains the captured info as a hierarchy of objects (more on this later)
   
      If the capture method fails, an exception should be thrown.
   */
   this.capture = function(documentNode) {
      ....
      return { site:"xxxx", results: results};
   }
}


When the add-on is loaded, PB_Service.initialise() (in components/pb_service.js) creates an instance of the each class and associates it with one or more website hostnames;

Code: Select all
        // Register rules
        var rightmoveRules = new PB_RightmoveRules()
        registerHostRules("www.rightmove.co.uk", rightmoveRules);
        registerHostRules("rightmove.co.uk", rightmoveRules);
        var daftRules = new PB_DaftRules();
        registerHostRules("www.daft.ie", daftRules);
        registerHostRules("daft.ie", daftRules);
        registerHostRules("www1.daft.ie", daftRules);
        registerHostRules("www2.daft.ie", daftRules);
        registerHostRules("www3.daft.ie", daftRules);
        registerHostRules("www4.daft.ie", daftRules);
        registerHostRules("www5.daft.ie", daftRules);
        registerHostRules("www6.daft.ie", daftRules);
        registerHostRules("www7.daft.ie", daftRules);
        registerHostRules("www8.daft.ie", daftRules);
        registerHostRules("www9.daft.ie", daftRules);
        var propertyNewsRules = new PB_PropertyNewsRules();
        registerHostRules("www.propertynews.com", propertyNewsRules);
        registerHostRules("propertynews.com", propertyNewsRules);
        var sspcRules = new PB_SspcRules();
        registerHostRules("www.sspc.co.uk", sspcRules);
        registerHostRules("sspc.co.uk", sspcRules);
        var espcRules = new PB_EspcRules();
        registerHostRules("www.espc.com", espcRules);
        registerHostRules("espc.com", espcRules);
        registerHostRules("www.espc.co.uk", espcRules);
        registerHostRules("espc.co.uk", espcRules);
        var gspcRules = new PB_GspcRules();
        registerHostRules("www.gspc.co.uk", gspcRules);
        registerHostRules("gspc.co.uk", gspcRules);


ie the rightmove rules work with 'rightmove.co.uk', whereas the daft rules work with 'www.daft.ie', 'daft.ie', 'www1.daft.ie'... etc.

When the toolbar detects that firefox has loaded a webpage from a known host, it will call the capture() function of the associated rules class (ie if a rightmove pages is loaded, PB_RightmoveRules.capture() is called) to get the current property info before adding info from the local database/server and displaying the properties history on the webpage.

Inside the Rules class

In essence, the following steps performed by the class;

  • Work out which rule should be applied to the webpage based on the url
  • Apply the rule by;
    • Identify which sections of the webpage belong to each property
    • Within each section identify which html element represents the price, title, description etc
    • For each html element what attribute, or text contains the required information, whether the text needs to be transformed and finally confirm the captured information is of the correct format.

Determining which rule to apply to the webpage

Each class has very similar implementation of the capture() function;

Code: Select all
   var rules =
    {
      'xyz.co.uk/for-sale/abroad': [ rule1() ], // This is a more specialised rule, so must occur before the following rule
      'xyz.co.uk/for-sale/': [ rule1(), rule2() ],
      'xyz.co.uk/to-rent/': [ rule3() ]
    };

   this.capture = function(documentNode) {
      for(var url in rules) {
         if (documentNode.location.href.indexOf(url) !== -1) {
             var results = [];
            for(var i=0; i<_rules[url].length; ++i) {
               results = rules[url][i].evaluate(documentNode);
               if (results.length > 0) {
                  return { site:'xyz', results: results };
               }
            }
            return { site: 'xyz', results: [] };
         }
      }
      return { site: 'xyz', results: [] };
   };


The rule is chosen as follows;
  • The first substring match in the url for 'xyz.co.uk/for-sale/abroad', 'xyz.co.uk/for-sale/' or 'xyz.co.uk/to-rent/'.
  • then the first rule on the right which returns some results.

For example if the webpage url was xyz.co.uk/for-sale/property-1234 then rule1() would be applied, if that doesn't return any results then rule2() is applied.

rule1(), rule2() and rule3() are functions which build a composite object defining the rules to capture information from the webapge.

Defining a rule

As mentioned above there are 3 parts to a rule;
  • Identify which sections of the webpage belong to each property
  • Within each section identify which html element represents the price, title, description etc
  • For each html element what attribute, or text contains the required information, whether the text needs to be transformed and finally confirm the captured information is of the correct format.

Rules are implemented using functors, these are classes with the following interface;

Code: Select all
function PB_Functor(...., fn) {
   this.evaluate = function (context) {
      var new_content = someFunction(content);
      return fn.evaluate(new_content)
   }
}


There are many functors defined in pb_rules.js, and we can chain these functors together.

For example if we wanted to extract the text 'For sale' from the following webage;
Code: Select all
<html>
<head>
</head>
<body>
   <p>This is a property for sale</p>
   <p class='type'>[For sale]</p>
</body>
<html>


we could chain the functors toegether as follows;

Code: Select all
var rule = PB_FirstNodeMatchingXPath("/html/body/p[contains(@class,'type')]",   // Find the first occurance of <p class='type'>....</p> on the webpage
   new PB_TextContent( // Get the paragraphs text
      new PB_TextBetween("[", "]", // Extract the text between the characters [ and ]
         new PB_Validate(/^For rent|For sale$/i) // Check the text is 'For sale' or 'For rent'
      )
   )
);


In the following sections, the sample principles apply... however the complexity increases slightly as we start extracting 6+ bits of data for multiple properties.

Identify which sections of the webpage belong to each property

Firstly we identify each html block element which contains all the details for a single property.

For example, the webpage;

Code: Select all
<html>
<head>
</head>
<body>
   <div class='search_results'>
      <div class='property_listing' id='property-1'>
         <!-- property 1 info -->
      </div>
      <div class='property_listing' id='property-2'>
         <!-- property 2 info -->
      </div>
      ....
      <div class='property_listing' id='property-10'>
         <!-- property 10 info -->
      </div>
   </div>
</body>
<html>


The xpath "/html/body/div[contains{@class,'search_results')/div[contains(@class,'property_listing')" will return each containing div node for each property, so we can start to write our rule for this page;

Code: Select all
function exampleRule() {
   return new PB_NodesMatchingXPath(
      "/html/body/div[contains{@class,'search_results')/div[contains(@class,'property_listing')",
      new PB_ConstructProperty(
         ....
      )
   );
}


ie for each nore matching "/html/body/div[contains{@class,'search_results')/div[contains(@class,'property_listing')", construct a property.

Within each section identify which html element represents the price, title, description etc

In the previous section, the .... is where we will need to add more code to get the property information.

PB_ConstructProperty takes 4 parameters;

  • referenceFn - a chain of functors that return the properties reference number
  • urlFn - a chain of functors that return the url used to view the property
  • appendToNodeFn - a chain of functors that return a handle to a html element into which the history is inserted
  • sampleFns - an PB_ConstructSample functor which get the price, title, subtitle etc

Note that all xpaths used to obtain information about the property should be relative to the block html element containing the properties deatils;

Lets assume the property website displays property 1 as follows;

Code: Select all
<div class='property_listing' id='property-1'>
   <div class='header'>
      <a href='http://www.xyz.co.uk/property-1'>
         <span class='title'>title</span>
         <span class='type'>subtitle</span>
         <span class='price'>price</span>
      </a>
   <div>
   <div class='info'>
      <p>description line 1</p>
      <p>description line 2</p>
   <div>
   <div class='agent'>
      <span='name'>agents name</span>
   <div>
</div>


Now we can get reference number from the id of the outer most div;

Code: Select all
new PB_FirstNodeMatchingXPath("." // The outermost div
   new PB_ID( // Get the id attribute
      new PB_TextAfter("property-", // Only use the text after 'property-', ie '1'
         PB_Prefix("xyz", // Prefix this with 'xyz', ie 'xyz1'
            new PB_Validate(/xyz\d+/) // Check the found text is xyz followed by 1 or more numbers
         )
      )
   )
)


Similarly we can get the url for the property;

Code: Select all
new PB_FirstNodeMatchingXPath("./div[contains(@class,'header')/a", // The <a href=''></a>
   new PB_HRef( // Get the href attribute, ie http://www.xyz.co.uk/property-1
      new PB_NoConversion() // This is a 'no-op' ie does nothing
   )
)


In addition to getting info about the propery, the rule also defines where the history table should be inserted into the webpage, this is just another functor which returns a handle to an html block entity.

For example if we wish to append the history to the end of the <div class='info'></div>

Code: Select all
new PB_FirstNodeMatchingXPath("./div[contains(@class,'info')" // The <div class='info'></div>
   new PB_NoConversion() // This is a 'no-op' ie does nothing
)


Finally we create an associative array of functors to extract the properties price, title etc

Code: Select all
{
price:   new PB_FirstNodeMatchingXPath("./div[contains(@class,'header')/a/span[contains(@class,'price')",
         new PB_TextContent(
            new PB_Validate(/^POA|Sale by Tender|((From |Guide Price |Lease Hold ...... )(\u00A3|\u20AC|\$)\d{1,3}(,\d{3})*)( pcm| pw| per year| \(Fixed Price\)|)$/i)
         )
      ),
title:   new PB_FirstNodeMatchingXPath("./div[contains(@class,'header')/a/span[contains(@class,'title')",
         new PB_TextContent(
            new PB_NoConversion()
         )
      )
}


If we combine all these bits together using a PB_ConstructSample functor, we end up with a complete rule (albeit only the price/title are being captured)

Code: Select all
function exampleRule() {
   return new PB_NodesMatchingXPath("/html/body/div[contains{@class,'search_results')/div[contains(@class,'property_listing')",
      new PB_ConstructProperty(
      
         // reference
         new PB_FirstNodeMatchingXPath(".",          
            new PB_ID(                        
               new PB_TextAfter("property-",      
                  PB_Prefix("xyz",            
                     new PB_Validate(/xyz\d+/)   
                  )
               )
            )
         ),
         
         // url
         new PB_FirstNodeMatchingXPath("./div[contains(@class,'header')/a",   
            new PB_HRef(                                       
               new PB_NoConversion()                              
            )
         ),

         // append history to node         
         new PB_FirstNodeMatchingXPath("./div[contains(@class,'info')",
            new PB_NoConversion()                           
         ),
         
         // sample details
         new PB_ConstructSample(
            new PB_Today,
            {
               // price
               price:   new PB_FirstNodeMatchingXPath("./div[contains(@class,'header')/a/span[contains(@class,'price')",
                        new PB_TextContent(
                           new PB_Validate(/^POA|Sale by Tender|((From |Guide Price |Lease Hold ...... )(\u00A3|\u20AC|\$)\d{1,3}(,\d{3})*)( pcm| pw| per year| \(Fixed Price\)|)$/i)
                        )
                     ),
                  
               // title
               title:   new PB_FirstNodeMatchingXPath("./div[contains(@class,'header')/a/span[contains(@class,'title')",
                        new PB_TextContent(
                           new PB_NoConversion()
                        )
                     )
            }
         )
      )
   );
}


The above is written out in "longhand", in practice common functor combination can be wrapped up into a helper function;

Code: Select all
function hrefOfNode(xpath)
   return
      new PB_FirstNodeMatchingXPath(,   
         new PB_HRef(                                       
            new PB_NoConversion()                              
         )
      );
}

function node(xpath)
   return
      new PB_FirstNodeMatchingXPath(xpath,
         new PB_NoConversion()                           
      );
}


function exampleRule() {
   return new PB_NodesMatchingXPath("/html/body/div[contains{@class,'search_results')/div[contains(@class,'property_listing')",
      new PB_ConstructProperty(
      
         // reference
         new PB_FirstNodeMatchingXPath(".",          
            new PB_ID(                        
               new PB_TextAfter("property-",      
                  PB_Prefix("xyz",            
                     new PB_Validate(/xyz\d+/)   
                  )
               )
            )
         ),
         
         
         hrefOfNode("./div[contains(@class,'header')/a"),   // url
         node("./div[contains(@class,'info')")            // append history to node         
         ....


Hints and tips about writting rules

In xpaths use [contains(@class,'xxx'] rather than [@class='xxx'].

Try to make the xpaths as explicit as possible, so if the site changes design the rule won't pick up incorrect data

Avoid using class labels which are only used for layout purposes.

Use PB_Validate as much as possible!

Be wary of the same data being displayed in slightly different ways on different pages, for example an added . at the end on one page which doesn't appear on another page - the toolbar will log this as a change!
User avatar
Beerhunter
Site Admin
 
Posts: 1788
Joined: Tue Jan 22, 2008 12:05 am

Re: Documentation: Rules

Postby Squidward on Thu Oct 08, 2009 3:37 am

What I found myself wanting when trying to get into the rule code was comments to state the actual types of the inputs and outputs of functions. The way the functors in pb_rules.js are used does vary subtly and often when I saw the word 'context' I forgot what I was dealing with.

I thought I'd just go ahead and add them to that file rather than talk about it and checked in a comment-only patch. (It took a bit longer than I expected; I got tired and stopped at PB_AllNodesMatchingXPath, but I'm sure I'll be back for those last few functions later.)

I know that Javascript is dynamically typed and there may be the odd case where I've said that an input parameter requires a particular type but strictly speaking you could pass another type in there which supports the same operations - but that's not being done here (as far as I've noticed) and I reckon a person who's just getting (re)acquainted with a code base wants to know the 'typical' mode of operation first - tricksy exceptions to the norm come later by which time you don't need the comments so much then anyway.

Yeah, it's certainly not the most productive way for me to spend my time, particularly in relation to releasing a fixed up version for all the peeps wanting to check up on house prices, and I'm slightly embarrassed to admit such geekery, but you know how it is, when you get in a comfortable, mundane, coder groove... right? :? ... :)
Squidward
 
Posts: 34
Joined: Mon Jun 09, 2008 1:28 pm

Re: Documentation: Rules

Postby Beerhunter on Thu Oct 08, 2009 11:37 am

Thanks Squidward, thats definitely helpful info to add.

Yes, there is a slight variation... (from memory) I think there are 5 types of functors;

  1. Those which take a string and pass a string to the child functor (ie all the string manipulation functors)
  2. Those which take a node and pass a string to the child functor (ie those that get text / attribute value associated with a node)
  3. Those which take a node and pass a 2 nodes to the child functor (ie functors with an absolute xpath from the documents root node)
  4. Those which take 2 nodes and pass a node to the child functor (ie functors with an relative xpath from the second node)
  5. Those which take 2 nodes and pass 2 nodes to the child functor (ie functors to construct a property/sample)

Perhaps grouping similar functors in pb_rules.js might make things a bit easier.

Also on the subject of comments, anyone have experience of doxygen/javadoc like documentation tools (ie that take comments and parse the source code to generate html format documentation)?

I had been looking at jsdoc toolkit and seems to work ok (an example of its output)

If its something that people feel could be useful, it would need a lot of effort to get the comments into the correct format - maybe the best approach would be to update the comments for each function/file as changes are made. Eventually the complete code would be documented.
User avatar
Beerhunter
Site Admin
 
Posts: 1788
Joined: Tue Jan 22, 2008 12:05 am

Re: Documentation: Rules

Postby rpp007 on Sun Oct 11, 2009 6:55 pm

excellent post.

I followed this to get an understanding of how the rules worked and attempted to do some basic rules for primelocation.com (Has this been requested/started?). Found it useful learning exercise. A couple of bits I didn't follow (I may still be mis-understanding the design) and it would be useful if you can clarify so that I know I'm on the right track:

In the example of building a new PB_ConstructProperty (labelled combining it all together), I think your example doesn't use: new PB_ConstructSample() which seems to be a used in the rightmove rules and I think the PB_ConstructProperty is expecting it as the last parameter is called sampleFns.

Also, once using the PB_ConstructSample, I set 'price', 'title' to be strings as they were not in the example. I assume they should be?

I also had to register a new js file in pb_service.js. I assume that is the only/correct way?

I also had trouble with the new debugging where the class of each xpath read is changed to include 'propertybee' but as you pointed out in another post switching to use 'contains' is much better way.

As I said, an excellent post and I may just had missed bits.

On a side note; I finding that I'm not getting along with the dynamic typing of JS (having come from java) as it nobbles the IDE somewhat, but hey that's life I guess.

I've got the bare bones of primelocation working - is it worth pursuing or shall I switch to something else?

Final thought - I've setup a (legal) copy of fisheye (http://www.atlassian.com/software/fisheye/) on a linux box I have at home. It's a nice graphical view on SVN. I could publish if anyone is interested?
rpp007
 
Posts: 29
Joined: Fri Sep 25, 2009 8:40 pm

Re: Documentation: Rules

Postby Beerhunter on Sun Oct 11, 2009 7:49 pm

rpp007 wrote:excellent post.


Thanks, documentation is not my strong point :oops:

rpp007 wrote:I followed this to get an understanding of how the rules worked and attempted to do some basic rules for primelocation.com (Has this been requested/started?). Found it useful learning exercise. A couple of bits I didn't follow (I may still be mis-understanding the design) and it would be useful if you can clarify so that I know I'm on the right track:


Yes has been asked for... Bug #33... not sure why it was closed, but I've reopened it and assigned it to you ;)

rpp007 wrote:In the example of building a new PB_ConstructProperty (labelled combining it all together), I think your example doesn't use: new PB_ConstructSample() which seems to be a used in the rightmove rules and I think the PB_ConstructProperty is expecting it as the last parameter is called sampleFns.


Yes, this was a typo on my part , there should be a "new PB_ConstructProperty(...)" there and I've corrected the post.

rpp007 wrote:Also, once using the PB_ConstructSample, I set 'price', 'title' to be strings as they were not in the example. I assume they should be?


In javascript

Code: Select all
var obj = { title: 'this is a title', price: '£100' };
var obj = { 'title': 'this is a title', 'price': '£100' };


are equivalent, ie the key (left of the colon) is implicity converted into a string, so title or 'title' is ok.

rpp007 wrote:I also had to register a new js file in pb_service.js. I assume that is the only/correct way?


Yes.

I also included the rules files in pb_toolbar.xul as well, but not sure of this is needed... if it doesn't work without it, then add it ;-)

The scope rules for add-ons are a bit mad for security reasons, each window/component (ie each sidebar/toolbar and xpcom component) has its own global scope, so if you import into one but try and use from another global scope the class doesn't exist.

I've go an idea on to tidy this up and improve the error reporting when developing when there is a syntax error in a file (getService failed isn't the most helpful error I know!), but this will have to wait until 2.0.6.0 is out the door.

rpp007 wrote:I also had trouble with the new debugging where the class of each xpath read is changed to include 'propertybee' but as you pointed out in another post switching to use 'contains' is much better way.


Yes, contains(@class, 'xxx') is a much better approach... must admit I didn't know you could do <p class='style1 style2'></p> until it caught me out!

rpp007 wrote:As I said, an excellent post and I may just had missed bits.


Thanks, no they all valid questions.

rpp007 wrote:On a side note; I finding that I'm not getting along with the dynamic typing of JS (having come from java) as it nobbles the IDE somewhat, but hey that's life I guess.


Yup from C++ background know what you mean... and wish there was the ability to strongly type vars!

Having said that I've raised Bug #152 to port assertions across from my 3.0 stream, which I think will help make the code more robust;

Code: Select all
function square(x) {
   PB.assert("x must be a number", function() typeof x === 'number');
   return x*x;
}


If x isn't a number, then there's a pop up with the message 'x must be a number' plus a dump of the call stack, and the ability to send the info to the server for analysis by developers :-)

rpp007 wrote:I've got the bare bones of primelocation working - is it worth pursuing or shall I switch to something else?


Definetly worth pursuing, as a few people have asked for it.

I'll need to set up some stuff in the backend (each supported site has its own set of database tables / update the php submit script to allow primelocation data from version x of the toolbar), but that should be reasonable quick to do.

For the time being I assuming 2.0.6.0 is just updates to get the existing sites working, if primelocation is also supported thats a bonus.

rpp007 wrote:Final thought - I've setup a (legal) copy of fisheye (http://www.atlassian.com/software/fisheye/) on a linux box I have at home. It's a nice graphical view on SVN. I could publish if anyone is interested?


I'd be interested in having a look, this is the first time I've used svn... normally I'm clearcase based, which for all its faults does provide useful graphs!
User avatar
Beerhunter
Site Admin
 
Posts: 1788
Joined: Tue Jan 22, 2008 12:05 am

Re: Documentation: Rules

Postby rpp007 on Sun Oct 11, 2009 9:17 pm

Thanks for the info (and the Javascript help). Think I'll be reading a JS tutorial again tonight :-)

If you can enable me to PM you, I'll give you the fisheye URL to have a look at as I don't want to publish it yet in case this forum is crawled. Or I could mail you - do you read/get admin@property-bee.com?
rpp007
 
Posts: 29
Joined: Fri Sep 25, 2009 8:40 pm

Re: Documentation: Rules

Postby Beerhunter on Sun Oct 11, 2009 9:52 pm

rpp007 wrote:Thanks for the info (and the Javascript help). Think I'll be reading a JS tutorial again tonight :-)

If you can enable me to PM you, I'll give you the fisheye URL to have a look at as I don't want to publish it yet in case this forum is crawled. Or I could mail you - do you read/get admin@property-bee.com?


I've enabled PM's for you (plus Squidward, s-p and illumination) - seems my cron job to enable PMs automatically to trusted users maybe broken due to the upgrade from phpBB3 since I wrote the script :(

Just an aside, there's also a developers forum group which allows access to more forums (esp regarding the backend and new features)... I'm not adding people at the moment, only cause it might be a distraction (there's lots of interesting stuff to talk about!).

Once the next release is out I'll add you 4 to the developers forum group, as hopefully we'll get a bit of time to talk about ideas/features/implementation - thats assuming that such stuff is of interest and you'd like to stick around!
User avatar
Beerhunter
Site Admin
 
Posts: 1788
Joined: Tue Jan 22, 2008 12:05 am

Re: Documentation: Rules

Postby s-p on Mon Oct 12, 2009 8:15 pm

Beerhunter wrote:
rpp007 wrote:Thanks for the info (and the Javascript help). Think I'll be reading a JS tutorial again tonight :-)

If you can enable me to PM you, I'll give you the fisheye URL to have a look at as I don't want to publish it yet in case this forum is crawled. Or I could mail you - do you read/get admin@property-bee.com?


I've enabled PM's for you (plus Squidward, s-p and illumination) - seems my cron job to enable PMs automatically to trusted users maybe broken due to the upgrade from phpBB3 since I wrote the script :(

Just an aside, there's also a developers forum group which allows access to more forums (esp regarding the backend and new features)... I'm not adding people at the moment, only cause it might be a distraction (there's lots of interesting stuff to talk about!).

Once the next release is out I'll add you 4 to the developers forum group, as hopefully we'll get a bit of time to talk about ideas/features/implementation - thats assuming that such stuff is of interest and you'd like to stick around!


Sounds good BH :)
Scott
s-p
 
Posts: 125
Joined: Mon Jun 09, 2008 10:20 pm

Re: Documentation: Rules

Postby illumination on Thu Oct 15, 2009 9:06 pm

Beerhunter wrote:Hints and tips about writting rules

...

Try to make the xpaths as explicit as possible, so if the site changes design the rule won't pick up incorrect data

Avoid using class labels which are only used for layout purposes.

...


Just a quick question about this, when we say class labels used for layout purposes, what exactly do we mean? Things like 'brochureLeftArea', 'brochureDetailsContainer' and 'resultsCol'? At what point do we say it is a class meant for layout and just a class that also contains layout information? And in that case what should we do instead, use the position of the node in the xml layout?
illumination
 
Posts: 22
Joined: Tue Mar 24, 2009 1:14 pm

Re: Documentation: Rules

Postby Beerhunter on Thu Oct 15, 2009 9:39 pm

illumination wrote:
Beerhunter wrote:Just a quick question about this, when we say class labels used for layout purposes, what exactly do we mean? Things like 'brochureLeftArea', 'brochureDetailsContainer' and 'resultsCol'? At what point do we say it is a class meant for layout and just a class that also contains layout information? And in that case what should we do instead, use the position of the node in the xml layout?


Its a difficult question, and probably has no exact answer.

For example on daft, the xpath to the link for for sale properties is;

Code: Select all
/html/body/div[@id='container']/div[@id='content']/table/tbody/tr/td[@class='valign_top' and position()=1]/ul[@id='search_results']/li/div[@class='content']/div[@class='title' and position()=1]/a/span[@class='line_height_19']


I'd say
* td[@class='valign_top' ]
* span[@class='line_height_19']

are presentational and hence write the xpath as

Code: Select all
/html/body/div[@id='container']/div[@id='content']/table/tbody/tr/td[1]/ul[@id='search_results']/li/div[@class='content']/div[@class='title' and position()=1]/a/span


This is probably a clear cut example!

I think probably the best way of putting it is;
* if the @class doesn't have any semantic value, then don't use it (like those above)
* if there is "mixed" use (ie part semantic / part layout), which I'd put 'brochureLeftArea', 'brochureDetailsContainer' and 'resultsCol' in, then I'd say use it in preference over just positional predicates which may not be tight enough if the website design changes
* if the @class is 'address' or 'agent_name' - definately use it!

Cheers
BH
User avatar
Beerhunter
Site Admin
 
Posts: 1788
Joined: Tue Jan 22, 2008 12:05 am


Return to Development

Who is online

Users browsing this forum: No registered users and 8 guests

cron