Tuesday, August 19, 2008

SEO Problems and Solutions to improve optimisation within phpBB3

Many people think that SEO is basically using or implementing Human Readable URLs. However, this is not encompassing of “SEO” at all, it is one small part of SEO, and is generally grossly misunderstood. Many people believe that Human Readable URLs are the only way that Search Engines can correctly index a site, or that it is the best method for spiders to index the site. Or that Dynamic URLs somehow hurt your Search Engine Ranking or performance within Search Engines. But this is not the case. Human Readable URLs mainly benefit Search Engines as added Keywords within the page, this only reinforces the keywords already set by the page and topic title, which Search Engines already use for the index.
Search Engines have just as easy of a time indexing dynamic URLs as they will with Human Readable static URLs. The benefit is minimal.

SEO is Dead! Long Live SEO!
The other side of the coin is the people who think that you really don’t need to do any Optimisation (SEO) at all. These are people who don’t think that any form of Optimisation can have any sort of positive impact on your site. They have usually never been in the SEO industry, and instead just hear all the hype from people talking about SEO. They believe that SEO no longer applies to today’s world of the internet. But these people do not truly understand what SEO means or what it is meant to do, and they are talking out of inexperience or ignorance.

This post is meant to address both sides of the argument by giving people a better understanding of what SEO is and it’s place within phpBB3. It also includes problems identified within phpBB3 itself with regards to Search Engine performance and solutions to address these issues.

What is SEO?
SEO (Search Engine Optimisation) is defined as any action or adjustment, also known as Optimisation that you perform on your site for the purpose of improving the volume and quality of traffic to your site or board from search engines by utilising search results for targeted keywords, and can include one or many possible methods. From implementing Human Readable URLs and Keyword targeting, to Marketing, to adding a no-index page or directory to your robots.txt file.
All of this is Search Engine Optimisation (SEO).

How does phpBB3 Handle SEO?
phpBB3, out of the box, has good Search Engine Optimisation (SEO) capabilities. It handles BOT sessions appropriately, and it hides completely useless content such as forms, links to profiles, or links that spiders should not or could not access, also known as “dead links”, among a few other things. Some of these are just meant to improve the performance of spiders indexing the site, from not displaying useless content, which can cut down on the HTTP requests and bandwidth, to showing useless items such as forms. But that’s about the extent of it. There is so much more that one can do to optimise phpBB3 for best Search Engine capability. Many methods that people I believe are not aware of yet.

What is the main issue of SEO within phpBB3?
Within phpBB3, the main issue with Optimisation is duplicate content. No, not the kind of duplicate content that will get you penalised or banned from Google (that’s a whole other post), but the kind of duplicate content that distorts search results and causes slightly higher bandwidth because the spider is indexing and re-indexing the same exact content as separate pages within it’s index. Then the search results for this single page instead display as multiple results of the exact same page, which defeats the purpose of good search results and degrades the effectiveness of those results.

How can this be improved? First the problem must be completely understood. When a spider crawls your board, from the index page, it looks at all the links: There are links to the Categories, Forums, Subforums, and also the last post of that particular forum. Upon entering a forum, it sees a list of links to topics, up to four pages within a single topic, and again the last post within that topic. Upon entering the topic, it will see a whole other page through the print option.

The idea is that we want the spider to index the topic in pages, and to do this, the spider has to see the exact same URL, dynamic or static of the pages for that topic. If it sees different URLs (such as those containing the p -post- variable), it considers the page a completely new page and it indexes it as such. But the last post URL on the index, and category views, as well as the last post within a specific topic shows the spider that there are far more pages to your forum than there really are. Thus causing multiple indexing of the same content.

Secondly, users will post links directly to a post within a topic, this URL may contain parameters other than the forum id, topic id and the start variable, which the spider should only recognising. It may see multiple parameters including: sort key, sort direction, order, session id, post id, print view, and highlighting.

Each new variable thrown in that is different from the last time it indexed this topic and page will mean a completely new page and thus another indexing pass by the spider. Further diluting search results for keywords or content for this page and again repeatedly consuming bandwidth over the same content.

The Solution for Duplicate Content within phpBB3
Now that we understand the problem, what is the solution?
There are two methods that can be used to improve the optimisation with the duplicate content issue within phpBB3.
First, by removing (hiding) the links to the last post within the topics and forums.
Second, by filtering out all parameters except for the topic_id and the start variable for bots only -- and perhaps enforcing the forum_id, but it has to be consistent, one way or another. Remember that every variable added means a new page to the Search Engines including Google. -- This type of change means basically redirecting the page on viewtopic for spiders if the parameters are not those strictly allowed, this kind of change is something that can increase the number of HTTP Requests if overused, but may be necessary to improve the optimisation, and due to the decreased repeated indexing, may result in fewer HTTP Requests, so it may even itself out in the end. But this change should only be performed for spiders and bots, as it would be a detriment to the user attempting to navigate your board if they experience this effect.

These changes will be far more beneficial for Search Engines than any Human Readable URLs change will have. Implementing Human Readable URLs can have a negative impact on this issue if Bots are still allowed to index the dynamic URLs, thus throwing in additional pages to what the Search Engines already see.

No comments: