Knowledge

Permalinks and mod_rewrite

Published on October 03, 2010 under How-To, Web Development

Permalinks

Like it or not, Search Engine Optimization (SEO) is a reality and a term anyone working in the internet must be familiar with.  If you're a developer or producer of web sites, then you more or less have to work, read, and study to become a literal expert of what brings results.  Worse off are how many tips and tricks that come out are soon made obsolete when search engines change up the game.

Outside of having a good hierarchy of information (headings, paragraph tags, lists, etc), meta tags, and all the advice on using keywords in your content, a new trend that's become more a must now in web design is the practice of using permalinks.

Permalinks are not a new thing.  The practice was developed back in 2000 by Jason Kottke and implemented by many of the first blogging web sites that kicked up out of the Web 2.0 movement.  In fact, pretty much any hyperlink is supposed to be a permalink, but when pages went from static to dynamic, the need arose for a better way to show linking that made sense not only to search engines, but to the normal person.  Links like these might make sense to PHP script and a server, but not to a normal person:

www.yourwebsite.com/index.php?id=1
www.yourwebsite.com/index.php?id=2
www.yourwebsite.com/index.php?id=3

While links like this might make more sense:

www.yourwebsite.com/alpha
www.yourwebsite.com/beta
www.yourwebsite.com/gamma

So you think about how easily one could remember a link like that over let's say one with a querystring attached to it.  If any of you have run Wordpress, Drupal, or one of the many different blog setups out there, you'll notice how it seems every piece of content is all plugged into the main index page through includes.  So the main list of entries, the individual articles, and even pages made all end up on index.php.  The small bits of data coming into the page tell it what to display.  So if you were to remove the permalinks feature, you would end up with loads of pages with similar addresses.  Thus the need.

Permalinks also help in SEO partially because you could put key words into the address, as well as other information one might be interested in.  At the time of writing this article, one can tell from the web address alone that it was written on October 18, 2010, and the article has to do with permalinks and mod_rewrite.

Mod_rewrite is the secret to it all.

Since most hosting is done on Apache servers, the ability to rewrite addresses has been there, but was mostly used by the Server Administration to direct users to set folders based on the address.  So my site here could be sitting on /server/file/path/to/amportfolio, but it's in the rewrite engine that tells the server to send anyone that comes to www.amportfolio.com to the index page in that folder.

Only recently have hosts now allowed users to be able to use the rewrite engine in order to make their own permalinks.  Some hosts unfortunately do limit what you can and cannot do.  Two noted names are 1and1 and GoDaddy.  At the time of writing this entry, I'm hosted at 1and1.  I could be frustrated at restraints, but I saw them as challenges to solve.  I also found much of the literature online to be a bit too technical for the average person, so I hope I make it easier for you.

Build an htaccess File

You will make an htaccess file in Notepad or whatever text editor you choose.  If you have access to create htaccess files over your server control panel, then more power to you.  I did mine the old fashioned way with Notepad.

Whatever file you do, save it as htaccess.txt, upload it into the main folder of your site, and then rename it .htaccess.  Yes you will be removing the .txt suffix.  If you want to apply the mod_rewrite to only a certain folder, then only put the htaccess file into that specific folder.  This can come in handy if you have loads of folders and want certain things happening in certain places.  However, I feel that part of the idea of dynamic scripting is to make for fewer pages that do more functions.

Prepping Your Site

I'm going into this because I ended up having to overhaul a lot of code on my site when I tried this.  First and foremost, if you use relative links for images and scripts, then change them all so they start with a / and then go into the file directory you want.

So if you have an image that's ../images/image.jpg, change it to /images/image.jpg.

The idea is the / will then tell the page no matter where it is to start from the main directory and move inward.  If anything, you should make this your practice from now on, even if you're not going to make permalinks.  If not then you'll see your images not appear.

Any scripts you use that call on directories also need to be changed.  I had some scripts to process forms that I had to go in and add the / to in some respects.  I also  added a / to the path variable in that one cookie I set up in the Creative section, hence why I mentioned it in the article about cookies.  Your cookie won't work with the permalinks if you don't do this.

Another big thing I learned is that if you're planning on naming a permalink with a certain name, make sure none of your web pages have the same name.  I had my contact page named contact.php, but in order to make the address www.amportfolio.com/contact  work correctly, I had to change the file name to something else, in this case contct.php.  I just noticed that the server will get confused.

On to the htaccess File

First thing you do is to turn on the engine and set up a base of where it's applying.

RewriteEngine on
RewriteBase /

The first line merely says to turn the rewrite engine on.  The second line is saying that this will apply to the directory it's sitting in and anywhere going within.  If you were to put /new/ on there, then it would only apply to anything in a folder named "new" and nothing else.

From there, you then create what are known as rewrite rules.  Here's a basic one:

RewriteRule ^my-page$ thePage.php [L]

What you're doing here is declaring that you're doing a RewriteRule.  The ^ is to tell the server the beginning of a pattern.  In this case it's looking for an instance of the web address with my-page on the end as if it were a directory.

The $ after my-page is to close up the pattern.  It's saying seek out www.youraddress.com/my-page and nothing more.  After this you put the actual file that's to be opened.

The [L] on the end is to tell the server not to run any other rewrite rules on that htaccess file if this one is satisfied.  This is the means to keep the server from accidentally running more than one and making a mess of things.

How about dynamic pages?

That's the whole idea, isn't it?  Turning that querystring address into a cooler one that human beings can relate to.  Here's an example of a rewrite rule for this:

RewriteRule ^my-page/([^.]+)/?$ thePage.php?v=$1 [L]

Now what I have here is how you would have something like www.youraddress.com/my-page/alpha transfer itself to www.youraddress.com/thePage.php?v=alpha.   The rule starts off like the last example, but I added a slash and then ([^.]+).  That group of characters basically tells the server to take whatever is after my-page/ and use it as the querystring in the actual page.  The $1 is what identifies as where that piece of copy goes.

What if you use numbers as the keys?

I usually use numbers as my primary keys, and thus my hope with permalinks was to merely add entries to the database where I would add in a permalink string.  So I would have the name of something, the content, and then a blank merely to state what I wanted my permalink to be, thinking it would all be for cosmetic.

Now if you have the access to do it, you can use what's called a rewrite map to map everything on your site to addresses you designate.  Make sure you find out from your host if they will allow you to use rewrite maps.  The file should be laid out with the key and then the actual.  Here's an example:

alpha      thePage.php?v=1
beta        thePage.php?v=2
gamma    thePage.php?v=3
delta       thePage.php?v=4

So you could make a text file with it all written out, or if you're feeling savvy enough, make a PHP page that calls to the database and pulls down the permalinks and their respective ID numbers, with the file names placed in the page as you see fit.  Here's the thinking:

$Row[plink]    thePage.php?v=$Row[key]

Whatever you do, make sure it lays out simple.  No fonts or design elements or CSS.  Not even doctype or html.  You're simply making a text file dynamically.

In the htaccess file, you would then add the following code to call on the map:

RewriteMap myMap int:/root/server/path/to/map.txt

You need the full server path to that file.  There is no exception to this.  In terms of the rewrite rule, this would be an example:

RewriteRule ^(.*)$ ${myMap:$1} [L]

What's happening here is the (.*) is taking whatever you have on the end of that web address and sending it over as $1, like we've shown before.  This time though, you have it calling up the map (like a database) and thus it will feed in the item as a key to find out what address to show.

You can also simplify with just the data.  So instead of thePage.php?v=1 and so on in the map, just put the key and then code the address into the htaccess:

RewriteRule ^(.*)$ thePage.php?v=${myMap:$1} [L]

But my host doesn't allow RewriteMap!

Now you've reached the problem I did.  I can't use a rewrite map on my hosting, so I can either code every single address of my web site into that htaccess, or come up with more clever ways to use the technology.

My first attempt was to try to put the ID number into the web address.  So I would have an address like www.youraddress.com/section/1/page-name  I was ready to settle on that, but then came up with a better idea - use the permalink as a key.

For sections like Photography and Creative, I never planned on having loads of items in there, so there isn't a need for an intricate and solid key system.  I still have numeric primary keys in the database, but I simply pull out the permalink string and pull the item from the database by seeking a matching permalink in the entry on the database table.  This does mean that I can never have two identical permalinks, but as I said, there isn't a lot of items in these sections, so I can make sure that mistake will never happen.

In terms of the Knowledge section, this became a challenge since I know there will be hundreds of entries in this blog as time passes.  I wanted to do links similar to Wordpress with the date and the permalink trailing on the end.

The answer was to pull entries from the database based on both date and permalink string.  Based on my own behavior, the chances of me having two entries with the same date and permalink string are relatively impossible.  So while one primary key is the better plan, this one works just as well and does the job.  Thus now I can have the permalinks free of ID numbers and craft them as I see fit.

When I pull entries from the database to create links to the blog entries, I have the PHP set up the links as www.youraddress.com/section/$postdate/$plink  I coded the PHP to convert the date into a string of YYYY/MM/DD with the slashes and all.  Thus as you can see the address lays out like a Wordpress address

Here's what I did in the htaccess:

RewriteRule ^section/([^.]+)/([^.]+)/([^.]+)/([^.]+)$ thePage.php?x=$1-$2-$3&v=$4 [L]

The variable x will create a MySQL-friendly date based on the first three blocks the rewrite rule is told to pull.  Thus $1-$2-$3 will become 2010-10-18.  The last item is the permalink string.  In the page that displays the entry, the PHP receives the two querystrings for the date and permalink, and thus I tell the database to pull the entry based on those two factors.

Now I've only skimmed the surface here, but I hope I've made this a lot clearer to you than it was made to me.  If you have questions, please ask and I'll try my best to answer. I'm going to leave you with three links I found to be very useful in learning the basics of mod_rewrite.

Tags: permalinks, mod_rewrite, seo, development, apache

comments powered by Disqus