| Understanding Rewrite Rules, Part 1 of 3: Hotlink Protection |
|
|
|
| Webmaster Articles - System Administration |
| Written by Admin |
|
Conveniences in this tutorial: A successful webmaster is always aware of the bandwidth his site uses and tries to reduce it any way he can to cut down on the amount of money paid to his host. One of the number one causes for using more bandwidth than a webmaster should is due to hot-linking and other forms of bandwidth stealing perpetrated by other webmasters, either evil or ignorant. Using Rewrite rules is the most common tip given to webmasters for reducing hot-linking. Note: Many novice web users will just say "Use .htaccess." Although you may use an .htaccess file to contain your rewrite rule, it is not he .htaccess file itself that helps block hot-linking and that terminology is in error. It is the contents of the .htaccess file (which can be many different things) - in this case, rewrite rules - that perform the operation. In this tutorial, I will not be referring to .htaccess files as their usage is out of the scope of this article. But, to be short: All the examples in this tutorial can be placed into a file titled ".htaccess" and uploaded to the directory where you want the rewrite rules to be performed. Let's get right into it. These rewrite statements will block access to your files from any domain but your own. This is useful in situations where you want a certain directory to only be accessed directly by your own site. Note: I'm using this example because the syntax seems to be floating around the webmaster community. It is overly precise and can be simplified, which I will do below. RewriteEngine on This is about as simple as rewrite gets. Rewrite is a mechanism for changing the url in the surfer's browser. If the surfer requests a url that is wrong (or that you don't want him to have), the rewrite rule will change the url for him. This code looks at the referring url of the surfer, analyzes it, and then decides whether to leave the url that the surfer has requested or change it. Let's break this down: RewriteEngine on If this statement baffles you, don't proceed! It just says, "Hey, let's rewrite some URL's." RewriteRule /* http://www.yourdomain.com/console.html [R,L] I'm explaining this one first because this is where the magic is done. The RewriteEngine statement and this statement are all you need to rewrite some urls. 'RewriteRule' tells the server to look at the url and adjust it accordingly. It accepts two arguments and then, optionally, some flags: RewriteRule <test url> <new url> [flags] The first argument <test url> is an expression where the url the surfer is requesting is analyzed. If that url matches this expression, then the surfer is rewritten to the <new url>. In our example, the test url is '/*'. This matches any url so we are saying, "Rewrite all urls to http://www.yourdomain.com/console.html" So why does '/*' mean all urls? The '*' does not mean what you might think. In Windows, it is a wildcard, but in RewriteRule, it means "match 0 or more of the proceeding characters." So, in this case, the surfer's url matches as long as it contains "0 or more" or '/' effectively matching everything. Tip: The '+' character matches "1 or more". The <new url> can be any full valid url or relative url or file. For simplicity sake, always write your url out completely. I will delve into this a bit more in my advanced rewriting article. The flags at the end of the RewriteRule specify certain actions to be taken. Our example has these flags [R,L]. No… this does not mean right and left. The 'R' tells the server to "redirect" and specifies the code 302 (MOVED TEMPORARILY). However, the server would know to do this anyway, so having the R flag is not needed. There are more usages to the R flag which I will go into in the advanced article. The L flag tells the server "that is the Last rule." In cases where you might have more rewrite rules below, the server will drop out immediately at this point. I always recommend using the 'L' flag for novice webmasters. RewriteCond %{HTTP_REFERER} !^http://www.yourdomain.com.*$ [NC] So, you've seen the RewriteRule command and you know that it will rewrite URL's. That is useful, but what about situations where you want to rewrite some URL's but not others, and you want to base the decision on properties of the surfer? The RewriteCond statement is used to say "If this statement is true, follow the RewriteRule listed below." The RewriteCond statement uses this format: RewriteCond %{<test variable>} <test pattern> [flags] The first argument is a variable that describes a property of the client or the server. This can be anything from cookies to time of day. Most webmasters will only use the referrer variable so I'll stick to that for this article. The variable name of the referring website is always named 'HTTP_REFERER'. The test pattern is an expression that the statement looks for in the test variable. In the example above, we are looking for "!^http://www.yourdomain.com.*$" in "HTTP_REFERER". !^http://www.yourdomain.com.*$ Does it look like ancient hieroglyphics to you? It did to me the first time I looked at it, but it is easy to break down. First, let's consider the obvious: we are looking for occurrences of http://www.yourdomain.com. The '^' before the url means that it must occur at the beginning. For example, if the HTTP_REFERER started with anything other than http://www.yourdomain.com (or had anything in front of it), it would NOT match. The '$' sign means the opposite, the end of the variable. '.*' is a little trickier. Remember what '*' means? Ok. But, we are not looking for 0 or more occurrences of '.'. A period has a different meaning, it means "anything". So, we are looking for 0 or more occurrences of anything. It is essentially a wildcard. So far, we have ^http://www.yourdomain.com.*$. It means look for http://www.yourdomain.com at the beginning of the variable and anything following it until reaching the end of the variable. If it matches it, the statement is true and therefore, executes the RewriteRule. I skipped over the '!' on purpose. It basically says, "Look for conditions where the following is NOT matched." If '!' wasn't there, we would be looking for conditions that DO match. So, this statement says in full: "If http://www.yourdomain.com and anything after it until reaching the end of the variable DOES NOT match the referring url, then execute the RewriteRule statement." The effects of this should be quite obvious: If the referring URL is not from yourdomain.com, then rewrite the url to something else. The [NC] flag means "no case" or "ignore case." This tells the server not to care if the URL is YOURDOMAIN.COM or yourdomain.com. This is important to have in all rewrite conditions as Unix servers don't see a capitol letter the same as a lower-case letter. WARNING: The Rewrite statements are case sensitive as well. "RewriteCond" is correct, "rewritecond" is not. RewriteCond %{HTTP_REFERER} !^http://yourdomain.com.*$ [NC] This statement is the same as the statement above except that it omits the www in case someone is trying to access your site without it. These two RewriteCond statements are not really necessary to accomplish this task. As I said earlier, I just listed it because that code seems to be floating around. We can do away with the '^' and the '$' and the 'http://www' and combine the RewriteCond statements into one: RewriteCond %{HTTP_REFERER} !.*yourdomain.com.* [NC] Can you read this? It says "Look for yourdomain.com in the referring url no matter what is before or after it, and if it DOES NOT match, execute the RewriteRule statement." Now, the full example above has been reduced to: RewriteEngine on Most webmasters use this rewrite code for their image or movie galleries. It is placed in the directory with your images and this keeps hot-linking at bay. If you are posting your images on other sites and would like to give them permission to hotlink, you can add a RewriteCond statement with their domain to your rewrite code. For example: RewriteEngine on CORRECTIONIt has recently been brought to my attention that my rewrite code has a potential loophole so that webmasters could still hotlink your images if they wrote their referring url in a specific way. This is true, though very unlikely to be used. But since I've had so many webmasters bring it up, I decided to go ahead and append a note to this article. RewriteCond %{HTTP_REFERER} !.*yourdomain.com.* [NC] If a hotlinker use a directory or file that matched yourdomain.com, he could get around your rewrite code. For example: http://www.assholehotlinker.com/yourdomain.com/ To fix this, replace .*yourdomain.com.* with: ^http://w*.?yourdomain.com/ To explain: ^ - means it has to be at the beginning, * - (you should already know), ? - means 0 or 1 of the preceeding character. WARNINGS:
This article is basic and meant to break down rewrite code that you've probably already seen. The example in this file is useful but doesn't even break the surface of the usability of rewrite code. I will jump into different RewriteCond statements in "Part 2: Intermediate", and will go over different RewriteRule statements as well as more expression syntax in "Part 3: Advanced." |



