How To

How I moved all my content from comeacross.info to raoulpop.com

A place in this world

Background info

As midnight approached this past New Year’s Eve, I was busy working on a long-term project. I was about to move all of my content (every article and post I’d written) from comeacross.info to raoulpop.com. There were many reasons for this, but consolidation was the most readily apparent.

As detailed on my About page, I’d already combined my content from other sites of mine onto comeacross.info, but there was one more piece of the puzzle that needed to fall into place. I’d alluded to it already. I was thinking about doing it in 2006, believe it or not. As a matter of fact, when I sat down and thought about whether to start writing at comeacross.info or raoulpop.com, I knew deep down I should choose to start writing on my personal domain, but worried it might be too difficult for people to remember and type the name.

After a year or so at ComeAcross, I realized that the subjects I was writing about were much too varied for a standalone site. I was writing in a personal voice, using a lot of 1st person, and it only made sense to have that sort of content reside on my personal site. Plus, there were so many splogs (spam blogs) on the .info TLD, that I worried whether I would be taken seriously if I stayed on .info. I’d owned raoulpop.com for a long time, I wasn’t really putting it to good use, and it didn’t make sense not to.

I set a deadline of 12/31, and got to work on planning and research. What better time for such a big change as this than New Year’s, right?

I’m documenting this for you because someone else might need to know how to do it. And I figure the thought process that went on behind the scenes is also worth knowing.

Planning and research

My biggest challenge was to figure out how to redirect all of the traffic from comeacross.info to raoulpop.com, reliably and accurately. I needed to make sure that every one of my articles and posts would redirect to my new domain automatically, so that a URL like

http://comeacross.info/2007/12/30/my-photographic-portfolio/

would automatically change to

https://raoulpop.com/2007/12/30/my-photographic-portfolio/

and the redirect would work in such a way that search engines would be properly notified and I wouldn’t lose my page rank.

I knew about 301 redirects, but I wasn’t sure how to accomplish them in the Linux/WordPress environment the way that I wanted them to work. I had worked mainly with Microsoft web servers until recent times, and Linux was and still is fairly new to me. I was using John Godley’s Redirection plugin for WP (it’s an awesome plugin btw), and I knew it could do 301 redirects quite nicely. I had been using it heavily when I changed post slugs or deleted/consolidated posts at ComeAcross.

I worked out a line of Regex code that I could use to create a site-wide redirection, I tested it and it worked fine. In case you’re wondering, you can easily test it by creating a 307 (temporary) redirection instead of a 301 (permanent) redirection. Here’s how to do it:

Create a new 301 redirection where the source URL is

/(.*)

and the target URL is

http://www.example.com/$1

Make sure you check the Regex box, add it, and you’re done.

Just to make sure, I contacted John Godley to confirm whether it was the best way to do things. He said that would certainly do the job, but there was a MUCH easier and faster way to do it, one that saves a lot of the overhead that comes into play when WP gets used. It works through the .htaccess file. He was kind enough to provide me with the code, which is reproduced below.

<IfModule mod_rewrite.c>

RewriteEngine On

RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

</IfModule>

Just paste that into your .htaccess file (remove all other code but make sure you back it up somewhere in case you need it), save it, upload it, and you’re done.

Don’t do anything yet though! Not before you’ve thoroughly backed up everything! Let me outline the steps for you, and keep in mind that I wanted to mirror all of my content from two separate WP sites using the same WP version, and to redirect from the first to the second. These two conditions have to be met in order for my advice to apply to your situation.

  1. Make sure both sites are on the latest and greatest version of WP, or at least they’re on the SAME version of WP
  2. Back up the database from the old domain
  3. Download all site files from the old domain
  4. Upload site files to new domain
  5. Restore database to new domain
  6. Make changes to .htaccess file as shown above
  7. Log into your new domain’s WP admin panel and change the site and blog URLs. Now you’re done! Check to make sure the redirection works properly and all of your content is there.

Upgrade your WP installs

The two sites have to be on the same version, or else things might not work as expected. Upgrade both sites to the latest and greatest, or at least make sure they’re on the SAME version before you do anything else. Go to WordPress, download and install the latest versions. There’s also an Automatic Upgrade plugin, but I haven’t tried it yet, so I can’t vouch for it.

BEFORE you do any sort of upgrade, you need to back up. Yes, you can’t get away from this… You’ll need to do two backups, one before you upgrade, and one after you upgrade, before you transfer the content.

Back up your content

This combines steps 2 and 3 listed above. Backing up your site files is easy. Use an FTP client to access the files on the web server and download them to your hard drive. I always keep a local copy of my site files. It just makes sense.

Backing up your database is a little more involved. Your database contains all of your site content (posts, links, comments, tags, categories, etc.) so you definitely don’t want to lose it. There are detailed instructions on backing up the database on the WordPress site. You can follow those, or you can go to your site’s Admin Panel >> Manage >> Export and download the WordPress WXR file, which you can import into your new site afterwards.

While this is great for backups, restores are another matter. I tried it and found that the import operation kept timing out at my web host. Given that I have thousands of posts, I didn’t want to sit there re-restoring the WXR file only to get a few posts done with every operation. I needed something quicker.

There is a plugin called WordPress Database Backup which lets you download a zipped SQL file of the database. You can use this to restore the database through the MySQL Admin Panel, if your webhost provides you access to it.

What I did was to simply point my new site install to my old database. This is a very handy and easy solution if you plan to host both sites with the same web host. But this still doesn’t excuse you from backing up the DB before you upgrade the WP install! 🙂

Restore your content to the new site

This is a two-step process (see #4 and #5 above) and involves reversing the steps you took during the backups. You will now upload your site files to the new domain, and you will restore the database to the new domain as well. If you’re in my situation, where you’re using the same web host, you can simply point the wpconfig.php file on your new domain to the old database.

Make sure all your content is properly restored before going on to the next step!

Make changes to the .htaccess file

You will need to make sure you don’t touch the .htaccess file before you transfer it to your new domain. Only the .htaccess file on your old domain needs to change. Remember this, or you’ll be wondering what’s going on with the redirects afterwards…

Use the code I’ve given you above, in the Planning and Research section, to make changes to the .htaccess file on your old domain, after you’ve made absolutely sure that all of your content is now mirrored on the new domain. Once this is done, the redirects will occur automatically and seamlessly.

Final checks and tweaks

This is very important. Surf to your old URL. You should get re-directed to your new URL. Do a search in the search engines for content of yours that you know is easily found. Click on the search results and make sure the links get redirected to your new site. Because you’re using 301 redirects, the search engines will automatically change their search results to reflect the URL changes without affecting your page rank, so you shouldn’t lose any search engine traffic if you execute the content move correctly.

There are a few more things you’ll need to check:

If you’d like to make changes to your site feed (and I did), you’ll need to handle that properly. I use FeedBurner, and there are people that subscribe to my content via RSS or via email. I needed to transfer both groups of subscribers to my new feed seamlessly. The FeedBurner folks helped me do just that, and I didn’t lose a single subscriber during the move. I detailed that process in this post.

What about internal links? If you’ve blogged for a while, you’ll have linked to older posts of yours. Those link URLs now contain the old domain, and you’ll need to change all of them at some point, or you’ll risk making those links invalid if you should ever stop renewing your old domain. Fortunately, there’s a Search and Replace plugin for WP that lets you do just that. It works directly with the database, it’s very powerful, and it’s very fast. That means you have to be VERY careful when you use it, because there’s no undo button. You can easily mess up all of your content if you don’t know what you’re doing.

What I did was to replace all instances of “.raoulpop.com/” with “.raoulpop.com/“. That did the trick nicely. I then did a regular site search for all instances of ComeAcross and manually made any needed changes to those posts. (Here’s a thought: back up the DB before you start replacing anything. This way you can restore if something should go wrong.)

Finally, if you’re using the Google Sitemaps Generator plugin, you’ll want to make sure you manually rebuild your site map. You don’t want to have your old site information in the site map as Google and the other search engines start to crawl your new domain.

That’s about all I did for the site content transfer. It occupied half my New Year’s Eve night, but it was worth it. It’s quite a bit of work, but if you plan it out, it should only take you 4-5 hours or less to execute the transfer, depending on your familiarity with this sort of thing, and the speed of your internet connection (keep in mind that upload speeds are a LOT slower than download speeds on most broadband connections).

Given how much work is involved, I was a bit surprised to see Matthew Mullenweg (founding developer of WordPress) talk about doing his own switch to a new domain in “2 seconds“. I think what he referred to is the changes to the .htaccess file and the blog URLs, which are the fastest parts of the process. There is, however, quite a bit of work that needs to take place behind the scenes before those switches can get flipped. And I also believe (someone correct me if I’m wrong) that he pointed both domains to the same web files — in other words, re-used his existing WP install — so he bypassed a lot of the steps that are otherwise required.

Hope this proves helpful to someone!

Standard
Thoughts

A few feed changes for my site

Birds of a feather…

The transfer of all my content from comeacross.info to raoulpop.com has gone smoother than expected, which is great. I’ve been monitoring the feed usage stats, and it looks like everyone has migrated over to the new feed. Just in case, please check your bookmarks and feeds, and correct them as follows, where appropriate:

All of my other feeds have stayed the same. Here they are:

Of course, all URLs are getting automatically redirected (with a 301 status) from comeacross.info to raoulpop.com. That’s been working great, although some people reported issues during the first few days. Thanks for letting me know about those!

If you’re linking to my site in your sidebar, could you do me a big favor and check to make sure you’re no longer linking to comeacross.info but to raoulpop.com? And if you’re not linking to me, would you please?

A big thank you goes out to FeedBurner for migrating my email subscribers and helping with the feed redirect!

Standard
Thoughts

Catching a code injection hacker in the act

Several days ago, I installed the Redirection plugin from Urban Giraffe. It’s truly awesome, in more ways than one. John Godley, you are an amazing programmer! As I re-arranged the categories on my blog, I tracked the 404 errors through the plugin. On Saturday morning, I noticed the following bit of information in my log:

You can click on the thumbnail to view the screenshot at full size. Look at the entries for IP address 65.90.251.169. Notice something peculiar? That’s a hacker trying to inject malicious code into my pages. He was trying to call to code contained in a text file by the name ide.txt located on a possibly compromised domain.

First, I checked out his domain, new-fields.com. It looked legitimate. The text file was another story altogether. Have a look at the screenshots above. I also saved the code to my computer in case it ends up disappearing from the hacker’s website.

I tested the code, and it looks like some pages from the podPress plugin are targeted or affected — at least that’s what the error message given by WP referenced when I ran the code. I had that plugin enabled at the time, and I’ve disabled it since. It seems that the code tries to modify one of the header.php pages, along with checking disk space (?). So I thought, let me find out who this hacker is. Apparently, he’s from Napperville, IL, US, or at least that’s where his IP address lives.

What’s more, I thought it’d be interesting to see who owns that domain name where his text file resides. It turns out to be one Samir Farajallah from Dubai.

So what we’ve got so far is some dude in Dubai who owns the domain where the malicious code resides, and some hacker in Napperville, IL, trying to exploit my blog using that malicious code.

Wait, it gets better… On Saturday evening, I have another look at my blog’s 404 log, and I find that some other hacker from Vietnam (IP address: 203.171.31.19) is trying to hack into my blog using that exact same code, but this time the text file’s located on some domain in Argentina. That last link leads directly to the text file with the malicious code, but it’s harmless if you browse it. It only works if you run it as PHP code, like these hackers are trying to do.

So far, it looks like I’ve got two hackers, who may or may not be working together, using the same malicious code, located on two different, possibly compromised domains, and trying to modify my header files, possibly to insert code in there that will display splog content or some other stuff.

Update: It looks like three more hackers are trying their luck today, on Sunday morning, 9/30/07. Their IP addresses are 65.98.14.194, 66.79.165.19 and 66.11.231.48.

What I can tell you is that they haven’t been successful. I checked all of my files, and none of them have been touched. Everything’s fine. At this point, I’m not going to waste any more of my time trying to hunt them down. If I see that the attacks continue, I’ll notify my web hosting provider, along with the hosting providers of the other domains, and I’ll also notify the ISPs who own the IP addresses used in the attacks.

My thanks go out to John Godley for the wonderful Redirection plugin. I wouldn’t have been able to catch these hackers without it. I don’t often check my 404 log files, although I should.

I’ve been working in IT for 13 years or so. Maybe I’m naive, maybe I’m too honest for my own good, but I’ve stayed away from this hacking business, and I’ll continue to do so. It’s just not a sustainable lifestyle. I believe that the bad stuff you do in life will catch up with you sooner or later. It’s inevitable. These hackers will get what’s coming to them, and I won’t even have to lift a finger beyond what I’ve done so far.

Standard
How To

Automatic redirect from HTTP to HTTPS

IIS (Internet Information Server) doesn’t have a way to automatically redirect HTTP traffic to HTTPS if SSL encryption is enabled for a site. So if you’ve got a site that users are supposed to access by typing in https://www.example.com, but they type in http://www.example.com or http://www.example.com or just example.com, they’re going to get a pretty ugly error message that looks like this:

What can you do? Well, there are two ways of going about it, and both of them are hacks, but they do the job just fine. I prefer method 2 myself.

Method 1:

Make sure the original site (the one with SSL encryption) is listening only on port 443 for the IP address you’ve assigned to it. Now create a separate site using that same IP address, and make sure it only listens on port 80. Create a single file at the root level and call it default.htm or default.asp. If you want to use HTML, then use a meta refresh tag. If you want to use ASP, use a redirect. I’ll give you examples for both below.

<meta http-equiv="Refresh" content="0;URL=https://www.example.com" /> 

or

<% Response.Redirect("https://www.example.com") %>

Don’t forget to enclose each line in its proper brackets. This method works great, but it has one shortcoming. If the site visitor chooses to go to http://www.example.com/somepage.htm, they’re going to get forwarded to the root-level of the HTTPS site, because that’s the nature of the script. It doesn’t differentiate between the page addresses. So you may ask yourself, isn’t there some other way of doing this? Yes, there is.

Method 2:

This method doesn’t require the creation of an additional site. All that you need to do for this is to create an HTML file — I call mine SSLredirect.htm — then point IIS to it using a custom error capture. First, here’s the code that you need to paste in that HTML file:


<script language="JavaScript">
<!-- begin hide

function goElseWhere()
{
var oldURL = window.location.hostname + window.location.pathname;
var newURL = "https://" + oldURL;
window.location = newURL;
}
goElseWhere();

// end hide -->
</script>

Once you’re done editing the file, save it to the root level of your site, or to the root level of IIS (c:\inetpub\wwwroot\). Saving it to that general location lets you use that same file to fix the HTTPS redirection problem for all of the sites you host on a single server.

Now, in IIS 6, right-click on the site in question, go to Properties >> Custom Errors, and double-click on 403;4. Select File for Message Type, then browse for the file you’ve just created and click on OK. In IIS 7, click on your site, then double-click on Custom Errors, locate the Add link in the top right-corner, and add an error for 403;4, as shown in the image below.

IIS 7 Error Configuration

Once you’ve done this, your sites should automatically transfer HTTP traffic to HTTPS when it’s required, and the visitors won’t be forwarded to the root-level of the site. Instead, the URL will be remembered, and the page will simply be re-loaded using the HTTPS protocol. Come to think of it, you could write this in ASP as well, and avoid potential problems caused by browsers that have JavaScript turned off, but this code should work just fine for a lot of people.

Standard