I have recently noticed that when I look at our indexed pages in Google urls are starting to show up with the https:// prefix. This is causing pages to show up twice now one for http:// and one for https://
I looked all over my site trying to figure out how Google got to a link with https:// to start with. I realized it was coming in from the ShoppingCart.aspx page. The default robots.txt file you put with the cart disallows /shoppingcart.aspx, but the robots.txt file is case sensitive. I changed it to the correct case and now Google can not get there.
Now I need to try and get those pages that were indexed with https:// de-listed. I did some research and this seems to be the best method to get this worked out:
Use url rewrites to 301 redirect the https pages to the http pages.
OR
1. Add NOINDEX, NOARCHIVE to all https pages.
2. When google crawls the pages they'll drop out
3. Create a robots.txt file for secure and non-secure pages. Disallow all bots on the ssl one.
I am not sure how to use the ISAPI rewrite to do this. Anyone know how to do it that way? The rewrites I use are setup in this fashion:
RewriteRule /old-page.aspx /new-page.aspx [I,RP]
If I put this it does not work:
RewriteRule https://www.site.com/old-page.aspx http://www.site.com/new-page.aspx
The second solution listed I am not sure how to add the meta tag NOINDEX, NOARCHIVE to just the https pages. I have this code, but don't know how to get it put into the templates:
if (Request.ServerVariables["HTTPS"] == "on") {
Response.Write("<meta name=\"robots\" content=\"nofollow,noindex,noarchive,nocache\" />");
}