Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to return 404 for unknown page
#1
My installation is not returning a 404 for an unknown-page request. Instead, it displays a "Oops Page not found" message. I like that, but I'd also like the server to return 404 as well. Currently it is returning 200.

I tried the bug fix mentioned here ( http://code.google.com/p/get-simple-cms/...tail?r=194 ) but the server still returned a 200.

I'm using GS 2.03 plus that patch mentioned above. I use a modified .htaccess (I run other apps as well as GS on the site) but I'm not rewriting the headers as far as I can tell.

I wonder if this is doing it
Code:
RewriteRule /?([A-Za-z0-9-]+)/?$ index.php?id=$1 [QSA,L]

An example is (/fathead is non-existent):
Code:
(in browser) http://www.nickcoleman.org/fathead
(from apache log)  "GET /fathead HTTP/1.1" 200 2284 "-"

Thanks.
Reply
#2
NickC Wrote:My installation is not returning a 404 for an unknown-page request. Instead, it displays a "Oops Page not found" message. I like that, but I'd also like the server to return 404 as well. Currently it is returning 200.

I tried the bug fix mentioned here ( http://code.google.com/p/get-simple-cms/...tail?r=194 ) but the server still returned a 200.

I'm using GS 2.03 plus that patch mentioned above. I use a modified .htaccess (I run other apps as well as GS on the site) but I'm not rewriting the headers as far as I can tell.

I wonder if this is doing it
Code:
RewriteRule /?([A-Za-z0-9-]+)/?$ index.php?id=$1 [QSA,L]

An example is (/fathead is non-existent):
Code:
(in browser) http://www.nickcoleman.org/fathead
(from apache log)  "GET /fathead HTTP/1.1" 200 2284 "-"

Thanks.

Could you post a link to your site?

- Matt
Reply
#3
www.nickcoleman.org
Reply
#4
Change index.php:
Code:
# if page does not exist, throw 404 error
if ($url == '403') {
    header('HTTP/1.0 404 Not Found');
}

to be
Code:
# if page does not exist, throw 404 error
if ($url == '404') {
    header('HTTP/1.0 404 Not Found');
}

-Rob A>
Reply
#5
Yes, I already did that (it's the bug fix I mention in my first post).

[Later] README for hosted sites.

If you use a hosting company for your site, you may come across the problem I described. In summary, the cause is inaccurate logging. There is no bug in GS (apart from the one I mention above).

The issue was that I was checking my logs and noticed that a path I knew was invalid was returning a 200. I also knew that Google frown on incorrect 200 for SEO, so I wanted to fix it.

I then went into a long, convoluted and pointless testing session to try and find why GS wasn't returning a 404 for invalid paths.

Eventually, I telnet'ed into the server and was able to see the exact output. Lo and behold, GS was returning a 404. So, GS is doing the correct thing.

I still don't know exactly what is happening. I think that my hosting company's apache logs are not correct. The 404 is not showing in either the access.log or the error.log.

My hosting company is DreamHosti. I wonder what other DreamHost customers are seeing in their logs.
Reply
#6
well, thanks for getting that sorted out Nick. Glad it wasn't GS doing it, but I hope Dreamhost fixes their problem. Thanks for investigating.
- Chris
Thanks for using GetSimple! - Download

Please do not email me directly for help regarding GetSimple. Please post all your questions/problems in the forum!
Reply
#7
NickC Wrote:Yes, I already did that (it's the bug fix I mention in my first post).

Sorry - hadn't seen that.

Your headers look good to me too:
Code:
Response Headers - http://www.nickcoleman.org/fathead

Date: Thu, 03 Feb 2011 16:06:46 GMT
Server: Apache
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 2000
Content-Type: text/html; charset=UTF-8

404 Not Found

So it must be host logs.

-Rob A>
Reply




Users browsing this thread: 1 Guest(s)