GetSimple Support Forum

Full Version: Excluding public pages from sitemap
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I know this has been asked before (I searched for this and went over the answers) but I could find the solution I was looking for. Maybe I missed something, in which case I apologize in advance.

I have some pages on my site that I don't want search engines to list, so I want to exclude them from the XML sitemap. I know I can set a page to private to exclude it, but that would also make it unavailable to the public. I need these pages public, but not listed.

For example, one of these pages is a "Thank you" message that the user gets when he/she confirms the subscription to my email newsletter. It would be weird to have this listed on my sitemap and/or that it appears on the search engine results.

I'm already using the MetaRobots plugin to set these pages to noindex & nofollow. Now I just need to exclude them from the sitemap.

Is there a way to exclude specific pages from the sitemap, without setting them private? Thanks in advance.
I guess I could just disable automatic sitemap creation, and just create one manually and upload it. But still would like to know if there's a more practical way to do it (one that doesn't require me to manually exclude the pages from the sitemap)
I am surprised there is not a plugin for this with a checkbox in pages to omit from sitemap.

You can make your own plugin to filter sitemap if you know a little bit of php.

https://github.com/GetSimpleCMS/GS_unit_...sample.php
if you create a plugin, please share
While I'm learning a little PHP, I don't know enough yet to make a plugin. Would gladly do it if I knew how.

Maybe we could try suggesting this feature to the author of the MetaRobots Plugin.

Update: I just posted a message on the plugin's forum asking for this feature.
I posted the plugin already, you just have to change the filtering logic.
Hi Shawn.

I really appreciate your effort, but to be honest I'm a little lost a what to do. My PHP knowledge is pretty limited (you could say almost nothing) and I'm not very familiar with GS PHP functions.

I'm going to need to polish my PHP skills a bit and familiarize myself with plugin development in GS. But I'll give it a try.
I just made a small plugin (attached). It requires the MetaRobots plugin, of course. Let me know if it works.
does not work with the plugin I18N
Ah yes, I forgot to say. That metaRobots_sitemap plugin requires GetSimple 3.3.x and doesn't work with the I18N (base/navigation) plugin as of version 3.2.8

I suppose it may work with that sitemap patch I posted (only if it's a one-language site that doesn't use 'lang' in URLs).
I have a short memory, with your patch it works Smile
I was thinking to just add a tag check in the sample filter plugin or something temporary.
Does i18n break the sitemap filter?
My plugin is based on your sample plugin, but runs thru $pagesArray to check the metarobots field.

I18N (as of 3.2.8) doesn't (and doesn't let GS) run the 'sitemap' filter. A possible patch here (Oleg says it works)
oh yeah i forgot the sitemap wont have slug in it, so you cant just cross reference it.
good job, this would definitely benefit from a route cache of sorts, thats alot of processing to do.

I just did a quick look over the rfc, which is good to know.
"Extending the Sitemaps protocol
You can extend the Sitemaps protocol using your own namespace. Simply specify this namespace in the root element. "
Thanks guys for the plugin. I'll test it ASAP and share the results.
Yes, it's working great on my GS sites. Thanks Carlos and Shawn.

Note: I'm not using the i18n plugin on my sites, so I can't comment on that issue.
Anyone feel free to take the plugin I posted and adapt it to I18N multilanguage, as it currently works only properly with I18N if it's one-language, as I had already commented here.
I don't have the time for this right now. (But please PM or tell me if you intend to do it, just in case some day I can...)