GetSimple Support Forum

Full Version: UTF-8, still using HTML entities in navigation markup output?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,

I'm a new user from Germany testing GetSimple. Liking it lots so far.

I noticed that German umlauts (ä,ü,ö,...) in a navigation's link title attribute and link text are being rendered with HTML entities in GetSimple's markup output. Since the .htaccess file and my own encoding metatag request UTF-8 that seems redundant? Seems to happen in navigation only here. Using GS 2.0.3.1 Edit: plus transliteration plugin.

Code:
<li class="ueber current"><a href="http://domain.info/ueber/" title="&Uuml;ber mich">&Uuml;ber mich</a></li>

Keep up the good work! Smile

Thorsten
When you create a new page, name its url without special characters,
or use this plugin: http://get-simple.info/extend/plugin/slu...ration/33/
Thanks for your reply,

I forgot to mention that I am indeed using the transliteration plugin which cleaned up the page slug itself fine. I'm using the default call for building the navigation.
Ive checked this out, and I fail to see where the problem is. From the source, you may see the encoded characters, but why is this a bad thing?

We've had discussions about this very thing before (encoded chars in the page's body) and noone has ever provided proof that this is a bad thing. From what I know, I think Google will be able to read something like this:

Code:
German &Auml;&auml;&Ouml;&ouml;&Uuml;&uuml;
ccagle8 Wrote:Ive checked this out, and I fail to see where the problem is. From the source, you may see the encoded characters, but why is this a bad thing?
...
Code:
German &Auml;&auml;&Ouml;&ouml;&Uuml;&uuml;

It makes a lot of operations on the page a bit more difficult, e.g.
  • Extracting the words for search - you have to decode the entities first
  • Extracting the first n characters for an excerpt - the number of characters will be off, if you don't decode
  • Splitting a long page into multiple pages.

Together with these terrible slashes I had to get the content for I18N Search like (it works, but it is correct?):

Code:
$content = html_entity_decode(strip_tags(stripslashes(htmlspecialchars_decode($pagedata->content))), ENT_QUOTES, 'UTF-8')

I think, it's impossible to get the page content with tags, but everything besides htmlspecialchars decoded - thus the pagify plugin doesn't care.

BTW: I hope the person responsible for add/stripslashes and it's "automagical" usage is never allowed to define a functionality of a programming language again ;-)
I'll research this further from an encoding/SEO point-of-view when I find the time.

Thanks for chipping in on this from a programming point-of-view, mvlcek.
Someone brought up the same problem in the system's editor. Connie suggested a change to the editor config which turns off character replacement in the editor. That solves the editor problem at least and makes edits easier with lots of special characters.