GetSimple Support Forum
UTF-8, still using HTML entities in navigation markup output? - Printable Version

+- GetSimple Support Forum (http://get-simple.info/forums)
+-- Forum: GetSimple (http://get-simple.info/forums/forumdisplay.php?fid=3)
+--- Forum: General Questions and Problems (http://get-simple.info/forums/forumdisplay.php?fid=16)
+--- Thread: UTF-8, still using HTML entities in navigation markup output? (/showthread.php?tid=1386)



UTF-8, still using HTML entities in navigation markup output? - polyfragmented - 2011-03-08

Hi,

I'm a new user from Germany testing GetSimple. Liking it lots so far.

I noticed that German umlauts (ä,ü,ö,...) in a navigation's link title attribute and link text are being rendered with HTML entities in GetSimple's markup output. Since the .htaccess file and my own encoding metatag request UTF-8 that seems redundant? Seems to happen in navigation only here. Using GS 2.0.3.1 Edit: plus transliteration plugin.

Code:
<li class="ueber current"><a href="http://domain.info/ueber/" title="&Uuml;ber mich">&Uuml;ber mich</a></li>

Keep up the good work! Smile

Thorsten


UTF-8, still using HTML entities in navigation markup output? - yojoe - 2011-03-08

When you create a new page, name its url without special characters,
or use this plugin: http://get-simple.info/extend/plugin/slug-transliteration/33/


UTF-8, still using HTML entities in navigation markup output? - polyfragmented - 2011-03-08

Thanks for your reply,

I forgot to mention that I am indeed using the transliteration plugin which cleaned up the page slug itself fine. I'm using the default call for building the navigation.


UTF-8, still using HTML entities in navigation markup output? - polyfragmented - 2011-03-10

I filed a bug report at http://code.google.com/p/get-simple-cms/issues/detail?id=142


UTF-8, still using HTML entities in navigation markup output? - ccagle8 - 2011-03-13

Ive checked this out, and I fail to see where the problem is. From the source, you may see the encoded characters, but why is this a bad thing?

We've had discussions about this very thing before (encoded chars in the page's body) and noone has ever provided proof that this is a bad thing. From what I know, I think Google will be able to read something like this:

Code:
German &Auml;&auml;&Ouml;&ouml;&Uuml;&uuml;



UTF-8, still using HTML entities in navigation markup output? - mvlcek - 2011-03-13

ccagle8 Wrote:Ive checked this out, and I fail to see where the problem is. From the source, you may see the encoded characters, but why is this a bad thing?
...
Code:
German &Auml;&auml;&Ouml;&ouml;&Uuml;&uuml;

It makes a lot of operations on the page a bit more difficult, e.g.
  • Extracting the words for search - you have to decode the entities first
  • Extracting the first n characters for an excerpt - the number of characters will be off, if you don't decode
  • Splitting a long page into multiple pages.

Together with these terrible slashes I had to get the content for I18N Search like (it works, but it is correct?):

Code:
$content = html_entity_decode(strip_tags(stripslashes(htmlspecialchars_decode($pagedata->content))), ENT_QUOTES, 'UTF-8')

I think, it's impossible to get the page content with tags, but everything besides htmlspecialchars decoded - thus the pagify plugin doesn't care.

BTW: I hope the person responsible for add/stripslashes and it's "automagical" usage is never allowed to define a functionality of a programming language again ;-)


UTF-8, still using HTML entities in navigation markup output? - polyfragmented - 2011-03-14

I'll research this further from an encoding/SEO point-of-view when I find the time.

Thanks for chipping in on this from a programming point-of-view, mvlcek.


UTF-8, still using HTML entities in navigation markup output? - polyfragmented - 2011-03-31

Someone brought up the same problem in the system's editor. Connie suggested a change to the editor config which turns off character replacement in the editor. That solves the editor problem at least and makes edits easier with lots of special characters.