Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
UTF-8, still using HTML entities in navigation markup output?
#1
Hi,

I'm a new user from Germany testing GetSimple. Liking it lots so far.

I noticed that German umlauts (ä,ü,ö,...) in a navigation's link title attribute and link text are being rendered with HTML entities in GetSimple's markup output. Since the .htaccess file and my own encoding metatag request UTF-8 that seems redundant? Seems to happen in navigation only here. Using GS 2.0.3.1 Edit: plus transliteration plugin.

Code:
<li class="ueber current"><a href="http://domain.info/ueber/" title="&Uuml;ber mich">&Uuml;ber mich</a></li>

Keep up the good work! Smile

Thorsten
Reply
#2
When you create a new page, name its url without special characters,
or use this plugin: http://get-simple.info/extend/plugin/slu...ration/33/
Addons: blue business theme, Online Visitors, Notepad
Reply
#3
Thanks for your reply,

I forgot to mention that I am indeed using the transliteration plugin which cleaned up the page slug itself fine. I'm using the default call for building the navigation.
Reply
#4
I filed a bug report at http://code.google.com/p/get-simple-cms/...ail?id=142
Reply
#5
Ive checked this out, and I fail to see where the problem is. From the source, you may see the encoded characters, but why is this a bad thing?

We've had discussions about this very thing before (encoded chars in the page's body) and noone has ever provided proof that this is a bad thing. From what I know, I think Google will be able to read something like this:

Code:
German &Auml;&auml;&Ouml;&ouml;&Uuml;&uuml;
- Chris
Thanks for using GetSimple! - Download

Please do not email me directly for help regarding GetSimple. Please post all your questions/problems in the forum!
Reply
#6
ccagle8 Wrote:Ive checked this out, and I fail to see where the problem is. From the source, you may see the encoded characters, but why is this a bad thing?
...
Code:
German &Auml;&auml;&Ouml;&ouml;&Uuml;&uuml;

It makes a lot of operations on the page a bit more difficult, e.g.
  • Extracting the words for search - you have to decode the entities first
  • Extracting the first n characters for an excerpt - the number of characters will be off, if you don't decode
  • Splitting a long page into multiple pages.

Together with these terrible slashes I had to get the content for I18N Search like (it works, but it is correct?):

Code:
$content = html_entity_decode(strip_tags(stripslashes(htmlspecialchars_decode($pagedata->content))), ENT_QUOTES, 'UTF-8')

I think, it's impossible to get the page content with tags, but everything besides htmlspecialchars decoded - thus the pagify plugin doesn't care.

BTW: I hope the person responsible for add/stripslashes and it's "automagical" usage is never allowed to define a functionality of a programming language again ;-)
I18N, I18N Search, I18N Gallery, I18N Special Pages - essential plugins for multi-language sites.
Reply
#7
I'll research this further from an encoding/SEO point-of-view when I find the time.

Thanks for chipping in on this from a programming point-of-view, mvlcek.
Reply
#8
Someone brought up the same problem in the system's editor. Connie suggested a change to the editor config which turns off character replacement in the editor. That solves the editor problem at least and makes edits easier with lots of special characters.
Reply




Users browsing this thread: 3 Guest(s)