Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Bug: htmlentities messes code blocks
#1
Greek characters are unicode supported. If entered as regulart text, they make no difference than other text.
If entered in a <code> block they are presented with their entity form. This results to
Code:
&pi; = 3.14
&phi; = 1.62

instead of

Code:
π = 3.14
φ = 1.62

The problem lies into safe_slash_html() function inside admin/inc/basic.php where htmlentities is used. htmlentities() and htmlspecialcharacters() are identical in all ways, except with htmlentities(), all characters which have HTML character entity equivalents are translated into these entities.

Fix:

FIND
Code:
function safe_slash_html($text) {
    if (get_magic_quotes_gpc()==0) {
        $text = addslashes(htmlentities($text, ENT_QUOTES, 'UTF-8'));
    } else {
        $text = htmlentities($text, ENT_QUOTES, 'UTF-8');
    }
    return $text;
}

REPLACE, WITH
Code:
function safe_slash_html($text) {
    if (get_magic_quotes_gpc()==0) {
        $text = addslashes(htmlspecialchars($text, ENT_QUOTES, 'UTF-8'));
    } else {
        $text = htmlspecialchars($text, ENT_QUOTES, 'UTF-8');
    }
    return $text;
}
Reply
#2
thanks for the info,

did you work with GS 3.0 or with the Beta version, GS 3.1?
|--

Das deutschsprachige GetSimple-(Unter-)Forum:   http://get-simple.info/forums/forumdisplay.php?fid=18
Reply
#3
Get-Simple 3.0
Reply
#4
please test GetSimple 3.1B, as there are changes done
|--

Das deutschsprachige GetSimple-(Unter-)Forum:   http://get-simple.info/forums/forumdisplay.php?fid=18
Reply
#5
Just tested with version 3.1B r646 RC3.
Problem is still there.
Reply
#6
intelx86 Wrote:Greek characters are unicode supported. If entered as regulart text, they make no difference than other text.
If entered in a <code> block they are presented with their entity form. This results to
Code:
&pi; = 3.14
&phi; = 1.62
instead of
[...]

You mean in the source code of a page?
I think that html entities are generated in the page content not only between <code> and </code> tags, but everywhere. It happens also with characters like á (&aacuteWink, ä (&aumlWink, etc.

Not a serious problem, as pages are rendered equally by the browser, the user doesn't notice.

...But anyway, I like your suggested patch: the page's source code is more readable if it is in German, French, Spanish, etc. (and I believe it wouldn't break anything)
Reply
#7
The problem arroused as I use markdown syntax while writing pages. Basically markdown is a frontend which parses the content of each page. So it is "silly" to store entities inside the xml files, for characters that are fully supported by unicode.
Reply
#8
Ah, are you using Zegnåt's Markdown plugin...?
Reply
#9
Yes. The problem actually arrouses with the plugin, but basically it's a structural problem (the use of htmlentities instead of htmlspecialchars) of GetSimple.

If greek characters are entered in a simple <pre><code></code></pre> structure, with markdown disabled, the result is greek characters, not their entities, though entitites are used for storage.
Reply
#10
Strange, I get the same entities in the html code with greek (or German/Spanish/etc.) characters both in- and outside <pre><code></code></pre> blocks.

(Anyway, as I said, I like your patch. I think that making that change in the core would be good.)
Reply
#11
@intelx86 , thanks for the patch, I've only seen it now.

Updated the SVN with it...
My Github Repos: Github
Website: DigiMute
Reply




Users browsing this thread: 1 Guest(s)