Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Whats wrong with encoding?
#1
Browser show site right, but when i see View->Site source. It show:

http://www.kotos.kylos.pl/a/encode.png
kotos.net - webdesign / dtp / graphics / photography
Reply
#2
There is nothing wrong at all. It swaps characters with their corresponding HTML entities. What I do find odd though is how it only does it to some and not to all non-ASCII characters... I guess this means the editor has a simple list of letters that it changes to the (sometimes better) entities.

It should work in every browser that supports any version of HTML after 3.2 and it should also validate as valid HTML, so there is no problem, right?
“Don’t forget the important ˚ (not °) on the a,” says the Unicode lover.
Help us test a key change for the core! ¶ Problems with GetSimple? Be sure to enable debug mode!
Reply
#3
I think it is problem cause search engeenes and bots read a site code, not like it looks in browser.
Example. How looks site for google bot:
http://www.kotos.kylos.pl/a/encode2.png

Also its important for indexing some keywords of site and positioning.
kotos.net - webdesign / dtp / graphics / photography
Reply
#4
It should not matter for the crawlers. If it does, then the crawler is not supporting HTML and in that case you shouldn't even care for it.

In the case of Google, they are big and have very complex software to crawl the web. They support HTML entities. They even support them if you put them in the searchbox!

As early as 2004 the Search Engine Roundtable already stated that they don't hurt SEO, and back then the crawlers were less complex than today.

Add this up with the fact that certain signs must be encoded for your pages to be valid XHTML (like the ampersand &) and I think we can say that search engines do not care.
“Don’t forget the important ˚ (not °) on the a,” says the Unicode lover.
Help us test a key change for the core! ¶ Problems with GetSimple? Be sure to enable debug mode!
Reply
#5
... i hope so you have right.
kotos.net - webdesign / dtp / graphics / photography
Reply
#6
i must disagree, this is a huge bug! + it is not cause by editor. even when you turn editor off, it is still encoding it. it is a wrong decoding of xml files of get simple engine
Reply
#7
Its not a bug. Its a php / xml thing. Everything is perfectly fine.
http://nijikokun.com
random stuff. idk.
Reply
#8
php/xml thing IS bug... solution is deadly simple php got function called html_entity_decode. it is necessary to solve this otherwise cms is SEO unfriendly for mupltiple languages...
Reply
#9
someone Wrote:php/xml thing IS bug...
Not a bug. HTML entities are recommended for XML (as XML does not support a full set of characters). HTML entities, as the name might imply, are made to be used in HTML. Again, not a bug.
someone Wrote:solution is deadly simple php got function called html_entity_decode.
Looks like something that might take away the entities. They do make a point about UTF8 on the function page though, so that might need looking into.

If you do choose to use this, I hope it does not decode &. This will break XHTML.
someone Wrote:it is necessary to solve this otherwise cms is SEO unfriendly for mupltiple languages...
Please show me where you get this from? After all these SEO complaints I would like to see a single link to a creditable page informing me about this. I already made my case clear a bit earlier in this topic.
“Don’t forget the important ˚ (not °) on the a,” says the Unicode lover.
Help us test a key change for the core! ¶ Problems with GetSimple? Be sure to enable debug mode!
Reply
#10
Hi, I'd like to reopen this - it is problem for me and my users, because my language (czech) uses diacritics, and automatic conversion to entities causes much trouble when editing the HTML source of articles, as it is practically unreadable Sad.

I know that this is your decision, but it would really make life simpler for me & my users if there was an option gsconfig.php that would toggle encoding of entities.

I know that &<> must be encoded acording to standard, but characters like šěřžýíé... are valid UTF-8 characters and need not to be encoded, therefore I propose that the alternative necoding would encode only &, < and >

What do yo think? Thank you for your time and this wonderful little CMS!
Reply
#11
I have looked at the code and managed to fix the behavior for me, it comprised of following 2 steps:

Replacing htmlentities with htmlspecialchars when saving the page - in admin/changedata.php

Setting the ckeditor to not encode diacritics as entities:
- to admin/template/js/ckeditor/config.js add
Code:
config.entities = false;
config.entities_greek = false;
config.entities_latin = false;
config.entities_processNumerical = false;


A patch for actual SVN version is here:

Code:
Index: admin/changedata.php
===================================================================
--- admin/changedata.php    (revision 9)
+++ admin/changedata.php    (working copy)
@@ -86,15 +86,15 @@
         $file = GSDATAPAGESPATH . $url .".xml";
        
         // format and clean the responses
-        if(isset($_POST['post-title'])) { $title = htmlentities($_POST['post-title'], ENT_QUOTES, 'UTF-8'); }
-        if(isset($_POST['post-metak'])) { $metak = htmlentities($_POST['post-metak'], ENT_QUOTES, 'UTF-8'); }
-        if(isset($_POST['post-metad'])) { $metad = htmlentities($_POST['post-metad'], ENT_QUOTES, 'UTF-8'); }
+        if(isset($_POST['post-title'])) { $title = htmlspecialchars($_POST['post-title'], ENT_QUOTES, 'UTF-8'); }
+        if(isset($_POST['post-metak'])) { $metak = htmlspecialchars($_POST['post-metak'], ENT_QUOTES, 'UTF-8'); }
+        if(isset($_POST['post-metad'])) { $metad = htmlspecialchars($_POST['post-metad'], ENT_QUOTES, 'UTF-8'); }
         if(isset($_POST['post-template'])) { $template = $_POST['post-template']; }
         if(isset($_POST['post-parent'])) { $parent = $_POST['post-parent']; }
-        if(isset($_POST['post-menu'])) { $menu = htmlentities($_POST['post-menu'], ENT_QUOTES, 'UTF-8'); }
+        if(isset($_POST['post-menu'])) { $menu = htmlspecialchars($_POST['post-menu'], ENT_QUOTES, 'UTF-8'); }
         if(isset($_POST['post-menu-enable'])) { $menuStatus = "Y"; } else { $menuStatus = ""; }
         if(isset($_POST['post-private'])) { $private = "Y"; } else { $private = ""; }
-        if(isset($_POST['post-content'])) { $content = htmlentities($_POST['post-content'], ENT_QUOTES, 'UTF-8'); }
+        if(isset($_POST['post-content'])) { $content = htmlspecialchars($_POST['post-content'], ENT_QUOTES, 'UTF-8'); }
        
         if(isset($_POST['post-menu-order']))
         {
Index: admin/template/js/ckeditor/config.js
===================================================================
--- admin/template/js/ckeditor/config.js    (revision 9)
+++ admin/template/js/ckeditor/config.js    (working copy)
@@ -8,4 +8,8 @@
     // Define changes to default configuration here. For example:
     // config.language = 'fr';
     // config.uiColor = '#AADC6E';
+    config.entities = false;
+    config.entities_greek = false;
+    config.entities_latin = false;
+    config.entities_processNumerical = false;
};
Reply
#12
Just use firefox , ...
__________________
ghd cheap ghd ghd hair straighteners
Reply
#13
keff Wrote:an option [in] gsconfig.php that would toggle encoding of entities.

What do yo think?
I think we should be able to offer this as an option, that would mae sense. On the other hand, as you said, it’s only a problem when you go and edit the page content by hand. That’s of course not something we had planned for a lot of users to be doing.

A note for everyone who’s thinking of putting that SVN patch in place: it does not offer you an option, it will always disable character encoding.

xseoer Wrote:Just use firefox, …
How would that solve anything?
“Don’t forget the important ˚ (not °) on the a,” says the Unicode lover.
Help us test a key change for the core! ¶ Problems with GetSimple? Be sure to enable debug mode!
Reply




Users browsing this thread: 1 Guest(s)