Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Include the slug transliteration plugin into the core of GS
#1
Since inclusion of essential plugins into the core is mentioned on the roadmap, I'd like to propose that the slug transliteration be included.

I believe it's important for any non-english user with special glyphs in their alphabet. It seems to be something clients expect to be automagical (from my experience).
#2
i've heard that this plugin has it's own problems... has there been any trouble with this in the past? what happens if we start using this plugin after not-using it for so long.

eg: will a site that is 3.0 have any residual problems if they upgrade to 3.1 and it has this plugin activated by default?
- Chris
Thanks for using GetSimple! - Download

Please do not email me directly for help regarding GetSimple. Please post all your questions/problems in the forum!
#3
CC: with polish translation for gs3.0 I've been supporting the transliteration plugin, and it would be nice to hardcode this plugin's functionality into GS core (I've been asking Zegnat to do it long time ago). It has its own problems in some cases (now I don't remember which one exactly) but when someone creates a new page and uses special chars for page title (without using page options) CMS should translate special chars to normal chars automatically, when using a translation file.
And I'd even insist of embedding a special chars table for all languages inside GS core, provided by users to prevent from URI being cut off during page creation.

I'm not sure if google doesn't offer an API for char convertion table.
Addons: blue business theme, Online Visitors, Notepad
#4
http://code.google.com/p/get-simple-cms/...ail?id=195
- Chris
Thanks for using GetSimple! - Download

Please do not email me directly for help regarding GetSimple. Please post all your questions/problems in the forum!
#5
Nice, Chris \o/
#6
CC: give us a ping when you embed this function.
I'm not sure about other translations, but if char conversion table won't be embed in translation file, tell us what to do.
Addons: blue business theme, Online Visitors, Notepad
#7
yojoe - i am confused looking at this plugin of Zegnat's. Does anyone feel comfortable helping me implement this into the core? I can always try and hunt down Zegnat too...

I feel no reason to change where the TRANSLITERATION values are coming from (it looks like they are coming from the respective language file).
- Chris
Thanks for using GetSimple! - Download

Please do not email me directly for help regarding GetSimple. Please post all your questions/problems in the forum!
#8
ccagle8 Wrote:I am confused looking at this plugin of Zegnåt’s. Does anyone feel comfortable helping me implement this into the core? I can always try and hunt down Zegnat too…
Oh joy, I’ve not been forgotten yet. It seems there hasn’t even been found anyone to replace me ;-)

I’ll update the issue over at Google Code with an explanation of the code soonish.

Quote:UPDATE: An explanation of the plugin code and working has been posted.

ccagle8 Wrote:I’ve heard that this plugin has it’s own problems…
Problems were reported, but they were very specific and edge-case. Most of them could never be reproduced by other users (including me) which made them hard to fix. Of course some of the problems can be circumvented just by having this integrated in the core: due to the nature of plugins the transliteration code ran after GetSimple processed everything already.

ccagle8 Wrote:It looks like [the transliteration values] are coming from the respective language file
Yes. That was the best way I could think off to make these values extendable by the users. It’s not feasible to have hard-coded tables for all possible languages, especially due to transliteration differences between languages. An example would be German and Swedish. In German tremata (¨) are often transliterated as -e (ö → oe) but in Swedish people often opt for just removing it (ö → o).

yojoe Wrote:I'm not sure if google doesn't offer an API for char conversion table.
Google offers an API as part of the Language API Family. I’m not sure whether we could implement it though, due to its JavaScript nature.
“Don’t forget the important ˚ (not °) on the a,” says the Unicode lover.
Help us test a key change for the core! ¶ Problems with GetSimple? Be sure to enable debug mode!
#9
I've been searching today for simplest methods of transliteration, as this thing Is a must have feature in GS - since it's search engine friendly but not in 100% w/o this option.

CC: Have a look at chr function
But it would need to make use of extended latin characters codes
http://jrgraphix.net/research/unicode_blocks.php

I've seen also using strtr() or just an ordinary str_replace() functions to switch utf-8 chars to ascii ones, like:
Code:
$translit = array(    'Å '=>'S', 'Å¡'=>'s', 'Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
                                'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U',
                                'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss', 'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'Ã¥'=>'a', 'æ'=>'a', 'ç'=>'c',
                                'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o',
                                'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y' );
    $string = strtr( $string, $translit );
Code:
function remove_accent($str)
{
  $a = array('À', 'Á', 'Â', 'Ã', 'Ä', 'Å', 'Æ', 'Ç', 'È', 'É', 'Ê', 'Ë', 'Ì', 'Í', 'Î', 'Ï', 'Ð', 'Ñ', 'Ò', 'Ó', 'Ô', 'Õ', 'Ö', 'Ø', 'Ù', 'Ú', 'Û', 'Ü', 'Ý', 'ß', 'à', 'á', 'â', 'ã', 'ä', 'Ã¥', 'æ', 'ç', 'è', 'é', 'ê', 'ë', 'ì', 'í', 'î', 'ï', 'ñ', 'ò', 'ó', 'ô', 'õ', 'ö', 'ø', 'ù', 'ú', 'û', 'ü', 'ý', 'ÿ', 'Ā', 'ā', 'Ă', 'ă', 'Ą', 'ą', 'Ć', 'ć', 'Ĉ', 'ĉ', 'Ċ', 'ċ', 'Č', 'č', 'Ď', 'ď', 'Đ', 'đ', 'Ē', 'ē', 'Ĕ', 'ĕ', 'Ė', 'ė', 'Ę', 'ę', 'Ě', 'ě', 'Ĝ', 'ĝ', 'Ğ', 'ğ', 'Ä ', 'Ä¡', 'Ä¢', 'Ä£', 'Ĥ', 'Ä¥', 'Ħ', 'ħ', 'Ĩ', 'Ä©', 'Ī', 'Ä«', 'Ĭ', 'Ä­', 'Ä®', 'į', 'Ä°', 'ı', 'IJ', 'ij', 'Ä´', 'ĵ', 'Ķ', 'Ä·', 'Ĺ', 'ĺ', 'Ä»', 'ļ', 'Ľ', 'ľ', 'Ä¿', 'ŀ', 'Ł', 'ł', 'Ń', 'ń', 'Ã…Â…', 'ņ', 'Ň', 'ň', 'ʼn', 'Ã…ÂŒ', 'ō', 'Ã…ÂŽ', 'ŏ', 'Ő', 'ő', 'Ã…Â’', 'œ', 'Ŕ', 'ŕ', 'Ŗ', 'ŗ', 'Ř', 'ř', 'Ś', 'ś', 'Ã…Âœ', 'ŝ', 'Ş', 'ş', 'Å ', 'Å¡', 'Å¢', 'Å£', 'Ť', 'Ã…Â¥', 'Ŧ', 'ŧ', 'Ũ', 'Å©', 'Ū', 'Å«', 'Ŭ', 'Å­', 'Å®', 'ů', 'Å°', 'ű', 'Ų', 'ų', 'Å´', 'ŵ', 'Ŷ', 'Å·', 'Ÿ', 'Ź', 'ź', 'Å»', 'ż', 'Ž', 'ž', 'Å¿', 'ƒ', 'Æ ', 'Æ¡', 'Ư', 'Æ°', 'Ǎ', 'ǎ', 'Ǐ', 'ǐ', 'Ǒ', 'ǒ', 'Ǔ', 'ǔ', 'Ǖ', 'ǖ', 'Ǘ', 'ǘ', 'Ǚ', 'ǚ', 'Ǜ', 'ǜ', 'Ǻ', 'Ç»', 'Ǽ', 'ǽ', 'Ǿ', 'Ç¿');
  $b = array('A', 'A', 'A', 'A', 'A', 'A', 'AE', 'C', 'E', 'E', 'E', 'E', 'I', 'I', 'I', 'I', 'D', 'N', 'O', 'O', 'O', 'O', 'O', 'O', 'U', 'U', 'U', 'U', 'Y', 's', 'a', 'a', 'a', 'a', 'a', 'a', 'ae', 'c', 'e', 'e', 'e', 'e', 'i', 'i', 'i', 'i', 'n', 'o', 'o', 'o', 'o', 'o', 'o', 'u', 'u', 'u', 'u', 'y', 'y', 'A', 'a', 'A', 'a', 'A', 'a', 'C', 'c', 'C', 'c', 'C', 'c', 'C', 'c', 'D', 'd', 'D', 'd', 'E', 'e', 'E', 'e', 'E', 'e', 'E', 'e', 'E', 'e', 'G', 'g', 'G', 'g', 'G', 'g', 'G', 'g', 'H', 'h', 'H', 'h', 'I', 'i', 'I', 'i', 'I', 'i', 'I', 'i', 'I', 'i', 'IJ', 'ij', 'J', 'j', 'K', 'k', 'L', 'l', 'L', 'l', 'L', 'l', 'L', 'l', 'l', 'l', 'N', 'n', 'N', 'n', 'N', 'n', 'n', 'O', 'o', 'O', 'o', 'O', 'o', 'OE', 'oe', 'R', 'r', 'R', 'r', 'R', 'r', 'S', 's', 'S', 's', 'S', 's', 'S', 's', 'T', 't', 'T', 't', 'T', 't', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'W', 'w', 'Y', 'y', 'Y', 'Z', 'z', 'Z', 'z', 'Z', 'z', 's', 'f', 'O', 'o', 'U', 'u', 'A', 'a', 'I', 'i', 'O', 'o', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'A', 'a', 'AE', 'ae', 'O', 'o');
  return str_replace($a, $b, $string);
}

But then GS would need to have a "ready to go" array with special chars, and those who'd like to make use of their own char conversion table, they could place it inside language file, as it would have higher priority than GS hardcoded char array.
Addons: blue business theme, Online Visitors, Notepad
#10
I'd suggest to rather use the 3-parameter function strtr(string, from, to), then you just have to include two strings TRANSLIT_FROM and TRANSLIT_TO in the language files and don't need to define an extra array.
This makes customization much easier and uses the standard i18n features of GetSimple:

Code:
...
'TRANSLIT_FROM' => 'äöüÄÖÜ',
'TRANSLIT_TO' => 'aouAOU',
...
I18N, I18N Search, I18N Gallery, I18N Special Pages - essential plugins for multi-language sites.
#11
I was getting lastly some headaches with strtr() and mkdir() functions along with chars in 'abcd' format, giving me an awful sausage in folder names.
Thus I'm more for str_replace with array using assignment operators.
But if you already use strtr succesfully, I have nothing against this way Smile

Hopefully backend theme is utf-8, and there shouldn't be problems with converting char codes (I get chills every time I hear about iconv/mb_string).
Addons: blue business theme, Online Visitors, Notepad
#12
yojoe Wrote:Thus I'm more for str_replace with array using assignment operators.
But if you already use strtr succesfully, I have nothing against this way Smile

Now that you mention it: I didn't try it and I suppose it won't work with UTF-8 :-(

However, I would still define the translation as described above in the language files (no additional variable, translate plugin works, ...) and then split it into arrays using a UTF-8 function.

PHP: the greatest pains are utf-8, the automatic escaping of quotes and finding out, if the string/array is the first parameter or not in a specific string or array function. Feel free to add to this list ;-)
I18N, I18N Search, I18N Gallery, I18N Special Pages - essential plugins for multi-language sites.
#13
mvlcek Wrote:
yojoe Wrote:Thus I'm more for str_replace with array using assignment operators.
But if you already use strtr succesfully, I have nothing against this way Smile

Now that you mention it: I didn't try it and I suppose it won't work with UTF-8 :-(

However, I would still define the translation as described above in the language files (no additional variable, translate plugin works, ...) and then split it into arrays using a UTF-8 function.

there's another argument for str_replace: Martijn's translit plugin uses it successfully Wink
From what I've found strtr() needs chars in latin1 encoding (maybe from extended char tables), and thus strings would have to be converted from utf. The working but harsh method is to convert using chr along with preg replace <- a true pain in the ***.

Quote:PHP: the greatest pains are utf-8, the automatic escaping of quotes and finding out, if the string/array is the first parameter or not in a specific string or array function. Feel free to add to this list ;-)

...encoding/decoding, forcing, converting, switching between different char sets, setting locales ... there are only problems with all those special chars. Adding to them databases ...

After looking at all existing php functions in regards of character encoding, I became stupid.
Existing solutions barely works, every system needs own approach <- the essence of multilingual web systems.
Addons: blue business theme, Online Visitors, Notepad
#14
yojoe Wrote:There’s another argument for str_replace: Martijn’s translit plugin uses it successfully Wink
Of course it does! How could I ever be wrong Wink
mvlcek Wrote:include two strings TRANSLIT_FROM and TRANSLIT_TO in the language files and don't need to define an extra array.
This makes customization much easier and uses the standard i18n features of GetSimple:
Code:
...
'TRANSLIT_FROM' => 'äöüÄÖÜ',
'TRANSLIT_TO' => 'aouAOU',
...
It also disables some languages from using the transliteration function. Japanese, Korean, Chinese, et al. come to mind. These languages need multiple Latin characters per symbol, which is impossible when you are just matching 2 strings.

As of now the development copy of GetSimple on Google Code has build in support for the TRANSLITERATION array. Please test it and let us know about any problems!
“Don’t forget the important ˚ (not °) on the a,” says the Unicode lover.
Help us test a key change for the core! ¶ Problems with GetSimple? Be sure to enable debug mode!
#15
Martijn,
I was just thinking... how does this effect someone that already has your translit plugin installed?

also.. this would be great if would all could be as lucky...
Quote:Been away, been busy, but currently trying to make a come back.
- Chris
Thanks for using GetSimple! - Download

Please do not email me directly for help regarding GetSimple. Please post all your questions/problems in the forum!
#16
ccagle8 Wrote:I was just thinking… how does this effect someone that already has your translit plugin installed?
Doesn’t affect them at all.

This is actually worse than it sounds. It doesn’t affect them because the transliteration plugin will keep on doing what it does. The plugin works straight off of the $_POST and will not know anything has changed. So when you upgrade to this newer version it is best to say good-bye to the plugin. (No, nothing is effected by removing the plugin.)

ccagle8 Wrote:also.. this would be great if would all could be as lucky...
Quote:Been away, been busy, but currently trying to make a come back.
Very true. Well, summer break is almost here for me. After that will be the first year of University (hopefully) and no telling how that’s going to effect my online time.
“Don’t forget the important ˚ (not °) on the a,” says the Unicode lover.
Help us test a key change for the core! ¶ Problems with GetSimple? Be sure to enable debug mode!
#17
Zegnåt Wrote:
yojoe Wrote:
mvlcek Wrote:include two strings TRANSLIT_FROM and TRANSLIT_TO in the language files and don't need to define an extra array.
This makes customization much easier and uses the standard i18n features of GetSimple:
Code:
...
'TRANSLIT_FROM' => 'äöüÄÖÜ',
'TRANSLIT_TO' => 'aouAOU',
...
It also disables some languages from using the transliteration function. Japanese, Korean, Chinese, et al. come to mind. These languages need multiple Latin characters per symbol, which is impossible when you are just matching 2 strings.

These strings would be UTF-8, thus all languages are supported. The transliteration function would need to split the string with mb_substr (or similar) to create the arrays needed for strtr (or whatever). As this splitting is only done when saving pages, there is no performance hit.
I18N, I18N Search, I18N Gallery, I18N Special Pages - essential plugins for multi-language sites.
#18
mvlcek Wrote:These strings would be UTF-8, thus all languages are supported. The transliteration function would need to split the string with mb_substr (or similar) to create the arrays needed for strtr (or whatever). As this splitting is only done when saving pages, there is no performance hit.
(Emphasis mine.)

Spitting is exactly the problem, take the following example:
Code:
...
'TRANSLITERATION' => array('æ'=>'ae','ꝛ'=>'r','ß'=>'ss'),
'TRANSLIT_FROM' => 'æꝛß',
'TRANSLIT_TO' => 'aerss',
...
The array works and will transliterate æ to ae and ß to ss. The strings will not work, they will turn æ into a (swapping the first characters) and ß into r (swapping the third characters). There is no way to teach a string splitter when 2 characters might go together.
“Don’t forget the important ˚ (not °) on the a,” says the Unicode lover.
Help us test a key change for the core! ¶ Problems with GetSimple? Be sure to enable debug mode!
#19
Zegnåt Wrote:
mvlcek Wrote:These strings would be UTF-8, thus all languages are supported. The transliteration function would need to split the string with mb_substr (or similar) to create the arrays needed for strtr (or whatever). As this splitting is only done when saving pages, there is no performance hit.
(Emphasis mine.)

Spitting is exactly the problem, take the following example:
Code:
...
'TRANSLITERATION' => array('æ'=>'ae','ꝛ'=>'r','ß'=>'ss'),
'TRANSLIT_FROM' => 'æꝛß',
'TRANSLIT_TO' => 'aerss',
...
The array works and will transliterate æ to ae and ß to ss. The strings will not work, they will turn æ into a (swapping the first characters) and ß into r (swapping the third characters). There is no way to teach a string splitter when 2 characters might go together.

OK, didn't think about replacing one character by multiple ones. You could of course use a comma-separated string like:
Code:
'TRANSLIT_FROM' => 'æ,ꝛ,ß',
'TRANSLIT_TO' => 'ae,r,ss',
By using a associative array it is easier matching from and to characters; it just introduces a special case for the translation file, what I personally don't like ;-)
I18N, I18N Search, I18N Gallery, I18N Special Pages - essential plugins for multi-language sites.
#20
Chris has been so kind to provide us with a new, pre-packaged beta version of GetSimple (r502). This one includes slug transliteration. I’m going to close this topic now, any furthur discussion about this feature should be done on the Beta Testing board.

Remember that you should not use the plugin together with GetSimple 3.1! I updated the plugin page to make this clear.
“Don’t forget the important ˚ (not °) on the a,” says the Unicode lover.
Help us test a key change for the core! ¶ Problems with GetSimple? Be sure to enable debug mode!




Users browsing this thread: 2 Guest(s)