Epinions.com 
Join Epinions | Help | Sign In   

HomeMember CenterFeedback on Epinions website

Read Advice   Write an essay on this topic. 

à disease

Nov 19 '04 (Updated Dec 27 '04)

The Bottom Line What’s going on & the workaround; I’m too sexy for my shirt. Now with Quick Overview of Common Codes.

“I’m too sexy for my shirt,
too sexy for my shirt
So sexy it hurts…”
- Right said Fred

MESSED UP CHARACTERS

what is going wrong?
Simply put, your text is being messed up by epinions.com. This is the second time that epinions.com messes member’s text up.

The first time epinions.com messed with our texts was the Automatic Link Editing debacle. The havoc that the ill-conceived Automatic Link Editing Program (ALEP) wreaked wasn’t intentional, but the dumb decision to deploy it despite warnings not to do so, was.
The ALEP destroyed many perfectly fine URLs by turning these into erroneous non-URLs. It often turned just part of an URL into a link, leaving readers with the utterly confusing useless results. Authors of modified texts were not informed about these unauthorised changes, and requests to “Community Care” to repair the damage it had done by undoing the destructive changes were ignored.

This last few weeks we’ve seen our texts being messed up again, but it is a different kind of problem. This time round there seems nothing deliberate about it. This time round, complaints have been posted on the Epinions.com Message Board for all other members to see, making it rather hard to ignore these member complaints.
Still, some two weeks after the first report of the defect, it has not been solved yet, and another thread about the continuing defect has just been started.

messed up how?
Just how is your text being messed up?
Simply put, any even slightly out-of-the ordinary character is being changed into something else. Surprisingly, a single character it is being changed into two different characters.

An example many epinionators have observed in more than one opinion, presumably because it is a popular word (now just why would that be?), is the word café. Instead of displaying «café» with an e with accent acute as it should, epinions.com displays «café», replacing the lowercase accented e with an uppercase A with tilde followed by the copyright symbol.

In the first thread mentioned above lorace
reported that the she got “a strange A with a line over it before the (r)symbol” and that the body of her text was “full of meaningless letters everywhere I had the (R) symbol”. In that same thread, tbhtorn reports that he gets “Frédérique” when he types “Frédérique”, but, to avoid using the é types “Frederique with acutely accented e” instead.

In the second thread mentioned above, shilmafone tries to report that the lower case ae ligature, «æ», is being messed up, only to have that «æ» turned into «Ã¦», an uppercase A with tilde followed by a broken vertical bar.

WORKAROUND

how did you do that? the workaround
The defect still hasn’t been fixed, so an obvious question at this stage is “how did you do that?”.
How did I manage to write exactly what I wanted to write? How did I manage to describe the defect with an example without that example being messed up by the very defect it describes?

The simple answer to that question is that I took no chances, but used the correct codes to insert just the characters I wanted — and the best part of it all is that you can do the same.

Even better yet, an overview of the so-called character entity codes for any character you are ever likely to need, is available on epinions.com itself.
The second part of the OWF Trilogy, • real bullets • “true quotes” • ellipsis and… • actual sex symbols! •, lists the character entity codes for all ordinary characters, accented characters, monetary symbol, and then some.

That text is just what you need. You see, there are many overviews of characters on the Internet, but most of these are geared towards technical people, organising the characters by their codes. Quite a few of these includes incorrect codes, codes that do not work in all browsers or codes that work in HTML but not in XHTML, often for the sake of completeness.

This text organises the characters into groupings geared towards writers instead of computers. It lists only correct codes that work in both HTML and XHTML and in any browser worthy of that name. It also lists quite a few characters most such tables do not, and as a bonus explains how to include any other character you fancy.

This handy dandy overview of character codes exists within the epi-approved epilinking space for easy reference, but the truly hot tip here is that you can easily locate that text again by googlewhacking “mobiprof sex symbol”.

WIDESPREAD

how widespread is this defect?
As far as a highly unscientific speedy scan of epinions.com indicates, all of epinions.com content is affected. The defect affects both new and existing opinions. It does not only affect opinions, but new and existing comments too. It does not just affect opinions and comments, but it affects the Message Board too. The defect seems to affects the whole of epinions.com. It affects pricetool.com in just the same way. The reposts of epinions.com reviews on the DealTime.com and shopping.com domain seem unaffected.

So, with all epinions.com content being affected by this defect, how is it possible that • real bullets • “true quotes” • ellipsis and… • actual sex symbols! • continues to display correctly?
What makes that opinion a rock-steady haven of character sanity in a sea of text turmoil? Why oh why wasn’t the “mobiprof sex symbol” text affected just like all other opinions have been affected?

Again, the simple answer is that that text has actually been painstakingly constructed using the very codes it lists, making it quite resistant to any character set meddling with it.

how not to do it

The continued unharmed existence of the sex symbol text proves that epinions.com is still quite capable of displaying all those characters, and the text itself tells you how to do it. That solves one issue, so let’s look at the other one now: how come it goes wrong anyway?

Well, not everyone is using those codes. If you simply include say a copyright symbol in your text right now, something quite a number of epinionators do near the end of their text, it will come out as é, an uppercase A with tilde followed by a copyright symbol.
If you keep editing that text on epinions.com, it keeps getting worse and worse:
0. ©
1. é
2. é
3. é
4. é
5. é

These are the symptoms of the à disease.
All you need to do to get to see this sequence happen for yourself is enter a single copyright symbol to start with, choose to preview, choose to edit, choose to preview, etcetera, without ever really editing anything — yet your text will have changed every time you preview it again, and it will not have changed for the better.
The only way to just get that copyright symbol right now is to use the code listed in the aforementioned epinion.

SEMICOLON

what is going wrong?

I created the “mobiprof sex symbol” text for several reasons. One reason is explained in its first paragraph: pure popular demand. Ever since I had posted part 1 of the OWF Trilogy, I kept receiving questions about characters codes. Most of these were of the “what is the code for…;” variety, and that text answers those.

Another reason was to counteract misinformation being spread by another member by providing correct information. What happened is that this member initially linked to the Objectionable Words Filter (OWF): Not Helpful text from her profile, but later created and posted a graphic image instead.That image showed codes for just A through Z and a through z. It did not provide codes for any accented characters, copyright symbols or such, but more important than that, the codes it did provide were incorrect codes.
I took the trouble to privately inform her that these codes were wrong. I, young idealist that I am (okay, okay, not really all that young anymore, but idealist anyway) was hoping that she would take the small trouble to correct the mistake and repost a corrected image, but she did not.
Although she now knew that image to be wrong, she continued to misinform other members. Providing a source of correct information seemed the best possible response. After all, the absence of that source was why I kept receiving these questions about character codes and why she created that erroneous image in the first place.

The particular mistake made in that image is that all the codes lack their terminating semicolon. All character entity codes start with an ampersand (&) and end with a semicolon (;). Removing the semicolon is not an option.

It is not hard to think of a situation where removing the semicolon messes up the result you are trying to achieve: any text that interlaces accented characters and semicolons. When some character is followed by semicolon, and you try to use the erroneous code (sans semicolon) for that character, the semicolon that is supposed to follow it will mysteriously disappear. Make no mistake about this. it really will be quite a mystery to you why it disappears. The incorrect character code table misinforms you that you used the right code, and you followed that with the semicolon too; the code is there and the semicolon is there, so how come the semicolon disappeared? You may find yourself wasting quite some time tearing your hair out before you even come to consider the possibility that the character code table you used is wrong.

EDITING

do not edit on epinions.com
The EpiHTML Editing Guide has a section titled DO NOT EDIT ON EPINIONS.COM.
The epi-editing advice I give there cannot be repeated too often:

“Do not edit on epinions.com. Do your editing in a word processors. Make all your changes in that word processor, and always resubmit the text from your word processor by copying & pasting it into epinions.com’s edit box.

Do not use the text epinions.com returns to you when you choose to edit your text.
The text you get back is not guaranteed to be the same as the text you put in.

Epinions.com may have changed character entities to characters, and characters into to character entities. The result of these changes may very well be that text that was okay isn’t okay any longer.
That’s right, it is possible to submit text that epinions.com accepts, and see it turned into text that epinions.com does not accept when you choose to edit it…
Don’t ever try to fix the problems epinions.com creates, just make it a habit to always copy & paste afresh from your word processor. ”

messing up characters
The details of just how exactly the edit mode messes up your opinion keep changing — and that is just another reason to not even try to deal with it, but to just keep editing in and copying & pasting from your word processor until your text submits correctly.

SANS SEMICOLON

One particular and completely unnecessary mistake I have observed for months now, and as recently as a few weeks ago is that epinions.com turns simple, straight quotes into codes; you put straight quotes in, and you get codes back. That unnecessary modification of your text is weird and confusing at best, especially if you never use any codes yourselves.
Alas, it is even worse than confusing; because the epinions.com programmer who thought it was a great idea to make this modification used incorrect codes. You probably guessed it by now: the character entity code sans semicolon. It makes you wonder what authoritative sources they use for reference…

As I have long since made it a habit to always copy & paste text from Word, and not edit on epinions.com, I would have not have noticed this, if this had not actually changed my text. That’s right, because of this defect, my text had actually and noticeably been changed.

removing semicolons
The power of the character entity codes is that they mean just the same as the character they stand for. If epinions.coms had just replaced the straight quote by its code ("), I would not have noticed a thing. The problem is that it is not all that epinions.com did; epinions.com replaced the straight quote by its code and then removed the semicolon. You are not allowed to do that. That semicolon is an integral part of the code, and removing it asking for trouble. Epinions.com removed it, I got the trouble.

Removing the semicolon turned the technically correct text into gibberish. Now, it is that programmer’s sheer luck that most browsers are quite forgiving and try to make the best out of any erroneous HTML code they encounter. That forgiving and flexibile browser behaviour allowed the defect to exist without immediately breaking every other text, but that doesn’t mean it is harmless.
If the straight quote had been followed by a letter, the browser would have guessed correctly that the terminating semicolon behind the 4 was missing, would have rendered a straight quote, and continued on. However, it just so happened that this quote started with a number. The browser again did the best it could, interpreted several of the digits following the maimed character entity code as part of that code, rendered the Kanji character corresponding to the five-digit number (character entity 螊, Unicode character U+878A, 螊) and only then displayed the rest of the sentence, which now lacked its first three digits, thus changing the «"698,635» I wrote into «螊,635».
That’s what made me notice that they are still changing perfectly fine ordinary characters into codes and still removing semicolons.

Kanji
By the way, if you do not see the Kanji character displayed correctly in the previous sentences, you may need to download and install a full Unicode font, reconfigure your browser, upgrade your browser, upgrade your OS, or all of the above. It’s an entirely different subject, but rest assured that there is an actual character there.

not being removed?
You may perhaps feel argumentative, feel like arguing that the semicolon is not being removed, that it is just never inserted. You could argue that it is the same difference, but the real point here is that the technically correct way of describing it helps you understand what’s actually happening and why things go wrong; epinions.com replaces a character by its character entity code and then removes the semicolon.

CHARACTER CODES

it is all about the character codes
This particular defect has been part of epinions.com for months, yet the problem with many characters suddenly changing into two different characters has been with us for just the last weeks. Obviously then, this isn’t the whole story. You could even wonder whether it is part of the story behind this changing one character into two different ones. It most definitely is. It is all about the character codes.

examples
Effectively, the defect replaces some characters by some others.
The following two examples were already mentioned:
é is replaced by é
æ is turned into æ
If you put these replacements in codes instead of characters, like this
é is replaced by é
æ is turned into æ
you notice that just as the difference between the codes for é and æ is 3, the difference between the codes for © and ¦ is 3. Both codes are replaced by wrong ones, but the difference between the wrong codes is still the same as the difference between the right codes.

64
More interesting, the difference between the codes for é and ©, é and © is 64, as is the differences between the codes for æ and ¦, æ and ¦. Sixty-four is one of those typical computer numbers, a power of two. It is two to the power of six.

the replacement
Some testing with random characters suggests that the defect, which not only changes the character, but also keeps adding more and more characters in front of it, only affects characters with codes below 256, and that characters above that are safe from this code mangling.
Some of the first 256 characters are affected, but epinions.com does not mess up each and every character below code value 256, after all, A through Z and a through z display correctly, but it does mess up all the accented characters.

I’ve had a look at what it does to accented characters, and each time, the code it uses has a value 64 less than the value it should have, the semicolon has been removed and it keeps sticking those uppercase A’s with tilde in front of them.

It is always the uppercase A with tilde (Ã), never another character. And, when you edit and preview again, it is always the florin (ƒ) it keeps adding.
Did I mention that the code for an uppercase A with a tilde (Ã) is Ã, the code for florin (ƒ) is ƒ and that the numerical difference is 64?
We are onto something here.

By the way, only when you use Windows code page 1252 does code 131 represent a florin. Codes in the range 128 through 159 should not be used at all.

Ã…;
So here is what is happening to the lone lowercase e with accent acute we started with.

On the first pass through the edit and preview cycle, the é (é) is changed into another character, and the à is prepended. The particular other character it is changed into is ©, the copyright symbol, because that happens to be the character at ©, the code that’s 64 less than é.

On the next pass through the edit and preview cycle, there now are two characters to deal with, the à and the © and both get the same treatment, independent of each other. Thus, on this pass the à inserted in the previous pass is changed into Ã; the à is changed into ƒ (numerically, the à is changed into ƒ), and another à is prepended in front of it.

On the next pass, that à gets the same treatment again, and so on, and so on. That is why things keeps getting worse with every edit and preview you go through.

Again, none of this will affect you if you follow the advice given in the The EpiHTML Editing Guide and do not edit on epinions.com.

Have another good look at that list of edit and preview cycle results. On the second pass, there are two characters to deal with, the à and the ©.
On the second pass, the copyright symbol «©» is changed into «Â©»; the copyright symbol remains a copyright symbol, but a  is prepended. Pay close attention; that is an  (Â), an uppercase A with a circumflex accent, not an Ã, an uppercase A with a tilde. It is a small but significant difference.

It is always the A with tilde that gets prepended for the characters with codes 192 through 255 and it is always the A with circumflex that gets prepended for the characters with codes 128 through 191. And while the characters in code range 192 through 255 are replaced by a characters with a code that’s 64 lower, the characters in range 128 through 191 remain unchanged.

SOME THEORIES

why Ã
Just why is epinions.com always adding an à or Â? What’s so special about these characters? Surely there is nothing special about that character, so it may just be random noise?
Then again, programmers often use hexadecimal code instead of decimal code. When we refer to Unicode characters by their code value, as I will do shortly, we do so using a capital U, and a plus symbol followed by the hexadecimal code value.
Could it be that some programmer had recognised that removing semicolons was a mistake, and was fiddling with the code to try and correct it, but, instead of appending a semicolon at the end, ended up prepending an Ã?

It’s not the unlikeliest explanation. The semicolon (&#059;) is U+003B, the uppercase A with tilde is U+00C3. It is not impossible that the semicolon’s code value was calculated by hand, an off-by-one error got 3C instead of 3B and an accidental transposition got C3 instead of 3C. That it was probably the same person who transposed the ; changed into à with the rest of the character code only strengthens the likeliness of C3 being transposed into 3C. Off-by-one errors are a very common occurrence.

not unusual
None of these errors is unusual. Transposition and off-by-one errors are the kind of mistake that junior trainee programmers make al too often, but even the most experienced programmers keep making all their lives. The important difference between the senior and junior programmer is that the seniors have made it a habit to keep checking their own code for just this kind of mistake, to catch most of these mistakes from even making it into test, let alone making it into production.

However, that this is a likely explanation does not mean it is the right one. For example, another explanation is that there are two programmers, one who decides that they need 6 places for a character entity code and another who calculates the character entity code to put in that space, but removes the terminating semicolon from that code, thus leaving only five character to fill up a six-character space. These five characters are then placed in the last five of the six available positions, and the first position remains unfilled. However, each memory location always contains something, and it just so happens that the way all this is done, that first of six locations always contains the value for an à or an Â.

So, we understand now why it keeps getting worse and worse with each edit and preview cycle. We also have some theories on why epinions.com is adding an Ã.
We still need to explain why not every character is being mangled like this, and why there is a difference of 64.

A GOOD IDEA…;

why the mangling?
The question why not every character is being mangled is the wrong one. The real question is why some characters are being mangled at all. After all, not messing with the text is the normal behaviour, the default we tend to assume for and expect from epinions.com.

The answer to the question why some characters are being mangled is that epinions.com chooses to do so - and there is even a good reason why they do so.

character sets
You are using one particular character set, epinions.com may be using another and the reader of your opinion may be using yet another one. If you are using Windows, the text you submit with any such characters may look fine for most (not even all!) Windows users, yet look mangled on MacOS.
The saving grace is that most Western character sets, be they for Finnish, Greek or Russian, still have their most common characters in common. Those common characters do not need any conversion, and because of this, text often remains quite understandable when shown in the wrong character set. If you pay attention to this kind of thing, you may even learn to recognise particular opinions by some MacOS users by the particular wrong characters in the right places. They probably aren’t aware of that, because their text looked fine for them on their monitors.

When you are submitting text containing accented characters such as é, you are doing something you shouldn’t do, as you have no idea how it is going to look to your reader. I do know how the previous sentence is going to look to my reader, but then, I didn’t submit an «:é», I submitted «é» instead.

standards
The only way to make sure it works right for everyone is to use standards. The relevant standards for the Internet are HTML and Unicode. Unicode defines the character set and HTML defines how you can include any Unicode character using character entities.

Now, many epinions.com users have no idea about all this, they just type their text, submit it, and expect it to work. Epinions.com tries to accommodate them by converting any “offending” characters into the corresponding character entities. That transformation ensures that all viewers, whatever their operating system and whatever their browser, see the same characters you do.

looks just right
Epinions.com could actually try to detect the operating system you are using to try and guess the right conversion table, but it is probably not doing that. It probably just tried to make sure it works fine for Word for Windows users.
Just how they do it does not really matter right now, the fact remains that if you replace every problematic character by a character entity, the page should look the same for everyone. Thus, if the resulting page looks right for you, it will right for everyone else. That’s definitely an improvement over the older situation where it always looked right for you, and you were kept unaware that looked wrong for others. So, the replacement of any such special characters by the character entities is a good thing. So, it’s actually a good thing.


not being done right
Well, it’s a good idea. The problem is that it’s not being done right;
• Characters that do not need to be replaced by character entities are being replaced anyway (straight quotes work just fine and do not need to be replaced by character entities at all).
• The terminating semicolon is removed from the codes. This ensures that the page will not validate as correct HTML. Browsers try to make the best of such erroneous HTML, but the results are not always as originally intended, and different browsers may guess differently. The only way to remain in control over the text is provide the browser with correct and complete codes.
• Right now, a single character is being replaced by two characters.
And, oh..
• The edit box does not return what you put in.

Principle of Least Surprise
Let’s dwell on that last bullet point for a moment. Suppose the preview modes shows you a few errors and you decide to edit your text — but the text you get back isn’t the text you put in. So which text contains the errors? The text you put in, or the text you got back? My answer is that that is a question I do not want to have to deal with at all.

I do not want my straight quotes replaced by character entities, I do not want my character entities replaced by characters, and I certainly do not want all those mistakes inserted into my text. I already manage to make my own mistakes without epinions.com’s assistance, thank you very much.

Keep It Simple Stupid
I just want to get back what I put in, and it is not just me who wants that. Really now, every user wants that. In fact, giving back anything but what the user puts in is plain wrong. It violates the Principle of Least Surprise. Making changes to a users’s text when they did not ask you to do so definitely qualifies as surprising.
It really doesn’t matter to me how many transformations a text goes through before another user reads it, as a member I just want to be sure of two things:
• the preview mode displays texts identical to how the posted text will look
• the text I get back when I choose to edit is the text I put in, i.e. the text that resulted in that display, not a text that will result in yet another display
Anything that deviates from those two simple rules is a serious user interface design blunder.

REMAINING PUZZLES
There are still a few remaining puzzles, and they happen to be related;
• exactly which characters are being mangled, and which ones are not?
• why that 64 difference?

which characters
If you know even a tiny bit about character sets, you know that many of them have the first 128 characters in common. Most code pages are 256 characters large, and they differences between them are in the characters that have code 128 through 255.
So, it makes sense for epinions.com to replace characters with these higher code values by character entities and to leave characters below 128 alone.

A few quick tests confirmed that that is indeed roughly what epinions.com is doing. Roughly, not exactly. All the accented characters that are being affected are in that 128 through 255 range. The Windows code page 1252 codes for smart quotes (“ and ”, “ and ”) are in that range, but the code for a straight quote (", ") is not.
Moreover, characters with codes above 255, such as most of the monetary symbols, remain unconverted too, while these should be definitely be converted into character entities to ensure identical display across operating systems and browsers (and the right codes to use are “ and ”, not “ and ”).
Failure to handle these higher values not suggests that the programmer of the defective code isn’t aware at all that there are characters with codes higher than 255.
Update: It seems that the epinions.com programmers have been reading this text (without rating or commenting) and all characters with codes 256 and higher are turned into character entities now.

the 64 difference
Failure to handle these higher values, as well as the unnecessary inclusion of the straight quote suggest to me that the conversion is handled through a table. A table that lists characters to be converted and the code they are to be converted into; if a character appears in that table, it is converted, and if it does not appear in that table, it remains unchanged.

Now, imagine a table like that, 256 entries long, for values 0 through 255, with each entry on a separate line. That’s a table 256 lines long, several pages of text. Unless you number the lines somehow, you are not going to able to tell whether you are looking at entry 99, 100, or 101. Things get a lot easier, even without any numbering, when you add a few visual breaks, say a single empty line every 16 entries, and a two empty lines every 64 entries. You might have opted for breaks at every 10 and 50 characters, but programmers typically opt for breaks at every 16 and 64 characters.

Typing such a large table would be a lot of repetitive work, but no smart programmer is going to type such a table line by line. After all, any editor worthy of the name supports copy & paste.

Of course, when you are editing such a table and copy a block of 64 codes, you may have the general layout and structure for another 64 entries right in no time, but you’ve still got the codes wrong. You must still edit those 64 character entities to be right for those entries…;

The table epinions.com is using now inserts exactly the same character entities sans semicolon preceded by an à for codes 192 through 255 as it inserts for codes 128 through 191. Verily, the entries for 192 through 255 are perfect copies of the entries for 128 through 191. So that’s the mistake responsible for this defect; the whole block of 64 codes has simply been copied…;

THE BEST EXPLANATION

theories so far
So far, I have presented two possible theories for the prepending of the à or  and one theory for the replacements in the range 192 through 255 being perfect copies of those for 128 through 191.

one theory for two issues
It turns out that there is one theory that not only explains both issues at once, but also explains why you get either an à or an Â.
It was smjg who hit upon this theory after reading the earlier version of this text with the preceding theories. When I read his comment I could only think “Yes of course, that’s it”. It fits perfectly.

mapping
Consider the mapping I described again. Characters with code 0 trough 127 remain unchanged, while characters with code 128 through 255 are mangled.
To be more precise, the 64 characters with codes 128 through 191 are replaced by  followed by the character itself, and the 64 characters with code 192 through are replaced by à followed by a character with a code that’s 64 less than the character itself.
smjg recognised that mapping as UTF-8.

UTF-what?
UTF-8.
UTF-8 is a way to encode any Unicode character in an 8-bit character set. It is a cousin of UTF-7, UTF-16 and UTF-32.

the UTF-8 mapping
In the UTF-8 mapping, Unicode characters with a code in the range 128 through 2047 are represented by two bytes.
Here is a succinct description of the actual mapping in computerese. the first byte has value 0xC0 plus the high two bits of the character, the second byte has value 0x80 plus the lower six bits of the character.

Here is a description of that same mapping in more common terms: a single value is replaced by two values, and we calculate those two values as follows:
• divide the value by 64 to get a result and remainder
• the first value is 192 plus that result
• the second value is 128 plus that remainder

For example, the code value of é is 233.
• 233 divided by 64 is 3, remainder 41
• the first value is 192 + 3 = 195
• and the second value is 128 + 41 = 169
Therefore, in the UTF-8 mapping, the single code value 233 is represented by the two consecutive bytes with values 195 and 169.

Now, if you make the mistake of reading the UTF-8 encoded bytes as if it were just text, you end up interpreting the encoded é (é) as é (é).

CHARACTER VERSUS CODE

character codes
It should be clear by now why things go wrong for some characters, while the corresponding character entities for the very same character are not affected at al.

Each individual affected character is in the range 128 through 256, and will therefore be mangled by the current defect, but a character entity consists of several characters, all of which are themselves in the range 0 through 128, and will therefore remain untouched.

To use the example of the «é» again: it has code 233, that is a code in the range 128 through 255 and therefore, it gets mangled. The corresponding character entity, «é» consists of six characters, all of them with codes below 128.
It is your browser that recognises the character entity as such then displays the corresponding character instead. The epinions.com program that replaces characters by character entity codes does not recognise anything but individual characters. All it sees are an & a 2, a 3, another 3 and a semicolon, all perfectly fine characters that do not need to be converted at all.
That is why using character entities in your opinions works fine while using the characters themselves do not.

That is also why, as many epinionators have already experienced, epinions.com keeps replacing the Ã, but does not replace the ƒ; the à is an actual character with code 195 (in the range 128 through 255), while the ƒ is a character entity (ƒ, just sans semicolon in epinions.com’s case), that your browser display as ƒ.

UPDATE LATE NOVEMBER

not fixed, but worse
If you follow the link to the first Epinions.com Message Board thread you will now see a message dated 2004 Nov 23 06:11 (epinions.com local time, not Internet time) by roheblius that “This should be fixed. Let me know if the fix shows up for everyone.”, followed three minutes later by a message from christal that “This bug [defect] is fixed and live!”, but not much more than an hour later these messages from epinions.com employees are followed by a message from trailhound: “I just posted a new review and the problem is still popping up. Anything posted with quote marks or apostrophes was especially bizarre looking!”.

I did some quick tests, and to my surprise, things have not been fixed at all, but have actually gotten worse. All the aforementioned defects are still in effect; epinions.com is still removing the semicolon from the character entity codes, is still prepending an à or  and still subtracting 64 from codes in the range 192 through 255.
However, a new and additional defect seems to have been added into this mix. Things haven’t gotten any better, they got worse.

Æther
Consider the word «Aether» written with an AE-ligature (&#198): «Æther».
When you try to edit your text, it has turned into«Ã&#134ther» (notice that the semicolon has been removed from †), which displays as «Ã†ther». The AE-ligature has been replaced by an uppercase A with tilde followed by a dagger. On the next pass this turns into «Ã&#131&#134ther» which displays as «Ãƒ†ther». All that is exactly as already described, so nothing seems to have changed?

smart quotes
However, now consider «“smart quotes”».(&147; and ”).
On the first pass, «“smart quotes”».turns into «â&#128&#156smart quotesâ&#128&#157», which displays as «â€œsmart quotes”».

What’s happening now? You soon notice that we did not get an uppercase A with tilde (Ã, Ã), as before, but a lowercase a with tilde (â, &226;). The more interesting observation is that the smart quotes, themselves single characters, have not been replaced by two characters, but have been replaced by three characters now.

the previous situation
Ignoring the semicolon defect, the other defects used to add up to the following replacements:
«’» was replaced by «Ã’»
«“» was replaced by «Ã“»
in codes instead of characters within guillemots:
’ was replaced by Ã’
“ was replaced by Ó

the new situation
The other defects now add up to the following replacements:
«’» is replaced by «â€œ»
«“» is replaced by «â€»
in codes instead of characters within guillemots:
’ is replaced by “
“ is replaced by ”

If you edit again, the â will have turned into â (â turned into â), and if you edit again, it will have turned into Ãâ (Ãâ minus the semicolons). These replacements are just as before.

The €œ and €œ are not subject to further replacements, because the actual replacements are
«’» is replaced by «â&#128&#156»
«“» is replaced by «â&#128&#157»
The €œ and €œ epinions.com inserts into your text are not the actual characters with code values above 128, but the character entities comprised of characters with codes below 128.

two for the price of one? no, three for the price of one!
The new (as of 2004-11-23 epinions.com local time) situation is that many characters are still being replaced by two different characters exactly as before, but yet others are now being replaced by three different characters.

bad news
The bad news is that you are much more likely to run into it now.
When I tried to post the update of this text, I noticed that contractions such as «What’s» and «couldn’t» were not displaying correctly at all. These had previously been changed into «What&#146s» and «couldn&#146t», which, despite lacking the semicolon behind the 146, still displayed correctly. They are now being changed into «Whatâ&#128&#153s» and «couldnâ&#128&#153t», which display as «What’s» and «couldn’t» respectively.

Problems with such commonly used characters as smart double quotes, right single quotes in contractions and ellipsis (…;) are now ensuring the even members who seldom use any copyright symbols or accented characters are being hit by the defects.
Freak369 has started yet another Message Board thread about the new defect: “it seems that even using quotes” …; “is creating a real headache”.
.
As the defects continue, members are getting cranky about it.
Rock_On gives his Brutally Honest™ opinion about it: “Yep, I kid you not. When I tried to post my review tonight. Aside from the 9character string for each " and ', in my title for Christmas with the Kranks, the apostrophe in Third Time's the Charm, it was replaced with a 22 string of characters. TWENTY-TWO!!! Dear mother, this can't go on. Come on epinions! Fix this crap!”.

We’re sorry, but there's been a problem…; Please try again later
As is so often the case when epinions.com is fiddling with the preview and edit cycle, members get uninformative messages such as “We’re sorry, but there's been a problem…;.Please try again later.”.
Many members find their opinions stuck in draft mode with no indication what, if anything, is wrong with them, and no hint on what to do to get these posted.
Past experience with epinions.com preview and edit cycle problems suggests that you probably have some small mistake in your EpiHTML codes. Read The EpiHTML Editing Guide for advice on dealing with this.

As the mess continues, more members come to the epinions.com message board to report and complain.
iluvbirds started yet another thread about the worsening defects, noting that “The site has gone bonkers. I have reviews missing, ratings gone and comments wiped out” DavidMac chimes in “Me too…; today,for a little while, nearly a hundred of my reviews vanished. but the worst part is I’ve been trying to post a review for the last two days, and I’m not allowed to get past the draft stage. How long is this going to last?”, and Rock_On pulls out the exclamation marks: “This sucks. I finally get motivation to write, and I can“t even post my reviews because this site is FILLED with bugs. You hear that Epinions? IT'S FILLED, FIX THEM!!!!”.

site status
Throughout all this, epinions.com’s Site Status Details page has been displaying the Current Site Status as “No known system issues at this time” as and the Epinions.com Site Status sidebar kept displaying “There are no known site issues at this time”.
In the second Message Board thread about these defects CyndiA remarks upon this: “As it stands, they have that no bug thing up when clearly they have bugs or a heck of a lot of hallucinating nuts around here.”.

good news
The good news is, that although the situation has gotten worse, the solution to avoiding the effects of the defects continues to work fine. Every «That’s» and «it’s» in this text was written «That’s» and «it’s» to have it display correctly. A phrase like «copy & paste» was written «copy & paste», and all the smart quotes were done using “ and ”, You can do the same; if you use the correct character entity codes, your opinion will display correctly.

pattern
There is probably some pattern to this replacement by either two or three characters, a pattern that gives a clue to what mistake is being made where. I did not look for any pattern yet, but couldn’t help noticing that the difference between 195 (uppercase A with tilde) and 226 (lowercase a with tilde) isn’t 64, but 59, and that it happens to be the code value of a semicolon…;

UPDATE LATE DECEMBER

the issue continues
On 2004 Dec 2, roheblius promises to address CyndiA’s complaint: “At the rate that I actually get to update the member center nowadays, you'd probably see it next week if we did. Though, it is something I should've put in the site standards message. But because it doesn't affect the entire site, I didn't. If this problem isn't fixed by tomorrow, I will do so.”.
The defects do affect the entire site. Not a single category or subcategory remains unaffected by the defects. Not a single user is exempt from the defects. It affects all user and all categories.
The Site Status Message was not updated the next, nor the day after. In fact, it was never updated at all. It still says “There are no known site issues at this time.”. It hasnÙt said anything else since the defect was first reported in early November and the Site Status Details page still does not even mention the defect…

“There are no known site issues at this time.”
For almost two weeks, “Community Care” seems in silent denial about the defect. Only on 2004 Dec 14 does roheblius tells us that it may or not be fixed: .
A reply from dragonfire88 seems to confirm everyone’s hopes: “I just posted a new review that I typed up in WordPerfect and copied in. I didn't get any of the long strings of weird characters for the 's anymore.”.
Alas, hist remarks that “Unless the fix isn't retroactive, then it's not fixed. I've been going through some of my reviews, and all of the accents (two "cliche" and one "blase") create the garbage characters.”.
Again, “Community Care” remains silent, and the Site Status Message continues to claim that “There are no known site issues at this time.”.

“the site issues at this time.”
When I posted a new opinion on 2004 Dec 21, I hoped to do so unaffected by these defects. Alas, I soon discovered that the à disease continues, and some testing suggests the situation has in fact become even more messed up, not less.
So far, I could just discuss the defects without any kind of qualification, as it is a site-wide issue. It is still a site-wide issue, but the issue has become multiple issues, and different sections of the site are affected by a different one.

three categories
The first thing I noticed after a quick test is that of all the characters that resulted in messed up text, some still result in messed up text and others do not. For example, the Capital S with caron Š, Š) remains unaffected, but the plus-minus (±, ±) still becomes ±, then becomes Ãamp;#130± etcetera. The e with accent acute (é, é) still becomes é, etcetera– obviously then, epinions.com is still suffering from à disease. The difference with the original situation is what characters it affects. The original issue affected all characters with code 128 through 255. Now, characters with codes 128 through 160 seem to remain unaffected, and the characters with codes 160 through now fall into one of three categories:
1. seemingly unaffected
2. affected just like before
3. affected, but not as bad as before
For example, after a few edit & preview cycles
1. Code 162, ¢ remains ¢
2. Code 191, ¿ becomes Ã,Ã,Ã,Ã,¿
3. Code 198, Æ becomes à ƒƒƒƒ†

Roughly, codes 128 through 159 are seemingly unaffected, 160 through 191 are affected as before and 192 through 255 are affected, but not as bad before.
Truth is that 192 through 255 are affected just before, just as 160 through 191 are, but after substracting 64 from the code, you get into the range 128 through 159, which are characters that seem unaffected now. That explains why the results for codes 192 through 1255 don’t look as bad after repeated edit and preview cycles as they they do for the codes in range 160 through 191.

the ostrich pattern
Remarkably, there are a few codes in the range 192 through 255 that seem to remain unaffected, just like the codes in range 128 through 159.
It took me a while to recognise the pattern of just what codes seem unaffected and I was rather surprised when I did. The unaffected codes are examples mentioned in this text and the message boards; the ostrich reasoning behind it seems to be that if you hide the defect for every character that epinionators have mentioned in connection with the defect, the defects has been hidden from sight, and when it has been hidden from sight, it does not exist anymore.

not fixed at all
So here is what is seems to be going on; the defect is still in full force and has not been fixed at all, but several characters seem unaffected because the resulting erroneous code is changed back to the correct one. As this isn’t done for all characters, the defect remains quite visible for all the other characters.
This “solution” isn’t really a solution at all. Not only does the defect remain unfixed, the “solution” only adds more programming code, and more code only provides more opportunity for defects.

deliberate editing
The defect may be an accident, but this “solution” is deliberate editing of your text. Didn’t epinions.com’s programmers learn anything from the Automatic Link Editing debacle? One small mistake in that program, and thousands of texts that did not need any fixing will become messed up. Another small mistake in that program, and even manually entering the codes does will not work anymore…
This is very hard to understand. This approach is only making things even more complex. Epinions.com’s programmers should be undoing the changes that created the defect it in the first place; not creating more defective code.

it gets worse
If you think this is bad, the above is a description of what happens to an opinion in Things to Know About Epinions.com, and old category supporting Traditional Opinions. When I updated two opinions to take advantage of the technique discussed in External Live Links Enhanced Epinion, I noticed that I did had to replace every ’ by &8217; for Oceans 1606: The Oldest Share in the World, an essay in Writer’s Corner: History Non-Fiction, yet did not have to do so for Unsenseoble Permanent Filter for your variable Senseo Fix, a Regular Opinion and a review of the Cafe-filter for Senseo Coffee Machines, a topic that Home & Garden CL pogomom added at my request.

So what just makes the difference here?
• The category?
• The kind of opinion?
• Whether it is a recent or old topic?
• Whether the topic was added manually?
Really now, do I really really want to know at all? No, I want to write opinions without worrying how perfectly ordinary characters are going to look and without having to deal with category-specific editing issues or anything like that. All I ask for Christmas is that epinions.com undoes the changes that created the issue in the first place.

CONCLUSION

three defects
If you are not used to dealing with all this on a regular basis, it is a lot to take in at once. You may want to think it over and reread some parts to get it all clear. I am rather tired after typing all of this, and just hoping that I didn’t get any codes wrong, but I will now try to summarise it all before I start stripping my shirt.

The à disease we are seeing is not the result of a single defect, but a combination of three, eh, four defects interacting with each other. These four defects are:
• replacing character codes 192 through 255 with the character entities for character codes 128 through 191
• adding a à or  in front of the damaged character entity
• removing the terminating semicolon from character entities
• some fourth defect that causes replacement by three instead of two characters
As explained above, the first two of these defects are probably the result of a single mistake.

fifth problem
These three, eh four, defects are reinforced by a fourth, eh, fifth problem: that you do not get back what you put in. When you decide to edit your text again, you get a text modified in the described manner and such modifications will be applied again when you preview again, and you get an ever-expanding amount of gibberish in front of an erroneous character entity, while you only tried to use a single character.

how they came about
I have explained how each of these three defects may have come about. I originally remarked that whether these explanations are dead-on or just another possible explanation that happens to fit observed facts doesn’t really matter. What really matters to us is that the defects exist and are interfering with our regular epinionating.
I still believe that, but I also believe smjg’s single explanation for two of these defects to be smack dab in the middle of the bulls-eye: epinions.com is encoding text as UTF-8 and then reading that text again as if it was un-encoded text.

how to avoid the effects of the defects
Most importantly, you have been informed how avoid the effects of these defects. You now know that the necessary codes are conveniently detailed in this text on epinions.com itself.You have even been told the secret insider Google search phrase that will get you to that handy dandy text in no time: “mobiprof sex symbol”.
“I feel lucky” indeed.
I feel sexy, oh so sexÿ.

Copyright © 2004 November by MobiProf, sex symbol.

COLLECTED LINKS

Objectionable Words Filter (OWF): Not Helpful
The EpiClassic about the Objectionable Words Filter. Part 1 of the OWF Trilogy.

• real bullets • “true quotes” • ellipsis and… • actual sex symbols! •
The “MobiProf Sex Symbol” text. Part 2 of the OWF Trilogy (in case you’re wondering, How the Objectionable Words Filter works is part 3.).

The EpiHTML Editing Guide
Tips and tricks, solutions to common problems.

Epinions.com Message Board Threads:
2004 Nov 3: A Registered and Copyright BUG, started by lorace
2004 Nov 18: Character Mashing?, started by shilmafone
2004 Nov 24: New Bug?, started by Freak369
2004 Nov 25: What's going on?, started by iluvbirds


QUICK OVERVIEW OF COMMON CODES
It would not make sense to repeat every code in • real bullets • “true quotes” • ellipsis and… • actual sex symbols! •, but the following overview lists some of the codes you are most likely to need right now.
& ← & ampersand
‘ ← ‘ left single quote
’ ← ’ right single quote
“ ← “ left double quote
” ← ” right double quote
… ← … ellipsis
• ← • bullet
— ← — em dash
ß ← ß small sz ligature
æ ← æ small ae ligature
ç ← ç lowercase c with cedilla
è ← è lowercase e with grave accent
é ← é lowercase e with acute accent
ö ← ö lowercase o with umlaut
½ ← ½ fraction one half
© ← © copyright symbol
® ← ® registered trademark
™ ← ™ trademark
€ ← € Euro sign

To get your text to look good, simply search & replace each messed up character by the corresponding code.

 Read all comments (46)
 Write your own comment
mobiprof

Epinions.com ID:
mobiprof
Epinions Most Popular Authors - Top 100
Location: In the Dutch Mountains
Reviews written: 180
Trusted by: 542 members
About Me:
Finally! The WOT Patent revealed!


Help | Member Center | Message Boards | Site Rules | User Agreement | Privacy Policy | Site Index  
About Epinions | Careers | Contact Epinions | Advertising  

Epinions | Shopping.com | Rent.com | Free Classifieds

Shopping.com Network © 1999-2008 Shopping.com, Inc. Trademark Notice

Epinions.com periodically updates pricing and product information from third-party sources,
so some information may be slightly out-of-date. You should confirm all information before relying on it.