ISO 8859-1 Encoding and umlauts

Topics: Issues
Aug 20, 2013 at 10:28 PM
I've been using ImapX to parse simple data from emails. One problem I ran across were emails with ISO-8859-1 (Latin1) encoding. Whenever I receive emails with this encoding, all the umlauts (e.g. äöüß, etc.) get lost. I have looked at the code and I assume it's due to the use of a StreamReader which defines an encoding at creation (in this case UTF-8 I guess, since umlauts in these emails are okay). So whenever an ISO-8859-1 email comes in, StreamReader.ReadLine() incorrectly interpretes the characters and garbles them.
I played around with Encoding.Convert, but with no success (most likely since the data is already interpreted wrong).

Is this is known issue or am I just doing something wrong?

Here's the important headers and body of my test email:
Date: Tue, 20 Aug 2013 18:47:16 +0200
From: Me <...>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8
MIME-Version: 1.0
To: Me <...>
Subject: Test Umlauts Plain ISO 8859-1
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit

öäüß ÖÄÜß
Thanks for the support and keep up the great work - ImapX has been the only email library so far that works flawlessly with Unity3D and Mono, so I'm hoping there's a solution for this problem.

Regards,
Wolfgang
Coordinator
Aug 21, 2013 at 8:18 AM
Hi Wolfgang,

this issue has not been reported yet, so really thank you for feedback!

To allow this problem to be fixed faster, I'd ask you sending me the test message you have used to p13a92@googlemail.com so I can take a look.



Kind regards,

Pavel Azanov



PS: You can also feel free to contact me in German.
Aug 21, 2013 at 8:57 AM
Hi Pavel,

thanks for your quick reply. I sent you an email with the same configuration to your specified address. The first line contains some of the umlauts.

Thanks,
Wolfgang

P.S.: German - English - both are fine. Whatever you prefer. :)
Coordinator
Aug 21, 2013 at 9:45 AM
Hi Wolfgang,

thank you, I received your message, will try fixing this issue as soon as possible!
Coordinator
Sep 19, 2013 at 8:40 AM
Hello Wolfgang,

I managed to fix the issue, the updated code will be available this weekend.


Kind regards,

Pavel Azanov
Sep 23, 2013 at 9:50 PM
Hi Pavel,

that's good news. How did you fix it? From what I saw, ImapX uses a StreamReader that needs to know the encoding right when it's created and it's not yet known what encoding the email will have. I'm curious. :)

Wolfgang
Coordinator
Sep 24, 2013 at 10:00 AM
Hi Wolfgang,

Over the past months I have completely rewritten the library from scratch, changing the way of how the messages are requested completely.
Now, every part of the message can be downloaded separately, and even more - you can choose to request only the message part structure and metadata, or the message parts completely.

The fix in this situation was quite simple, all I needed is to determine the encoding that's been used for the message or a specific part of it, then, downloading the message parts, I simply create a new StreamReader with the specific encoding if it is different from UTF-8.

The release is nearly ready to be published, currently I only need to fix methods for storing messages on the server.

Best regards,

Pavel Azanov