Quantcast
Channel: VBForums
Viewing all articles
Browse latest Browse all 42412

VS 2010 Crash course in UTF encoding

$
0
0
Can you please help me with this code?

What is happening is I'm trying to process international characters, but the code is falling through (giving up?) and replacing the field with a blank (well, I guess the screenshot is at the bottom)... The customer recently asked for support for Polish characters, and gave us a list of which ones he wants added.

This is the entire function:
Code:

    Protected Function chkExtchars(ByVal name As String) As String
        Dim j As Integer = 0
        Dim dt As New DataTable
        Dim c() As Char = Nothing
        Dim n As Integer
        Dim nc As Char
        Dim newname As String = ""
        dt = HttpContext.Current.Session("xChars")

        ' Create two different encodings.
        Dim utf8 As Encoding = Encoding.UTF8
        Dim [unicode] As Encoding = Encoding.Unicode

        ' Convert the string into a byte[].
        Dim unicodeBytes As Byte() = [unicode].GetBytes(name)


        ' Perform the conversion from one encoding to the other.
        Dim utf8Bytes As Byte() = Encoding.Convert([unicode], utf8, unicodeBytes)

        ' Convert the new byte[] into a char[] and then into a string.
        ' This is a slightly different approach to converting to illustrate
        ' the use of GetCharCount/GetChars.
        Dim utf8Chars(utf8.GetCharCount(utf8Bytes, 0, utf8Bytes.Length) - 1) As Char
        utf8.GetChars(utf8Bytes, 0, utf8Bytes.Length, utf8Chars, 0)
        Dim utf8String As New String(utf8Chars)
        c = utf8String.ToCharArray

        For i As Integer = 0 To c.Length - 1

            ' n = Asc(c(i))
            n = AscW(utf8Chars(i))
            'n = unicodeBytes(j)
            If n >= 32 And n < 127 Then 'ANSI Char Set only
                newname = newname & c(i)
            ElseIf n > 191 And n < 256 Then
                nc = dt.Rows(n - 192).Item(2).ToString
                newname = newname & nc
            Else
                'newname = newname & c(i)
                newname = ""
                Exit For
            End If
            j = j + 2
        Next
        Return newname
    End Function

I know what's important here is the session variable xChars. It is read from a file which is basically this:
DEC,Symbol,CVT Symbol,HTML Number,HTML Name,Description
192,À,A,&#192;,&Agrave;,Latin capital letter A with grave
193,Á,A,&#193;,&Aacute;,Latin capital letter A with acute
194,Â,A,&#194;,&Acirc;,Latin capital letter A with circumflex
...
252,ü,u,&#252;,&uuml;,Latin small letter u with diaeresis
253,ý,y,&#253;,&yacute;,Latin small letter y with acute
254,þ, ,&#254;,&thorn;,Latin small letter thorn
255,ÿ,y,&#255;,&yuml;,Latin small letter y with diaeresis

Which is 65 lines long. Which makes sense given the comparison: ElseIf n > 191 And n < 256 Then

So how do I add more conversions? I thought I could just take the new Polish characters the custome gave me and add them to the end of that xChars file:

260 A A Ą Latin capital letter a with ogonek
261 a a ą Latin small letter a with ogonek
262 C C Ć Latin capital letter c with acute
263 c c ć Latin small letter c with acute
280 E E Ę Latin capital letter e with ogonek

But maybe it needs to be sequential? (the numbers on the far left?)
Attached Images
 

Viewing all articles
Browse latest Browse all 42412

Trending Articles