We recently overcame a problem where we were trying to pull data from Microsoft’s SQL Server into Ruby for some processing. First, the SQL Server we were connecting to was configured to use a case insensitive form of latin1. You can find your encoding by executing:
SELECT DATABASEPROPERTYEX('DBName', 'Collation') SQLCollation; |
In our case we were getting a: SQL_Latin1_General_CP1_CI_AS. Now, we were making use of the fine Ruby ODBC library, and we tried to have the library do a conversion to UTF-8 for us, by setting the ODBC::UTF8 constant to true before proceeding. Although, we didn’t get what we’d expect. Our algorithm ran fine until we tried to convert one of the VarChar columns to UTF-8, and we ran into the following error:
Encoding::UndefinedConversionError: "\x96" from ASCII-8BIT to UTF-8
Now looking at the string, we could see a single character “\x96″ in the string returned from the ODBC library. Querying the database through the Management Studio we found that the offending character was a hyphen. Now looking at 0×96 (150 in decimal) in the extended ASCII table we find a dash. The problem is that our keyboards have a hyphen, or a 0x2d (45 in decimal). This can be verified in the interactive ruby interpreter:
ruby-1.9.2-p290 :001 > "-".ord => 45 ruby-1.9.2-p290 :002 > "\x96".ord => 150 ruby-1.9.2-p290 :003 > 45.chr => "-" ruby-1.9.2-p290 :004 > 150.chr => "\x96" ruby-1.9.2-p290 :005 > "\x96".encode('UTF-8') Encoding::UndefinedConversionError: "\x96" from ASCII-8BIT to UTF-8 |
No dash for you!
The easiest way to get rid of this problem was to just substitute the dash for a legit hyphen
bad_string.gsub(150.chr,'-') |

Post a Comment