[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Encoding question; example code for strange situations
Steve Murphy wrote:
> Question:
>
> The files I am reading can have several encodings; at least the
> textual parts can. The main syntax elements, like the keywords, etc. are
> all ascii. How would I best convert, say, an ANSEL character set to
> UTF-8, or vice versa in the parser? "String"s are supposed to be raw
> unicode? When I write them to a file, they seem to be in UTF-8 or
> somesuch 8-bit standard. How do I tweak the in/out encodings, and how
> would this affect the parser?
The lexer expects "UNICODE" characters (or streams or strings). You might want
to search the JDK libraries for encoding related APIs (probably in the java.io.*
classes). I think that even class java.lang.String has some encoding related
methods. If you want to manually implement a very specific encoding, you can do
as I did for the Java lexer: add a preprocessing stage (and do the same in the
back-end/writing stage).
> Conclusions:
>
> I guess I do some unique things in this parser, that probably no-one
> else on earth would do. But, just in case someone else has the same
> problems I did, here is what I did:
Thanks a lot for your report. It is always nice to hear about successful
projects and learn new solutions to difficult problems.
Have fun!
Etienne
--
Etienne M. Gagnon http://www.info.uqam.ca/~egagnon/
SableVM: http://www.sablevm.org/
SableCC: http://www.sablecc.org/