TDICsvParser.SkipBlankRows
property.TDICsvParser.ReadNextData
improvements:DIUtils.pas
Unicode functions to Unicode 14.0.0.Extend character support to the full range of Unicode Code Points from $000000 to $10FFFF.
Up to now, DIUnicode stored code points as WideChars. This limited Unicode support to the Basic Multilingual Plane (BMP) from $0000 to $FFFF. Code points from the Supplementary Planes were converted to the $FFFD replacement character. This went well with a great number of languages. But less common scripts did not work, just like the increasingly popular emojis from the Symbols and Pictographs Unicode blocks.
DIUnicode 7.0.0 overcomes these limitations and now covers the complete Unicode range. Changes are almost entirely internal and maintain backwards compatibility as much as possible. Existing applications should compile with no or minor changes only. WideChar routines are marked as deprecated and hint at their new complementary UCP routines.
TDIUnicodeReader.Data
is still a WideChar buffer. However, its contents is now fully UTF-16 encoded. This means that it may contain code points > $FFFF which take up two WideChars (surrogate pairs). As a result, indexed access to the buffer is no longer guaranteed. TDIUnicodeReader.Data
related methods, like TDIUnicodeReader.DataAsStrTrimW
are adjusted accordingly.
UnicodeString utility routines are rewritten to handle full UTF-16, including surrogate pairs. Most of them are in DIUtils.pas
. YuUtf.pas
also contains new utility routines for UTF-16 testing, encoding, and decoding. If possible, string handling routines now take NativeInt type parameters for the buffer length.
Other noteworthy changes:
TDIUnicodeReader.UCP
complements TDIUnicodeReader.Char
.DI_No_Classes
and DI_No_Unicode_Component
. TDIUnicodeReader
always descends from TComponent
and the Classes
unit is always used. Source code only.DI.inc
include file. Directly link in DICompilers.inc
instead. Source code only.TDIUnicodeWriter
memory leak if TDIUnicodeWriteMethods.Init
allocates its own memory.TDIUnicodeWriter.Clear
calls TDIUnicodeWriteMethods.Flush
to reset encoder state.Read_iso_2022_jp_ms
read methods and Write_iso_2022_jp_ms
write methods.TDIUnicodeReader.SourceStream
, the size of the internal source buffer was not correctly calculated. Depending on the decoding, this slowed down reading or even stoped it before the end of the stream was reached.TDIUnicodeReader.SkipEmptyLines
consumed additional chars after the line break.TDIUnicodeReader.FillSourceBuffer
(source code edition only).TDIUnicodeWriter.WriteStr8
and WriteBuf8
methods.TDIUnicodeReader.DataAsStrTrim8
method.TDIUnicodeReader
when a pushed source was popped at the end of a nested document.TDIUnicodeReader.ReadBOM
function which returns the Byte Order Mark (BOM) found at the current position and advances the position accordingly.TDIUnicodeReader.SourceFile
property as a simple means to read from a file.WriteByteOrderMark
parameter to TDIUnicodeReader.SaveDataToFile
and TDIUnicodeReader.SaveDataToStream
which controls if a UTF-16/UCS-2 little endian byte order mark is being written in front of the data.Write_UTF_7
/ Read_UTF_7
)Write_UTF_7_ODC
/ reads as Read_UTF_7
)Write_UTF_7
) or without (Write_UTF_7_ODC
) encoding optional direct characters. UTF-7 reading (Read_UTF_7
) works equaly well for both writing methods.TDIUnicodeReader
and TDIUnicodeWriter
to allow data buffering between consecutive reads and writes.TDIUnicodeReader.PushSource
and TDIUnicodeReader.PopSource
methods added to which allow to insert one source into another, like for Pascal {$INCLUDE …}
directive.TDIUnicodeReader
can optionally free its source stream if the reading reached the end of the stream. This is especially usefull when reading nested files using the TDIUnicodeReader.PushSource
and TDIUnicodeReader.PopSource
methods. The protected property TDIUnicodeReader.AutoFreeSourceStreams
may be used by descendent classes which implement specialized reading / parsing.TDIUnicodeReader
, as well as for retrieving data as trimmed strings.