If (firstCharPosition = 3 || firstCharPosition = bytesToRead) so we've done a short read, we should have the character start see a valid character start in every 3 bytes, and if this is the start of the file Bad UTF-8 sequences could trigger this. While (!characterStartDetector(position + firstCharPosition, buffer)) We've now *effectively* read this much data. to overlapping data: we *might* just have read 7 bytes instead ofĪrray.Copy(buffer, bufferSize, buffer, bytesToRead, leftOverData) Buffer.BlockCopy doesn't document its behaviour with respect If (leftOverData > 0 & bytesToRead != bufferSize) over from before, copy them to the end of the buffer
#Ada read filr backwards full
If we haven't read a full buffer, but we had bytes left StreamUtil.ReadExactly(stream, buffer, bytesToRead) Int bytesToRead = Math.Min(position > int.MaxValue ? bufferSize : (int)position, bufferSize) the carriage-return at the end of this buffer - hence this needs declaring A line-feed at the start of the previous buffer means we need to swallow Therefore we don't return an empty string if it's our *first* TextReader doesn't return an empty string if there's line break at the end read which didn't quite make it as full charactersīyte buffer = new byte Ĭhar charBuffer = new char Allow up to two bytes for data from the start of the previous Throw new InvalidDataException("UTF-16 encoding provided, but stream has odd length.") If (encoding is UnicodeEncoding & (position & 1) != 0) Private IEnumerator GetEnumeratorImpl(Stream stream) Throw new NotSupportedException("Unable to read within stream") Throw new NotSupportedException("Unable to seek within stream") / the returned stream is either unreadable or unseekable, a NotSupportedException is thrown. / Returns the enumerator reading strings backwards. Throw new ArgumentException("Only single byte, UTF-8 and Unicode encodings are permitted") For UTF-8, bytes with the top bit clear or the second bit set are the start of a characterĬharacterStartDetector = (pos, data) => (data & 0x80) = 0 || (data & 0x40) != 0 More work requiredĬharacterStartDetector = (pos, data) => (pos & 1) = 0 TODO: This assumes no surrogate pairs. For UTF-16, even-numbered positions are the start of a character. For a single byte encoding, every byte is the start (and end) of a characterĬharacterStartDetector = (pos, data) => true Internal ReverseLineReader(Func streamSource, Encoding encoding, int bufferSize) : this(streamSource, encoding, DefaultBufferSize) Public ReverseLineReader(Func streamSource, Encoding encoding) / Encoding to use to decode the stream into text / called when the enumerator is fetched. : this(() => File.OpenRead(filename), encoding) Public ReverseLineReader(string filename, Encoding encoding) / Encoding to use to decode the file into text Public ReverseLineReader(string filename) / UTF8 is used to decode the file into text. / (or even checked for existence) when the enumerator is fetched. / Creates a LineReader from a filename. Public ReverseLineReader(Func streamSource) / Creates a LineReader from a stream source. / or not the byte represents the start of a character. / Function which, when given a position within a file and a byte, states whether This must be at least as big as the maximum number of / Size of buffer (in bytes) to read each time we read from the / Encoding to use when converting bytes to text / Means of creating a Stream to read from. Private const int DefaultBufferSize = 4096 / a different buffer size - this is useful for testing. Public sealed class ReverseLineReader : IEnumerable
/ returned by the function must be seekable. / Only single byte encodings, and UTF-8 and Unicode, are supported. / (or a filename for convenience) and yields lines from the end of the stream backwards. / Takes an encoding (defaulting to UTF-8) and a function which produces a seekable stream
Oh, dan itu perlu refactoring - ada satu metode yang lumayan, seperti yang akan Anda lihat: using System Ini menggunakan StreamUtil dari MiscUtil, tapi saya sudah memasukkan metode yang diperlukan (baru) dari sana di bagian bawah.
Tidak ada yang dibangun ke dalam kerangka kerja, dan saya menduga Anda harus melakukan pengkodean keras terpisah untuk setiap pengodean lebar variabel.ĮDIT: Ini telah agak diuji - tapi itu tidak berarti masih ada beberapa bug halus di sekitar. Ketika Anda memiliki pengodean ukuran-variabel (seperti UTF-8) Anda harus tetap memeriksa apakah Anda berada di tengah-tengah karakter atau tidak ketika Anda mengambil data. Membaca file teks mundur benar-benar rumit kecuali jika Anda menggunakan pengodean ukuran tetap (mis.