Mega Code Archive

 
Categories / Delphi / Files
 

Intelligently reading a file one line at a time

Title: Intelligently reading a file one line at a time Question: Have you ever needed to read a file one line at a time? Answer: Have you ever needed to read a file one line at a time? You can simply use a textfile type and call ReadLn, but on a large file thats rather inefficient, or you can use the TStrings.LoadFromFile method, which is also great, but the data resides in memory; which is no problem for a small text file, but when you need to read a larger file it can be a real headache. Introducing TabLineStreamer! This object accepts any stream as an input and then intelligently buffers an 8K (adjustable) chunk of data (which takes about the same time to read as a single line from a hard disk drive), which it then splices into lines, which can be accessed through the GetLine function. As you call the GetLine function it will read from the internal buffer (a TStringList) until the buffer is empty, and then read again from the file. This object can only be used in a forward-only mode; it simply allows you to process data one line at a time. In a future article Ill introduce a helper object that can read a CSV file efficiently. A simple example: var Streamer : TabLineStreamer; fStream : TFileStream; Line : String; begin fStream := TFileStream.Create('c:\filetoread.csv', fmOpenRead + fmShareDenyNone); Streamer := TabLineStreamer.Create(fStream); pbMax.Max := Streamer.Size; while not Streamer.EOF do begin // Get the line Line := Streamer.GetLine; // Do something with the line pbMax.Position := Streamer.Position; Application.ProcessMessages; end; // while Streamer.Free; fStream.Free; Properties and functions: Create(Stream : TStream; OwnStream : Boolean = false); The Create constructor accepts any Stream descendant so this can be used with nearly any source (memory stream, etc), and a Boolean value indicating if the object should take ownership of the stream, if this is true when the object is freed it will also free the stream its maintaining. GetLine : String; This function returns the next available line. Position : Integer; Returns the position of the stream. This is useful in displaying a progress bar. Size : Integer; Returns the size of the stream. EOF : Integer; This returns true when the stream has reached the end of the file. I hope you found this article and function to be useful; Id love to hear your comments, suggestions, etc. -David Lederman dlederman@InterentToolsCorp.com The following is the source code for the functions described above, feel free to use the code in your own programs, but please leave my name and address intact! // ---------------------------ooo------------------------------ \\ // 2000 David Lederman // dlederman@internettoolscorp.com // ---------------------------ooo------------------------------ \\ unit abStreams; interface uses Classes, Sysutils; const BlockSize = 8192; type TabLineStreamer = class private DataStream : TStream; DataOwner : Boolean; CurrentLine, MaxLine, CurrentBufferLine : Integer; Buffer : String; BufferList : TStringList; procedure InternalBufferData; function GetUsableLines(DataToParse : String; var StringList : TStringList) : String; public published constructor Create(Stream : TStream; OwnStream : Boolean = false); destructor Destroy; override; function GetLine : String; function Position : Integer; function Size : Integer; function EOF : boolean; protected end; implementation { TabLineStreamer } constructor TabLineStreamer.Create(Stream : TStream; OwnStream : Boolean = false); begin DataStream := Stream; CurrentLine := 0; MaxLine := 0; Buffer := ''; BufferList := TStringList.Create; DataOwner := OwnStream; // Now prepare the stream for usage InternalBufferData; end; destructor TabLineStreamer.Destroy; begin BufferList.Free; if DataOwner then FreeAndNil(DataStream); inherited; end; function TabLineStreamer.EOF: boolean; begin // See if we are at the end if CurrentLine MaxLine then begin EOF := False; exit; end; // Now see if there is any more data EOF := (Position = Size); end; function TabLineStreamer.GetLine: String; begin // Result line if CurrentLine = MaxLine then begin // See if more data can be read if not EOF then begin InternalBufferData; end else begin raise Exception.Create('EOF: Out-of-range'); end; end; // Now Return The Data Result := BufferList[CurrentBufferLine]; Inc(CurrentBufferLine); Inc(CurrentLine); end; function TabLineStreamer.GetUsableLines(DataToParse: String; var StringList: TStringList): String; var StartPos : Integer; Line : String; begin // ---------------------------ooo------------------------------ \\ // This function will look for the #13#10 Sequence and // add it to the stringlist, if an item remains then it is // returned and becomes the new buffer // ---------------------------ooo------------------------------ \\ while Pos(#10, DataToParse) 0 do begin StartPos := Pos(#10, DataToParse); Line := Copy(DataToParse, 1, StartPos); Line := Trim(Line); StringList.Add(Line); Delete(DataToParse, 1, StartPos); end; Result := DataToParse; end; procedure TabLineStreamer.InternalBufferData; var NewBuffer: PChar; DataRead: integer; BufferData : array[0..BlockSize] of Char; begin // Step 1. Read the data from the stream // Read The Data DataRead := DataStream.Read(BufferData, SizeOf(BufferData)); // Allocate the new buffer GetMem(NewBuffer, BlockSize + 1); // Copy the New Data Into The Buffer StrPLCopy(NewBuffer, BufferData, DataRead); // Concat the buffers Buffer := Buffer + NewBuffer; // Return the buffer memory FreeMem(NewBuffer); // Step 2. Chop the data into a stringlist BufferList.Clear; Buffer := GetUsableLines(Buffer, BufferList); // Step 3. Update the numbers Inc(MaxLine, BufferList.Count); CurrentBufferLine := 0; end; function TabLineStreamer.Position: Integer; begin Result := DataStream.Position; end; function TabLineStreamer.Size: Integer; begin Result := DataStream.Size; end; end.