Delphi Pages Forums  

Go Back   Delphi Pages Forums > Delphi Forum > General

Lost Password?

Reply
 
Thread Tools Display Modes
  #1  
Old 12-10-2008, 01:15 PM
Ouiji Ouiji is offline
Senior Member
 
Join Date: Nov 2001
Location: US of A
Posts: 492
Default Read specified line of large text file (80+ MB)

I am trying to get the value of a specified line in a very large text file. The file is over 80mb, so I do not want to load it into a TStringlist. I am able to read it line by line using this code:

[DELPHI]
AssignFile(mLog, aLogFileName);
Reset(mLog);

while NOT EOF(aLogFile) do begin
ReadLn(mLog, mLine);

// Do something with mLine Here..
end;
CloseFile(mLog);

[/DELPHI]

However I cannot figure out how to use that type of proceedure and have it read only a certain line. Is it possible? Or is there an easy quick way to do it with a TFileStream?

Any thoughts?

-ouiji

[Link=http://www.ouiji.net/DelphiPages/] [/Link]
[q]"not quite smart enough to be dumb"[/q]
Reply With Quote
  #2  
Old 12-10-2008, 06:12 PM
chris_w chris_w is offline
Senior Member
 
Join Date: Jan 2004
Posts: 1,397
Default RE: Read specified line of large text file (80+ MB)

[pre]
Easier: No. Faster: Probably.

Not tested...

[/pre][DELPHI]
function FSGetLine(const Filename: string; Index: cardinal; var s: string): boolean;
const
BUF_SZ = $10000;
var
FS : TFileStream;
buf : string;
i, red : integer;
a, b, dx, fpos : int64;
begin
result := FileExists(Filename);
if result then begin
result := false;

FS := TFileStream.Create(Filename, fmOpenRead or fmShareDenyNone);
try
SetLength(buf, BUF_SZ);
fpos := 0;
dx := 0;
a := 0;
b := 0;

red := FS.Read(buf[1], BUF_SZ);
while (red > 0) and not result do begin

i := 0;
while (i < red) and not result do begin
if buf[i +1] = #10 then begin
b := fpos +i;
if dx = Index then begin
SetLength(s, b -a -1);
result := true;
if Length(s) > 0 then begin
FS.Position := a;
FS.Read(s[1], b -a -1);

while (s <> '') and (s[1] in [#10, #13]) do
Delete(s, 1, 1);
end;
end else begin
a := b;
Inc(dx);
end;
end;

Inc(i);
end;

Inc(fpos, red);
red := FS.Read(buf[1], BUF_SZ);
end;

if (b <= a) and (dx +1 = Index) and (a < FS.Size) then begin
//last line, no terminating linefeed char...
b := FS.Size -1;
SetLength(s, b -a -1);
result := true;
if Length(s) > 0 then begin
FS.Position := a;
FS.Read(s[1], b -a -1);

while (s <> '') and (s[1] in [#10, #13]) do
Delete(s, 1, 1);
end;
end;
finally
FS.Free;
end;
end;
end;

[/DELPHI][pre]
"There is a theory which states that if ever anybody discovers
exactly what the Universe is for and why it is here, it will
instantly disappear and be replaced by something even more
bizarre and inexplicable. There is another theory which states
that this has already happened."
-- Douglas Adams
[/pre]

Chris
Reply With Quote
  #3  
Old 12-11-2008, 11:40 PM
ErgonomixIT ErgonomixIT is offline
Member
 
Join Date: Jun 2007
Posts: 98
Default RE: Read specified line of large text file (80+ MB)

I once wrote some classes to deal with large text files by indexing them with assembler and generating an index file. I haven't touched this code for quite a while but used to work... Note: It loads the text file into RAM as binary, so unlike Chris' example, you need enough RAM.

Unlicensed, public domain:

[delphi]
type
{Future-proof variable defines}
INT_8 = -127..127;
UINT_8 = 0..255;
INT_16 = -32767..32767;
UINT_16 = 0..65535;
INT_32 = -2147483647..2147483647;
UINT_32 = 0..4294967295;
INT = INT_32;
UINT = UINT_32;

const
{Quantity defines}
KILOBYTE = 1024;
MEGABYTE = KILOBYTE * 1024;
GIGABYTE = MEGABYTE * 1024;
CR = #13;
LF = #10;

{ ************************************************** ************************** }
{ * Documentation: * }
{ * * }
{ * This is like a TMemoryStream, but it's not a TStream descendant and it * }
{ * addresses one of the main pitfalls of using TMemoryStream for batch * }
{ * writing - performance. This maintains a write-ahead buffer (the size of * }
{ * the buffer is designated in AOverflow in the Create() constructor). The * }
{ * purpose is to reduce the amount of ReallocMem() calls needed. When you * }
{ * have finished writing, you can call Shrink() to remove the write-ahead * }
{ * buffer (it then uses the same amount of memory as TMemoryStream. * }
{ * * }
{ * Inserting: * }
{ * This works like Write(), except that data is moved to make space for * }
{ * the inserted data. Inserting near the start of a large buffer can * }
{ * take some time. The buffer will be resized if it is too small. * }
{ * * }
{ * Deleting: * }
{ * This moves all data after ASize to Position - overwriting the old * }
{ * data. Deleting near the start of a large buffer can take some time. * }
{ * Delete never resizes the buffer. If you use Delete() a lot, you may * }
{ * wish to Shrink() periodically. * }
{ * * }
{ ************************************************** ************************** }

type
EEnhancedMemoryError = class(Exception);
TEnhancedMemory = class(TObject)
private
FHandle :Pointer;
FHandleSize :Integer;
FPosition :Integer;
FSize :Integer;
FOverflow :Integer;
procedure Resize(ANew :Integer = 0);
procedure SetPosition(AValue :Integer);
procedure SetSize(AValue :Integer);
public
constructor Create(AOverflow :Integer = MEGABYTE);
destructor Destroy; override;
property Position :Integer read FPosition write SetPosition;
property Size :Integer read FSize write SetSize;
procedure Write(const ABuffer; ASize :Integer);
procedure Insert(const ABuffer; ASize :Integer);
procedure Delete(ASize :Integer);
function Read(var ABuffer; ASize :Integer) :Integer;
procedure Shrink;
procedure LoadFromStream(AStream :TStream; ASize :Integer);
procedure SaveToStream(AStream :TStream);
{Special Reads}
function Readln :String;
function ReadByte :Byte;
function ReadWord :Word;
function ReadDWord :Cardinal;
end;

{ ************************************************** ************************** }
{ * Documentation: * }
{ * * }
{ * This is not designed to replace your TStringList. This is designed to * }
{ * provide a way of storing a large amount of text, in a binary form, for * }
{ * exceptional performance in loading, saving and manipulation. This IS * }
{ * null safe specifically for use with TStringCompress. * }
{ * * }
{ * LoadFromStream() in ASCII mode will ignore all CR characters, and * }
{ * register a new line on all LF characters. This makes it compatable with * }
{ * Windows, DOS, Unix, Linux and Mac OS X. It is not compatable with Mac * }
{ * OS 9 or earlier. * }
{ * * }
{ * SaveToStream() in ASCII mode will output all new lines as CR/LF pairs. * }
{ * * }
{ * Shrink() rebuilds the index and data, stripping any empty blocks caused * }
{ * by Delete() and any invalid entries, and then calls Shrink() on the * }
{ * TEnhancedMemory objects. This doesn't need doing very often, maybe * }
{ * after a few hundred Delete() calls, or when your application is bored. * }
{ * * }
{ ************************************************** ************************** }

TCRC32 = class; {forward}
TEnhancedStringListProgress = procedure(APercent :Byte); {NOT re-entrant}
TEnhancedStringListCompare = function(AFirst, ASecond :String) :Integer;
TEnhancedStringList = class(TObject)
private
FIndex :TEnhancedMemory;
FData :TEnhancedMemory;
FLock :TCriticalSection;
FOnChange :TNotifyEvent;
FCRC32 :TCRC32;
function GetString(AIndex :Integer) :String;
procedure PutString(AIndex :Integer; AValue :String);
function GetCount :Integer;
procedure QuickSort(L, R: Integer; ACompare: TEnhancedStringListCompare);
procedure _MakeEntry(AStart, AFinish :Pointer);
procedure _CreateIndexFromBuffer(const ABuffer; ASize :Cardinal);
public
constructor Create;
destructor Destroy; override;
{General Read}
property Strings[AIndex :Integer] :String read GetString write PutString; default;
property Count :Integer read GetCount;
{General Modification}
procedure Add(AValue :String);
procedure AddStrings(AValue :TStrings);
procedure Insert(AIndex :Integer; AValue :String);
procedure Delete(AIndex :Integer);
{Advanced}
function Find(ANeedle :String; ACaseSensitive :Boolean = True;
AWildcard :Boolean = False; ACallback :TEnhancedStringListProgress = nil;
ARangeStart :Integer = -1; ARangeEnd :Integer = -1) :Integer;
procedure Exchange(AFirst, ASecond :Integer);
procedure Move(AFrom, ATo :Integer);
procedure Sort(ACompare :TEnhancedStringListCompare = nil);
property OnChange: TNotifyEvent read FOnChange write FOnChange;
procedure Shrink;
{Loading and Saving}
procedure LoadFromStream(AStream :TStream; ABinary :Boolean = True);
procedure LoadFromFile(AName :String; ABinary :Boolean = True);
procedure SaveToStream(AStream :TStream; ABinary :Boolean = True);
procedure SaveToFile(AName :String; ABinary :Boolean = True);
{Threading}
procedure Lock;
procedure Unlock;
end;

{ ************************************************** ************************** }
{ * Documentation: * }
{ * * }
{ * This is a PKZIP compatable CRC32 hash calculator. To use, create the * }
{ * object, use the AddFrom* routines and call Hash*. The Hash* functions * }
{ * finalize the CRC32, but you can still add to it afterwards. NewHash() * }
{ * clears the CRC32 for you to start again. * }
{ * * }
{ ************************************************** ************************** }

TCRC32 = class(TObject)
private
FCRC32 :Cardinal;
FFinalized :Boolean;
function GetHashCardinal :Cardinal;
function GetHashString :String;
public
constructor Create;
procedure NewHash;
procedure AddFromStream(AStream :TStream);
procedure AddFromBuffer(var ABuffer; ASize :Integer);
procedure AddFromFile(AFilename :String);
procedure AddFromString(AString :String);
procedure FinalizeHash;
property HashCardinal :Cardinal read GetHashCardinal;
property HashString :String read GetHashString;
end;

{Specialist Binary Search Routines}
function BinaryPos(const ANeedle; ANeedleSize :Integer; const AHaystack; AHaystackSize :Integer) :Integer;
{ Finds ANeedle in AHaystack, the same as Pos() but binary }
function FindIntegerInArray(AValue :Integer; const ABuffer; ASize :Integer) :Integer;
{ Finds AValue in an ABuffer integer array. ASize is SizeOf(), +1 of index.
and the result is -1 or Low() + Result. Do partial searches with pointer
arithmetic. }
function PosR(ANeedle :Char; AHaystack :String) :Cardinal;
{ Pos(Char,String), but right to left }

implementation

function BinaryPos(const ANeedle; ANeedleSize :Integer; const AHaystack; AHaystackSize :Integer) :Integer;
type
PByte = ^Byte;
PInteger = ^Integer;
var
NeedleD, HaystackD :PInteger;
NeedleB, HaystackB :PByte;
Match :Boolean;

procedure Check(AOffset :Pointer);
var
NeedleB, HaystackB :^Byte;
Count :Integer;
LMatch :Boolean;
begin
NeedleB := @ANeedle;
HaystackB := AOffset;
LMatch := True;
for Count := 0 to ANeedleSize-1 do
if PByte(Integer(NeedleB) + Count)^ <> PByte(Integer(HaystackB) + Count)^ then
LMatch := False;
if LMatch then
begin
Match := True;
Result := Integer(AOffset) - Integer(@AHaystack);
end;
end;

begin
Match := False;
Result := -1;
if ANeedleSize >= 4 then
begin
NeedleD := @ANeedle;
HaystackD := @AHaystack;
while (Integer(HaystackD) < (Integer(@AHaystack) + AHaystackSize) - ANeedleSize) and (not Match) do
begin
if NeedleD^ = HaystackD^ then
Check(HaystackD);
Inc(HaystackD);
end;
end
else
begin
NeedleB := @ANeedle;
HaystackB := @AHaystack;
while (Integer(HaystackB) < (Integer(@AHaystack) + AHaystackSize) - ANeedleSize) and (not Match) do
begin
if NeedleB^ = HaystackB^ then
Check(HaystackB);
Inc(HaystackB);
end;
end;
end;

function FindIntegerInArray(AValue :Integer; const ABuffer; ASize :Integer) :Integer;
asm
push edi
push ebx
mov ebx,ecx
mov edi,edx
cmp ecx,00000000h
je @notfound
repne scasd
{calc result into eax}
jnz @notfound
mov eax,ebx
sub eax,ecx
dec eax
jmp @done
@notfound:
mov eax,-1
@done:
pop ebx
pop edi
end;

function PosR(ANeedle :Char; AHaystack :String) :Cardinal;
asm
push edi

mov ecx,[edx-4]
cmp ecx,00000000h
je @zerolength

mov edi,edx
add edi,ecx
inc ecx
std
repne scasb
jnz @zerolength
mov eax,ecx
inc eax
jmp @done
@zerolength:
xor eax,eax
@done:
pop edi
cld
end;

{ TEnhancedMemory }

constructor TEnhancedMemory.Create(AOverflow: Integer);
begin
FHandle := nil;
FHandleSize := 0;
FPosition := 0;
FSize := 0;
FOverflow := AOverflow;
Resize;
end;

procedure TEnhancedMemory.Delete(ASize: Integer);
var
Amount :Integer;
begin
Amount := FSize - (FPosition + ASize);
Move(
Pointer(Integer(FHandle) + FPosition + ASize)^,
Pointer(Integer(FHandle) + FPosition)^,
Amount
);
Dec(FSize,ASize);
end;

destructor TEnhancedMemory.Destroy;
begin
inherited;
FreeMem(FHandle,FHandleSize);
end;

procedure TEnhancedMemory.Insert(const ABuffer; ASize: Integer);
begin
Resize(ASize);
Move(
Pointer(Integer(FHandle) + FPosition)^,
Pointer(Integer(FHandle) + FPosition + ASize)^,
ASize
);
Move(ABuffer,Pointer(Integer(FHandle) + FPosition)^,ASize);
Inc(FPosition,ASize);
if FPosition > FSize then
FSize := FPosition;
end;

procedure TEnhancedMemory.LoadFromStream(AStream: TStream; ASize :Integer);
begin
FSize := ASize;
FHandleSize := FSize;
ReallocMem(FHandle,FHandleSize);
if FPosition > FSize then
FPosition := FSize;
AStream.Read(FHandle^,FHandleSize);
end;

function TEnhancedMemory.Read(var ABuffer; ASize: Integer): Integer;
var
ReadSize :Integer;
begin
if FPosition + ASize > FSize then
ReadSize := FSize - FPosition
else
ReadSize := ASize;
Move(Pointer(Integer(FHandle) + FPosition)^,ABuffer,ReadSize);
Inc(FPosition,ReadSize);
Result := ReadSize;
end;

function TEnhancedMemory.Readln: String;
var
Ch :Char;
begin
Result := '';
while (Read(Ch,SizeOf(Char)) = SizeOf(Char)) and (Ch <> LF) do
if Ch <> LF then
Result := Concat(Result,Ch);
end;

function TEnhancedMemory.ReadByte: Byte;
begin
if Read(Result,SizeOf(Byte)) <> SizeOf(Byte) then
EEnhancedMemoryError.Create('End of buffer');
end;

function TEnhancedMemory.ReadWord: Word;
begin
if Read(Result,SizeOf(Word)) <> SizeOf(Word) then
EEnhancedMemoryError.Create('End of buffer');
end;

function TEnhancedMemory.ReadDWord: Cardinal;
begin
if Read(Result,SizeOf(Cardinal)) <> SizeOf(Cardinal) then
EEnhancedMemoryError.Create('End of buffer');
end;

procedure TEnhancedMemory.Resize(ANew :Integer = 0);
begin
if FPosition > FSize then
FPosition := FSize;
if FPosition + ANew > FHandleSize then
begin
FHandleSize := FPosition + ANew + FOverflow;
ReallocMem(FHandle,FHandleSize);
end;
end;

procedure TEnhancedMemory.SaveToStream(AStream: TStream);
begin
AStream.Write(FHandle^,FSize);
end;

procedure TEnhancedMemory.SetPosition(AValue: Integer);
begin
if AValue > FSize then
FPosition := FSize
else
FPosition := AValue;
end;

procedure TEnhancedMemory.SetSize(AValue: Integer);
begin
FSize := AValue;
FHandleSize := FSize;
ReallocMem(FHandle,FHandleSize);
if FPosition > FSize then
FPosition := FSize;
end;

procedure TEnhancedMemory.Shrink;
begin
ReallocMem(FHandle,FSize);
FHandleSize := FSize;
end;

procedure TEnhancedMemory.Write(const ABuffer; ASize: Integer);
begin
Resize(ASize);
Move(ABuffer,Pointer(Integer(FHandle) + FPosition)^,ASize);
Inc(FPosition,ASize);
if FPosition > FSize then
FSize := FPosition;
end;

{ TEnhancedStringList }

type
TEnhancedStringListIndexEntry = packed record
Offset :Integer;
Size :Integer;
Hash :Cardinal;
end;

procedure TEnhancedStringList.Add(AValue: String);
var
Entry :TEnhancedStringListIndexEntry;
begin
FIndex.Position := FIndex.Size;
FData.Position := FData.Size;
Entry.Offset := FData.Position;
Entry.Size := Length(AValue);
FCRC32.NewHash;
FCRC32.AddFromString(AValue);
Entry.Hash := FCRC32.GetHashCardinal;
FIndex.Write(Entry,SizeOf(TEnhancedStringListIndex Entry));
FData.Write(AValue[1],Entry.Size);
if Assigned(FOnChange) then FOnChange(Self);
end;

procedure TEnhancedStringList.AddStrings(AValue: TStrings);
var
Count :Integer;
TempOnChange :TNotifyEvent;
begin
TempOnChange := FOnChange;
FOnChange := nil; {Stop it triggering for every add here}
for Count := 0 to AValue.Count-1 do
Add(AValue[Count]);
FOnChange := TempOnChange;
if Assigned(FOnChange) then FOnChange(Self);
end;

constructor TEnhancedStringList.Create;
begin
FLock := TCriticalSection.Create;
OnChange := nil;
FIndex := TEnhancedMemory.Create(4*KILOBYTE);
FData := TEnhancedMemory.Create(MEGABYTE);
FCRC32 := TCRC32.Create;
end;

destructor TEnhancedStringList.Destroy;
begin
FCRC32.Free;
FIndex.Free;
FData.Free;
FLock.Free;
inherited;
end;

procedure TEnhancedStringList.Delete(AIndex: Integer);
var
Buffer :Pointer;
Entry :TEnhancedStringListIndexEntry;
begin
{Overwrite the old data incase it's sensitive}
FIndex.Position := AIndex * SizeOf(TEnhancedStringListIndexEntry);
FIndex.Read(Entry,SizeOf(TEnhancedStringListIndexE ntry));
FData.Position := Entry.Offset;
GetMem(Buffer,Entry.Size);
FillChar(Buffer^,Entry.Size,0);
FData.Write(Buffer^,Entry.Size);
FreeMem(Buffer);
{Now Delete}
FIndex.Position := AIndex * SizeOf(TEnhancedStringListIndexEntry);
FIndex.Delete(SizeOf(TEnhancedStringListIndexEntry ));
if Assigned(FOnChange) then FOnChange(Self);
end;

procedure TEnhancedStringList.Exchange(AFirst, ASecond: Integer);
var
Entry :array[0..1] of TEnhancedStringListIndexEntry;
begin
{Read}
FIndex.Position := AFirst * SizeOf(TEnhancedStringListIndexEntry);
FIndex.Read(Entry[0],SizeOf(TEnhancedStringListIndexEntry));
FIndex.Position := ASecond * SizeOf(TEnhancedStringListIndexEntry);
FIndex.Read(Entry[1],SizeOf(TEnhancedStringListIndexEntry));
{Write}
FIndex.Position := AFirst * SizeOf(TEnhancedStringListIndexEntry);
FIndex.Write(Entry[1],SizeOf(TEnhancedStringListIndexEntry));
FIndex.Position := ASecond * SizeOf(TEnhancedStringListIndexEntry);
FIndex.Write(Entry[0],SizeOf(TEnhancedStringListIndexEntry));
if Assigned(FOnChange) then FOnChange(Self);
end;

function TEnhancedStringList.Find(ANeedle: String; ACaseSensitive, AWildcard: Boolean; ACallback: TEnhancedStringListProgress; ARangeStart, ARangeEnd: Integer): Integer;
var
Count :Integer;
RangeStart, RangeEnd :Integer;
PC, OldPC :Byte;
CurrentStr :String;
Needle :String;
begin
if (ARangeStart >= GetCount) or (ARangeStart < -1) then
Result := -1
else if (ARangeEnd >= GetCount) or (ARangeEnd < ARangeStart) then
Result := -1
else if ANeedle = '' then
Result := -1
else
begin
Result := -1;
if ARangeStart = -1 then RangeStart := 0 else RangeStart := ARangeStart;
if ARangeEnd = -1 then RangeEnd := GetCount-1 else RangeEnd := ARangeEnd;
Count := RangeStart;
if ACaseSensitive then
Needle := ANeedle
else
Needle := LowerCase(ANeedle);
OldPC := 255;
while (Result = -1) and (Count <= RangeEnd) do
begin
if ACaseSensitive then
CurrentStr := Strings[Count]
else
CurrentStr := LowerCase(Strings[Count]);
if AWildcard then
if Pos(Needle,CurrentStr) > 0 then
Result := Count
else
else
if Needle = CurrentStr then
Result := Count;
{Percentage Callback}
if Assigned(ACallback) then
begin
if Count-RangeStart < 1 then
PC := 0
else
PC := Round( ( (Count-RangeStart) / (RangeEnd-RangeStart) ) * 100 );
if OldPC <> PC then
begin
OldPC := PC;
ACallback(PC);
end;
end;
Inc(Count);
end;
end;
end;

function TEnhancedStringList.GetCount: Integer;
begin
if FIndex.Size = 0 then
Result := 0
else
Result := FIndex.Size div SizeOf(TEnhancedStringListIndexEntry);
end;

function TEnhancedStringList.GetString(AIndex: Integer): String;
var
EntryBuffer :TEnhancedStringListIndexEntry;
IndexOffset :Integer;
begin
{32814 error}
IndexOffset := AIndex * SizeOf(TEnhancedStringListIndexEntry);
if IndexOffset > FIndex.Size then
raise EStringListError.Create('List index out of bounds ('+IntToStr(AIndex)+')');
FIndex.Position := IndexOffset;
FIndex.Read(EntryBuffer,SizeOf(TEnhancedStringList IndexEntry));
SetLength(Result,EntryBuffer.Size);
if (EntryBuffer.Offset + EntryBuffer.Size) > FData.FSize then
raise EStringListError.Create('List index corrupted');
FData.Position := EntryBuffer.Offset;
FData.Read(Result[1],EntryBuffer.Size);
end;

procedure TEnhancedStringList.Insert(AIndex: Integer; AValue: String);
var
Entry :TEnhancedStringListIndexEntry;
begin
FIndex.Position := AIndex * SizeOf(TEnhancedStringListIndexEntry);
FData.Position := FData.Size;
Entry.Offset := FData.Position;
Entry.Size := Length(AValue);
FCRC32.NewHash;
FCRC32.AddFromString(AValue);
Entry.Hash := FCRC32.GetHashCardinal;
FIndex.Insert(Entry,SizeOf(TEnhancedStringListInde xEntry));
FData.Write(AValue[1],Entry.Size);
if Assigned(FOnChange) then FOnChange(Self);
end;

procedure TEnhancedStringList.LoadFromFile(AName: String; ABinary: Boolean);
var
FileStream :TFileStream;
begin
FileStream := TFileStream.Create(AName,fmOpenRead or fmShareDenyNone);
try
LoadFromStream(FileStream,ABinary);
finally
FileStream.Free;
end;
end;

procedure TEnhancedStringList._MakeEntry(AStart, AFinish :Pointer);
var
Entry :TEnhancedStringListIndexEntry;
begin
Entry.Offset := Cardinal(AStart) - Cardinal(FData.FHandle);
Entry.Size := Cardinal(AFinish) - Cardinal(AStart);
FCRC32.NewHash;
FCRC32.AddFromBuffer(AStart^,Entry.Size);
Entry.Hash := FCRC32.GetHashCardinal;
FIndex.Position := FIndex.Size;
FIndex.Write(Entry,SizeOf(TEnhancedStringListIndex Entry));
end;

procedure TEnhancedStringList._CreateIndexFromBuffer(const ABuffer; ASize :Cardinal);
var
LastOffset :Pointer;
OriginalEDI :Cardinal;
OriginalEAX :Cardinal;
asm
mov OriginalEAX,eax
mov OriginalEDI,edi
jcxz @done
mov edi,edx
mov LastOffset,edx

@continue:
mov al,13
repne scasb
jcxz @done

push edi
push ecx
mov edx,LastOffset
mov ecx,edi
dec ecx

mov eax,OriginalEAX
call TEnhancedStringList[EAX]._MakeEntry
pop ecx
pop edi
mov LastOffset,edi
jmp @continue

@done:
mov edi,OriginalEDI
end;


procedure TEnhancedStringList.LoadFromStream(AStream: TStream; ABinary: Boolean);
var
IntBuffer :Integer;
Buffer :array[0..4095] of Char;
BufferSize :Integer; {For incomplete reads}
StrBuffer :String;
Count :Integer;
begin
if ABinary then
begin
AStream.Read(IntBuffer,SizeOf(Integer));
FIndex.LoadFromStream(AStream,IntBuffer);
AStream.Read(IntBuffer,SizeOf(Integer));
FData.LoadFromStream(AStream,IntBuffer);
end
else
begin
FData.Size := 0;
FIndex.Size := 0;
FData.LoadFromStream(AStream,AStream.Size);
_CreateIndexFromBuffer(FData.FHandle^,FData.Size);
{ StrBuffer := '';
while AStream.Position < AStream.Size do
begin
BufferSize := AStream.Read(Buffer[0],SizeOf(Buffer));
for Count := 0 to BufferSize-1 do
case Buffer[Count] of
#13: ;
#10: begin Add(StrBuffer); StrBuffer := ''; end;
else StrBuffer := Concat(StrBuffer,Buffer[Count]);
end;
end;
if StrBuffer <> '' then
Add(StrBuffer);}
end;
if Assigned(FOnChange) then FOnChange(Self);
end;

procedure TEnhancedStringList.Lock;
begin
FLock.Acquire;
end;

procedure TEnhancedStringList.Move(AFrom, ATo: Integer);
var
Buffer :String;
TempOnChange :TNotifyEvent;
begin
TempOnChange := FOnChange;
FOnChange := nil;
Buffer := Strings[AFrom];
Delete(AFrom);
Insert(ATo,Buffer);
if AFrom > ATo then
Delete(AFrom+1)
else
Delete(AFrom);
FOnChange := TempOnChange;
if Assigned(FOnChange) then FOnChange(Self);
end;

procedure TEnhancedStringList.PutString(AIndex: Integer; AValue: String);
var
Buffer :Pointer;
Entry :TEnhancedStringListIndexEntry;
begin
{Overwrite the old data incase it's sensitive}
FIndex.Position := AIndex * SizeOf(TEnhancedStringListIndexEntry);
FIndex.Read(Entry,SizeOf(TEnhancedStringListIndexE ntry));
FData.Position := Entry.Offset;
GetMem(Buffer,Entry.Size);
FillChar(Buffer^,Entry.Size,0);
FData.Write(Buffer^,Entry.Size);
FreeMem(Buffer);
{There may be space to store the new data in the old location...}
if Entry.Size >= Length(AValue) then
FData.Position := Entry.Offset
else
FData.Position := FData.Size;
{Now write the data wherever, and the index}
Entry.Offset := FData.Position;
FIndex.Position := AIndex * SizeOf(TEnhancedStringListIndexEntry);
Entry.Size := Length(AValue);
FCRC32.NewHash;
FCRC32.AddFromString(AValue);
Entry.Hash := FCRC32.GetHashCardinal;
FIndex.Write(Entry,SizeOf(TEnhancedStringListIndex Entry));
FData.Write(AValue[1],Entry.Size);
if Assigned(FOnChange) then FOnChange(Self);
end;

procedure TEnhancedStringList.SaveToFile(AName: String; ABinary: Boolean);
var
FileStream :TFileStream;
begin
DeleteFile(PChar(AName));
FileStream := TFileStream.Create(AName,fmCreate or fmShareDenyWrite);
try
SaveToStream(FileStream,ABinary);
finally
FileStream.Free;
end;
end;

procedure TEnhancedStringList.SaveToStream(AStream: TStream; ABinary: Boolean);
var
IntBuffer :Integer;
Count :Integer;
StrBuffer :String;
begin
if ABinary then
begin
IntBuffer := FIndex.Size;
AStream.Write(IntBuffer,SizeOf(Integer));
FIndex.SaveToStream(AStream);
IntBuffer := FData.Size;
AStream.Write(IntBuffer,SizeOf(Integer));
FData.SaveToStream(AStream);
end
else
for Count := 0 to GetCount-1 do
begin
StrBuffer := Strings[Count] + #13#10;
AStream.Write(StrBuffer[1],Length(StrBuffer));
end;
end;

function EnhancedStringListCompareDefault(AFirst, ASecond :String) :Integer;
begin
Result := AnsiCompareText(AFirst,ASecond);
end;

procedure TEnhancedStringList.QuickSort(L, R: Integer; ACompare: TEnhancedStringListCompare);
var
I, J, P: Integer;
begin
repeat
I := L;
J := R;
P := (L + R) shr 1;
repeat
while ACompare(Strings[I], Strings[P]) < 0 do Inc(I);
while ACompare(Strings[J], Strings[P]) > 0 do Dec(J);
if I <= J then
begin
Exchange(I, J);
if P = I then
P := J
else if P = J then
P := I;
Inc(I);
Dec(J);
end;
until I > J;
if L < J then QuickSort(L, J, ACompare);
L := I;
until I >= R;
end;

procedure TEnhancedStringList.Sort(ACompare: TEnhancedStringListCompare);
var
CompareProc :TEnhancedStringListCompare;
TempOnChange :TNotifyEvent;
begin
TempOnChange := FOnChange;
FOnChange := nil;
if Assigned(ACompare) then
CompareProc := ACompare
else
CompareProc := EnhancedStringListCompareDefault;
QuickSort(0,GetCount-1,CompareProc);
FOnChange := TempOnChange;
if Assigned(FOnChange) then FOnChange(Self);
end;

procedure TEnhancedStringList.Unlock;
begin
FLock.Release;
end;

procedure TEnhancedStringList.Shrink;
var
Count :Integer;
OldEntry :TEnhancedStringListIndexEntry;
NewEntry :TEnhancedStringListIndexEntry;
NewIndex :TEnhancedMemory;
NewData :TEnhancedMemory;
Buffer :Pointer;
begin
{Rebuild index into NewIndex, NewData}
FIndex.Position := 0;
NewIndex := TEnhancedMemory.Create(4*KILOBYTE);
NewData := TEnhancedMemory.Create(MEGABYTE);
Buffer := nil;
if FIndex.Size > 0 then
try
for Count := 0 to FIndex.Size div SizeOf(TEnhancedStringListIndexEntry)-1 do
if (FIndex.Read(OldEntry,SizeOf(TEnhancedStringListIn dexEntry)) = SizeOf(TEnhancedStringListIndexEntry)) then
begin
ReallocMem(Buffer,OldEntry.Size);
FData.Position := OldEntry.Offset;;
if (FData.Position = OldEntry.Offset) then
if (FData.Read(Buffer^,OldEntry.Size) = OldEntry.Size) then
begin
NewEntry.Offset := NewData.Position;
NewEntry.Size := OldEntry.Size;
NewEntry.Hash := OldEntry.Hash;
NewIndex.Write(NewEntry,SizeOf(TEnhancedStringList IndexEntry));
NewData.Write(Buffer^,NewEntry.Size);
end;
end;
finally
FreeMem(Buffer);
end;
{Shrink and swap objects}
NewIndex.Shrink;
NewData.Shrink;
FIndex.Free;
FData.Free;
FIndex := NewIndex;
FData := NewData;
if Assigned(FOnChange) then FOnChange(Self);
end;

{ TCRC32 }

const
CRC32Hash :array[0..255] of Cardinal =
($00000000, $77073096, $EE0E612C, $990951BA,
$076DC419, $706AF48F, $E963A535, $9E6495A3,
$0EDB8832, $79DCB8A4, $E0D5E91E, $97D2D988,
$09B64C2B, $7EB17CBD, $E7B82D07, $90BF1D91,
$1DB71064, $6AB020F2, $F3B97148, $84BE41DE,
$1ADAD47D, $6DDDE4EB, $F4D4B551, $83D385C7,
$136C9856, $646BA8C0, $FD62F97A, $8A65C9EC,
$14015C4F, $63066CD9, $FA0F3D63, $8D080DF5,
$3B6E20C8, $4C69105E, $D56041E4, $A2677172,
$3C03E4D1, $4B04D447, $D20D85FD, $A50AB56B,
$35B5A8FA, $42B2986C, $DBBBC9D6, $ACBCF940,
$32D86CE3, $45DF5C75, $DCD60DCF, $ABD13D59,
$26D930AC, $51DE003A, $C8D75180, $BFD06116,
$21B4F4B5, $56B3C423, $CFBA9599, $B8BDA50F,
$2802B89E, $5F058808, $C60CD9B2, $B10BE924,
$2F6F7C87, $58684C11, $C1611DAB, $B6662D3D,

$76DC4190, $01DB7106, $98D220BC, $EFD5102A,
$71B18589, $06B6B51F, $9FBFE4A5, $E8B8D433,
$7807C9A2, $0F00F934, $9609A88E, $E10E9818,
$7F6A0DBB, $086D3D2D, $91646C97, $E6635C01,
$6B6B51F4, $1C6C6162, $856530D8, $F262004E,
$6C0695ED, $1B01A57B, $8208F4C1, $F50FC457,
$65B0D9C6, $12B7E950, $8BBEB8EA, $FCB9887C,
$62DD1DDF, $15DA2D49, $8CD37CF3, $FBD44C65,
$4DB26158, $3AB551CE, $A3BC0074, $D4BB30E2,
$4ADFA541, $3DD895D7, $A4D1C46D, $D3D6F4FB,
$4369E96A, $346ED9FC, $AD678846, $DA60B8D0,
$44042D73, $33031DE5, $AA0A4C5F, $DD0D7CC9,
$5005713C, $270241AA, $BE0B1010, $C90C2086,
$5768B525, $206F85B3, $B966D409, $CE61E49F,
$5EDEF90E, $29D9C998, $B0D09822, $C7D7A8B4,
$59B33D17, $2EB40D81, $B7BD5C3B, $C0BA6CAD,

$EDB88320, $9ABFB3B6, $03B6E20C, $74B1D29A,
$EAD54739, $9DD277AF, $04DB2615, $73DC1683,
$E3630B12, $94643B84, $0D6D6A3E, $7A6A5AA8,
$E40ECF0B, $9309FF9D, $0A00AE27, $7D079EB1,
$F00F9344, $8708A3D2, $1E01F268, $6906C2FE,
$F762575D, $806567CB, $196C3671, $6E6B06E7,
$FED41B76, $89D32BE0, $10DA7A5A, $67DD4ACC,
$F9B9DF6F, $8EBEEFF9, $17B7BE43, $60B08ED5,
$D6D6A3E8, $A1D1937E, $38D8C2C4, $4FDFF252,
$D1BB67F1, $A6BC5767, $3FB506DD, $48B2364B,
$D80D2BDA, $AF0A1B4C, $36034AF6, $41047A60,
$DF60EFC3, $A867DF55, $316E8EEF, $4669BE79,
$CB61B38C, $BC66831A, $256FD2A0, $5268E236,
$CC0C7795, $BB0B4703, $220216B9, $5505262F,
$C5BA3BBE, $B2BD0B28, $2BB45A92, $5CB36A04,
$C2D7FFA7, $B5D0CF31, $2CD99E8B, $5BDEAE1D,

$9B64C2B0, $EC63F226, $756AA39C, $026D930A,
$9C0906A9, $EB0E363F, $72076785, $05005713,
$95BF4A82, $E2B87A14, $7BB12BAE, $0CB61B38,
$92D28E9B, $E5D5BE0D, $7CDCEFB7, $0BDBDF21,
$86D3D2D4, $F1D4E242, $68DDB3F8, $1FDA836E,
$81BE16CD, $F6B9265B, $6FB077E1, $18B74777,
$88085AE6, $FF0F6A70, $66063BCA, $11010B5C,
$8F659EFF, $F862AE69, $616BFFD3, $166CCF45,
$A00AE278, $D70DD2EE, $4E048354, $3903B3C2,
$A7672661, $D06016F7, $4969474D, $3E6E77DB,
$AED16A4A, $D9D65ADC, $40DF0B66, $37D83BF0,
$A9BCAE53, $DEBB9EC5, $47B2CF7F, $30B5FFE9,
$BDBDF21C, $CABAC28A, $53B39330, $24B4A3A6,
$BAD03605, $CDD70693, $54DE5729, $23D967BF,
$B3667A2E, $C4614AB8, $5D681B02, $2A6F2B94,
$B40BBE37, $C30C8EA1, $5A05DF1B, $2D02EF8D);

procedure TCRC32.AddFromBuffer(var ABuffer; ASize: Integer);
type
PByte = ^Byte;
var
Count :Cardinal;
begin
{If the CRC was finalized earlier, reverse it and allow updates}
if FFinalized then
begin
FCRC32 := not FCRC32;
FFinalized := False;
end;
if ASize > 0 then
for Count := 0 to ASize-1 do
FCRC32 := (FCRC32 shr 8) xor CRC32Hash[PByte(Cardinal(@ABuffer) + Count)^ xor (FCRC32 and $000000FF)];
end;

procedure TCRC32.AddFromFile(AFilename: String);
var
Handle :TFileStream;
begin
Handle := TFileStream.Create(AFilename,fmOpenRead or fmShareDenyNone);
try
AddFromStream(Handle);
finally
Handle.Free;
end;
end;

procedure TCRC32.AddFromStream(AStream: TStream);
var
Handle :Pointer;
begin
if AStream is TCustomMemoryStream then
AddFromBuffer(TCustomMemoryStream(AStream).Memory^ ,AStream.Size)
else
begin
GetMem(Handle,AStream.Size);
try
AStream.Seek(0,soFromBeginning);
AStream.Read(Handle^,AStream.Size);
AddFromBuffer(Handle^,AStream.Size);
finally
FreeMem(Handle);
end;
end;
end;

procedure TCRC32.AddFromString(AString: String);
begin
AddFromBuffer(AString[1],Length(AString));
end;

constructor TCRC32.Create;
begin
NewHash;
end;

procedure TCRC32.FinalizeHash;
begin
FCRC32 := not FCRC32;
FFinalized := True;
end;

function TCRC32.GetHashCardinal: Cardinal;
begin
if not FFinalized then
FinalizeHash;
Result := FCRC32;
end;

function TCRC32.GetHashString: String;
begin
Result := IntToHex(GetHashCardinal,8);
end;

procedure TCRC32.NewHash;
begin
FFinalized := False;
FCRC32 := $FFFFFFFF;
end;
[/delphi]
Reply With Quote
  #4  
Old 12-16-2008, 04:45 PM
Ouiji Ouiji is offline
Senior Member
 
Join Date: Nov 2001
Location: US of A
Posts: 492
Default RE: Read specified line of large text file (80+ MB)

I could not seem to get this code to work correctly, but I did stumble upon one of your older posts which worked like a charm after a little tweaking.

[LINK]http://www.delphipages.com/threads/thread.cfm?ID=193531&G=193515[/LINK]

[Link=http://www.ouiji.net/DelphiPages/] [/Link]
[q]"not quite smart enough to be dumb"[/q]
Reply With Quote
  #5  
Old 12-16-2008, 08:19 PM
chris_w chris_w is offline
Senior Member
 
Join Date: Jan 2004
Posts: 1,397
Default RE: Read specified line of large text file (80+ MB)

[pre]
Cleaner code, tested on 60MB file...

[/pre][DELPHI]

// Following functions handle ASCII files only, not unicode.
//
// Files containing stray line feed characters (x0A)
// or mismatched CRLFs (x0A0D0A) may cause the line
// indexing/count returned to be different than that
// reported by a file editing application which ignores
// these errors.


//Index is zero based index of line
function FSGetLine(const Filename: string; Index: cardinal; var s: string): boolean;

procedure TrimCntrlChars(var s: string);
var
a, b, len : integer;
begin
len := Length(s);
if len > 0 then begin
a := 1;
while (a <= len) and (s[a] in [#0..#8, #10..#27]) do
Inc(a);

if a <= len then begin
b := len;
while (b > a) and (s[b] in [#0..#8, #10..#27]) do
Dec(b);

s := Copy(s, a, b -a +1)
end else
s := '';
end;
end;

const
BUF_SZ = $10000;
var
FS : TFileStream;
buf : string;
i, red : integer;
a, b, dx, fpos : int64;
c : char;
begin
result := FileExists(Filename);
if result then begin
result := false;

FS := TFileStream.Create(Filename, fmOpenRead or fmShareDenyNone);
try
if FS.Size > 0 then begin
SetLength(buf, BUF_SZ);
fpos := 0;
dx := 0;
a := 0;

red := FS.Read(buf[1], BUF_SZ);
while (red > 0) and not result do begin

i := 0;
while (i < red) and not result do begin
if buf[i +1] = #10 then begin
b := fpos +i;
if dx = Index then begin
result := true;
if b -a -1 > 0 then begin
SetLength(s, b -a -1);
FS.Position := a;
FS.Read(s[1], b -a -1);

TrimCntrlChars(s);
end else
s := '';
end else begin
a := b;
Inc(dx);
end;
end;

Inc(i);
end;

Inc(fpos, red);
red := FS.Read(buf[1], BUF_SZ);
end;

if not result then begin
FS.Seek(-1, soFromEnd);
FS.Read(c, 1);
if (c <> #10) and (dx +1 = Index) then begin
//last line not terminated with linefeed char...
b := FS.Size -1;
result := true;
if b -a > 0 then begin
SetLength(s, b -a);
FS.Position := a;
FS.Read(s[1], b -a);

TrimCntrlChars(s);
end else
s := '';
end;
end;

end;
finally
FS.Free;
end;
end;
end;


function FSLineCount(const Filename: string): integer;
const
BUF_SZ = $10000;
var
FS : TFileStream;
buf : string;
i, red : integer;
c : char;
begin
result := 0;
if FileExists(Filename) then begin

FS := TFileStream.Create(Filename, fmOpenRead or fmShareDenyNone);
try
if FS.Size > 0 then begin

SetLength(buf, BUF_SZ);
red := FS.Read(buf[1], BUF_SZ);
while red > 0 do begin

for i := 1 to red do
if buf[i] = #10 then
Inc(result);

red := FS.Read(buf[1], BUF_SZ);
end;

FS.Seek(-1, soFromEnd);
FS.Read(c, 1);
if c <> #10 then
//last line not terminated with linefeed char...
Inc(result);

end;
finally
FS.Free;
end;
end;
end;

[/DELPHI][pre]
"There is a theory which states that if ever anybody discovers
exactly what the Universe is for and why it is here, it will
instantly disappear and be replaced by something even more
bizarre and inexplicable. There is another theory which states
that this has already happened."
-- Douglas Adams
[/pre]

Chris
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On

Forum Jump


All times are GMT. The time now is 09:52 AM.


Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2019, vBulletin Solutions, Inc.