Jump to content

Read (big) text file into a string


---
 Share

Recommended Posts

What is the correct way to read an entire (big) text file into a string in PCM?

I tried something like this:

File="C:\test.txt"
TestString=File.asFilename.contentsOfEntireFile
display(TestString)

The first two lines work, but if the text file exceeds a certain size, the display() function fails, always complaining about a "missing second quotation mark", followed by various other errors. The same happens with other string operations, e.g. len(TestString). Is this a PCM bug or what am I doing wrong?

I know there is readListFile(), but in the end I need a string. And as soon as I convert the list to a string, the same errors occur.

Edited
Link to comment
Share on other sites

Please sign in to view this username.

 seems to work for me :  well this was only 2000 characters, how big is your file ?? EDIT : Also worked with 20,000 characters too. Calypso 2023 7.6

image.thumb.png.88d5fdf5e1a515b68f8eedd5e126727d.png

with random generated text from ChatGPT 😁

Edited
Link to comment
Share on other sites

It's a normal written text file, so may contain all sorts of characters from the ASCII space (nothing below asc(32) except linefeeds).

The file in the example above had 29 KB. Not that I would really need to read a file THAT big. I just noticed that PCM starts to act up when a file gets too big and used a random one for tests. I haven't yet figured out the actual limit.

Please sign in to view this username.

 It's not a limitation of the display() function or the display window. I have displayed thousands of characters at once without a problem as long as I created the string solely from within PCM (in a loop for instance). This only happens when I read from a file with Smalltalk extensions. And as mentioned above, it also happens when I use other functions like len() on the resulting string.

Edited
Link to comment
Share on other sites

Please sign in to view this quote.

Very strange. I use the same Calypso version and it also happened with 2024. I think 20,000 characters is well above the size limit that triggered the error. Maybe try something without quote characters? I haven't checked my files for those.

Link to comment
Share on other sites

I made the first and last characters quotation marks " "   and added a few more quotes throughout, still functioned here.

 

with quotes.png

Link to comment
Share on other sites

This is insane! Today I created a 30 KB text file containing a repetition of all the ASCII characters from 32 to 127 without any linefeeds. And it works! 😮😮

Size doesn't seem to be the critical factor then.

So what the heck is the difference between this and some random hand-typed text file of the same size that makes PCM go crazy? 🤯

And what is even more insane: In the parameter list I can see that the string variable really contains everything from the file in question. So PCM has no problem displaying it there.

I just don't get it.

Edited
Link to comment
Share on other sites

HA! Found it!

It's the double slash ('//') that makes PCM hiccup. Files size doesn't matter at all. You don't even need a file. Every string containg double slashes causes the error.

I guess it interferes with PCM's comment recognition somehow.

Sooo, how can I prevent that from happening? Removing those characters is not an option.

Edited
  • Like! 2
Link to comment
Share on other sites

How you work with that text? Can you bit read file / stream / line?

I would imagine that that hang would be non direct for reading into variable, but just that opening.

Edit:

Would it be possible with basic PCM command "readListFile" and converting it into string?

Edited
Link to comment
Share on other sites

I already tried a list file and used .asString as well as some custom PCM code to cut the string from the list. But  it's always the same: As soon as I access the string variable with a PCM function, it fails.

Edited
Link to comment
Share on other sites

Cont=readListFile("C:\test.txt")
for A=1 to Cont.size
    if A==1 then
        for B=1 to getTechnologySegment(Cont,A).size
            Buchst=mid(getTechnologySegment(Cont,A),B,1)
            if Buchst=="/" then
                Buchst="°"
            endif
            if B==1 then
                Zeile=Buchst
            else
                Zeile=Zeile+Buchst
            endif
        next B
        Text=Zeile+cr()
    else
        for B=1 to getTechnologySegment(Cont,A).size
            Buchst=mid(getTechnologySegment(Cont,A),B,1)
            if Buchst=="/" then
                Buchst="°"
            endif
            if B==1 then
                Zeile=Buchst
            else
                Zeile=Zeile+Buchst
            endif
        next B
        Text=Text+Zeile+cr()
    endif
next A
message(Text)

 

Der Code liest Buchstabe für Buchstabe und ersetzt "/" mit "°". Wie schnell das mit sehr großen Dateien funktioniert, weiß ich allerdings nicht. Leerzeilen werden leider nicht erkannt.

Edited
  • Like! 2
  • Thank you! 1
Link to comment
Share on other sites

Danke für den Code! Das Problem ist nur, ich darf den String nicht verändern. Die "/" müssen drin bleiben. Ich bräuchte also eine andere Methode, um den unveränderten String bzw. die String-Variable zu handhaben, sozusagen am PCM-Interpreter vorbei. Ich werde da vielleicht mal mit executeCode() experimentieren.

Edited
Link to comment
Share on other sites

Wofür benötigst du denn den String? Es gibt Wege, die "°" im Nachhinein wieder mit "/" zu ersetzen, etwa wenn der String oder Ausschnitte davon in eine Datei geschrieben werden sollen.

Den Code habe ich nochmal überarbeitet, jetzt erkennt er Ä, ä, Ö, ö, Ü, ü, und Leerzeilen korrekt:

vC=0
Cont=readListFile("C:\test.txt")
for A=1 to Cont.size
    if A==1 then
        if getTechnologySegment(Cont,A).size==0 then
            Zeile=cr()
        else
            for B=1 to getTechnologySegment(Cont,A).size
                Buchst=mid(getTechnologySegment(Cont,A),B,1)
                if B<getTechnologySegment(Cont,A).size then
                    if text(mid(getTechnologySegment(Cont,A),B,1)+mid(getTechnologySegment(Cont,A),B+1,1))=="Ä" then
                        Buchst="Ä"
                        B=B+1
                        vC=1
                    endif
                    if text(mid(getTechnologySegment(Cont,A),B,1)+mid(getTechnologySegment(Cont,A),B+1,1))=="ä" then
                        Buchst="ä"
                        B=B+1
                        vC=1
                    endif
                    if text(mid(getTechnologySegment(Cont,A),B,1)+mid(getTechnologySegment(Cont,A),B+1,1))=="Ö" then
                        Buchst="Ö"
                        B=B+1
                        vC=1
                    endif
                    if text(mid(getTechnologySegment(Cont,A),B,1)+mid(getTechnologySegment(Cont,A),B+1,1))=="ö" then
                        Buchst="ö"
                        B=B+1
                        vC=1
                    endif
                    if text(mid(getTechnologySegment(Cont,A),B,1)+mid(getTechnologySegment(Cont,A),B+1,1))=="Ü" then
                        Buchst="Ü"
                        B=B+1
                        vC=1
                    endif
                    if text(mid(getTechnologySegment(Cont,A),B,1)+mid(getTechnologySegment(Cont,A),B+1,1))=="ü" then
                        Buchst="ü"
                        B=B+1
                        vC=1
                    endif
                endif
                if Buchst=="/" then
                    Buchst="°"
                endif
                if B==1 then
                    Zeile=Buchst
                else
                    if (B==2) and (vC==1) then
                        Zeile=Buchst
                    else
                        Zeile=Zeile+Buchst
                    endif
                endif
            next B
        endif
        if getTechnologySegment(Cont,A).size==0 then
            Text=Zeile
        else
            Text=Zeile+cr()
        endif
    else
        if getTechnologySegment(Cont,A).size==0 then
            Zeile=cr()
        else
            for B=1 to getTechnologySegment(Cont,A).size
                Buchst=mid(getTechnologySegment(Cont,A),B,1))
                if B<getTechnologySegment(Cont,A).size then
                    if text(mid(getTechnologySegment(Cont,A),B,1)+mid(getTechnologySegment(Cont,A),B+1,1))=="Ä" then
                        Buchst="Ä"
                        B=B+1
                    endif
                    if text(mid(getTechnologySegment(Cont,A),B,1)+mid(getTechnologySegment(Cont,A),B+1,1))=="ä" then
                        Buchst="ä"
                        B=B+1
                    endif
                    if text(mid(getTechnologySegment(Cont,A),B,1)+mid(getTechnologySegment(Cont,A),B+1,1))=="Ö" then
                        Buchst="Ö"
                        B=B+1
                    endif
                    if text(mid(getTechnologySegment(Cont,A),B,1)+mid(getTechnologySegment(Cont,A),B+1,1))=="ö" then
                        Buchst="ö"
                        B=B+1
                    endif
                    if text(mid(getTechnologySegment(Cont,A),B,1)+mid(getTechnologySegment(Cont,A),B+1,1))=="Ü" then
                        Buchst="Ü"
                        B=B+1
                    endif
                    if text(mid(getTechnologySegment(Cont,A),B,1)+mid(getTechnologySegment(Cont,A),B+1,1))=="ü" then
                        Buchst="ü"
                        B=B+1
                    endif
                endif
                if Buchst=="/" then
                    Buchst="°"
                endif
                if B==1 then
                    Zeile=Buchst
                else
                    if (B==2) and (vC==1) then
                        Zeile=Buchst
                    else
                        Zeile=Zeile+Buchst
                    endif
                endif
            next B
        endif
        if A==(Cont.size) or (getTechnologySegment(Cont,A).size==0) then
            Text=Text+Zeile
        else
            Text=Text+Zeile+cr()
        endif
    endif
next A

display(Text)
 

  • Thank you! 1
Link to comment
Share on other sites

Er wird u.a. mit executeCode weiterverarbeitet und da habe ich keine Möglichkeit mehr, vorher was zu ersetzen (zumindest ist mir nichts bekannt). Ich sagte ja oben schon, ich werde mal versuchen, das vielleicht von Anfang an auf die Art zu realisieren und PCM quasi zu umgehen. Ich muss mich da nur etwas tiefer einarbeiten, wie man das formuliert.

  • Like! 1
Link to comment
Share on other sites

Ich denke mein Problem ist gelöst. Es geht leider nur, wenn man die PCM String-Verarbeitung hier komplett umgeht. Vielen Dank für euren Input!

@Zeiss: Es wäre es wünschenswert, wenn dieser Bug mal beseitigt werden könnte. 

  • Like! 1
Link to comment
Share on other sites

 Share

×
×
  • Create New...