This is part of the HicEst documentation

EDIT: Lots of String Operations in one Single Statement


For strings as long as Shakespeares Complete Works do case changes, searches, replaces, counts, parses, extracts, sorts, lexicons, inverted indexes.

⇒Home ⇒Contents ⇒more Strings

Bookmarks:


  ⇒append_separator_and_string   ⇒copy_string   ⇒count_occurences   ⇒delete_substring   ⇒insert_string   ⇒lowercase_string   ⇒parse_string   ⇒remove_double_entries   ⇒replace_string   ⇒search_options   ⇒sorted_query_result   ⇒string_edit_error_control   ⇒tabify_string   ⇒uppercase_string

Optional keywords:


(Syntax of optional keywords)
APpendTo Begin CoPyTo ContInueiF Count DO Delete ELSE ENDIF ERror EXTRA EXit End GetPosition ID1 ID2 IDentification IF ITeM Insert InvertedIndex LeXicon Left LowerCase Mark1 Mark2 Marks MatriX OFfSet Option Parse RaNge RangEend RangeBegin RePLaceby Right ScaNnot ScaNnotLeft ScanFor ScanForLeft SePaRators SetPosition SorTColumn1 SorTSequence SorTedtext SorTtoIndex SortDelDbls SortFromIndex TabiFyfont Text UpperCase Word WordEnd
keyword type mini example

Text= or Matrix= must always be the first keyword of an EDIT call

:
Text txt tx=sample The string to be worked on by all following options
MatriX txt mx=filename The name of an open MatrixExplorer file. Possible options are Delete=row_nr, APpendTo=rownr, D0
keyword type mini example

Keywords to change or remember the actual location Loc

:
Right txt r=$CR move Loc right to next $CR
  • n = EDIT(T=sample, Option=2, Right='jump', Ins='X')
  • ! n=0: the word "jump" was not found and 'X' therefore not inserted
[1] r=72 move Loc right 72 bytes
Left txt l=$TAB move Loc left to next $TAB
[1] l move Loc left 1 character
Begin --- b set Loc to 1st nonblank in string
End --- e set Loc to byte after last nonblank in string
  • EDIT(T=sample, End, Left, Parse=LastNonBlank)
  • ! LastNonBlank is now "g" (no M1, M2 set)
Mark1 --- M1 remember start position for parse / delete / replace / copy / lowercase / uppercase string
  • both M1 and M2 are reset to 0 after a Parse, Delete, Insert, CopyTo, APpendTo, or RePLaceby
Mark2 --- M2 remember end position M2 for string operations
Word num word=n
  • set Loc and Mark1 to 1st byte of n-th word
  • set Mark2 to last byte of n-th word
  • right (n>0) or left (n=<0) of current Loc.
  • Word boundaries can be adjusted by SePaRators
  • SePaRators="xy" (2 bytes): Only expressions right of "x" and left of "y" will be found, e.g.
    • txt = "<title>XML kind of work</title>"
    • word1 = EDIT(Text=txt, SPR="><", Word=1) ! = "XML kind of work"
    • word2 = EDIT(Text=txt, SPR="<>", Word=2) ! = "/title"
    • word3 = EDIT(Text=txt, Word=3) ! = kind
    • word4 = EDIT(Text=txt, W=123) ! = 0 = not found
  • 2 identical SePaRators=\""\
    (Note: "\" is used here as a string delimiter)
    • txt = \"CSV = ","Comma Separated Variables"\
    • EDIT(T=txt, SPR=\""\, Word=1, Parse=w1, Word=2, Parse=w2) ! w1 is CSV, w2 is Comma Separated Variables
WordEnd txt we implicitely set Mark1 and Mark2 to 1st and last byte of current word. If txt is set it will be a separator for the APpendTo command.
GetPosition NUM gp=j
vec gp=pVec pVec(DOindex) is set to byte location
  • DIMENSION vec(10)
  • EDIT(T=sample, Right=' ', GetPosition=vec, Right, do)
  • ! vec is set to (4, 10,16, 20, 26, 31, 35, 40, 44, 0)
SetPosition num sp=j set byte location to Loc>0 and Loc<= LEN(string)
  • EDIT(T=sample, SetPosition=13, Insert='XXX')
  • ! The quick brXXXown fox jumps over the lazy dog
vec sp=pVec move Loc to pVec(DOindex)
ScanFor txt sf=".:!" scan right for next "." or ":" or "!"
  • n = EDIT(T=sample, ScanFor='jump', Ins='*', Right=2, DO=10)
  • ! "The q*uick brown fox *j*u*m*ps over the lazy" (truncated)
ScaNnot txt sn='123' scan for next byte that is nor 1 or 2 or 3
ScanForLeft txt sfl='xy' scan backward for next x or y
ScaNnotLeft txt snl=' ' scan backward for next non-blank
ITeM num itm=5 set Loc to item start (count SePaRators hits)
  • EDIT(T=sample, SePaRators='abc', ITeM=2, Insert='<', ITeM=3, Insert='>')
  • ! The quic<k brown fox jumps over the la>zy dog
RaNge txt RN=$CRLF repeat all in ranges 1...$CRLF...$CRLF.., each new range also restarts DO
  • EDIT(T=sample, RaNge=' ', Del=1)
  • ! he uick rown ox umps ver he azy og
num RN=4 character ranges 1..4 5..8 9..12 ..
  • EDIT(T=sample, RaNge=4, RangEend, Insert='.')
  • ! The .quic.k br.own .fox .jump.s ov.er t.he l.azy .dog
RangeBegin --- rb set Loc to begin of range
txt rb='SUBR' MOVE SCRIPT to next 'SUBR', Count=lineNR and/or Option=... may follow)
RangEend --- re Loc to end range (string end if no range)

Keywords to change the original string:

Delete --- d (m1:m2) if m1,m2 are set
  • EDIT(T=sample, Ri='x', M1, Ri='y', M2, Del)
  • ! sample is now 'The quick brown fo dog '
txt d=' ' move Loc to next ' ' and delete this blank
num d=3 delete 3 characters starting at current location Loc
Insert txt i=$CRLF $CRLF at Loc.
Note: Characters exceeding the length of string are dropped
RePLaceby txt rpl='x' : after either of Right, Left, ScanFor, ScanForLeft, ScanNot, ScanNotLeft, or M1 and M2
  • EDIT(T=sample, ScanFor='aeiou', RePLaceby='*', DO=5)
  • ! sample="Th* q**ck br*wn f*x jumps over the lazy dog "
UpperCase --- UC : capitalize characters in the range M1 and M2:
  • EDIT(Text=sample, ITeM=3, M1, WordEnd, M2, UpperCase))
  • ! the quick BROWN fox jumps over the lazy dog
If nothing is marked, only the character at the current location is uppercased
  • EDIT(Text=sample, UpperCase)
  • ! The quick brown fox jumps over the lazy dog
[1] uc=3 capitalize 3 characters
  • EDIT(T=sample, RaNge=' ', UpperCase=2)
  • ! THe QUick BRown FOx JUmps OVer THe LAzy DOg
LowerCase corresponding to uppercase string
TabiFyfont txt TF=3 :
  • replaces all doubleblanks in string by 1 or more tabs to allow for aligned table columns.
  • FontNr=3 is the current dialog font (see Options menu > Fonts).
  • Use RaNge=line_sep ( normally $CRLF)
  • EDIT(T=multi_line_string, RaNge=$CRLF, TabiFyfont=3)

Keywords to extract information

:
Parse txt p=subtxt to individual substrings:
M1>0: subtxt + string(m1:Loc) share memory
  • EDIT(T=sample, Right='e', Mark1, Right='i', Parse=xyz)
  • ! xyz == ''e qui' shares memory with sample
M1==0: string(>= Loc), free of SePaRators
  • EDIT(T=sample, R='r', Parse=word1, P=word2, p=w3)
  • ! word1=rown, word2=fox, w3=jumps
moves Loc to next SePaRator
  • string = %"a, b" some text "c--d"%
  • word = EDIT(T=string, SePaRator=%"%, Word=2)
sets word to "c--d"
CoPyTo TXT cpt=word set word to string(m1:m2) if m1 and m2 are set
APpendTo TXT apt=str to the variable str. Default separator is a blank, it can be changed with WordEnd=new_separator
Count --- c : If the word-option is set (Option=2) the total number of words in string is returned:
  • words = EDIT(T=sample, Opt=2, Count) ! sets words to 9
Without the word-option: Set count start position at current location (default = 1)
num c=byte1 set count start position to byte1
txt c=txt n=txt's in string(start:Loc), NOTE: set Loc to the end position, e.g. End!
  • n = EDIT(T=sample, End, Count=' ') ! n = 9 blanks

Keywords to control command execution

Option num o=1 : case=1, word=2, scan=8, trailing blanks=16, best match=32, verify=64, RegEx=128, Count=256, Alphabetical order=512. For examples click here.
SePaRators txt spr='.!' default is ' .,(+-*/^;="\<>!:)', bar, $CR, $LF, RangeBegin, RangeEnd
num spr=d separate string in d bytes for sorting etc
DO num do [=n]
  • loop over all EDIT arguments maximum n times
  • DO without argument stops to loop when an error occurs
  • the loop is terminated when an error occurs
  • Tip: Error provocation can be useful, e.g. write "Right=2,Left" instead a simple "Right" will stop the loop at the last non blank character
  • Tip: To display the progress of long running loops, activate the ⇒ status bar CPU readout
  • RaNge can sometimes be preferable to DO. RaNge may also be combined with DO.
  • EDIT(T=sample, Right=' ', Ins='X', DO=10)
  • ! Loc unchanged: TheXXXXXXXXXX quick brown fox jumps over the
IF --- IF, R=8 next commands up to ELSE or ENDIF only if R=8 OK
  • EDIT(T=sample, IF, Right='XY', End, Insert=' XY OK', ELSE, End, Insert=' NO XY')
  • ! The quick brown fox jumps over the lazy dog NO XY
txt IF='Q' continue (max to ELSE ) only if string(loc) == Q
ELSE [T] ELSE next commands only if error after IF
ContInueiF txt cif='az' next keyword only IF a <= string(loc) <= z
ENDIF --- endif closes IF..ELSE... clause if needed
EXit [T] ex exit range scan
ERror LBL er=99 (on error jump to label 99)

lexical commands

: see LeXicon for details
LeXicon txt LX=lex Set/Update lexicon: Query lexicon:
  • result = EDIT(Text=search_string, [Option=opts,] [$Marks=leftrite,] LeXicon=lex)
result can be
  • text: to receive search_string words in lex
  • numeric: bit-sum of search_string-word-numbers in lex
  • vector: byte positions of search_string words in lex
$Marks txt $M="AZ" marks result-string finds left="A", rite="Z"

inverted index commands

: see InvertedIndex for details
IDentification txt ID=name include a name (e.g. file name) in InvIdx (default is "noID")
ID1 --- id1 marks start position of an ID-string in the "ORIGINAL"
ID2 --- id2 marks its end position
OFfSet num ofs=p1 added to InvIdx positions. Useful if ORIGINAL is just a separate clipping of the complete document generated by another statement.
SorTSequence num sts=32 (32 means: first column3, then column2)
EXTRA txt extra="."//$LF includes non-alfanum 1-character words to InvIdx, eg "." or Linefeed for use in later query results (eg to return with embracing linefeeds)
InvertedIndex txt ii=invidx receives the inverted index. If the name ID is already indexed, inv_idx updates the existing index.
Set/Update InvertedIndex:
  • EDIT(Text=txt, [ID=...,] [EXTRA=...,] [ID1=...,] [ID2=...,] [Mark1,] [Mark2,] [OFfSet=...,] [Option=...,] InvertedIndex=invidx [, DO])
see InvertedIndexQuery:
  • result = EDIT(Text=txt, [ID=...,] [EXTRA=...,] [Option=...,] [SorTSequence=...,] InvertedIndex=invidx)

sort string commands

SorTtoIndex num sti=vec indices of sorted words goes to vec
  • EDIT(T=sample, Option=1, SorTtoIndex=vec)
  • ! vec -> (3,9,4,5,8,6,2,7,1,0) with Option=1 (keep case)
SorTedtext txt st=lex
  • lex gets sorted

    words

    (words are defined by current setting of SePaRators) or

    n-byte chunks

  • with SePaRators=$LF:

    sort lines

    (the EDIT-string can be a complete file).
  • with SePaRators=n: string is sorted in n-byte chunks
    • EDIT(Text=sample, SePaRators=1, SorTedtext=sortedSample)
    • ! " Tabcdeeefghhijklmnoooopqrrstuuvwxyz"
  • Without Option=1+...(keep case) the EDIT-string will be lowercased
    • EDIT(T=sample, SorTedtext=sample)
    • ! brown dog fox jumps lazy over quick the the
  • With Option=...+ 4+...(backward) Sort will be descending
    • EDIT(Text=sample, Option=4+1, SorTedtext=sample)
    • ! the quick over lazy jumps fox dog brown The
SortDelDbls txt sdd=lex lex is sorted,
  • EDIT(T=sample, SortDelDbls=sample)
  • ! brown dog fox jumps lazy over quick the
SortFromIndex num sfi=vec sort words in string along vec (no lower case conversion)
  • vec = (2, 1, 4)
  • EDIT(T=sample, SortFromIndex=vec, SorTedtext=sample)
  • ! sample is now "quick The fox"
SorTColumn1 num stc1=2 sort starts in word positions 2 (disregard 1st character)
  • EDIT(T=sample, SorTColumn1=2, SorTedtext=sample)
  • ! lazy the the dog fox brown quick jumps over



Support HicEst   ⇒ Impressum
©2000-2016 Georg Petrich, HicEst Instant Prototype Computing. All rights reserved.