This is part of the HicEst documentation

EDIT: Lots of String Operations in one Single Statement

For strings as long as Shakespeares Complete Works do case changes, searches, replaces, counts, parses, extracts, sorts, lexicons, inverted indexes.

EDIT executes in the given keyword sequence
EDIT allows multiple occurences of its keywords
The actual edit location Loc is initially set to Loc=1
RESULT = EDIT(options...)
- see Count or Lexicon or Invertedindex for the setting of RESULT
- RESULT is assigned the numeric or the string value if it is prefixed to EDIT(...)
- number of changes in string if string was changed
- the word at the last position of the location pointer Loc if no changes occured
When a condition cannot be matched, EDIT is terminated without error message. Check with RESULT or the use the option GetPosition to check the error position.
cLoc == string(Loc) is the character at location Loc
$CR, $CRLF and $TAB are global symbols ( $SystemVariables) in HicEst
- CHARACTER sample='The quick brown fox jumps over the lazy dog ' ! used in the examples

keyword	type	mini example	Text= or Matrix= must always be the first keyword of an EDIT call :
Text	txt	tx=sample	The string to be worked on by all following options
MatriX	txt	mx=filename	The name of an open MatrixExplorer file. Possible options are Delete=row_nr, APpendTo=rownr, D0

keyword	type	mini example	Keywords to change or remember the actual location Loc :
Right	txt	r=$CR	move Loc right to next $CR n = EDIT(T=sample, Option=2, Right='jump', Ins='X') ! n=0: the word "jump" was not found and 'X' therefore not inserted
	[1]	r=72	move Loc right 72 bytes
Left	txt	l=$TAB	move Loc left to next $TAB
	[1]	l	move Loc left 1 character
Begin	---	b	set Loc to 1st nonblank in string
End	---	e	set Loc to byte after last nonblank in string EDIT(T=sample, End, Left, Parse=LastNonBlank) ! LastNonBlank is now "g" (no M1, M2 set)
Mark1	---	M1	remember start position for parse / delete / replace / copy / lowercase / uppercase string both M1 and M2 are reset to 0 after a Parse, Delete, Insert, CopyTo, APpendTo, or RePLaceby
Mark2	---	M2	remember end position M2 for string operations
Word	num	word=n	set Loc and Mark1 to 1st byte of n-th word set Mark2 to last byte of n-th word right (n>0) or left (n=<0) of current Loc. Word boundaries can be adjusted by SePaRators SePaRators="xy" (2 bytes): Only expressions right of "x" and left of "y" will be found, e.g. txt = "<title>XML kind of work</title>" word1 = EDIT(Text=txt, SPR="><", Word=1) ! = "XML kind of work" word2 = EDIT(Text=txt, SPR="<>", Word=2) ! = "/title" word3 = EDIT(Text=txt, Word=3) ! = kind word4 = EDIT(Text=txt, W=123) ! = 0 = not found 2 identical SePaRators=\""\ (Note: "\" is used here as a string delimiter) txt = \"CSV = ","Comma Separated Variables"\ EDIT(T=txt, SPR=\""\, Word=1, Parse=w1, Word=2, Parse=w2) ! w1 is CSV, w2 is Comma Separated Variables
WordEnd	txt	we	implicitely set Mark1 and Mark2 to 1st and last byte of current word. If txt is set it will be a separator for the APpendTo command.
GetPosition	NUM	gp=j	return the actual byte location in j j=-1 on error (use for string_edit_error_control ) not possible after the If keyword
	vec	gp=pVec	pVec(DOindex) is set to byte location DIMENSION vec(10) EDIT(T=sample, Right=' ', GetPosition=vec, Right, do) ! vec is set to (4, 10,16, 20, 26, 31, 35, 40, 44, 0)
SetPosition	num	sp=j	set byte location to Loc>0 and Loc<= LEN(string) EDIT(T=sample, SetPosition=13, Insert='XXX') ! The quick brXXXown fox jumps over the lazy dog
	vec	sp=pVec	move Loc to pVec(DOindex)
ScanFor	txt	sf=".:!"	scan right for next "." or ":" or "!" n = EDIT(T=sample, ScanFor='jump', Ins='', Right=2, DO=10) ! "The quick brown fox jumps over the lazy" (truncated)
ScaNnot	txt	sn='123'	scan for next byte that is nor 1 or 2 or 3
ScanForLeft	txt	sfl='xy'	scan backward for next x or y
ScaNnotLeft	txt	snl=' '	scan backward for next non-blank
ITeM	num	itm=5	set Loc to item start (count SePaRators hits) EDIT(T=sample, SePaRators='abc', ITeM=2, Insert='<', ITeM=3, Insert='>') ! The quic<k brown fox jumps over the la>zy dog
RaNge	txt	RN=$CRLF	repeat all in ranges 1...$CRLF...$CRLF.., each new range also restarts DO EDIT(T=sample, RaNge=' ', Del=1) ! he uick rown ox umps ver he azy og
	num	RN=4	character ranges 1..4 5..8 9..12 .. EDIT(T=sample, RaNge=4, RangEend, Insert='.') ! The .quic.k br.own .fox .jump.s ov.er t.he l.azy .dog
RangeBegin	---	rb	set Loc to begin of range
	txt	rb='SUBR'	MOVE SCRIPT to next 'SUBR', Count=lineNR and/or Option=... may follow)
RangEend	---	re	Loc to end range (string end if no range)
			Keywords to change the original string:
Delete	---	d	delete_substring (m1:m2) if m1,m2 are set EDIT(T=sample, Ri='x', M1, Ri='y', M2, Del) ! sample is now 'The quick brown fo dog '
	txt	d=' '	move Loc to next ' ' and delete this blank
	num	d=3	delete 3 characters starting at current location Loc
Insert	txt	i=$CRLF	insert_string $CRLF at Loc. Note: Characters exceeding the length of string are dropped
RePLaceby	txt	rpl='x'	replace_string : after either of Right, Left, ScanFor, ScanForLeft, ScanNot, ScanNotLeft, or M1 and M2 EDIT(T=sample, ScanFor='aeiou', RePLaceby='', DO=5) ! sample="Th q*ck brwn f*x jumps over the lazy dog "
UpperCase	---	UC	uppercase_string : capitalize characters in the range M1 and M2: EDIT(Text=sample, ITeM=3, M1, WordEnd, M2, UpperCase)) ! the quick BROWN fox jumps over the lazy dog If nothing is marked, only the character at the current location is uppercased EDIT(Text=sample, UpperCase) ! The quick brown fox jumps over the lazy dog
	[1]	uc=3	capitalize 3 characters EDIT(T=sample, RaNge=' ', UpperCase=2) ! THe QUick BRown FOx JUmps OVer THe LAzy DOg
LowerCase			lowercase_string corresponding to uppercase string
TabiFyfont	txt	TF=3	tabify_string : replaces all doubleblanks in string by 1 or more tabs to allow for aligned table columns. FontNr=3 is the current dialog font (see Options menu > Fonts). Use RaNge=line_sep ( normally $CRLF) EDIT(T=multi_line_string, RaNge=$CRLF, TabiFyfont=3)
			Keywords to extract information :
Parse	txt	p=subtxt	parse_string to individual substrings: M1>0: subtxt + string(m1:Loc) share memory EDIT(T=sample, Right='e', Mark1, Right='i', Parse=xyz) ! xyz == ''e qui' shares memory with sample M1==0: string(>= Loc), free of SePaRators EDIT(T=sample, R='r', Parse=word1, P=word2, p=w3) ! word1=rown, word2=fox, w3=jumps moves Loc to next SePaRator string = %"a, b" some text "c--d"% word = EDIT(T=string, SePaRator=%"%, Word=2) sets word to "c--d"
CoPyTo	TXT	cpt=word	copy_string set word to string(m1:m2) if m1 and m2 are set
APpendTo	TXT	apt=str	append_separator_and_string to the variable str. Default separator is a blank, it can be changed with WordEnd=new_separator
Count	---	c	count_occurences : If the word-option is set (Option=2) the total number of words in string is returned: words = EDIT(T=sample, Opt=2, Count) ! sets words to 9 Without the word-option: Set count start position at current location (default = 1)
	num	c=byte1	set count start position to byte1
	txt	c=txt	n=txt's in string(start:Loc), NOTE: set Loc to the end position, e.g. End! n = EDIT(T=sample, End, Count=' ') ! n = 9 blanks
			Keywords to control command execution
Option	num	o=1	search_options : case=1, word=2, scan=8, trailing blanks=16, best match=32, verify=64, RegEx=128, Count=256, Alphabetical order=512. For examples click here.
SePaRators	txt	spr='.!'	default is ' .,(+-*/^;="\<>!:)', bar, $CR, $LF, RangeBegin, RangeEnd
	num	spr=d	separate string in d bytes for sorting etc
DO	num	do [=n]	loop over all EDIT arguments maximum n times DO without argument stops to loop when an error occurs the loop is terminated when an error occurs Tip: Error provocation can be useful, e.g. write "Right=2,Left" instead a simple "Right" will stop the loop at the last non blank character Tip: To display the progress of long running loops, activate the ⇾ status bar CPU readout RaNge can sometimes be preferable to DO. RaNge may also be combined with DO. EDIT(T=sample, Right=' ', Ins='X', DO=10) ! Loc unchanged: TheXXXXXXXXXX quick brown fox jumps over the
IF	---	IF, R=8	next commands up to ELSE or ENDIF only if R=8 OK EDIT(T=sample, IF, Right='XY', End, Insert=' XY OK', ELSE, End, Insert=' NO XY') ! The quick brown fox jumps over the lazy dog NO XY
	txt	IF='Q'	continue (max to ELSE ) only if string(loc) == Q
ELSE	[T]	ELSE	next commands only if error after IF
ContInueiF	txt	cif='az'	next keyword only IF a <= string(loc) <= z
ENDIF	---	endif	closes IF..ELSE... clause if needed
EXit	[T]	ex	exit range scan
ERror	LBL	er=99	(on error jump to label 99)
			lexical commands : see LeXicon for details
LeXicon	txt	LX=lex	Set/Update lexicon: EDIT(Text=original, [ Option=opts,] [ SePaRators=spr,] LeXicon=lex) Query lexicon: result = EDIT(Text=search_string, [Option=opts,] [$Marks=leftrite,] LeXicon=lex) result can be text: to receive search_string words in lex numeric: bit-sum of search_string-word-numbers in lex vector: byte positions of search_string words in lex
$Marks	txt	$M="AZ"	marks result-string finds left="A", rite="Z"
			inverted index commands : see Invertedindex for details
IDentification	txt	ID=name	include a name (e.g. file name) in InvIdx (default is "noID")
ID1	---	id1	marks start position of an ID-string in the "ORIGINAL"
ID2	---	id2	marks its end position
OFfSet	num	ofs=p1	added to InvIdx positions. Useful if ORIGINAL is just a separate clipping of the complete document generated by another statement.
SorTSequence	num	sts=32	sorted_query_result (32 means: first column3, then column2)
EXTRA	txt	extra="."//$LF	includes non-alfanum 1-character words to InvIdx, eg "." or Linefeed for use in later query results (eg to return with embracing linefeeds)
InvertedIndex	txt	ii=invidx	receives the inverted index. If the name ID is already indexed, inv_idx updates the existing index. Set/Update InvertedIndex: EDIT(Text=txt, [ID=...,] [EXTRA=...,] [ID1=...,] [ID2=...,] [Mark1,] [Mark2,] [OFfSet=...,] [Option=...,] InvertedIndex=invidx [, DO]) see InvertedIndexQuery: result = EDIT(Text=txt, [ID=...,] [EXTRA=...,] [Option=...,] [SorTSequence=...,] InvertedIndex=invidx)
			sort string commands
SorTtoIndex	num	sti=vec	indices of sorted words goes to vec EDIT(T=sample, Option=1, SorTtoIndex=vec) ! vec -> (3,9,4,5,8,6,2,7,1,0) with Option=1 (keep case)
SorTedtext	txt	st=lex	lex gets sorted words (words are defined by current setting of SePaRators) or n-byte chunks with SePaRators=$LF: sort lines (the EDIT-string can be a complete file). with SePaRators=n: string is sorted in n-byte chunks EDIT(Text=sample, SePaRators=1, SorTedtext=sortedSample) ! " Tabcdeeefghhijklmnoooopqrrstuuvwxyz" Without Option=1+...(keep case) the EDIT-string will be lowercased EDIT(T=sample, SorTedtext=sample) ! brown dog fox jumps lazy over quick the the With Option=...+ 4+...(backward) Sort will be descending EDIT(Text=sample, Option=4+1, SorTedtext=sample) ! the quick over lazy jumps fox dog brown The
SortDelDbls	txt	sdd=lex	lex is sorted, remove_double_entries EDIT(T=sample, SortDelDbls=sample) ! brown dog fox jumps lazy over quick the
SortFromIndex	num	sfi=vec	sort words in string along vec (no lower case conversion) vec = (2, 1, 4) EDIT(T=sample, SortFromIndex=vec, SorTedtext=sample) ! sample is now "quick The fox"
SorTColumn1	num	stc1=2	sort starts in word positions 2 (disregard 1st character) EDIT(T=sample, SorTColumn1=2, SorTedtext=sample) ! lazy the the dog fox brown quick jumps over