Data Structures: Strings (C Language Version)

Data Structures: Strings (C Language Version)

1. Concept of Strings

A string, referred to as a string, is a special type of linear list where its data elements consist solely of characters.

2. Definition of Strings

A string (String) is a finite sequence composed of zero or more characters, also known as a string.

Here, s is the name of the string, and the character sequence enclosed in double quotes is the string value, but the quotes themselves are not part of the string content. ai (1<=i<=n) is an arbitrary character, known as an element of the string, which is the basic unit that makes up the string; i is its index in the entire string; n is the length of the string, indicating the number of characters contained in the string.

3. Terminology Description

(1) Length – The number of characters in the string is referred to as the length of the string.

(2) Empty String – A string with a length of zero is called an empty string.

(3) Whitespace String – A string composed of one or more consecutive spaces is called a whitespace string.

(4) String Equality – Two strings are equal if their lengths are equal and their corresponding characters are also equal.

(5) Substring – Any continuous sequence of characters in a string is called a substring of that string.

(6) Main String – The string that contains the substring is called the main string of that substring.

(7) Pattern Matching – The operation of locating a substring is also known as pattern matching, which is an operation to find the index of the substring’s first occurrence in the main string. The matched main string is called the target string, and the substring is called the pattern.

Example 1: The length of the string and the position of the substring.

String

S1=”SHANG”

S2=”HAI”

S3=”SHANGHAI”

S4=”SHANG HAI”

S1 is a substring of S3 and S4, with a position of 1 in both S3 and S4.

S2 is also a substring of S3 and S4, with a position of 6 in S3 and 7 in S4.

4. Representation and Implementation of Strings

Since a string is a linear list with a character-type data element, the storage methods used for linear lists are also suitable for strings. However, due to the single-byte nature of the data elements in a string, its storage method has its peculiarities.

4-1. Fixed-Length Sequential Storage

Similar to a linear list, a group of contiguous storage units can be used to store the character sequence of the string, utilizing the order of the storage unit addresses to represent the adjacency of characters in the string.

4-1-1 Description of Fixed-Length Storage in C Language

In C language, the sequential storage of strings can be represented using a character array and an integer variable, where the character count is sufficient to store the string value, and the integer variable represents the length of the string.

#define MAXLEN 10

typedef struct

{

char vec[MAXLEN];

int len;

} Str; // You can use Str to define this type of structure variable

4-1-2 Storage Method

When the computer addresses by byte (Byte), one storage unit can store exactly one character, and adjacent characters in the string are stored in adjacent storage units.

When the computer addresses by word (for example, 1 word = 32 bits), one storage unit can consist of 4 bytes. In this case, the sequential storage structure has both non-compact and compact formats.

(1) Non-Compact Format

Assuming S=”String Structure”, if the computer word length is 32 bits (4 bytes), using the non-compact format, one address can only store one character, as shown in Figure 5-1. The advantage is that the computation is simple, but the disadvantage is that storage space is wasted.

(2) Compact Format

Similarly, storing S=”String Structure” in a compact format allows one address to store four characters, as shown in Figure 5-2. The advantage of compact storage is high space utilization, while the disadvantage is low efficiency for processing characters in the string.

4-2. Linked Storage

For inputting strings of uncertain length, if fixed-length string storage is used, the following problem may arise: if the storage space is fixed and the actual input string length is small, it causes waste of memory space; conversely, if the storage space is fixed and the actual input string length is large, the storage space is insufficient. In this case, linked storage can be used.

4-2-1 Description of Linked Storage

Strings are stored using linked lists, where each node has two fields: one is the data field (data) and one is the pointer field (next).

Data Field (data) – Stores the characters in the string.

Pointer Field (next) – Stores the address of the next node.

Using S=”String Structure” as an example, the linked storage structure is shown in the figure.

(1) Advantages of Linked Storage – Insertion and deletion operations;

(2) Disadvantages of Linked Storage – Lower storage and retrieval efficiency.

Due to the special nature of strings, using linked lists as a storage method for strings is not very practical, and thus used less frequently.

4-3. Heap-Allocated Storage Structure for Strings

In practical applications, many strings need to be defined, and the lengths of these strings cannot be determined before definition. In this case, heap-allocated storage (also known as indexed storage) can be used, which is a dynamic storage structure.

4-3-1 Method of Heap Allocation

(1) Allocate a block of contiguous storage space for storing the values of each string; this storage space is called the “heap” (also known as the free storage area).

(2) Additionally, create an index table to store the string names (name), lengths (length), and the starting addresses of these strings in the “heap”.

(3) During program execution, whenever a string is generated, the system allocates a block of contiguous space in the “heap” that is the same size as the length of the string to store its value, and an index entry is added to the index table to register the string’s name, length, and starting address.

4-3-2 Example of Indexed Storage

Assuming strings: A=”Red” B=”Yellow” C=”Blue” D=”White”

Use a pointer free to point to the starting address of the unused space in the heap.

The index table is shown in Figure 5-5

Considering the insertion and deletion operations on strings, which may cause changes in string length, when allocating space for string values in the “heap”, appropriate space should be reserved. At this point, the index entry in the index table should add a field to store the actual number of storage units owned by the string in the “heap”. When the string length equals the actual number of storage units for that string, insertion operations on the string cannot be performed.

4-3-3 C Language Description of Index Table with Length

As shown in Figure 5-5, the node type of the index entry is:

#define MAXLEN 10

typedef struct

{

char name[MAXLEN]; // String Name

int length; // String Length

char *Stradr; // Starting Address

} LNode

5. Basic Operations on Strings

(1) Calculate String Length LenStr(s)

Operation Condition: String s exists.

Operation Result: Calculate the length of string s.

(2) String Concatenation ConcatStr(s1,s2)

Operation Condition: Strings s1, s2 exist.

Operation Result: The new string s1 is the new string formed by concatenating strings s1 and s2, while the original string s2 remains unchanged, and the value of string s1 changes.

Example: Let s1=”Micsosoft”, s2=”Office”.

The operation result is s1=”MicsosoftOffice”; s2=”Office”.

(3) Calculate Substring SubStr(s,i,len)

Operation Condition: String s exists.

Operation Result: Returns the substring of length len starting from the i-th character of string s. If len=0, an empty string is obtained.

Example: SubStr(“abcdefghi”,3,4)=”cdef”.

(4) String Comparison EqualStr(s1,s2)

Operation Condition: Strings s1, s2 exist.

Operation Result: If s1 equals s2, the return value is 0; if s1 is less than s2, the return value is less than 0; if s1 is greater than s2, the return value is greater than 0.

(5) Substring Search IndexStr(s,t)

Find the first occurrence position of substring t in the main string s (also known as pattern matching).

Operation Condition: Strings s, t exist.

Operation Result: If t is a substring of s, return the position of its first occurrence in s; otherwise, return 0.

Example: Substring positioning

IndexStr(“abcdebda”,”bc”)=2;

IndexStr(“abcdebda”,”ba”)=0;

(6) String Insertion InsStr(s,t,i)

Operation Condition: Strings s, t exist.

Operation Result: Insert string t before the i-th character in string s, changing the value of s.

(7) String Deletion DelStr(s,i,len)

Operation Condition: String s exists.

Operation Result: Delete the substring of length len starting from the i-th character of string s, changing the value of s.

6. Implementation of Operations Using Fixed-Length Sequential String Structure Code

6-1. String Structure Definition

#define MAXLEN 10 // Define the maximum length of strings

typedef struct

{

char vec[MAXLEN];

int len; // Actual length of the string

} Str; // You can use Str to define this type of structure variable

Store a special character that will not appear in the string at the end of the string as a terminator to indicate the end of the string. For example, the method of handling fixed-length strings in C language is to use ‘\0’ to indicate the end of the string, as shown in Figure 5-6.

(1) Calculate the Length of the String

To determine whether the string has ended by checking if the current character is ‘\0’, if not ‘\0’, it indicates that the string length i increases by 1; if it is ‘\0’, it indicates that the string has ended, breaking the loop, and i is the length of the string.

int LenStr(Str *r){

int i=0;

while(r->vec[i]!=’\0′){

i++;

}

return i;

}

Testing code is as follows:

void ShowStr(Str *r){

printf(“\n\t\tThe string value is: “);

if(r->vec[0]==’\0′){

printf(“Empty string! \n”);

}else{

puts(r->vec); // Use puts function to output the string, formatted as puts(string array name)

}

}

int main()

{

Str a;

Str *r=&a

r->vec[0]=’\0′;

char choice;

int ch=1;

while(ch!=0){

printf(“\n”);

printf(“\n\t\t String Subsystem *”);

scanf(“%c”,&choice);

getchar();

if(choice==’1′){

printf(“\n\t\tPlease enter a string: “);

gets(r->vec); // Use get function to input string, formatted as gets(string array name)

r->len=LenStr(r);

}else if(choice==’8′){

ShowStr(r);

int n=LenStr(r);

printf(“String length is:%d”,n);

}

}

return 0;

}

Testing results:

(2) String Concatenation

Concatenate two strings r1 and r2 into a new string r1, i.e., r1=r1+r2.

void ConcatStr(Str *r1,Str *r2){

int i;

printf(“\n\t\t r1=%s r2=%s\n”,r1->vec,r2->vec);

if(r1->len+r2->len>MAXLEN){

printf(“The two strings are too long, overflow!”); // The length of the concatenated string exceeds the maximum length

}else{

for(i=0;ilen;i++){

r1->vec[r1->len+i]=r2->vec[i]; // Perform concatenation

}

r1->vec[r1->len+i]=’\0′;

r1->len=r1->len+r2->len; // Modify the length of the new concatenated string

}

printf(“\n\t\t r1=%s r2=%s\n”,r1->vec,r2->vec);

}

Testing code is as follows:

int main()

{

Str a,b;

Str *r=&a,*r1;

r->vec[0]=’\0′;

char choice;

int ch=1;

while(ch!=0){

printf(“\n”);

printf(“\n\t\t String Subsystem *”);

scanf(“%c”,&choice);

getchar();

if(choice==’1′){

printf(“\n\t\tPlease enter a string: “);

gets(r->vec); // Use get function to input string, formatted as gets(string array name)

r->len=LenStr(r);

}else if(choice==’2′){

printf(“\n\t\tPlease enter the string to concatenate: “);

r1=CreateStr(&b);

printf(“\n\t\tr1 is: “);

puts(r1->vec);

ConcatStr(r,r1);

printf(“\n\t\tThe new concatenated string value is: “);

puts(r->vec);

int n=LenStr(r);

printf(“New string length is:%d”,n);

}

else if(choice==’8′){

ShowStr(r);

int n=LenStr(r);

printf(“String length is:%d”,n);

}

}

return 0;

}

Testing results:

(3) Calculate Substring

In the given string r, extract j consecutive characters starting from the specified position i to form a substring r1.

void SubStr(Str *r,Str *r1,int i ,int j){

if(i+j-1>MAXLEN){

printf(“Substring out of bounds!”);

}else{

for(int k=0;k

r1->vec[k]=r->vec[i+k-1]; // Extract substring from r

}

r1->len=j;

r1->vec[r1->len]=’\0′;

}

printf(“\n\t\t Extracted characters are r1=%s “,r1->vec);

}

Testing code is as follows:

else if(choice==’3′){

printf(“\n\t\tPlease enter the starting character position: “);

scanf(“%d”,&i);

getchar();

printf(“\n\t\tPlease enter the number of consecutive characters to extract: “);

scanf(“%d”,&j);

getchar();

SubStr(r,&a,i ,j);

}

Testing results:

(4) String Comparison

Two strings are equal only if they have the same length and the characters at each corresponding position are the same.

int EqualStr(Str *r1,Str *r2){

printf(“\n\t\t r1=%s r2=%s\n”,r1->vec,r2->vec);

int i=0;

while(r1->vec[i]==r2->vec[i]&&r1->vec[i]!=’\0’&&r2->vec[i]!=’\0″)

i++;

if(r1->vec[i]==r2->vec[i])

return 0;

else if(r1->vec[i]>r2->vec[i])

return 1;

else

return -1;

}

Testing code is as follows:

else if(choice==’7′){

printf(“\n\t\tPlease enter the first string: “);

gets(c.vec);

printf(“\n\t\tPlease enter the second string: “);

gets(d.vec);

int k=EqualStr(&c,&d);

if(k>0){

printf(“\n\t\tThe first string is larger! \n”);

}else if(k<0){

printf(“\n\t\tThe second string is larger! \n”);

}else {

printf(“\n\t\tThe two strings are equal! \n”);

}

}

Testing results:

(5) Insert Substring

Insert substring r1 at the specified position i in string r.

Str *InsStr(Str *r,Str *r1,int i){

printf(“\n\t\t r=%s r1=%s\n”,r->vec,r1->vec);

if(i>r->len||r->len+r1->len>MAXLEN){

printf(“Cannot insert!”);

}else{

for(int k=r->len-1;k>=i;k–){

r->vec[r1->len+k]=r->vec[k]; // Move back to create space

}

for(int k=0;klen;k++){

r->vec[i+k]=r1->vec[k]; // Insert substring

}

r->len=r->len+r1->len;

r->vec[r->len]=’\0′;

}

printf(“\n\t\t The new string after insertion r=%s \n”,r->vec);

return r;

}

Testing code is as follows:

else if(choice==’5′){

printf(“\n\t\tPlease enter the position before which to insert: “);

scanf(“%d”,&i);

getchar();

printf(“\n\t\tPlease enter the string to insert: “);

r1=CreateStr(&b);

Str *newStr=InsStr(r,r1,i-1);

printf(“\n\t\tNew string value is: “);

puts(newStr->vec);

}

Testing results:

(6) Delete Substring

Delete j consecutive characters starting from the specified position i in the given string r.

void DelStr(Str *r,int i ,int j){

if(i+j-1>r->len){

printf(“The string to be deleted is out of bounds!”);

}else{

for(int k=i+j;klen;k++,i++){

r->vec[i]=r->vec[k]; // Move the subsequent string forward to cover

}

r->len=r->len-j;

r->vec[r->len]=’\0′;

}

printf(“\n\t\t The new string after deletion r=%s \n”,r->vec);

}

Testing code is as follows:

else if(choice==’4′){

printf(“\n\t\tPlease enter the starting position: “);

scanf(“%d”,&i);

getchar();

printf(“\n\t\tPlease enter the number of consecutive characters to delete: “);

scanf(“%d”,&j);

getchar();

DelStr(r,i ,j);

}

Testing results:

(7) Find Substring

int IndexStr(Str *r,Str *r1){

printf(“\n\t\t r=%s r1=%s\n”,r->vec,r1->vec);

int i,j,k;

for(i=0;r->vec[i];i++){

for(j=i,k=0;r->vec[j]==r1->vec[k];j++,k++){

if(!r1->vec[k+1]){

return i;

}

return -1;

}

}

}

Testing code is as follows:

else if(choice==’6′){

printf(“\n\t\tPlease enter the string to find: “);

r1=CreateStr(&b);

i=IndexStr(r,r1);

if(i!=-1){

printf(“\n\t\tThe first occurrence is at position %d.\n “,i+1);

}else{

printf(“\n\t\tThe substring is not present!”);

}

}

Testing results:

7. Complete Code for String Subsystem

#include<iostream>

using namespace std;

#define MAXLEN 100 // Define the maximum length of strings

typedef struct

{

char vec[MAXLEN];

int len; // Actual length of the string

} Str; // You can use Str to define this type of structure variable

int LenStr(Str *r){

int i=0;

while(r->vec[i]!=’\0′){

i++;

}

return i;

}

Str *CreateStr(Str *r){

gets(r->vec);

r->len=LenStr(r);

return r;

}

void ShowStr(Str *r){

printf(“\n\t\tThe string value is: “);

if(r->vec[0]==’\0′){

printf(“Empty string! \n”);

}else{

puts(r->vec); // Use puts function to output the string, formatted as puts(string array name)

}

}

void ConcatStr(Str *r1,Str *r2){

int i;

printf(“\n\t\t r1=%s r2=%s\n”,r1->vec,r2->vec);

if(r1->len+r2->len>MAXLEN){

printf(“The two strings are too long, overflow!”); // The length of the concatenated string exceeds the maximum length

}else{

for(i=0;ilen;i++){

r1->vec[r1->len+i]=r2->vec[i]; // Perform concatenation

}

r1->vec[r1->len+i]=’\0′;

r1->len=r1->len+r2->len; // Modify the length of the new concatenated string

}

printf(“\n\t\t r1=%s r2=%s\n”,r1->vec,r2->vec);

}

void SubStr(Str *r,Str *r1,int i ,int j){

if(i+j-1>MAXLEN){

printf(“Substring out of bounds!”);

}else{

for(int k=0;k

r1->vec[k]=r->vec[i+k-1]; // Extract substring from r

}

r1->len=j;

r1->vec[r1->len]=’\0′;

}

printf(“\n\t\t Extracted characters are r1=%s “,r1->vec);

}

int EqualStr(Str *r1,Str *r2){

printf(“\n\t\t r1=%s r2=%s\n”,r1->vec,r2->vec);

int i=0;

while(r1->vec[i]==r2->vec[i]&&r1->vec[i]!=’\0’&&r2->vec[i]!=’\0″)

i++;

if(r1->vec[i]==r2->vec[i])

return 0;

else if(r1->vec[i]>r2->vec[i])

return 1;

else

return -1;

}

Str *InsStr(Str *r,Str *r1,int i){

printf(“\n\t\t r=%s r1=%s\n”,r->vec,r1->vec);

if(i>r->len||r->len+r1->len>MAXLEN){

printf(“Cannot insert!”);

}else{

for(int k=r->len-1;k>=i;k–){

r->vec[r1->len+k]=r->vec[k]; // Move back to create space

}

for(int k=0;klen;k++){

r->vec[i+k]=r1->vec[k]; // Insert substring

}

r->len=r->len+r1->len;

r->vec[r->len]=’\0′;

}

printf(“\n\t\t The new string after insertion r=%s \n”,r->vec);

return r;

}

void DelStr(Str *r,int i ,int j){

if(i+j-1>r->len){

printf(“The string to be deleted is out of bounds!”);

}else{

for(int k=i+j;klen;k++,i++){

r->vec[i]=r->vec[k]; // Move the subsequent string forward to cover

}

r->len=r->len-j;

r->vec[r->len]=’\0′;

}

printf(“\n\t\t The new string after deletion r=%s \n”,r->vec);

}

int IndexStr(Str *r,Str *r1){

printf(“\n\t\t r=%s r1=%s\n”,r->vec,r1->vec);

int i,j,k;

for(i=0;r->vec[i];i++){

for(j=i,k=0;r->vec[j]==r1->vec[k];j++,k++){

if(!r1->vec[k+1]){

return i;

}

return -1;

}

}

}

int main()

{

Str a,b,c,d;

Str *r=&a,*r1;

r->vec[0]=’\0′;

char choice,p;

int i,j, ch=1;

while(ch!=0){

printf(“\n”);

printf(“\n\t\t String Subsystem *”);

printf(“\n\t\t************************************”);

printf(“\n\t\t* 1——Input String *”);

printf(“\n\t\t* 2——Concatenate String *”);

printf(“\n\t\t* 3——Extract Substring *”);

printf(“\n\t\t* 4——Delete Substring *”);

printf(“\n\t\t* 5——Insert Substring *”);

printf(“\n\t\t* 6——Find Substring *”);

printf(“\n\t\t* 7——Compare Strings *”);

printf(“\n\t\t* 8——Display String *”);

printf(“\n\t\t* 0——Return *”);

printf(“\n\t\t************************************”);

printf(“\n\t\tPlease choose a menu number (0-8): *”);

scanf(“%c”,&choice);

getchar();

if(choice==’1′){

printf(“\n\t\tPlease enter a string: “);

gets(r->vec); // Use get function to input string, formatted as gets(string array name)

r->len=LenStr(r);

}else if(choice==’2′){

printf(“\n\t\tPlease enter the string to concatenate: “);

r1=CreateStr(&b);

printf(“\n\t\tr1 is: “);

puts(r1->vec);

ConcatStr(r,r1);

printf(“\n\t\tThe new concatenated string value is: “);

puts(r->vec);

int n=LenStr(r);

printf(“New string length is:%d”,n);

}else if(choice==’3′){

printf(“\n\t\tPlease enter the starting character position: “);

scanf(“%d”,&i);

getchar();

printf(“\n\t\tPlease enter the number of consecutive characters to extract: “);

scanf(“%d”,&j);

getchar();

SubStr(r,&a,i ,j);

}else if(choice==’4′){

printf(“\n\t\tPlease enter the starting position: “);

scanf(“%d”,&i);

getchar();

printf(“\n\t\tPlease enter the number of consecutive characters to delete: “);

scanf(“%d”,&j);

getchar();

DelStr(r,i ,j);

}else if(choice==’5′){

printf(“\n\t\tPlease enter the position before which to insert: “);

scanf(“%d”,&i);

getchar();

printf(“\n\t\tPlease enter the string to insert: “);

r1=CreateStr(&b);

Str *newStr=InsStr(r,r1,i-1);

printf(“\n\t\tNew string value is: “);

puts(newStr->vec);

}else if(choice==’6′){

printf(“\n\t\tPlease enter the string to find: “);

r1=CreateStr(&b);

i=IndexStr(r,r1);

if(i!=-1){

printf(“\n\t\tThe first occurrence is at position %d.\n “,i+1);

}else{

printf(“\n\t\tThe substring is not present!”);

}

}else if(choice==’7′){

printf(“\n\t\tPlease enter the first string: “);

gets(c.vec);

printf(“\n\t\tPlease enter the second string: “);

gets(d.vec);

int k=EqualStr(&c,&d);

if(k>0){

printf(“\n\t\tThe first string is larger! \n”);

}else if(k<0){

printf(“\n\t\tThe second string is larger! \n”);

}else {

printf(“\n\t\tThe two strings are equal! \n”);

}

}else if(choice==’8′){

ShowStr(r);

int n=LenStr(r);

printf(“String length is:%d”,n);

}else if(choice==’0′){

break;

}else{

printf(“\n\t\tPlease note: Input error! \n”);

if(choice!=’X’&&choice!=’x’){

printf(“\n\t\tPress Enter to continue, press any key to return to the main menu \n”);

p=getchar();

if(p!=’\xA’){

getchar();

break;

}

}

}

}

return 0;

}

1. Definition of String Type

Non-numeric processing objects on computers are basically string data.

A string (or character string) is a finite sequence composed of zero or more characters, and the value of the string can be letters, numbers, or other characters. The number of characters n in the string is called the length of the string, and a string with zero characters is called an empty string, with a length of zero.

Any continuous sequence of characters in a string is called a substring of that string. The string containing the substring is referred to as the main string. The position of a character in the sequence is typically referred to as its position in the string, and the position of the substring in the main string is indicated by the position of the first character of the substring in the main string.

If two strings are equal, it is only when their values are equal, meaning that both strings have the same length and the characters at each corresponding position are also the same.

A string composed of one or more spaces is called a whitespace string, and its length is the number of space characters in the string.

In the basic operations of linear lists, most operations are performed on individual elements, while in the basic operations of strings, the entire string is typically the object of the operation.

Example 1: Implementing the Locating Function Index(S,T,pos)

The basic idea is to take the substring from the main string S starting from the i-th character, which has the same length as string T, and compare it with string T. If they are equal, the function value is i; otherwise, i increments until no substring in S is equal to T. The algorithm implementation is as follows:

int Index(String S,String T,int pos){

// T is a non-empty string, if there exists a substring in the main string S after the pos-th character that is equal to T

// it returns the position of the first such substring in S, otherwise returns 0

if(pos>0){

n=StrLength(S);m=Strlength(T);i=pos;

while(i<=n-m+1){

SubString(sub,S,i,m); // Use sub to return the substring of S starting from the i-th character with length m

if(StrCompare(Sub,T)!=()) ++i;

else return i; // Return the position of the substring in the main string

}

}

return 0; // No substring in S is equal to T

}

2. Representation and Implementation of Strings

If strings only appear as constants for input and output in programming languages, only the string value, i.e., the character sequence, needs to be stored. However, in most non-numeric processing programs, strings also appear in variable form. There are three methods of representing strings, as follows:

1. Fixed-Length Sequential Storage Representation

Similar to the sequential storage structure of linear lists, a group of contiguous storage units is used to store the character sequence of string values. In the fixed-length sequential storage structure of strings, a fixed-length storage area is allocated for each defined string variable according to a predefined size, which can be described as follows:

/* Fixed-Length Sequential Storage Representation of Strings */

# define MAXSTARLEN 255 // Users can define the maximum string length within 255

typedef unsigned char SString[MAXSTRLEN+1] // The 0-th unit stores the length of the string

The actual length of the string can vary within this predefined length, and any string value exceeding the predefined length will be truncated.

(1) String Concatenation Concat(&T,S1,S2)

Assuming S1, S2, and T are all SString type string variables, and string T is obtained by concatenating strings S1 and S2, i.e., the value of string T’s front section equals the value of string S1, and the back section equals the value of string S2, then the appropriate “string value copy” operation can be performed. However, it should be noted that for overly long portions, the “truncation” operation should be applied. Based on the different lengths of strings S1 and S2, the value of string T may be produced in the following three scenarios: 1) S[0]+S2[0]<=MAXSTRLEN 2) S1[0]<=MAXSTRLEN, and S1[0]+S2[0]>MAXSTRLEN, then part of string S2 is truncated, resulting in string T containing only a substring of string S2 3) S1[0]=MAXSTRLEN, then the resulting string T is not the concatenation result but is equal to string S1, the algorithm description is as follows:

Status Concat(SString &T,SString S1,SString S2){

// Use T to return the new string formed by concatenating S1 and S2, if not truncated, return TRUE, otherwise return FALSE

if(S1[0]+S2[0]<=MAXSTRLEN){ // Not truncated

T[1..S1[0]] = S1[1..S1[0]];

T[S1[0]+1..S1[0]+S2[0]]=S2[1..S2[0]];

T[0] = S1[0]+S[0]; uncut=True;

}

else if (S1[0]

T[1..S1[0]] = S1[1..S1[0]];

T[S1[0]+1..MAXSTRLEN] = S2[1..MAXSTRLEN-S1[0]]; uncut=FALSE;

}

else { // Truncated

T[0..MAXSTRLEN] = S1[0..MAXSTRLEN]; uncut=False;

}

return uncut;

}

(2) Calculate Substring SubString(⋐,S,pos,len)

The process of calculating a substring is essentially the process of copying character sequences, copying the character sequence from string S starting from the pos-th character with length len into string sub. It is evident that this operation will not have truncation situations, but it may produce user-given parameters that do not meet the initial conditions for the operation, in which case it returns ERROR. The algorithm description is as follows:

Status SubString(SString ⋐,SString S,int pos,int len){

// Use Sub to return the substring starting from the pos-th character of string S with length len

// where 1<=pos<=StrLength and 0<=len<=StrLength(S)-pos+1

if(pos<1 || pos>S[0]||len<0||len>S[0]-pos+1)

return ERROR;

Sub[1..Len] = S[pos..pos+len-1];

Sub[0]=len; return OK;

}

From the above two operations, it can be seen that in the sequential storage structure, the original operation for implementing string operations is “copying character sequences”, and the time complexity of the operation is based on the length of the copied character sequence. Another characteristic of this operation is that if the length of the string sequence exceeds the upper limit MAXSTRLEN during the operation, it is agreed to handle it using truncation. This situation may occur not only when calculating the concatenated string but also in other string operations, such as insertion and replacement. To overcome this drawback, the maximum length of strings should not be limited, i.e., dynamically allocating storage space for string values.

2. Heap-Allocated Storage Representation

This storage representation is characterized by still using a group of contiguous storage units to store the character sequence of string values, but their storage space is dynamically allocated during program execution. In C language, there exists a free storage area called “heap”, managed by the dynamic allocation functions malloc() and free() in C language. Using the function malloc() to allocate a block of storage space equal to the actual length required for each new string, if the allocation is successful, it returns a pointer pointing to the starting address, serving as the base address of the string. To facilitate future processing, it is agreed that the string length is also part of the storage structure.

/* Heap-Allocated Storage Representation of Strings */

typedef struct{

char *ch; // If it is a non-empty string, allocate storage space according to the string length; otherwise, ch is NULL

int length; // String length

} HString;

This storage structure indicates that string operations are still based on “copying character sequences”.

1. The implementation algorithm for the string copy operation StrCopy(&T,S) is as follows: if string T already exists, first release the space occupied by string T. When string S is not empty, first allocate storage space equal to the length of string S for string T, and then copy the value of string S into string T;

2. The implementation algorithm for the string insertion operation StrInsert(&S,pos,T) is as follows: reallocate storage space equal to the sum of the lengths of strings S and T, and then perform the value copying operation. The algorithm implementation is as follows:

Status StrInsert(HString &S,int pos,HString T){

// 1<=POS<=StrLength(S)+1, insert string T before the pos-th character of string S

if(pos<1||pos>S.length+1) return ERROR; // pos is invalid

if(T.length){ // T is non-empty, reallocate space and insert T

if(!(S.ch=(char *)realloc(S.ch,(S.length+T.length)*sizeof(char))))

exit(OVERFLOW);

for(i=S.length-1;i>=pos-1;–i) // Create space for inserting T

S.ch[i+T.length]= S.ch[i];

S.ch[pos-1..pos+T.length-2] = T.ch[0..T.length-1]; // Input T

S.length +=T.length;

}

return OK;

}

The above two storage representations are usually adopted by advanced programming. Since the heap-allocated storage structure has the characteristics of sequential storage structure, it is convenient for processing, and there is no restriction on the string length during operations, making it more flexible. Therefore, it is often chosen in application programs for string processing.

3. Description of Basic Operation Algorithms

Status StrAssign(HString &T,char *chars){

// Generate a string T whose value equals the string constant chars

if(T.ch) free(T.ch); // Free the existing space of T

for(i=0,c=chars;c;++i,++c); // Calculate the length i of chars

if(!i){ T.ch=NULL; T.length=0;}

else{

if(!(T.ch=(char *)malloc(i*sizeof(char))))

exit(OVERFLOW);

T.ch[0..i-1] = chars[0..i-1]

T.length = i;

}

return OK;

}

int StrLength(HString S){

// Return the number of elements in S, called the length of the string

return S.length;

}

int StrCompare(HString S,HString T){

// If S>T, return value>0; if S=T, return value=0; if S

for(i=0;i

if(S.ch[i]!=T.ch[i]) return S.ch[i]-T.ch[i];

return S.length-T.length;

}

Status ClearString(HString &S){

// Clear S to an empty string

if(S.ch) {free(S.ch); S.ch=NULL;}

S.length=0;

return OK;

}

Status Concat(HString &T,HString S1,HString S2){

// Use T to return the new string formed by concatenating S1 and S2

if(T.ch) free(T.ch); // Free old space

if(!(T.ch=(char *)malloc((S1.length+S2.length)*sizeof(char))))

exit(OVERFLOW);

T.ch[0..S1.length-1] = S1[0..S1.length-1];

T.length = S1.length+S2.length;

T.ch[S1.length..T.length-1] = S2[0..S2.length-1];

return OK;

}

Status SubString(HString ⋐,HString S,int pos,int len){

// Use sub to return the substring of S starting from the pos-th character with length len

if(pos<1||pos>S.length||len<0||len>S.length-pos+1)

return ERROR;

if(Sub.ch) free(Sub.ch); // Free old space

if(!len){ Sub.ch=NULL; Sub.length=0; } // Empty substring

else{

Sub.ch = (char *)malloc(len*sizeof(char));

Sub.ch[0..len-1] = S.ch[pos-1..pos+len-2];

Sub.length= len;

}

return OK;

}

3. Block Chain Storage Representation of Strings

Similar to the linked storage structure of linear lists, strings can also be stored using linked lists. Due to the special nature of the string structure – where each data element is a character, when storing string values using linked lists, there is a size issue for each node, i.e., each node can store one character or multiple characters.

To facilitate string operations, when storing string values using linked lists, in addition to the head pointer, a tail pointer can also be provided to indicate the last node in the linked list, with the current length of the string given, which is referred to as the block chain structure defined in this way.

String Block Chain Storage Representation

#define CHUNKSIZE 80 // User-defined block size

typedef struct Chunk{

char ch[CHUNKSIZE];

struct Chunk *next;

} Chunk;

typedef struct{

Chunk *head,*tail; // Head and tail pointers of the string

int curlen; // Current length of the string

} LString;

Since in general cases, string operations only require scanning from head to tail, there is no need to establish a doubly linked list for the string values. The purpose of the tail pointer is to facilitate linking operations, but care should be taken to handle the invalid character at the end of the first string during concatenation.

In the linked storage method, the choice of node size is equally important as the format selection in sequential storage methods, as it directly affects the efficiency of string processing. In various string processing systems, the strings being processed are often very long or numerous, such as millions of characters in a book or thousands in intelligence data. This requires consideration of the storage density of string values, which can be defined as

Storage Density = Storage Bits Occupied by String Values / Actual Allocated Storage Bits

Clearly, when the storage density is small (e.g., when the node size is 1), the operation is easier, but the storage occupies more space. If the string processing requires frequent swapping between internal and external storage, excessive swapping operations may affect the overall processing efficiency. It should also be noted that the size of the character set for string values is an important factor. Generally, a smaller character set means shorter internal encoding for characters, which also affects the selection of storage methods for string values.

The linked storage structure for string values has certain conveniences for some string operations, such as concatenation operations, but overall, it is less flexible than the other two storage structures. It occupies a larger storage space, making operations more complex. Additionally, the implementation of string operations in linked storage structures is similar to operations in linked list storage structures for linear lists.

Leave a Comment