shell - How do I split a string on a delimiter in Bash?

ID : 297

viewed : 118

Tags : bashshellsplitscriptingbashdashkshbusyboxbashbashbashbash

Top 5 Answer for shell - How do I split a string on a delimiter in Bash?

vote vote

98

You can set the internal field separator (IFS) variable, and then let it parse into an array. When this happens in a command, then the assignment to IFS only takes place to that single command's environment (to read ). It then parses the input according to the IFS variable value into an array, which we can then iterate over.

This example will parse one line of items separated by ;, pushing it into an array:

IFS=';' read -ra ADDR <<< "$IN" for i in "${ADDR[@]}"; do   # process "$i" done 

This other example is for processing the whole content of $IN, each time one line of input separated by ;:

while IFS=';' read -ra ADDR; do   for i in "${ADDR[@]}"; do     # process "$i"   done done <<< "$IN" 
vote vote

83

Taken from Bash shell script split array:

IN="bla@some.com;john@home.com" arrIN=(${IN//;/ }) echo ${arrIN[1]}                  # Output: john@home.com 

Explanation:

This construction replaces all occurrences of ';' (the initial // means global replace) in the string IN with ' ' (a single space), then interprets the space-delimited string as an array (that's what the surrounding parentheses do).

The syntax used inside of the curly braces to replace each ';' character with a ' ' character is called Parameter Expansion.

There are some common gotchas:

  1. If the original string has spaces, you will need to use IFS:
  • IFS=':'; arrIN=($IN); unset IFS;
  1. If the original string has spaces and the delimiter is a new line, you can set IFS with:
  • IFS=$'\n'; arrIN=($IN); unset IFS;
vote vote

75

If you don't mind processing them immediately, I like to do this:

for i in $(echo $IN | tr ";" "\n") do   # process done 

You could use this kind of loop to initialize an array, but there's probably an easier way to do it. Hope this helps, though.

vote vote

61

Compatible answer

There are a lot of different ways to do this in .

However, it's important to first note that bash has many special features (so-called bashisms) that won't work in any other .

In particular, arrays, associative arrays, and pattern substitution, which are used in the solutions in this post as well as others in the thread, are bashisms and may not work under other shells that many people use.

For instance: on my Debian GNU/Linux, there is a standard shell called ; I know many people who like to use another shell called ; and there is also a special tool called with his own shell interpreter ().

Requested string

The string to be split in the above question is:

IN="bla@some.com;john@home.com" 

I will use a modified version of this string to ensure that my solution is robust to strings containing whitespace, which could break other solutions:

IN="bla@some.com;john@home.com;Full Name <fulnam@other.org>" 

Split string based on delimiter in (version >=4.2)

In pure bash, we can create an array with elements split by a temporary value for IFS (the input field separator). The IFS, among other things, tells bash which character(s) it should treat as a delimiter between elements when defining an array:

IN="bla@some.com;john@home.com;Full Name <fulnam@other.org>"  # save original IFS value so we can restore it later oIFS="$IFS" IFS=";" declare -a fields=($IN) IFS="$oIFS" unset oIFS 

In newer versions of bash, prefixing a command with an IFS definition changes the IFS for that command only and resets it to the previous value immediately afterwards. This means we can do the above in just one line:

IFS=\; read -a fields <<<"$IN" # after this command, the IFS resets back to its previous value (here, the default): set | grep ^IFS= # IFS=$' \t\n' 

We can see that the string IN has been stored into an array named fields, split on the semicolons:

set | grep ^fields=\\\|^IN= # fields=([0]="bla@some.com" [1]="john@home.com" [2]="Full Name <fulnam@other.org>") # IN='bla@some.com;john@home.com;Full Name <fulnam@other.org>' 

(We can also display the contents of these variables using declare -p:)

declare -p IN fields # declare -- IN="bla@some.com;john@home.com;Full Name <fulnam@other.org>" # declare -a fields=([0]="bla@some.com" [1]="john@home.com" [2]="Full Name <fulnam@other.org>") 

Note that read is the quickest way to do the split because there are no forks or external resources called.

Once the array is defined, you can use a simple loop to process each field (or, rather, each element in the array you've now defined):

# `"${fields[@]}"` expands to return every element of `fields` array as a separate argument for x in "${fields[@]}" ;do     echo "> [$x]"     done # > [bla@some.com] # > [john@home.com] # > [Full Name <fulnam@other.org>] 

Or you could drop each field from the array after processing using a shifting approach, which I like:

while [ "$fields" ] ;do     echo "> [$fields]"     # slice the array      fields=("${fields[@]:1}")     done # > [bla@some.com] # > [john@home.com] # > [Full Name <fulnam@other.org>] 

And if you just want a simple printout of the array, you don't even need to loop over it:

printf "> [%s]\n" "${fields[@]}" # > [bla@some.com] # > [john@home.com] # > [Full Name <fulnam@other.org>] 

Update: recent >= 4.4

In newer versions of bash, you can also play with the command mapfile:

mapfile -td \; fields < <(printf "%s\0" "$IN") 

This syntax preserve special chars, newlines and empty fields!

If you don't want to include empty fields, you could do the following:

mapfile -td \; fields <<<"$IN" fields=("${fields[@]%$'\n'}")   # drop '\n' added by '<<<' 

With mapfile, you can also skip declaring an array and implicitly "loop" over the delimited elements, calling a function on each:

myPubliMail() {     printf "Seq: %6d: Sending mail to '%s'..." $1 "$2"     # mail -s "This is not a spam..." "$2" </path/to/body     printf "\e[3D, done.\n" }  mapfile < <(printf "%s\0" "$IN") -td \; -c 1 -C myPubliMail 

(Note: the \0 at end of the format string is useless if you don't care about empty fields at end of the string or they're not present.)

mapfile < <(echo -n "$IN") -td \; -c 1 -C myPubliMail  # Seq:      0: Sending mail to 'bla@some.com', done. # Seq:      1: Sending mail to 'john@home.com', done. # Seq:      2: Sending mail to 'Full Name <fulnam@other.org>', done. 

Or you could use <<<, and in the function body include some processing to drop the newline it adds:

myPubliMail() {     local seq=$1 dest="${2%$'\n'}"     printf "Seq: %6d: Sending mail to '%s'..." $seq "$dest"     # mail -s "This is not a spam..." "$dest" </path/to/body     printf "\e[3D, done.\n" }  mapfile <<<"$IN" -td \; -c 1 -C myPubliMail  # Renders the same output: # Seq:      0: Sending mail to 'bla@some.com', done. # Seq:      1: Sending mail to 'john@home.com', done. # Seq:      2: Sending mail to 'Full Name <fulnam@other.org>', done.  

Split string based on delimiter in

If you can't use bash, or if you want to write something that can be used in many different shells, you often can't use bashisms -- and this includes the arrays we've been using in the solutions above.

However, we don't need to use arrays to loop over "elements" of a string. There is a syntax used in many shells for deleting substrings of a string from the first or last occurrence of a pattern. Note that * is a wildcard that stands for zero or more characters:

(The lack of this approach in any solution posted so far is the main reason I'm writing this answer ;)

${var#*SubStr}  # drops substring from start of string up to first occurrence of `SubStr` ${var##*SubStr} # drops substring from start of string up to last occurrence of `SubStr` ${var%SubStr*}  # drops substring from last occurrence of `SubStr` to end of string ${var%%SubStr*} # drops substring from first occurrence of `SubStr` to end of string 

As explained by Score_Under:

# and % delete the shortest possible matching substring from the start and end of the string respectively, and

## and %% delete the longest possible matching substring.

Using the above syntax, we can create an approach where we extract substring "elements" from the string by deleting the substrings up to or after the delimiter.

The codeblock below works well in (including Mac OS's bash), , , and 's :

(Thanks to Adam Katz's comment, making this loop a lot simplier!)

IN="bla@some.com;john@home.com;Full Name <fulnam@other.org>" while [ "$IN" != "$iter" ] ;do     # extract the substring from start of string up to delimiter.     iter=${IN%%;*}     # delete this first "element" AND next separator, from $IN.     IN="${IN#$iter;}"     # Print (or doing anything with) the first "element".     echo "> [$iter]" done # > [bla@some.com] # > [john@home.com] # > [Full Name <fulnam@other.org>] 

Have fun!

vote vote

53

I've seen a couple of answers referencing the cut command, but they've all been deleted. It's a little odd that nobody has elaborated on that, because I think it's one of the more useful commands for doing this type of thing, especially for parsing delimited log files.

In the case of splitting this specific example into a bash script array, tr is probably more efficient, but cut can be used, and is more effective if you want to pull specific fields from the middle.

Example:

$ echo "bla@some.com;john@home.com" | cut -d ";" -f 1 bla@some.com $ echo "bla@some.com;john@home.com" | cut -d ";" -f 2 john@home.com 

You can obviously put that into a loop, and iterate the -f parameter to pull each field independently.

This gets more useful when you have a delimited log file with rows like this:

2015-04-27|12345|some action|an attribute|meta data 

cut is very handy to be able to cat this file and select a particular field for further processing.

Top 3 video Explaining shell - How do I split a string on a delimiter in Bash?

Related QUESTION?