A string replacement function, replace_str(), for the C programming language

Updated 05 October 2014: A new optimisation has been introduced into the function. Both the new (replace_str2) and original (replace_str) versions are presented below.

The standard C library doesn't include a string replacement function, which it might have called strrep(), strrepl(), strreplace() or even, less likely, str_replace(). We might call our stand-in function replace_str() - the Standard requires that we not name it strrep(), strrepl(), strreplace() or str_replace() because those names impose on the reserved function namespace, str*. The purpose of the would-be-strrep()-renamed-replace_str() function is to replace in the input string all occurrences of a source/search (old) substring with a target/replace (new) substring, and to then return a post-replacement string.

Such a function became the topic of a thread in comp.lang.c in late 2005 in which I participated, titled "how to replace a substring in a string using C?". Below, I present the optimised, standards-compatible function (i.e. can be compiled with any C Standard-compliant compiler) that emerged out of that thread.

For those familiar with PHP, this function is a C equivalent for PHP's str_replace() function, except for its order of parameters, its inability to accept array parameters, the possibility that it will return NULL, and the need to free the memory of the char pointer that it returns.

Performance

In case you are curious enough to want to compare this string replacement function implementation with alternative implementations, you are welcome to compile and run this benchmarking code: the below functions are renamed replace_opt2 and replace_opt in that code. See the comments at the top of the file for information on how to compile and run the code. The top answer to the stackoverflow question, What is the function to replace string in C?, is included (as replace_jmucc), and yes, replace_str2/replace_opt2 outperforms it (compiling with GCC on 64-bit Linux) - very modestly in most cases, but replace_str2/replace_opt2 is about twice as fast in the case where the old and new strings are identical in length. It outperforms all of the other included implementations too (again, compiling with GCC on 64-bit Linux).

A C str_replace() function better named replace_str()

Description:Replaces in the string str all the occurrences of the source string old with the destination string new. The lengths of the strings old and new may differ. The string new may be of any length, but the string "old" must be of non-zero length - the penalty for providing an empty string for the "old" parameter is an infinite loop. In addition, none of the three parameters may be NULL.
Returns:The post-replacement string, or NULL if memory for the new string could not be allocated. Does not modify the original string. The memory for the returned post-replacement string may be deallocated with the standard library function free() when it is no longer required.
Dependencies:For this function to compile, you will need to also #include the following files: <string.h>, <stdlib.h> and <stddef.h>.
Licence:Public domain. You may use this code in any way you see fit, optionally crediting its author (me, Laird Shaw, with assistance and inspiration from comp.lang.c and stackoverflow).

char *replace_str2(const char *str, const char *old, const char *new)
			{
				char *ret, *r;
				const char *p, *q;
				size_t oldlen = strlen(old);
				size_t count, retlen, newlen = strlen(new);
				int samesize = (oldlen == newlen);
			
				if (!samesize) {
					for (count = 0, p = str; (q = strstr(p, old)) != NULL; p = q + oldlen)
						count++;
					/* This is undefined if p - str > PTRDIFF_MAX */
					retlen = p - str + strlen(p) + count * (newlen - oldlen);
				} else
					retlen = strlen(str);
			
				if ((ret = malloc(retlen + 1)) == NULL)
					return NULL;
			
				r = ret, p = str;
				while (1) {
					/* If the old and new strings are different lengths - in other
					 * words we have already iterated through with strstr above,
					 * and thus we know how many times we need to call it - then we
					 * can avoid the final (potentially lengthy) call to strstr,
					 * which we already know is going to return NULL, by
					 * decrementing and checking count.
					 */
					if (!samesize && !count--)
						break;
					/* Otherwise i.e. when the old and new strings are the same
					 * length, and we don't know how many times to call strstr,
					 * we must check for a NULL return here (we check it in any
					 * event, to avoid further conditions, and because there's
					 * no harm done with the check even when the old and new
					 * strings are different lengths).
					 */
					if ((q = strstr(p, old)) == NULL)
						break;
					/* This is undefined if q - p > PTRDIFF_MAX */
					ptrdiff_t l = q - p;
					memcpy(r, p, l);
					r += l;
					memcpy(r, new, newlen);
					r += newlen;
					p = q + oldlen;
				}
				strcpy(r, p);
			
				return ret;
			}

Original version

char *replace_str(const char *str, const char *old, const char *new)
			{
				char *ret, *r;
				const char *p, *q;
				size_t oldlen = strlen(old);
				size_t count, retlen, newlen = strlen(new);
			
				if (oldlen != newlen) {
					for (count = 0, p = str; (q = strstr(p, old)) != NULL; p = q + oldlen)
						count++;
					/* this is undefined if p - str > PTRDIFF_MAX */
					retlen = p - str + strlen(p) + count * (newlen - oldlen);
				} else
					retlen = strlen(str);
			
				if ((ret = malloc(retlen + 1)) == NULL)
					return NULL;
			
				for (r = ret, p = str; (q = strstr(p, old)) != NULL; p = q + oldlen) {
					/* this is undefined if q - p > PTRDIFF_MAX */
					ptrdiff_t l = q - p;
					memcpy(r, p, l);
					r += l;
					memcpy(r, new, newlen);
					r += newlen;
				}
				strcpy(r, p);
			
				return ret;
			}

Notes

  • Sometimes - see, for example, user Rhomboid's post in this reddit thread - it is suggested that the calculation of the original string's length in the 12th line (13th line in the latest version) would be better simplified from p - str + strlen(p) to strlen(str). Whilst this would indeed improve the code's simplicity and readability, it would also remove an optimisation whose effect is often measurable (even if not large), and which is why the code has been written as it has been. The code as it is requires strlen() to iterate over the string only from the last match onwards, whereas the simplified code would require strlen() to iterate over the entire string - a redundant waste most of the time (i.e. when there is at least one match) seeing that strstr() has done some, and often even a lot, of that iteration already. Especially in cases where the last match is very near the end of the string, and where the string is long, this is a significant enough saving for the optimisation to be worth retaining despite the slight loss of simplicity and readability it entails - at least in my view.
  • Other efficiencies/optimisations include:
    • [This applies to replace_str2 only] strstr() is called only as many times as are strictly necessary. When the old and new strings differ in length, then, taking the number of matches of the old string to be "count", this equates to count x 2 + 1 times: one iteration to count the number of matches so as to determine memory requirements, but which also includes a final call returning NULL - hence the "+ 1" - and another iteration whilst replacing matches, but this time avoiding the final NULL-returning call since we know how many matches there are. When the old and new strings are the same length, then this equates to count + 1 times: we only need iterate once whilst replacing matches (the iteration to determine memory requirements is unnecessary because the memory requirements of the post-replacement string are identical to those of the pre-replacement string), but including a final call returning NULL.
    • strcpy() is never called when memcpy() is sufficient - i.e. we avoid superfluously copying '\0's.
  • If you'd like the function to handle NULL arguments and empty "old" arguments by returning NULL rather than failing, then you can add these lines to the top of the function (immediately below the variable declarations):
    	if (str == NULL || old == NULL || new == NULL || old[0] == '\0')
    					return NULL;