A PHP strpos variant taking an array of needles: strposm

The standard PHP library function, strpos, accepts only a string for its $needle parameter. Often, it would be useful for it to accept an array of strings too. Here's a drop-in replacement function, strposm (which could also have been named strpos_array) that does accept arrays, but also accepts a string, for compatibility with the standard strpos. It leverages preg_match.

So, what are the advantages of this function over all of the other similar functions strewn across the web? Well, there are two. Firstly, it returns the position of the earliest occurrence of any of the needles, not just the first match that it finds. Secondly, it optionally returns the matching needle, so you know which one matched. I have been unable to find an existing function on the web that has either of these features.

I've included several versions. The first is fully-featured, and can also perform case-insensitive searches. For performance reasons, it wraps around strpos and stripos when $needles is not an array. The second is that same fully-featured version with extensive comments. The third is a minimal version for those who want to avoid bloat - it doesn't support case-insensitive searching, and nor does it wrap around strpos/stripos when $needles is a scalar string. The fourth is an even more minimal version whose size is reduced because it returns an array rather than passing one of the return values as a parameter by reference.

The function: strposm

Description:Finds the position of the first occurrence in the string $haystack of any of the supplied substrings in $needles, from the offset $offset onwards. $needles need not be an array of strings but may also be an individual (scalar) string. Matches case-insensitively if $flags includes STRPOSM_CI. Returns the matching needle in $match if that parameter is supplied. This is in $haystack case by default, but if $flags includes STRPOSM_NC, then it is in $needles case. Alternatively, if $flags includes STRPOSM_MATCH_AS_INDEX, then $matches is set to the index into $needles of the matching needle.
Returns:The position of the first occurrence of any of the $needles in $haystack, or false if none of them occur. If supplied, $match returns either the matching needle or its index as described above. If nothing matches, then $match is set to either false or null depending, respectively, on whether the STRPOSM_MATCH_AS_INDEX flag is set or not.
Author:Me (Laird Shaw).
Licence:Public domain.
Attribution:Optional. If you choose to indicate attribution when using this function, feel free to link to this page.

/*
 * A strpos variant that accepts an array of $needles - or just a string,
 * so that it can be used as a drop-in replacement for the standard strpos,
 * and in which case it simply wraps around strpos and stripos so as not
 * to reduce performance.
 *
 * The "m" in "strposm" indicates that it accepts *m*ultiple needles.
 *
 * Finds the earliest match of *any* of the needles. Returns the position of
 * this match or false if none found, as does the standard strpos. Optionally
 * also returns via $match either the matching needle as a string (by default)
 * or the index into $needles of the matching needle (if the
 * STRPOSM_MATCH_AS_INDEX flag is set).
 *
 * Case-insensitive searching can be specified via the STRPOSM_CI flag.
 * Note that for case-insensitive searches, when the STRPOSM_MATCH_AS_INDEX flag
 * is not set, so that $match is to contain the matching needle rather than its
 * index, then $match will be in the haystack's case by default, but the default
 * can be overridden by also setting the STRPOSM_NC flag (per example below), in
 * which case $match will be in the needle's case.
 *
 * Flags can be combined using the bitwise or operator,
 * e.g. $flags = STRPOSM_CI|STRPOSM_NC
 */
define('STRPOSM_CI'            , 1); // CI => "case insensitive".
define('STRPOSM_NC'            , 2); // NC => "needle case".
define('STRPOSM_MATCH_AS_INDEX', 4);
function strposm($haystack, $needles, $offset = 0, &$match = null, $flags = 0) {
	if (!is_array($needles)) {
		$func = $flags & STRPOSM_CI ? 'stripos' : 'strpos';
		$pos = $func($haystack, $needles, $offset);
		if ($pos !== false) {
			$match = (($flags & STRPOSM_MATCH_AS_INDEX)
			          ? 0
			          : (($flags & STRPOSM_NC)
			             ? $needles
			             : substr($haystack, $pos, strlen($needles))
			            )
			         );
			return $pos;
		} else	goto strposm_no_match;
	}

	$needles_esc = array_map('preg_quote', $needles);
	if (($flags & STRPOSM_NC) || ($flags & STRPOSM_MATCH_AS_INDEX)) {
		$needles_esc = array_map(
			function($needle) {return '('.$needle.')';},
			$needles_esc
		);
	}
	$pattern = '('.implode('|', $needles_esc).')';
	if ($flags & STRPOSM_CI) $pattern .= 'i';
	if (preg_match($pattern, $haystack, $matches, PREG_OFFSET_CAPTURE, $offset)) {
		$found = array_shift($matches);
		if (($flags & STRPOSM_NC) || ($flags & STRPOSM_MATCH_AS_INDEX)) {
			$index = array_search($found, $matches);
		}
		$match = (($flags & STRPOSM_MATCH_AS_INDEX)
			  ? $index
			  : (($flags & STRPOSM_NC)
			     ? $needles[$index]
			     : $found[0]
			    )
			 );
		return $found[1];
	}

strposm_no_match:
	$match = ($flags & STRPOSM_MATCH_AS_INDEX) ? false : null;
	return false;
}

Example usage:

$haystack = 'At the fruit monger we bought bananas, cherries and mangoes: good fruit.';
$needles = array('apples', 'mangoes', 'cherries', 'bananas', 'oranges');
$needles_uc = array('APPLES', 'MANGOES', 'CHERRIES', 'BANANAS', 'ORANGES');

// Use as a strpos replacement.
$pos = strposm($haystack, /*$needle*/'fruit');
echo "\$pos == $pos\n"; // Outputs: $pos = 7

// Use as a strpos replacement with an offset.
$pos = strposm($haystack, /*$needle*/'fruit', /*$offset*/8);
echo "\$pos == $pos\n"; // Outputs: $pos = 66

// Basic use with multiple needles.
$pos = strposm($haystack, $needles);
echo "\$pos == $pos\n"; // Outputs (matching "bananas"): $pos = 30

// Use with multiple needles and an offset.
$pos = strposm($haystack, $needles, /*$offset*/31);
echo "\$pos == $pos\n"; // Outputs (matching "cherries"): $pos = 39

// Use with multiple needles, returning the matching needle.
// Outputs:
// The earliest occurrence of any of the needles was at position 30 for the needle 'bananas'.
$pos = strposm($haystack, $needles, /*$offset*/0, $match);
echo "The earliest occurrence of any of the needles was at position $pos for the needle '$match'.\n";

// Use with multiple needles, returning the index of the matching needle.
// Outputs:
// The earliest occurrence of any of the needles was at position 30, corresponding to the needle at index 4, 'bananas'.
$pos = strposm($haystack, $needles, /*$offset*/0, $match, /*$flags*/STRPOSM_MATCH_AS_INDEX);
echo "The earliest occurrence of any of the needles was at position $pos, ".
     "corresponding to the needle at index $match, '{$needles[$match]}'.\n";

// Use with multiple needles, case-insensitive matching and an offset,
// returning the matching needle in haystack case.
// Outputs (note the lowercase needle in $match even though $needles_uc contains uppercase needles;
//          this is due to the lack of the STRPOSM_NC flag):
// The earliest occurrence of any of the needles was at position 39 for the needle 'cherries'.
$pos = strposm($haystack, $needles_uc, /*$offset*/31, $match, /*$flags*/STRPOSM_CI);
echo "The earliest occurrence of any of the needles was at position $pos for the needle '$match'.\n";

// Use with multiple needles, case-insensitive matching and an offset,
// returning the matching needle in needle case.
// Outputs (note the uppercase needle in $match due to the STRPOSM_NC flag):
// The earliest occurrence of any of the needles was at position 39 for the needle 'CHERRIES'.
$pos = strposm($haystack, $needles_uc, /*$offset*/31, $match, /*$flags*/STRPOSM_CI|STRPOSM_NC);
echo "The earliest occurrence of any of the needles was at position $pos for the needle '$match'.\n";

With comments:

function strposm($haystack, $needles, $offset = 0, &$match = null, $flags = 0) {
	// In the special case where $needles is not an array, simply wrap
	// strpos and stripos for performance reasons.
	if (!is_array($needles)) {
		$func = $flags & STRPOSM_CI ? 'stripos' : 'strpos';
		$pos = $func($haystack, $needles, $offset);
		if ($pos !== false) {
			$match = (($flags & STRPOSM_MATCH_AS_INDEX)
			          ? 0
			          : (($flags & STRPOSM_NC)
			             ? $needles
			             : substr($haystack, $pos, strlen($needles))
			            )
			          );
			return $pos;
		} else	goto strposm_no_match;
	}

	// $needles is an array. Proceed appropriately, initially by...
	// ...escaping regular expression meta characters in the needles.
	$needles_esc = array_map('preg_quote', $needles);
	// If either of the "needle case" or "match as index" flags are set,
	// then create a sub-match for each escaped needle by enclosing it in
	// parentheses. We use these later to find the index of the matching
	// needle.
	if (($flags & STRPOSM_NC) || ($flags & STRPOSM_MATCH_AS_INDEX)) {
		$needles_esc = array_map(
			function($needle) {return '('.$needle.')';},
			$needles_esc
		);
	}
	// Create the regular expression pattern to search for all needles.
	$pattern = '('.implode('|', $needles_esc).')';
	// If the "case insensitive" flag is set, then modify the regular
	// expression with "i", meaning that the match is "caseless".
	if ($flags & STRPOSM_CI) $pattern .= 'i';
	// Find the first match, including its offset.
	if (preg_match($pattern, $haystack, $matches, PREG_OFFSET_CAPTURE, $offset)) {
		// Pull the first entry, the overall match, out of the matches array.
		$found = array_shift($matches);
		// If we need the index of the matching needle, then...
		if (($flags & STRPOSM_NC) || ($flags & STRPOSM_MATCH_AS_INDEX)) {
			// ...find the index of the sub-match that is identical
			// to the overall match that we just pulled out.
			// Because sub-matches are in the same order as needles,
			// this is also the index into $needles of the matching
			// needle.
			$index = array_search($found, $matches);
		}
		// If the "match as index" flag is set, then return in $match
		// the matching needle's index, otherwise...
		$match = (($flags & STRPOSM_MATCH_AS_INDEX)
			  ? $index
			  // ...if the "needle case" flag is set, then index into
			  // $needles using the previously-determined index to return
			  // in $match the matching needle in needle case, otherwise...
			  : (($flags & STRPOSM_NC)
			     ? $needles[$index]
			     // ...by default, return in $match the matching needle in
			     // haystack case.
			     : $found[0]
			    )
			 );
		// Return the captured offset.
		return $found[1];
	}

strposm_no_match:
	// Nothing matched. Set appropriate return values.
	$match = ($flags & STRPOSM_MATCH_AS_INDEX) ? false : null;
	return false;
}

The first minimal version:

function strposm_min1($haystack, $needles, $offset = 0, &$match = null) {
	$pattern = '('.implode('|', array_map('preg_quote', (array)$needles)).')';
	if (preg_match($pattern, $haystack, $matches, PREG_OFFSET_CAPTURE, $offset)) {
		$match = $matches[0][0];
		return $matches[0][1];
	} else {
		$match = null;
		return false;
	}
}

Example usage:

$haystack = 'At the fruit monger we bought bananas, cherries and mangoes.';
$needles = array('apples', 'mangoes', 'cherries', 'bananas', 'oranges');

// Outputs:
// The earliest occurrence of any of the needles was at position 52 for the needle 'mangoes'.
$pos = strposm_min1($haystack, $needles, /*$offset*/40, $match);
echo "The earliest occurrence of any of the needles was at position $pos for the needle '$match'.\n";

The most minimal version, returning an array rather than passing a return parameter by reference:

function strposm_min2($haystack, $needles, $offset = 0) {
	$pattern = '('.implode('|', array_map('preg_quote', (array)$needles)).')';
	return preg_match($pattern, $haystack, $matches, PREG_OFFSET_CAPTURE, $offset)
	         ? array_reverse($matches[0])
	         : array(false, null);
}

Example usage:

$haystack = 'At the fruit monger we bought bananas, cherries and mangoes.';
$needles = array('apples', 'mangoes', 'cherries', 'bananas', 'oranges');

// Use without an offset.
// Outputs:
// The earliest occurrence of any of the needles was at position 30 for the needle 'bananas'.
list($pos, $match) = strposm_min2($haystack, $needles);
echo "The earliest occurrence of any of the needles was at position $pos for the needle '$match'.\n";

// Use with an offset.
// Outputs:
// The earliest occurrence of any of the needles was at position 52 for the needle 'mangoes'.
list($pos, $match) = strposm_min2($haystack, $needles, /*$offset*/40);
echo "The earliest occurrence of any of the needles was at position $pos for the needle '$match'.\n";

// To get just the position:
list($pos) = strposm_min2($haystack, $needles, /*$offset*/40); // $pos == 52

// To get just the matching needle:
list(,$match) = strposm_min2($haystack, $needles, /*$offset*/40); // $match == 'mangoes'