有关php内建函数复杂度的一点探究

在用php实现CtCI里面的习题时,算法中常常要用到array_key_exists()方法。然而通常在数组中检索的复杂度为O(n),那么这是否会对php实现的算法产生比较大的影响呢?

在php的 manual 中有人提到,isset()array_key_exists()快的多,但是两者对null的处理方式是不一样的。

<?php //test.php
$t['a'] = null;
if (isset($t['a']))
    echo "a is set\n";
else
    echo "a is not set\n";

if (array_key_exists('a', $t))
    echo "a exists\n";
else
    echo "a doesn't exists\n";

// 运行上面的代码,会输出:
// a is not set
// a exists

在开发中可以采用如下方法:

if (isset(..) || array_key_exists(...))
{
...
}

大大提升了运行速度又保证了结果。

Benchmark (100000 runs): array_key_exists() : 205 ms is_set() : 35ms isset() || array_key_exists() : 48ms

另外 stackoverflow 上有人对array_*的复杂度做过一番调查:

  • array_key_exists() O(n)但是非常接近O(1)。 - this is because of linear polling in collisions, but because the chance of collisions is very small, the coefficient is also very small. I find you treat hash lookups as O(1) to give a more realistic big-O. For example the different between N=1000 and N=100000 is only about 50% slow down.

  • isset() O(n)但是非常接近O(1)。 - it uses the same lookup as array_key_exists. Since it’s language construct, will cache the lookup if the key is hardcoded, resulting in speed up in cases where the same key is used repeatedly.

  • in_array() O(n) - this is because it does a linear search though the array until it finds the value.

  • array_search() O(n) - it uses the same core function as in_array but returns value.

  • array_push O(∑ var_i, for all i)

  • array_pop() O(1)

  • array_shift() O(n) - it has to reindex all the keys

  • array_unshift() O(n + ∑ var_i, for all i) - it has to reindex all the keys

  • array_intersect_key() if intersection 100% do O(Max(param_i_size)*∑param_i_count, for all i), if intersection 0% intersect O(∑param_i_size, for all i)

  • array_intersect() if intersection 100% do O(n^2*∑param_i_count, for all i), if intersection 0% intersect O(n^2)

  • array_intersect_assoc() if intersection 100% do O(Max(param_i_size)*∑param_i_count, for all i), if intersection 0% intersect O(∑param_i_size, for all i)

  • array_diff() O(π param_i_size, for all i) - That’s product of all the param_sizes

  • array_diff_key() O(∑ param_i_size, for i != 1) - this is because we don’t need to iterate over the first array.

  • array_merge() O( ∑ array_i, i != 1 ) - doesn’t need to iterate over the first array

  • + (union) O(n), where n is size of the 2nd array (ie array_first + array_second) - less overhead than array_merge since it doesn’t have to renumber

  • array_replace() O( ∑ array_i, for all i )

  • shuffle() O(n)

  • array_rand() O(n) - Requires a linear poll.

  • array_fill() O(n)

  • array_fill_keys() O(n)

  • range() O(n)

  • array_splice() O(offset + length)

  • array_slice() O(offset + length) or O(n) if length = NULL

  • array_keys() O(n)

  • array_values() O(n)

  • array_reverse() O(n)

  • array_pad() O(pad_size)

  • array_flip() O(n)

  • array_sum() O(n)

  • array_product() O(n)

  • array_reduce() O(n)

  • array_filter() O(n)

  • array_map() O(n)

  • array_chunk() O(n)

  • array_combine() O(n)

很佩服原文作者的探究精神,文章很值得仔细读一读。链接附在下面。

参考链接

List of Big-O for PHP functions array_key_exists - User Contributed Notes

comments powered by Disqus