2009-02-01, 02:46 AM
(2009-02-01, 01:05 AM)frostschutz Wrote: [ -> ]You're confusing data structures of a built in function with PHP data structures.Well, your first point says that the compiled code is fast, and your second point is saying it's slow. What is it exactly?
I'm sure that internally the function has a whole array of memory structures that no one wants to know about, it even has to compile the regular expression first and everything before it can do any work. All of this does not matter from PHP point of view as it's a builtin function, written in C and compiled to machine code, so it's blazingly fast, compared to the scripting language. But in the end, preg_replace builds a single PHP string, which in this case is your ready to use end result even. All the work was done by the builtin function.
Compared to that, preg_split has to build a data structure that is n times more complex than what preg_replace builds, i.e. not one string, but possibly dozens of PHP strings wrapped in a PHP array/hash. That alone makes it definitely slower than preg_replace. What makes PHP data structures so expensive is that no one knows what they will be used for, so they have to be allocated, initialized, tracked, refcounted, and garbage collected.
preg_replace does all the work by putting it into a string. Similarly, preg_split does "all the work" by putting it into an array.
(2009-02-01, 01:05 AM)frostschutz Wrote: [ -> ]But then that isn't even your ready to use end result, you have to use PHP logic, call other PHP functions that build new structures, etc. etc. and finally build the end string. Since PHP is slow you have by then reached several dimensions slower than what the original preg_replace solution did.Not always the case. Do some benchmarks yourself - preg_* functions are extremely expensive - much more than using a fair bit of PHP logic to do the task in various cases. Reason being is that regular expressions are very difficult to evaluate at a lower level - much more difficult than parsing PHP code and evaluating that.
In scripting languages, optimizing works by offloading as much work as possible to as little builtin function calls as possible. Usually regardless of how those functions work internally - they're builtin so even if they do something internally that isn't actually necessary to your problem it doesn't matter because it's still several dimensions faster than writing a better logic in the scripting language itself because the scripting language is just so slow in comparison.
If you're replacing a single builtin function call with dozens of lines of PHP code and function calls you're going into the wrong direction. The only thing that makes this negligible is the sheer amount of processing power offered by machines nowadays so no one notices if something is a helluvalot more expensive than need be.