MyBB Community Forums

Full Version: Fulltext Search and Stopwords
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I hope you consider this a bug but if not it's an improvement.

Currently MySQL defaults with fulltext stopwords. These are words that do not get part of the index.

By default MyBB has a search add a + to each word you search so a search for "my setup" is actually "+my +setup".

So any MyBB forum with fulltext search enabled, including this one, will show zero results anytime search has a word that's part of stopwords.

And to add to that if you have a ft_min_word_len set in MySQL (my.cnf) it will also do 0 results.

Here is default list of stopwords (36 words):
https://dev.mysql.com/doc/refman/5.7/en/...ch-indexes

I propose that MyBB does 2 things.
1. Hard code into the cleanwords functions inside functions_search.php an array to simply remove any default stopwords. Could also be put into a setting or table with the $cache for adding, subtracting.

2. Remove any words from search under the setting of "Minimum Search Word Length" which is part of MyBB already. Just remove the word. A search for "my setup" should then just do a search for "setup".

Returning zero results is bad. Especially if you have 40 million posts.

This should be relatively easy to fix and will improve search for MyBB.

Thank you.

EDIT: Btw, this is an issue with default MySQL and default MyBB. No special setup or alteration is required for this behavior.
(2018-12-20, 04:56 AM)labrocca Wrote: [ -> ]I hope you consider this a bug but if not it's an improvement.

Currently MySQL defaults with fulltext stopwords.  These are words that do not get part of the index.

By default MyBB has a search add a + to each word you search so a search for "my setup" is actually "+my +setup".  

So any MyBB forum with fulltext search enabled, including this one, will show zero results anytime search has a word that's part of stopwords.

And to add to that if you have a ft_min_word_len set in MySQL (my.cnf) it will also do 0 results.  

Here is default list of stopwords (36 words):
https://dev.mysql.com/doc/refman/5.7/en/...ch-indexes

I propose that MyBB does 2 things.
1. Hard code into the cleanwords functions inside functions_search.php an array to simply remove any default stopwords. Could also be put into a setting or table with the $cache for adding, subtracting.

2. Remove any words from search under the setting of "Minimum Search Word Length" which is part of MyBB already.  Just remove the word.  A search for "my setup" should then just do a search for "setup".  

Returning zero results is bad.  Especially if you have 40 million posts.  

This should be relatively easy to fix and will improve search for MyBB.

Thank you.

EDIT: Btw, this is an issue with default MySQL and default MyBB.  No special setup or alteration is required for this behavior.

Which version do you consider to be the 'default MySQL'? 

Not all websites use the same version as shown in this screenshot. (mine uses 5.6.41)

[Image: e66a5a75d077f5ccb7d9adcbf280b9bd.png]

So, I am wondering if your suggestion will apply to ALL versions or just only to one version of MySQL. 

My hosting company has InnoDB storage engine for their version of MySQL, but I've seen some other website hosting companies use MariaDB storage engine. 

How will your suggestion be affected by the website hosting company choice of MySQL version and storage engine?
It's all modern versions of MySQL and MariaDB for InnoDB.

Because InnoDB added Fulltext searching and InnoDB is the default engine of MySQL and MariaDB and, the stopwords is default. And InnoDB is default storage engine of MyBB too (which it should be).

This will be for forums with fulltext searching enabled and using innodb which would likely be any new install.
Definitely makes sense to me to handle the stop words. I'm not sure if Postgres and SQLite act the same, so it might be best to add a table to allow the management of words to remove from search.

It's worth noting that the stop words for MyISAM are different to those for InnoDB, and there are still likely a bunch of installs using MyISAM and there are definitely plugins still using MyISAM. I'm not sure how much crossover there is between the two lists (I haven't taken the time to check yet), so it might be nice to have the upgrader check the engine being used and insert the default values based on that?
(2018-12-20, 10:16 PM)Euan T Wrote: [ -> ] I'm not sure how much crossover there is between the two lists (I haven't taken the time to check yet), so it might be nice to have the upgrader check the engine being used and insert the default values based on that?

Yep, that sounds like the best plan of action for that issue. 

If the installer/upgrader did not have that option, then you will see many Admins complaining about the lack of fixes for dealing with stopwords.
Handling the stopwords in cache imho would be best. Allowing admins to add/remove at will in ACP.

This should also be easy:

2. Remove any words from search under the setting of "Minimum Search Word Length" which is part of MyBB already. Just remove the word. A search for "my setup" should then just do a search for "setup".


Just one line in the clean functions of the functions_search.php....

//Replaces less than 3 characters
$keywords = preg_replace('/(\b.{1,2}\s)/','',$keywords);
Hi,

Thank you for your report. We have pushed this issue to our Github repository for further analysis where you can track our commits and progress with fixing this bug. Discussions regarding this bug may also take place there too.

Follow this link to visit the issue on Github: https://github.com/mybb/mybb/issues/3609

Thanks for contributing to MyBB!

Regards,
The MyBB Group