2018-09-02, 07:07 PM
(2018-07-23, 10:49 AM)linguist Wrote: [ -> ]Just asking: since they basically _look_ the same, did you make sure that in the cyrillic string the <je> are Serbian cyrillic letters not latin letters? They look alike, but have different Unicode points and will thus be treated as different in a database search etc.:
Cyrillic: u0435 u0458 : је
Latin: u006A u0065 : je
If you want to be 100% safe, you'd need to have four patterns to exclude, because people use these lookalike letters all the time to circumvent filters:
<jeb> (all Latin)
<јеb> (Cyrillic је plus Latin b)
<јеб> (all Cyrillic)
<jeб> (Latin je, Cyrillic b)
Sorry, for deleyd reply. They are similar, its true, but not the same. Thanks.
I appriciate recent changes in bad words filter, but bad words filter is not working in 1.8.18 for Serbian cyrillic letters.
Hope so that this will be corrected in 1.8.19 version.