Regexp not working when excluded?

Hi,

I have created a web collection and I am trying to exclude some Wordpress locations but they don’t seem to be working.

In my collection.cfg I have the following however urls with wp-json in them still show up.

exclude_patterns=/cgi-bin,/vti,/_vti,calendar,SQ_DESIGN_NAME=print,SQ_ACTION=logout,SQ_PAINT_LAYOUT_NAME=,%3E%3C/script%3E,google-analytics.com,regexp:./wp-json/.,regexp:.*/feed$

Is my syntax incorrect?

Thanks

The configuration syntax only allows for a single regular expression unfortunately.

Please see the documentation page here:
https://docs.squiz.net/funnelback/docs/latest/collections/collection-types/web/included-and-excluded-web-content.html#regular-expressions-in-includeexclude-patterns

However, in this case you may not need a regular expression.

/wp-json/,/feed

may be sufficient. This would exclude any URLs with those exact patterns in them, the /wp-json/ rule is likely to be fine as-is. The second rule may have false positives, for example, /news-articles/feeding-goats-on-my-farm.html would trigger the /feed part and be excluded.

In that case, a simplified regexp here would be this:

regexp:.\/wp-json\/.|\/feed$

The rest of the default terms in exclude_patterns are of questionable value, I generally delete them to simplify the pattern. /cgi-bin/, /vti, /_vti aren’t used nowadays, and patterns like google-analytics.com don’t need to be excluded because they shouldn’t be in your include_patterns.

1 Like

Hi,

I changed my regex to what you wrote in the exclude box and did an update of the collection and the results still show pages with that address in it.

regexp:./wp-json/.|/feed$