The Boolean NOT problem

A known difficult problem in online search is finding records on a topic that aren’t about the ‘usual suspects’. For instance, imagine that you’re researching the topic of renewable energy, and you’re looking for articles about renewable energy sources other than wind, wave and solar.

In principle, you can perform an advanced search on most search engines, where you tell the search engine to find articles which do include the words renewable and energy, but which do not include wind, wave or solar. This is known as a Boolean NOT search.

The problem is that many relevant records, discussing other renewable energy sources, will be excluded because they contain not only mentions of the other sources, but also mentions of wind, wave or solar. The Boolean NOT would only find documents which never mentioned those three keywords; in practice, this would exclude almost all the relevant documents, because the relevant documents would almost certainly mention those three keywords as part of their opening section about renewable energy sources.

The image opposite shows a different way of tackling the problem. It is a Search Visualiser search for wind, wave and solar. It shows five documents. The one on the right has an interesting structure. All three keywords occur close to each other at the start of the document; presumably an introduction, setting the scene. There is then a section with repeated mentions of one keyword (black squares). There is then a gap, followed by a band of numerous mentions of another keyword (red squares). This is immediately followed by another band of mentions of a different keyword (green squares).

This looks like a well structured document, with a series of sections on different energy sources. We know what three of them are (wind, wave and solar) but it looks as if there’s also a section on a different energy source. This is in fact the case; the source discussed in that gap between the black squares and the red squares is tidal energy.

In summary, this visualisation provides a way of solving the Boolean NOT problem at least some of the time.

It is also a useful way of checking for structure when writing large documents, via seeing the relative balance and distributions of different key terms.

Wind, wave, solar and a significant absence

Material on this website is copyright of Hyde & Rugg unless otherwise stated.