Neo4j Cypher: Find common nodes between a set of matched nodes

Question

Very similar to the question posted here

I have the following nodes: Article and Words. Each word is connected to an article by a MENTIONED relationship.

I need to query all articles that have common words where the list of common words is dynamic. From the clients perspective, I am passing back a list of words and expecting back a results of articles that have those words in common.

The following query does the job

WITH ["orange", "apple"] as words
MATCH (w:Word)<-[:MENTIONED]-(a:Article)-[:MENTIONED]->(w2:Word)
WHERE w.name IN words AND w2.name IN words
RETURN a, w, w2

but does not work with word list of one. How can I make it handle any number of words? Is there a better way to do this?


Show source
| database   | neo4j   | graph   | cypher   2017-01-04 22:01 1 Answers

Answers ( 1 )

  1. 2017-01-05 03:01

    Yes. There are two approaches I can think of:

    1. Finding all articles that contain some subset of those words, and then returning only articles where the number of words mentioned is the number of words you supplied in your wordlist.

    2. Getting the :Word nodes for the given list of words, and then getting articles where all words are mentioned in the article.

    Here's an example graph to test this on:

    MERGE (a1:Article {name:'a1'}), 
          (a2:Article {name:'a2'}), 
          (a3:Article {name:'a3'})
    MERGE (w1:Word{name:'orange'}), 
          (w2:Word{name:'apple'}), 
          (w3:Word{name:'pineapple'}), 
          (w4:Word{name:'banana'})
    MERGE (a1)-[:MENTIONED]->(w1), 
          (a1)-[:MENTIONED]->(w2), 
          (a1)-[:MENTIONED]->(w3), 
          (a1)-[:MENTIONED]->(w4),
          (a2)-[:MENTIONED]->(w1), 
          (a2)-[:MENTIONED]->(w4),
          (a3)-[:MENTIONED]->(w1), 
          (a3)-[:MENTIONED]->(w2),
          (a3)-[:MENTIONED]->(w3)
    

    Approach 1, comparing the wordlist size to the number of words mentioned in the article, looks like this:

    WITH ["orange", "apple"] as words
    MATCH (word:Word)<-[:MENTIONED]-(article:Article)
    WHERE word.name IN words
    WITH words, article, COUNT(word) as wordCount
    WHERE wordCount = SIZE(words)
    RETURN article
    

    This only works if there is ever only one :MENTIONED relationship between an article and a mentioned word, no matter how many times that word is mentioned.

    Approach 2 is using ALL() on the collection of :Words to ensure that we match on an article where all words are mentioned:

    WITH ["orange", "apple"] as words
    MATCH (word:Word) 
    WHERE word.name in words
    WITH COLLECT(word) as words
    MATCH (article:Article)
    WHERE ALL (word in words WHERE (word)<-[:MENTIONED]-(article))
    RETURN article
    

    You can try using PROFILE with each of these to figure out which works best with your data set.

◀ Go back