Neo4j backs new graph query standard for AI era

Interview. AI searches that query any part of an organization’s data will need to look at structured as well as unstructured data, and the structured data won’t just be relational databases, where SQL rules. There are graph databases that store relationships between entities. The entries in a graph database can’t be vectorized, ruling out GenAI similarity search responses to natural language queries into such database content, nor can SQL be used.

Neo4j reckons it has a way for GenAI to access its graph database records, and we interviewed Andreas Kollegger, lead for GenAI Innovation, to find out more.

Blocks & Files: Where should graph databases be used instead of relational or other databases? What do graph databases do that other databases don’t?

Andreas Kollegger

Andreas Kollegger: Graph databases take a different approach to data modelling and querying compared to relational databases, focusing instead on use cases where the relationships between data points are just as important as the data itself. They don’t just store data, they encode the semantics of how things relate. That’s why graphs excel at fraud detection, recommendation engines, supply chain bottleneck analysis, and other scenarios where uncovering hidden patterns can simplify complexity and surface insights critical to decision-making.

On the other hand, relational databases are designed to store structured data in tables and perform aggregations, sums, or filters, which are more likely to be used in accounting systems, inventory management, customer records and other transactional applications. In relational databases, uncovering complex insights often requires complicated JOINs and multiple queries, which become cumbersome and inefficient as the connections multiply. Graphs, however, allow you to traverse relationships directly, revealing insights that would otherwise remain buried. 

Put simply, (despite what the name implies) relational databases tell you what exists; while graph databases tell you how those things are connected, leading users to creative solutions for pressing problems.

Blocks & Files: The widely used relational databases have their SQL query language, which is pretty standard and cross-supplier. Graph databases are nowhere near as widely used. Is there a cross-supplier query language? Is one likely to evolve? 

Andreas Kollegger: It’s fair to say that SQL is the standard language of relational databases, which is why relational systems have such broad adoption. For many years, graph databases didn’t have a single, universal query language. However, as of April 2024, that changed with the announcement of the ISO approved Graph Query Language (GQL) – a concrete standard for querying graphs across platforms with broad industry backing. It’s closely aligned with Cypher and familiar to SQL users, which makes adoption straightforward. With all major graph vendors moving toward GQL compliance, this marks a significant step towards wider adoption of graph technologies across enterprises.

Blocks & Files: What is Neo4j’s graph database query language? Could you provide a simple example of what it looks like and how it works? 

Andreas Kollegger: Neo4j uses Cypher as its query language, which has organically evolved to become a fully GQL-compliant implementation. That way, developers can keep using Cypher as they always have, while knowing it aligns with the new ISO standard for graph querying. Most notably, Cypher is designed to be readable and intuitive. By matching patterns of nodes with relationships, users can navigate connected data easily, without needing to write complicated JOINs. It also scales naturally to more complex queries, whether analyzing social networks, supply chains, or recommendation engines. 

Cypher’s similarity with SQL makes it easier for developers with SQL experience to interpret Cypher queries, therefore supporting a smoother transition from relational to graph querying.

For example, one can create a query to find all movies connected to Tom Hanks and the type of relationship, like this: 

MATCH (tom:Person {name:’Tom Hanks’})-[r]->(m:Movie) 

RETURN type(r) AS type, m.title AS movie 

This query finds the Person node for Tom Hanks, follows all outgoing relationships [r] to Movie nodes, and returns both the relationship type (e.g., ACTED_IN or DIRECTED) and the movie titles. This illustrates how Cypher makes exploring and querying connected data simple and intuitive. 

Blocks & Files: How skilled do ordinary users have to be to use it? As skilled as an SQL coder? Do they need access to a graph database query building person? 

Andreas Kollegger: Historically, while there are similarities between Cypher and SQL, users needed some understanding of graph structures – nodes and relationships – to write queries effectively. This often meant relying on developers to write queries and interpret results, making graph databases feel like a specialist domain. However, today, graph technology is being democratized. Modern tools provide drag-and-drop workflows, ready-made algorithms, and integrations with familiar formats like spreadsheets, meaning you don’t have to be a Cypher expert to start exploring graphs. 

With graph technology now appearing in forms that business users can explore and understand for themselves, you don’t need SQL-level expertise. Developers now serve more as facilitators than gatekeepers, enabling a broader ecosystem of users to leverage graph data effectively. 

Blocks & Files: Could GenAI act as a natural language interface to Neo4j’s graph database, constructing a query from a user’s input request? How would that work? 

Andreas Kollegger: Absolutely, GenAI language models can map natural language requests to Cypher queries by first interpreting the intent of your question and then translating it into the right query structure. For instance, if you were to ask, “Which customers bought both product X and product Y in the last month?” a GenAI system can automatically generate the appropriate MATCH and WHERE clauses in Cypher.

The process typically works in three steps. First, intent extraction, where the system interprets what the user actually wants, followed by query generation, which turns that intent into Cypher. Finally, execution and post-processing run the query in our graph databases and format the results for the user.

This removes the need for the user to know the language, letting them interact with graph data conversationally. For example, with Neo4j’s LLM Knowledge Graph Builder, users can drag in documents, web pages, videos and more to create a queryable graph, then use a natural language interface to ask questions, with the LLM automatically extracting nodes, relationships, and generating queries. This makes it easy for anyone, regardless of technical expertise, to explore complex connected data. 

Blocks & Files: Could you describe the output from such a natural language query? 

Andreas Kollegger: The output could be tabular, visual or both, depending on the tool. Sometimes, results are returned as a simple table of rows and columns, similar to SQL, showing the nodes and their properties. In other cases, it could be a graph visualization that highlights how nodes connect and what relationships exist between them. Neo4j’s tools, like Bloom or Browser, can also overlay aggregated metrics or insights, including node counts or centrality scores, derived from algorithms run on the graph databases.

Since graphs encode relationships natively, the visual representation is often the most intuitive – letting users see the connections directly rather than having to infer them from rows and columns. 

Blocks & Files: I’m guessing it does not make sense to suggest a graph database could be vectorized. Why not? 

Andreas Kollegger: Vectorising a graph by itself may seem appealing, but it can’t capture the structured, navigable nature that makes graph databases powerful. Vectors convert data into numerical forms, making them great for similarity search, machine learning or embedding documents and images. However, graphs encode explicit relationships between nodes, which can be traversed and analysed. Converting a graph entirely into vectors would lose that native structure and semantics. 

That said, Neo4j does integrate native vector search as part of its core database capabilities to help capture implicit patterns and relationships based on items with similar data characteristics rather than exact matches. 

This allows users to perform similarity searches or embed ML features while still preserving the graph structure. This approach combines the strengths of vector-based methods for AI/ML tasks with the rich, traversable relationships that make graph databases inherently powerful.

Blocks & Files: An AI agent could receive a query about an organization’s data, some of which is in relational databases, some in a graph database, and some held in unstructured files and object storage systems. Am I right in thinking that the agent could deconstruct the query into three sub-queries? One would use trad SQL to search the relational database, one would use an LLM sub-agent to look at the vector embeddings of the unstructured data, and a third would take its part of the query and build a graph database query and execute it. Then the overall agent would combine the results from the three types of search and generate a response for the user. Does this make sense? 

Andreas Kollegger: Yes, that is one approach that makes sense when an organisation’s data is spread across multiple types of storage. Modern AI agents can act like orchestrators, breaking a complex query into specialised sub-queries tailored for each data store. For your example, the three sub-queries would be: 

  • Relational databases an agent can generate SQL queries to filter, join and aggregate structured tables.
  • Graph databases – an agent can translate the relevant part of the query into Cypher to uncover patterns or connections that wouldn’t be obvious from a flat table. 
  • Unstructured data (documents, emails, etc.) an LLM-powered agent can use vector embeddings to find semantically relevant information, even when the exact words don’t match. 

The agent then aggregates the results, using retrieval mechanisms such as GraphRAG, which leverages the structure of knowledge graphs to pull relevant nodes and relationships before synthesizing a coherent response. This hybrid approach enables users to leverage the strengths of each data paradigm without forcing one system to do all the work, delivering faster, richer insights than any single database could provide. By using GraphRAG, AI agents can generate answers that are not only correct but also more accurate, contextual, and explainable.

Bootnote

GraphRAG combines knowledge graphs with Retrieval-Augmented Generation (RAG), enabling you to build GenAI applications that deliver better results. Here is a brief history of Neo4j:

  • 2000: The concept of Neo4j began when founders Emil Eifrem, Johan Svensson, and Peter Neubauer, while working on a content management system, identified the need for a database to handle complex relationships more effectively than relational databases.
  • 2007: Neo4j, Inc. (originally Neo Technology) was founded in Malmö, Sweden, and the first version of the Neo4j graph database was released as an open-source project.
  • 2010: Neo4j gained traction with the release of version 1.0, introducing features like the Cypher query language, a declarative language for querying graph data, which simplified development.
  • 2011-2014: Neo4j grew in popularity, with enterprise adoption increasing. The company raised funding to expand, and Neo4j transitioned to a dual-licensing model (open-source Community Edition and commercial Enterprise Edition).
  • 2018: Neo4j 3.4 introduced native graph processing improvements and cloud integrations.
  • 2020: Neo4j launched Aura, a fully-managed cloud service, making it easier for organizations to deploy and scale graph databases.
  • Present (2025): Neo4j provides advancements in AI integration, scalability, and cloud-native features. It has more than 80 Fortune 100 customers, 170-plus partners and 300,000 developers in its ecosystem.

Contact Neo4j to find out more.