Skip to content
Merged
Show file tree
Hide file tree
Changes from 62 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
b754a4d
[wip] Add postgres backend for graphs.
DanielNScott Mar 24, 2026
3282ffa
[wip] Raise where postgres graphDB incompatible with graphDB consumers
DanielNScott Mar 24, 2026
7ee2e6e
[pass] Add tests and fixes for postgres graph adapter.
DanielNScott Mar 25, 2026
6952dd8
[pass] Apply formatting guidelines via ruff.
DanielNScott Mar 25, 2026
e5e1232
[pass] Update documentation
DanielNScott Mar 25, 2026
61c0b05
[wip] Add postgres hybrid adapter combining graph and vector backends.
DanielNScott Mar 26, 2026
34d144a
[wip] Extend pghybrid incompatibility checks to match postgres ones.
DanielNScott Mar 26, 2026
47794eb
[pass] Add unit tests for postgres hybrid adapter.
DanielNScott Mar 26, 2026
36c4520
[wip] Add postgres backend for graphs.
DanielNScott Mar 24, 2026
f0396d5
[wip] Raise where postgres graphDB incompatible with graphDB consumers
DanielNScott Mar 24, 2026
4347a95
[pass] Add tests and fixes for postgres graph adapter.
DanielNScott Mar 25, 2026
b6a533a
[pass] Apply formatting guidelines via ruff.
DanielNScott Mar 25, 2026
a69929f
[pass] Update documentation
DanielNScott Mar 25, 2026
dea9d50
[pass] Add USE_UNIFIED_PROVIDER flag and pghybrid e2e test.
DanielNScott Mar 26, 2026
0781aaf
[wip] Batch SQL operations in postgres graph adapter.
DanielNScott Mar 30, 2026
fe1c3b0
[pass] Unify schema definition into SQLAlchemy Core table objects.
DanielNScott Mar 30, 2026
1a846b6
[pass] Address PR #2506 review comments on postgres graph adapter.
DanielNScott Apr 1, 2026
36837f7
[pass] Address new PR review nitpicks.
DanielNScott Apr 1, 2026
7d637ad
Merge commit '7cfb55ff' into feature/cog-4469-graph-aware-embeddings-…
DanielNScott Apr 1, 2026
b3203b3
Merge branch 'feature/cog-4463-graph-aware-embeddings-implement-minim…
DanielNScott Apr 1, 2026
e5c0e5e
[pass] Move postgres-dependent tests to new e2e/postgres directory.
DanielNScott Apr 2, 2026
ea5efcd
[pass] Defensive tweaks throughout pghybrid system.
DanielNScott Apr 2, 2026
ffc246c
[pass] More tweaks in pghybrid sub-system.
DanielNScott Apr 2, 2026
c42cf17
[pass] Document get_triplets_batch as optional method on GraphDBInter…
DanielNScott Apr 2, 2026
03625fa
[pass] Fix pghybrid test CI integration.
DanielNScott Apr 2, 2026
2a7e388
[pass] Use hybrid adds in add_data_points when possible.
DanielNScott Apr 2, 2026
3317006
[pass] Add equivalence test, fix edge payload in hybrid adapter.
DanielNScott Apr 2, 2026
4c216aa
[pass] Fix CI env vars, interface contract, vector upsert, and triple…
DanielNScott Apr 2, 2026
f19e7d2
Merge branch 'dev' into feature/cog-4463-graph-aware-embeddings-imple…
lxobr Apr 7, 2026
70fe9ff
[pass] Fix hybrid adapter deadlocks, pgvector overhead, recreation.
DanielNScott Apr 7, 2026
986cfd7
[pass] Decouple postgres graph adapter from relational layer.
DanielNScott Apr 8, 2026
5335918
Merge branch 'feature/cog-4463-graph-aware-embeddings-implement-minim…
DanielNScott Apr 8, 2026
5f6ae51
[pass] Implement get_neighborhood and fix interface compliance after …
DanielNScott Apr 8, 2026
6f33e6f
Merge branch 'dev' into feature/cog-4463-graph-aware-embeddings-imple…
lxobr Apr 8, 2026
3a2b785
[pass] Reformat via pre-commit hooks
DanielNScott Apr 8, 2026
eafce01
Merge branch 'dev' into feature/cog-4463-graph-aware-embeddings-imple…
lxobr Apr 8, 2026
34a4ffe
[pass] Move postgres-dependent tests to e2e/postgres directory.
DanielNScott Apr 8, 2026
da6a001
Merge remote-tracking branch 'upstream/feature/cog-4463-graph-aware-e…
DanielNScott Apr 9, 2026
f784a7a
[pass] Implement AND logic for node_name_filter_operator in get_nodes…
DanielNScott Apr 10, 2026
6b2af3f
[pass] Address CodeRabbit review items on PR #2584.
DanielNScott Apr 10, 2026
1941db7
[pass] Tweaks: batch dedup, node tuple shape, and triplet ordering.
DanielNScott Apr 10, 2026
ac55c48
Merge branch 'feature/cog-4463-graph-aware-embeddings-implement-minim…
DanielNScott Apr 10, 2026
3f659fb
[pass] Fix PostgresAdapter constructor calls after 4463 merge.
DanielNScott Apr 10, 2026
14fb30b
Merge remote-tracking branch 'upstream/feature/cog-4469-graph-aware-e…
DanielNScott Apr 10, 2026
71d9b47
Merge remote-tracking branch 'upstream/dev' into feature/cog-4469-gra…
DanielNScott Apr 10, 2026
935ff16
[pass] Fix add_data_points test mock taking hybrid write path.
DanielNScott Apr 10, 2026
5e23c76
[pass] Fix add_data_points test mock and add hybrid write path test.
DanielNScott Apr 10, 2026
5065c8e
[pass] Fix lockfile, unified provider gate, edge guard, and test isol…
DanielNScott Apr 10, 2026
606e57f
[pass] Move pghybrid env check out of lru_cache in graph and vector f…
DanielNScott Apr 10, 2026
e1ed4d4
[pass] Fix LIMIT applied to join fan-out in search_graph_with_distances.
DanielNScott Apr 10, 2026
ad8c140
[pass] Fix hybrid adapter interface compliance and test fixture.
DanielNScott Apr 10, 2026
57e9665
[pass] Fix embeddable field lookup in add_nodes_with_vectors.
DanielNScott Apr 10, 2026
5145cba
[pass] Fix ruff formatting in PGVectorAdapter.
DanielNScott Apr 10, 2026
5f63a26
Merge branch 'dev' of https://github.com/topoteretes/cognee into feat…
DanielNScott Apr 10, 2026
1efdc0a
[pass] Clear PGVector metadata cache on prune, fix hybrid init docstr…
DanielNScott Apr 10, 2026
d66f152
Merge branch 'dev' of https://github.com/topoteretes/cognee into feat…
DanielNScott Apr 11, 2026
575da72
[pass] Add postgres extra to CI setup for graph adapter tests.
DanielNScott Apr 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 12 additions & 25 deletions .github/workflows/approve_dco.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,40 +11,27 @@ jobs:
- name: Validate Developer Certificate of Origin statement
uses: actions/github-script@v8
with:
# If using the built-in GITHUB_TOKEN, ensure it has 'read:org' permission.
# In GitHub Enterprise or private orgs, you might need a PAT (personal access token) with read:org scope.
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
const orgName = 'YOUR_ORGANIZATION_NAME'; // Replace with your org
const prUser = context.payload.pull_request.user.login;
const prBody = context.payload.pull_request.body || '';
const authorAssociation = context.payload.pull_request.author_association;

// Exact text you require in the PR body
const requiredStatement = "I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin";

// 1. Check if user is in the org
let isOrgMember = false;
try {
// Attempt to get membership info
const membership = await github.rest.orgs.getMembershipForUser({
org: orgName,
username: prUser,
});
// If we get here without an error, user is in the org
isOrgMember = true;
console.log(`${prUser} is a member of ${orgName}. Skipping DCO check.`);
} catch (error) {
// If we get a 404, user is NOT an org member
if (error.status === 404) {
console.log(`${prUser} is NOT a member of ${orgName}. Enforcing DCO check.`);
} else {
// Some other error—fail the workflow or handle accordingly
core.setFailed(`Error checking organization membership: ${error.message}`);
}
}
// 1. Check if user is an org member via author_association
// (available on the PR payload without extra API calls or permissions)
// OWNER, MEMBER, and COLLABORATOR are trusted roles.
const trustedRoles = ['OWNER', 'MEMBER', 'COLLABORATOR', 'CONTRIBUTOR'];
const isTrusted = trustedRoles.includes(authorAssociation);
Comment on lines +23 to +27
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In GitHub pull_request payloads, what does author_association = "CONTRIBUTOR" mean, and is it considered equivalent to OWNER/MEMBER/COLLABORATOR for trust decisions?

💡 Result:

Web search failed: Server error: no LLM provider could handle the message


🏁 Script executed:

# First, locate and read CONTRIBUTING.md to understand DCO requirements
find . -name "CONTRIBUTING.md" -type f | head -5

Repository: topoteretes/cognee

Length of output: 79


🌐 Web query:

GitHub pull request author_association field values meaning OWNER MEMBER COLLABORATOR CONTRIBUTOR documentation

💡 Result:

In GitHub’s author_association field (an enum named CommentAuthorAssociation), these values mean:

  • OWNER: Author is the owner of the repository. [1]
  • MEMBER: Author is a member of the organization that owns the repository. [1]
  • COLLABORATOR: Author has been invited to collaborate on the repository. [1]
  • CONTRIBUTOR: Author has previously committed to the repository. [1]

Source: GitHub GraphQL enum docs for CommentAuthorAssociation. [1]

References
[1] https://docs.github.com/en/enterprise-cloud@latest/graphql/reference/enums#commentauthorassociation


🏁 Script executed:

# Read the actual file under review to confirm current state
cat -n .github/workflows/approve_dco.yaml | head -40

Repository: topoteretes/cognee

Length of output: 2117


🏁 Script executed:

# Read CONTRIBUTING.md to verify DCO policy requirements
cat -n CONTRIBUTING.md

Repository: topoteretes/cognee

Length of output: 7165


Remove CONTRIBUTOR from trusted roles to enforce DCO policy

Line 26 includes CONTRIBUTOR in trustedRoles, allowing external contributors who have previously committed to bypass DCO enforcement (lines 29–35). However, CONTRIBUTOR in GitHub's author_association field means "has previously committed to the repository," not organizational membership or trusted collaborator status. This contradicts the comment on line 25 ("OWNER, MEMBER, and COLLABORATOR are trusted roles") and violates the DCO requirement stated in CONTRIBUTING.md section 5, which mandates DCO affirmation for all contributions.

Suggested fix
-            //    OWNER, MEMBER, and COLLABORATOR are trusted roles.
-            const trustedRoles = ['OWNER', 'MEMBER', 'COLLABORATOR', 'CONTRIBUTOR'];
+            //    OWNER, MEMBER, and COLLABORATOR are trusted roles.
+            const trustedRoles = ['OWNER', 'MEMBER', 'COLLABORATOR'];
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// 1. Check if user is an org member via author_association
// (available on the PR payload without extra API calls or permissions)
// OWNER, MEMBER, and COLLABORATOR are trusted roles.
const trustedRoles = ['OWNER', 'MEMBER', 'COLLABORATOR', 'CONTRIBUTOR'];
const isTrusted = trustedRoles.includes(authorAssociation);
// 1. Check if user is an org member via author_association
// (available on the PR payload without extra API calls or permissions)
// OWNER, MEMBER, and COLLABORATOR are trusted roles.
const trustedRoles = ['OWNER', 'MEMBER', 'COLLABORATOR'];
const isTrusted = trustedRoles.includes(authorAssociation);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/approve_dco.yaml around lines 23 - 27, The trustedRoles
array currently includes 'CONTRIBUTOR', which incorrectly treats GitHub
author_association "CONTRIBUTOR" as a trusted role; remove 'CONTRIBUTOR' from
the trustedRoles declaration so only OWNER, MEMBER, and COLLABORATOR are
considered trusted, and ensure any logic using isTrusted (computed from
trustedRoles.includes(authorAssociation)) continues to enforce DCO checks for
contributors; also update the nearby comment if needed to match the corrected
trustedRoles list.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not our contribution, outside scope.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DanielNScott, understood — I'll drop this for the current PR. Would you like me to open a follow-up GitHub issue to track removing CONTRIBUTOR from the trustedRoles array in .github/workflows/approve_dco.yaml, so it doesn't get lost?


🧠 Learnings used
Learnt from: CR
Repo: topoteretes/cognee PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-05T15:51:58.120Z
Learning: Sign commits and affirm the DCO as described in CONTRIBUTING.md

Learnt from: DanielNScott
Repo: topoteretes/cognee PR: 2584
File: cognee/infrastructure/databases/graph/postgres/adapter.py:35-44
Timestamp: 2026-04-10T03:13:45.843Z
Learning: In `topoteretes/cognee`, the factory function `get_graph_engine()` in `cognee/infrastructure/databases/graph/get_graph_engine.py` always calls `await graph_client.initialize()` (via a `hasattr` guard) before returning the adapter. This means all graph adapters, including `PostgresAdapter`, have their schema/tables created before any caller can invoke graph operations. Concerns about missing schema at runtime (e.g., "relation does not exist") do not apply to code paths that go through the factory.


if (isTrusted) {
console.log(`${prUser} has association '${authorAssociation}'. Skipping DCO check.`);
} else {
console.log(`${prUser} has association '${authorAssociation}'. Enforcing DCO check.`);

// 2. If user is not in the org, enforce the DCO statement
if (!isOrgMember) {
// 2. If user is not trusted, enforce the DCO statement
if (!prBody.includes(requiredStatement)) {
core.setFailed(
`DCO check failed. The PR body must include the following statement:\n\n${requiredStatement}`
Expand Down
40 changes: 39 additions & 1 deletion .github/workflows/graph_db_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ jobs:
if: ${{ inputs.databases == 'all' || contains(inputs.databases, 'postgres') }}
services:
postgres:
image: postgres:16
image: pgvector/pgvector:pg17
env:
POSTGRES_USER: cognee
POSTGRES_PASSWORD: cognee
Expand Down Expand Up @@ -133,6 +133,44 @@ jobs:
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
run: uv run python ./cognee/tests/e2e/postgres/test_graphdb_postgres.py

- name: Run Postgres Hybrid Adapter Tests
env:
ENV: 'dev'
DB_PROVIDER: postgres
DB_HOST: localhost
DB_PORT: 5432
DB_USERNAME: cognee
DB_PASSWORD: cognee
DB_NAME: cognee_db
EMBEDDING_DIMENSIONS: 300
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
run: uv run pytest cognee/tests/e2e/postgres/test_postgres_hybrid_adapter.py -v

- name: Run Postgres Hybrid E2E Test
env:
ENV: 'dev'
USE_UNIFIED_PROVIDER: "pghybrid"
DB_PROVIDER: postgres
DB_HOST: localhost
DB_PORT: 5432
DB_USERNAME: cognee
DB_PASSWORD: cognee
DB_NAME: cognee_db
ENABLE_BACKEND_ACCESS_CONTROL: 'false'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
EMBEDDING_DIMENSIONS: 300
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
run: uv run python ./cognee/tests/e2e/postgres/test_pghybrid.py

run-neo4j-tests:
name: Neo4j Tests
runs-on: ubuntu-22.04
Expand Down
22 changes: 21 additions & 1 deletion cognee/infrastructure/databases/graph/get_graph_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,18 @@ def create_graph_engine(
Wrapper function to call create graph engine with caching.
For a detailed description, see _create_graph_engine.
"""
# Check USE_UNIFIED_PROVIDER outside the cache so it's always re-read
import os

unified_provider = os.environ.get("USE_UNIFIED_PROVIDER", "")
if unified_provider == "pghybrid":
from .postgres.adapter import PostgresAdapter
from cognee.infrastructure.databases.relational.get_relational_engine import (
get_relational_engine,
)

return PostgresAdapter(connection_string=get_relational_engine().db_uri)

return _create_graph_engine(
graph_database_provider,
graph_file_path,
Expand Down Expand Up @@ -205,7 +217,15 @@ def _create_graph_engine(
graph_id=graph_identifier,
)

all_providers = list(supported_databases.keys()) + [
"neo4j",
"kuzu",
"kuzu-remote",
"postgres",
"neptune",
"neptune_analytics",
]
raise EnvironmentError(
f"Unsupported graph database provider: {graph_database_provider}. "
f"Supported providers are: {', '.join(list(supported_databases.keys()) + ['neo4j', 'kuzu', 'kuzu-remote', 'postgres', 'neptune', 'neptune_analytics'])}"
f"Supported providers are: {', '.join(all_providers)}"
)
14 changes: 14 additions & 0 deletions cognee/infrastructure/databases/graph/graph_db_interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -347,3 +347,17 @@ async def set_edge_feedback_weights(
Returns per-id update success.
"""
raise NotImplementedError("set_edge_feedback_weights is not implemented for this adapter")

async def get_triplets_batch(self, offset: int, limit: int) -> List[Dict[str, Any]]:
"""Retrieve a batch of triplets (source, edge, target).

Optional extension — implemented by PostgresAdapter, Neo4jAdapter,
and KuzuAdapter but not NeptuneGraphDB.

Parameters:
-----------

- offset: Number of triplets to skip.
- limit: Maximum number of triplets to return.
"""
raise NotImplementedError("get_triplets_batch is not implemented for this adapter")
Empty file.
Loading
Loading