-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Release candidate v0.5.6 - dev branch #2507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
3e121a4
351f4e0
fbe94d4
cc9452a
4b38042
f15d41f
1e42ea0
182ba1b
6cee1e7
f38c8fd
67675b8
b2269f6
bf302c1
32d0903
0839f4d
7f2976f
9c6d617
95cfc36
69d23bc
8b9630d
8bd3590
c630bd8
694814d
4da584c
a612c79
4fa0257
f3d0c00
611df62
e517bf5
37252c4
f481842
9bb69cf
ac87032
c506d3a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,51 @@ | ||
| from typing import List, Union | ||
| from cognee.shared.logging_utils import setup_logging | ||
|
|
||
| logger = setup_logging() | ||
|
|
||
|
|
||
| def is_embeddable(s: str) -> bool: | ||
| """ | ||
| Check if input string is embeddable, if not it will be replaced with a dummy value to prevent API errors. | ||
| Empty strings and a string with only a space character are not embeddable. | ||
| If input string contains at least one alphanumeric character, it is considered embeddable. | ||
| """ | ||
| if not isinstance(s, str): | ||
| return False | ||
| # Strip whitespace to check if the string is empty or only contains spaces | ||
| s = s.strip() | ||
| if len(s) >= 1: | ||
| return True | ||
|
Comment on lines
+8
to
+18
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Align the docstring with the actual embeddable rule. The implementation accepts any non-whitespace string, but Lines 10-11 say the input must contain an alphanumeric character. That already changes how ✏️ Suggested docstring fix- Empty strings and a string with only a space character are not embeddable.
- If input string contains at least one alphanumeric character, it is considered embeddable.
+ Empty or all-whitespace strings are not embeddable.
+ Any non-whitespace string is considered embeddable.🤖 Prompt for AI Agents |
||
| logger.debug( | ||
| "Input string was not embeddable. Skipping embedding and using dummy value instead." | ||
| ) | ||
| return False | ||
|
|
||
|
|
||
| def sanitize_embedding_text_inputs(text: Union[str, List[str]]) -> List[str]: | ||
| """ | ||
| Transform invalid/empty inputs into a safe dummy to prevent API 422 embedding errors while | ||
| keeping list length consistent. | ||
| """ | ||
| # Ensure we are working with a list | ||
| text_list = [text] if isinstance(text, str) else text | ||
| dummy_value = "." | ||
|
|
||
| return [t if is_embeddable(t) else dummy_value for t in text_list] | ||
|
|
||
|
|
||
| def handle_embedding_response( | ||
| original_texts: Union[List[str], str], embeddings: List[List[float]], dimensions: int | ||
| ) -> List[List[float]]: | ||
| """ | ||
| Compare the original input strings against the results. | ||
| If the original string was 'junk' that was not embeddable, overwrite its vector with zeros. | ||
| """ | ||
| if isinstance(original_texts, str): | ||
| original_texts = [original_texts] | ||
|
|
||
| zero_vector = [0.0] * dimensions | ||
| return [ | ||
| embeddings[i] if is_embeddable(original_texts[i]) else zero_vector | ||
| for i in range(len(original_texts)) | ||
| ] | ||
|
Comment on lines
+37
to
+51
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't assume
🛠️ Proposed fix-from typing import List, Union
+from typing import List, Optional, Union
@@
def handle_embedding_response(
- original_texts: Union[List[str], str], embeddings: List[List[float]], dimensions: int
+ original_texts: Union[List[str], str],
+ embeddings: List[List[float]],
+ dimensions: Optional[int],
) -> List[List[float]]:
@@
if isinstance(original_texts, str):
original_texts = [original_texts]
+ if len(embeddings) != len(original_texts):
+ raise ValueError(
+ f"Expected {len(original_texts)} embeddings, received {len(embeddings)}."
+ )
- zero_vector = [0.0] * dimensions
+ vector_size = dimensions if dimensions is not None else len(embeddings[0]) if embeddings else 0
return [
- embeddings[i] if is_embeddable(original_texts[i]) else zero_vector
+ embeddings[i] if is_embeddable(original_texts[i]) else [0.0] * vector_size
for i in range(len(original_texts))
]As per coding guidelines, "Prefer explicit, structured error handling in Python code." 🤖 Prompt for AI Agents |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Pin to cognee==0.5.6 to match the main package version.
This line pins
cogneeto version 0.5.5, but the main package inpyproject.tomlline 4 is being bumped to 0.5.6 in this same PR. This creates a version skew where:For a consistent release, this should be updated to
cognee[postgres,docs,neo4j]==0.5.6.📦 Proposed fix to align versions
After making this change, regenerate the lock file:
📝 Committable suggestion
🤖 Prompt for AI Agents