Skip to content

dice-group/ASTRA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ASTRA: Adaptive Structure-Aware Post-Hoc Alignment of Knowledge Graph Embeddings

ASTRA is a post-hoc alignment framework for aligning entity embeddings from independently trained Knowledge Graph Embedding (KGE) models.

It enables embeddings from different knowledge graphs (KGs) to be used for downstream tasks—such as entity alignment and link prediction—without requiring joint training or KG merging.


Motivation

Traditional alignment methods (e.g., MTransE, BootEA, KDCoE) have several limitations:

  • Require joint training
  • Need merged graphs
  • Do not scale well to large KGs

ASTRA's Approach

ASTRA follows a different paradigm:

  • Align embeddings after training (post-hoc)
  • Preserve original semantic information
  • Inject graph structure via R-GCN
  • Learn non-linear alignment

Code and Data

Embeddings: Trained using the DICE Embedding Framework

Each embedding directory should contain:

model.pt
entity_to_idx.p
relation_to_idx.p
configuration.json

Datasets: OpenEA benchmark

Zenodo – DBpedia-Wikidata

Required Files

Triples

rel_triples_1_train.txt
rel_triples_2_train.txt
rel_triples_test_merged.txt (merged test triples from both dataset)
rel_triples_train_merged.txt (merged train triples from both dataset)

Alignment Links

train_links
valid_links
test_links

Requirements

Install dependencies using:

pip install -r requirements.txt

Installation Steps

  1. Train KGE embeddings Train entity and relation embeddings using the DICE Embedding Framework (or any compatible KGE model such as TransE, ComplEx, etc.).

  2. Prepare datasets Download or prepare:

    • Knowledge graph triples (train/test)
    • Alignment links (train / validation / test)

    Supported sources include:

    • OpenEA benchmark datasets
    • DBpedia–Wikidata datasets

Run Pipeline

python3 -m modules.pipeline \
  --directory_1 <KG1_embeddings> \
  --directory_2 <KG2_embeddings> \
  --train_triples_path_1 <KG1_train_triples> \
  --train_triples_path_2 <KG2_train_triples> \
  --test_triples_path <merged_test_triples> \
  --triple_paths <merged_train_triples> \
  --train_links <train_links> \
  --val_links <validation_links> \
  --test_links <test_links> \
  --output_dir <output_directory>

What the Pipeline Does

The pipeline performs the following steps:

  1. Loads pretrained embeddings for both KGs
  2. Loads triples and alignment links
  3. Builds a merged graph structure
  4. Computes R-GCN structural embeddings
  5. Applies adaptive fusion (structure + base embeddings)
  6. Trains the alignment model
  7. Evaluates entity alignment (Hits@k, MRR)
  8. Injects aligned embeddings into a KGE model
  9. Performs fine-tuning (KvsAll training)
  10. Evaluates link prediction performance

--

Output

Results are saved in:

output_dir/
│
├── alignment_results.json
├── link_prediction_results.json
│
├── aligned_embeddings/        # aligned entity embeddings
├── fine_tuned_model/          # final fine-tuned KGE model

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages