InfoHarvest

A simple contact info scraping tool written in Python and available as an executable file. It has a CLI & a GUI. This scripts main function is also integrated into ClatScope Info Tool.

Website Contact Info Scraper

#

Overview

This Python script is a comprehensive tool designed to extract and display contact information from websites. It retrieves phone numbers, fax numbers, email addresses, and social media profile URLs directly from a given website's HTML content.

Purpose

The primary use of this script is for Open Source Intelligence (OSINT), digital marketing, customer relations management, competitive analysis, or investigative tasks that require identification and cataloguing of contact details from various online sources.

Key Features

Automatic Dependency Installation: Automatically checks and installs required Python libraries (requests, BeautifulSoup).
Detailed Information Extraction: Precisely extracts phone numbers, fax numbers, emails, and social media URLs.
Regular Expression-Based Parsing: Uses robust regex patterns to reliably identify contact information.
Contextual Identification: Differentiates between phone and fax numbers by context within the webpage.
Duplicate and Spam Reduction: Filters redundant or repetitive contact information.

Usage

Requirements

Python 3.x installed on your system.

Installation

Save the script file to your local machine.
Ensure Python is installed and available via the command line.

Execution

Run the script from a command line or terminal

Interaction

Upon execution, the script prompts the user for a full URL.
Enter the complete URL (e.g., https://example.com) to scrape contact details.
Pressing Enter without entering a URL will terminate the script gracefully.

Output

The extracted contact information is clearly formatted as JSON:

{
  "Phone numbers": "555-123-4567, 555-765-4321",
  "Fax number": "555-111-2222",
  "Email addresses": "info@example.com, contact@example.org",
  "Social media profiles": "https://twitter.com/example, https://linkedin.com/company/example"
}

Error Handling

The script includes basic error handling and provides feedback if the URL is invalid or inaccessible.

Limitations

Relies on publicly visible HTML content; will not extract data hidden behind JavaScript-heavy pages or authenticated portals.
Sensitive to changes in website HTML structures.

License

This project is licensed under the Apache 2.0 open source license.

Author

Joshua M Clatney (Clats97)

Ethical Pentesting Enthusiast

DISCLAIMER: This project comes with no warranty, express or implied. The author is not responsible for abuse, misuse, or vulnerabilities. Please use responsibly and ethically in accordance with relevant laws, regulations, legislation and best practices.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
InfoHarvest CLI (1.00).py		InfoHarvest CLI (1.00).py
InfoHarvest CLI.exe		InfoHarvest CLI.exe
InfoHarvest GUI (1.00).py		InfoHarvest GUI (1.00).py
InfoHarvest GUI.exe		InfoHarvest GUI.exe
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InfoHarvest

Website Contact Info Scraper

Overview

Purpose

Key Features

Usage

Requirements

Installation

Execution

Interaction

Output

Error Handling

Limitations

License

Author

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

InfoHarvest

Website Contact Info Scraper

Overview

Purpose

Key Features

Usage

Requirements

Installation

Execution

Interaction

Output

Error Handling

Limitations

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 1

Languages

Packages