SemanticNetworkTool

A Python-based tool for analyzing keyword co-occurrence networks in social media text corpora. Enables semantic relationship mapping, network visualization, and insight extraction from unstructured data.

Open Source Under development

Overview

SemanticNetworkTool analyzes co-occurrence networks of keywords extracted from posts on the Bluesky social media platform. It collects data using the Bluesky API, constructs a network graph based on keyword co-occurrences, computes various network metrics, and visualizes the results.

The aim is to explore the relationships between keywords and identify communities within the network, providing insights into trending topics and their interconnections.

Features

  • Automated Data Collection: Fetch posts from Bluesky API with customizable keyword filters
  • Network Analysis: Compute comprehensive local and global network metrics using NetworkX
  • Community Detection: Identify thematic clusters using the Louvain algorithm
  • Visualization: Generate network graphs and metric distributions
  • Run Archiving: Automatic versioning of analysis runs with UUID-based directories
  • Multiple Analysis Modes: Run different keyword configurations in parallel
  • Docker Support: Containerized environment for reproducible analysis

Network Metrics

Local Metrics (Per Node)

  • Degree distribution: Number of connections per keyword
  • Strength distribution: Sum of edge weights per keyword
  • Betweenness centrality: Importance as bridge between keywords
  • Closeness centrality: Average distance to all other keywords

Global Metrics (Entire Network)

  • Average degree: Mean number of connections
  • Graph density: Ratio of actual edges to possible edges
  • Global clustering coefficient: Overall clustering tendency
  • Graph diameter: Longest shortest path
  • Modularity value: Quality measure of community structure

Analysis Workflow

  1. Data Collection: Posts are fetched from Bluesky API, filtered by configurable keywords (case-insensitive)
  2. Co-occurrence Analysis: For each post, the script counts keyword pairs that appear together. Co-occurring relationships are logged and weighted by frequency
  3. Network Building: Keywords become nodes; edges connect pairs that co-occur, weighted by frequency. Strong connections indicate frequently co-discussed topics

Outputs

Each analysis run generates:

  • bluesky_posts_complex.csv: Collected posts with metadata
  • keyword_network_edges.txt: Edge list with co-occurrence weights
  • node_metrics.csv: Per-node metrics (degree, strength, betweenness, closeness, community)
  • community_assignments.csv: Community detection results
  • global_metrics.csv: Global network metrics summary
  • keyword_network.graphml: Graph format for Gephi/Cytoscape
  • keyword_network.png: Network visualization (spring layout)
  • keyword_network_circular.png: Network visualization (circular layout)
  • network_metrics.png: Metrics histograms

Quick Start

docker compose up --build

Local Development

python -m venv venv
.\venv\Scripts\Activate.ps1
pip install -r scripts/requirements.txt
cd scripts
python main.py

Requires Python 3.11+ and Bluesky API credentials in .env file.