Skip to content

Monitoring

Benjamin Paige edited this page Sep 16, 2025 · 4 revisions

Monitoring

Appian Connector Monitoring and Alerting

Page Navigation: ← Home | Monitoring Architecture | CloudWatch Setup | Health Checks | Performance | Next: Services →


Monitoring Architecture

graph TB
    subgraph "Data Sources"
        CONNECTOR[Kafka Connect Container<br/>ECS Fargate logs]
        LAMBDA[Lambda Functions<br/>Function logs & metrics]
        ORACLE[Oracle Database<br/>Connection metrics]
    end

    subgraph "Monitoring Stack"
        CW_LOGS[CloudWatch Logs<br/>Centralized logging]
        CW_METRICS[CloudWatch Metrics<br/>Performance data]
        CW_ALARMS[CloudWatch Alarms<br/>Threshold monitoring]
        DASHBOARD[Custom Dashboard<br/>appian-connector-{stage}]
    end

    subgraph "Alerting"
        SNS[SNS Topics<br/>Failure notifications]
        SLACK[Slack Integration<br/>#cms-bigmac channel]
        EMAIL[Email Alerts<br/>Operations team]
    end

    CONNECTOR --> CW_LOGS
    LAMBDA --> CW_LOGS
    LAMBDA --> CW_METRICS
    
    CW_METRICS --> CW_ALARMS
    CW_METRICS --> DASHBOARD
    
    CW_ALARMS --> SNS
    SNS --> SLACK
    SNS --> EMAIL

    classDef source fill:#e1f5fe,stroke:#01579b,stroke-width:2px
    classDef monitor fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    classDef alert fill:#ffebee,stroke:#d32f2f,stroke-width:2px

    class CONNECTOR,LAMBDA,ORACLE source
    class CW_LOGS,CW_METRICS,CW_ALARMS,DASHBOARD monitor
    class SNS,SLACK,EMAIL alert
Loading

CloudWatch Monitoring

Log Groups

Log Group Purpose Retention Expected Volume
/aws/ecs/appian-connector-{stage} Kafka Connect container logs 30 days Medium
/aws/lambda/appian-connector-{stage}-configureConnectors Connector configuration 7 days Low
/aws/lambda/appian-connector-{stage}-testConnectors Health check results 7 days High
/aws/lambda/appian-connector-{stage}-createTopics Topic management 7 days Low
/aws/lambda/appian-connector-{stage}-cleanupKafka Resource cleanup 7 days Low

Key Metrics

Connector Performance Metrics

graph LR
    subgraph "JDBC Connector Metrics"
        CONN_STATUS[Connector Status<br/>0 = ✅ Running<br/>1 = ❌ Failed]
        PKG_RATE[Package Processing Rate<br/>Packages/minute from MCP_SPA_PCKG]
        POLL_TIME[Oracle Poll Duration<br/>Query execution time]
        ERROR_RATE[Error Rate<br/>Failed polls/total polls]
    end
    
    subgraph "Oracle Database Metrics"
        DB_CONN[Active Connections<br/>Current Oracle connections]
        QUERY_TIME[Query Response Time<br/>Average Oracle query duration]
        RECORD_COUNT[Record Count<br/>Packages processed per poll]
    end
    
    subgraph "ECS Container Metrics"
        CPU_UTIL[CPU Utilization<br/>Container CPU usage]
        MEM_UTIL[Memory Utilization<br/>Container memory usage]
        NETWORK[Network I/O<br/>Oracle + Kafka traffic]
    end
Loading

Custom Metrics Published

// Published by testConnectors Lambda every minute
const packageDataMetrics = {
  [`${connectorName}_failures`]: connectorStatus === 'RUNNING' ? 0 : 1,
  [`${connectorName}_task_failures`]: allTasksRunning ? 0 : 1,
  [`${connectorName}_package_processing_rate`]: packagesPerMinute,
  [`${connectorName}_oracle_query_duration`]: avgOracleQueryTimeMs,
  [`${connectorName}_connection_pool_active`]: activeOracleConnections
};

CloudWatch Alarms

Critical Alarms

Alarm Name Threshold Purpose Response
appian-connector-{stage}-connector-failure 1 failure JDBC connector stopped Immediate restart
appian-connector-{stage}-task-failure 1 task failure Connector task crashed ECS service restart
appian-connector-{stage}-oracle-connection-failure Connection lost Oracle database unreachable Check database + network
appian-connector-{stage}-ecs-service-failure ECS service stopped Container orchestration failure Check ECS cluster health

Warning Alarms

Alarm Name Threshold Purpose Response
appian-connector-{stage}-processing-rate-low <5 packages/min for 10 min Package processing issues Investigate Oracle performance
appian-connector-{stage}-oracle-query-slow >30 seconds avg Oracle query performance Check database load
appian-connector-{stage}-ecs-cpu-high >80% CPU for 5 min Resource pressure Scale up ECS task
appian-connector-{stage}-ecs-memory-high >85% memory for 5 min Memory pressure Increase task memory

Health Checks

Automated Health Monitoring

testConnectors Lambda Validation

// Comprehensive MACPRO package data health check (runs every minute)
const macproHealthChecks = {
  connectorHealth: async () => {
    const status = await fetch(`http://${ecsIP}:8083/connectors/source.jdbc.appian-connector-dbo-1/status`);
    const connectorData = await status.json();
    return connectorData.connector.state === 'RUNNING';
  },
  
  taskHealth: async () => {
    const tasks = await fetch(`http://${ecsIP}:8083/connectors/source.jdbc.appian-connector-dbo-1/tasks`);
    const taskData = await tasks.json();
    return taskData.every(task => task.state === 'RUNNING');
  },
  
  oracleConnectivity: async () => {
    // Test Oracle database connectivity
    const testQuery = "SELECT 1 FROM appian_schema.MCP_SPA_PCKG WHERE ROWNUM <= 1";
    return await testOracleQuery(testQuery);
  },
  
  recentPackageActivity: async () => {
    // Check for recent package processing
    const status = await fetch(`http://${ecsIP}:8083/connectors/source.jdbc.appian-connector-dbo-1/status`);
    const connectorStatus = await status.json();
    const lastPollTime = parseLastPollTime(connectorStatus.tasks[0].trace);
    return (Date.now() - lastPollTime) < 5 * 60 * 1000; // Within 5 minutes
  }
};

Manual Health Verification

Comprehensive Health Check Script

#!/bin/bash
# Appian Connector Comprehensive Health Check

STAGE=$1
echo "Appian Connector Health Check - Stage: $STAGE"

# 1. Check ECS service status
echo "=== ECS Service Status ==="
aws ecs describe-services --cluster appian-connector-$STAGE-connect \
  --services kafka-connect \
  --query 'services[0].{Service:serviceName,Status:status,Running:runningCount,Desired:desiredCount}'

# 2. Check connector status
echo "=== JDBC Connector Status ==="
CONN_IP=$(aws ecs describe-tasks --cluster appian-connector-$STAGE-connect \
  --tasks $(aws ecs list-tasks --cluster appian-connector-$STAGE-connect --query 'taskArns[0]' --output text) \
  --query 'tasks[0].attachments[0].details[?name==`privateIPv4Address`].value' --output text)

if [ ! -z "$CONN_IP" ]; then
  echo "Available connectors:"
  curl -s "http://$CONN_IP:8083/connectors" | jq .
  echo "Appian connector status:"
  curl -s "http://$CONN_IP:8083/connectors/source.jdbc.appian-connector-dbo-1/status" | jq .
else
  echo "Connector container not available"
fi

# 3. Check recent metrics
echo "=== Recent Performance Metrics ==="
aws cloudwatch get-metric-statistics \
  --namespace "appian-connector-$STAGE" \
  --metric-name "source.jdbc.appian-connector-dbo-1_failures" \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 --statistics Sum

# 4. Verify recent package data processing
echo "=== Recent Package Data ==="
aws lambda invoke --function-name {bigmac-debugger} \
  --payload '{"topic":"aws.appian.cmcs.MCP_SPA_PCKG","numRecords":1}' \
  --region us-east-1 /dev/stdout | jq .

echo "Appian Connector health check complete!"

Log Analysis

Important Log Patterns

Successful Operation Patterns

# Look for these patterns in connector logs:
grep "Successfully started" /aws/ecs/appian-connector-{stage}
grep "source.jdbc.appian-connector-dbo-1.*RUNNING" /aws/ecs/appian-connector-{stage}
grep "Finished executing.*query" /aws/ecs/appian-connector-{stage}

Error Patterns

# Oracle connectivity issues
grep -i "oracle.*error\|connection.*failed\|timeout\|refused" /aws/ecs/appian-connector-{stage}

# JDBC connector errors
grep -i "jdbc.*error\|sql.*exception\|numeric.*error" /aws/ecs/appian-connector-{stage}

# Kafka connectivity issues  
grep -i "kafka.*error\|broker.*connection\|ssl.*error" /aws/ecs/appian-connector-{stage}

CloudWatch Insights Queries

Package Data Processing Analysis

-- MACPRO package processing performance
fields @timestamp, @message
| filter @message like /source.jdbc.appian-connector-dbo-1/
| filter @message like /processing.*records|query.*completed/
| sort @timestamp desc
| limit 100

-- Oracle connection analysis
fields @timestamp, @message
| filter @message like /oracle|jdbc|connection/
| filter @message like /MCP_SPA_PCKG/
| sort @timestamp desc
| limit 50

-- Error pattern analysis
fields @timestamp, @message
| filter @message like /ERROR|WARN|Exception/
| filter @message like /appian|connector/
| stats count() by bin(5m)
| sort @timestamp desc

Alert Management

SNS Topic Configuration

Alert Topic Setup

  • Topic Name: Alerts-appian-connector-alerts-{stage}
  • Encryption: KMS encrypted with automatic key rotation
  • Permissions: EventBridge, CloudWatch, Lambda can publish
  • Subscriptions: Manually managed (email addresses, Slack webhooks)

Alert Message Format

{
  "AlarmName": "appian-connector-{stage}-connector-failure",
  "AlarmDescription": "Appian JDBC connector has failed",
  "AWSAccountId": "{account-id}",
  "Region": "us-east-1",
  "AlarmArn": "arn:aws:cloudwatch:us-east-1:{account}:alarm:appian-connector-{stage}-connector-failure",
  "OldStateValue": "OK",
  "NewStateValue": "ALARM",
  "StateChangeTime": "2025-09-16T14:30:00.000Z",
  "StateReason": "Threshold Crossed: 1 datapoint [1.0] was greater than or equal to the threshold (1.0).",
  "MetricName": "source.jdbc.appian-connector-dbo-1_failures",
  "Namespace": "appian-connector-{stage}"
}

Slack Integration

Alert Format

{
  "text": "🚨 Appian Connector Alert",
  "attachments": [{
    "color": "danger",
    "fields": [
      {"title": "Service", "value": "appian-connector", "short": true},
      {"title": "Stage", "value": "{stage-name}", "short": true},
      {"title": "Issue", "value": "JDBC connector failure", "short": false},
      {"title": "Data Source", "value": "Appian Oracle (MCP_SPA_PCKG)", "short": true},
      {"title": "Time", "value": "{timestamp}", "short": true}
    ]
  }]
}

Performance Monitoring

MACPRO Package Data Metrics

Business Intelligence Metrics

graph TB
    subgraph "Package Processing Metrics"
        PKG_TOTAL[Total Packages<br/>Cumulative count]
        PKG_RATE[Processing Rate<br/>Packages per minute]
        PKG_STATES[State Distribution<br/>Packages by state code]
        PKG_STATUS[Status Distribution<br/>Package statuses]
    end
    
    subgraph "System Performance"
        ORACLE_PERF[Oracle Performance<br/>Query execution time]
        CONNECTOR_PERF[Connector Performance<br/>Polling efficiency]
        KAFKA_PERF[Kafka Performance<br/>Message publishing rate]
    end
    
    subgraph "Error Tracking"
        CONN_ERRORS[Connection Errors<br/>Oracle connectivity issues]
        PROC_ERRORS[Processing Errors<br/>Data transformation failures]
        KAFKA_ERRORS[Kafka Errors<br/>Publishing failures]
    end
Loading

Custom CloudWatch Metrics

# View custom metrics for Appian connector
aws cloudwatch list-metrics --namespace "appian-connector-{stage}"

# Get connector status over time
aws cloudwatch get-metric-statistics \
  --namespace "appian-connector-{stage}" \
  --metric-name "source.jdbc.appian-connector-dbo-1_failures" \
  --start-time $(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 3600 --statistics Sum,Average

# Get package processing rate
aws cloudwatch get-metric-statistics \
  --namespace "appian-connector-{stage}" \
  --metric-name "source.jdbc.appian-connector-dbo-1_package_rate" \
  --start-time $(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 3600 --statistics Average,Maximum

Health Monitoring

Automated Health Checks

testConnectors Lambda Health Validation

// Comprehensive Appian package data health check (runs every minute)
const appianHealthChecks = {
  connectorStatus: async () => {
    // Check JDBC connector state
    const response = await fetch(`http://${ecsIP}:8083/connectors/source.jdbc.appian-connector-dbo-1/status`);
    const status = await response.json();
    return status.connector.state === 'RUNNING' && status.tasks.every(task => task.state === 'RUNNING');
  },
  
  oracleConnectivity: async () => {
    // Test Oracle database connection
    const testQuery = `
      SELECT COUNT(*) as active_packages 
      FROM appian_schema.MCP_SPA_PCKG 
      WHERE REPLICA_TIMESTAMP > SYSDATE - INTERVAL '1' DAY
    `;
    return await executeOracleHealthQuery(testQuery);
  },
  
  packageDataFreshness: async () => {
    // Verify recent package data processing
    const response = await fetch(`http://${ecsIP}:8083/connectors/source.jdbc.appian-connector-dbo-1/status`);
    const status = await response.json();
    const lastPollTime = extractLastPollTime(status.tasks[0].trace);
    return (Date.now() - lastPollTime) < 300000; // Within 5 minutes
  },
  
  kafkaPublishing: async () => {
    // Verify messages are being published to BigMAC
    const bigmacResponse = await invokeBigmacDebugger({
      topic: "aws.appian.cmcs.MCP_SPA_PCKG",
      numRecords: 1
    });
    return bigmacResponse.success;
  }
};

Manual Health Verification

Package Data Validation Script

#!/bin/bash
# Appian Package Data Validation

STAGE=$1
echo "Validating Appian package data for stage: $STAGE"

# 1. Check Oracle source data
echo "=== Oracle Source Validation ==="
sqlplus ${APPIAN_DB_USER}/${APPIAN_DB_PASSWORD}@${APPIAN_DB_HOST}:${APPIAN_DB_PORT}/${APPIAN_DB_NAME} << EOF
SELECT 
  COUNT(*) as total_packages,
  MAX(PCKG_ID) as max_package_id,
  MAX(REPLICA_TIMESTAMP) as latest_timestamp,
  COUNT(CASE WHEN REPLICA_TIMESTAMP > SYSDATE - 1 THEN 1 END) as recent_updates
FROM appian_schema.MCP_SPA_PCKG;
EXIT;
EOF

# 2. Check connector processing position
echo "=== Connector Processing Position ==="
CONN_IP=$(aws ecs describe-tasks --cluster appian-connector-$STAGE-connect \
  --tasks $(aws ecs list-tasks --cluster appian-connector-$STAGE-connect --query 'taskArns[0]' --output text) \
  --query 'tasks[0].attachments[0].details[?name==`privateIPv4Address`].value' --output text)

curl -s "http://$CONN_IP:8083/connectors/source.jdbc.appian-connector-dbo-1/status" | \
  jq '.tasks[0].trace' | grep -i "timestamp\|pckg_id\|offset"

# 3. Check BigMAC topic data
echo "=== BigMAC Topic Validation ==="
aws lambda invoke --function-name {bigmac-debugger} \
  --payload '{"topic":"aws.appian.cmcs.MCP_SPA_PCKG","numRecords":3}' \
  --region us-east-1 /dev/stdout | grep -o '"PCKG_ID":[^,]*' | head -3

echo "Package data validation complete!"

Troubleshooting Monitoring Issues

Alert Not Firing

# 1. Check SNS topic configuration
aws sns get-topic-attributes --topic-arn "arn:aws:sns:us-east-1:{account}:Alerts-appian-connector-alerts-{stage}"

# 2. Check EventBridge rule
aws events describe-rule --name "appian-connector-{stage}-ecs-task-failure"

# 3. Test alert manually
aws events put-events --entries '[{
  "Source": "test.appian",
  "DetailType": "ECS Task State Change", 
  "Detail": "{\"test\": \"manual alert test\"}"
}]'

CloudWatch Metrics Missing

# 1. Check testConnectors Lambda execution
aws logs filter-log-events \
  --log-group-name "/aws/lambda/appian-connector-{stage}-testConnectors" \
  --start-time $(date -d '1 hour ago' +%s)000 \
  --filter-pattern "ERROR"

# 2. Verify Lambda permissions
aws iam get-role-policy --role-name appian-connector-{stage}-{region}-lambdaRole \
  --policy-name appian-connector-{stage}-lambda-policy | grep cloudwatch:PutMetricData

# 3. Check connector API accessibility
./run connect --stage {stage} --service connector
# Inside container:
curl -X GET http://localhost:8083/connectors

Operational Monitoring

Package Data Flow Monitoring

Data Lineage Verification

# End-to-end data flow verification
echo "=== End-to-End Package Data Flow Test ==="

# 1. Create test package in Oracle (if permitted)
sqlplus ${APPIAN_DB_USER}/${APPIAN_DB_PASSWORD}@${APPIAN_DB_HOST}:${APPIAN_DB_PORT}/${APPIAN_DB_NAME} << EOF
-- Note: Only use in development environments
INSERT INTO appian_schema.MCP_SPA_PCKG (
  PCKG_ID, REPLICA_TIMESTAMP, STATE_CD, SPA_ID, CRNT_STUS
) VALUES (
  999999, SYSTIMESTAMP, 'TEST', 'TEST-001', 'TEST_STATUS'
);
COMMIT;
EOF

# 2. Wait for connector processing
sleep 10

# 3. Verify message in BigMAC
aws lambda invoke --function-name {bigmac-debugger} \
  --payload '{"topic":"aws.appian.cmcs.MCP_SPA_PCKG","numRecords":5}' \
  --region us-east-1 /dev/stdout | grep "999999"

# 4. Cleanup test data (if created)
sqlplus ${APPIAN_DB_USER}/${APPIAN_DB_PASSWORD}@${APPIAN_DB_HOST}:${APPIAN_DB_PORT}/${APPIAN_DB_NAME} << EOF
DELETE FROM appian_schema.MCP_SPA_PCKG WHERE PCKG_ID = 999999;
COMMIT;
EOF

echo "Data flow verification complete!"

Resource Utilization Monitoring

# Monitor container resource usage over time
aws cloudwatch get-metric-statistics \
  --namespace "AWS/ECS" \
  --metric-name "CPUUtilization" \
  --dimensions Name=ServiceName,Value=kafka-connect Name=ClusterName,Value=appian-connector-{stage}-connect \
  --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 86400 --statistics Average,Maximum

# Check memory trends
aws cloudwatch get-metric-statistics \
  --namespace "AWS/ECS" \
  --metric-name "MemoryUtilization" \
  --dimensions Name=ServiceName,Value=kafka-connect Name=ClusterName,Value=appian-connector-{stage}-connect \
  --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 86400 --statistics Average,Maximum

Related Documentation & Resources

Implementation Guides

Operational Resources

AWS Management Consoles

External Monitoring Resources

Quick Monitoring Commands

# Health check
curl -s http://{ecs-ip}:8083/connectors/source.jdbc.appian-connector-dbo-1/status | jq .

# Recent performance
aws cloudwatch get-metric-statistics \
  --namespace "appian-connector-{stage}" \
  --metric-name "source.jdbc.appian-connector-dbo-1_failures" \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 --statistics Sum

Next: Explore Services for detailed component architecture or Operations for daily monitoring procedures.

Clone this wiki locally