-
Notifications
You must be signed in to change notification settings - Fork 0
Monitoring
Benjamin Paige edited this page Sep 16, 2025
·
4 revisions
Appian Connector Monitoring and Alerting
Page Navigation: ← Home | Monitoring Architecture | CloudWatch Setup | Health Checks | Performance | Next: Services →
graph TB
subgraph "Data Sources"
CONNECTOR[Kafka Connect Container<br/>ECS Fargate logs]
LAMBDA[Lambda Functions<br/>Function logs & metrics]
ORACLE[Oracle Database<br/>Connection metrics]
end
subgraph "Monitoring Stack"
CW_LOGS[CloudWatch Logs<br/>Centralized logging]
CW_METRICS[CloudWatch Metrics<br/>Performance data]
CW_ALARMS[CloudWatch Alarms<br/>Threshold monitoring]
DASHBOARD[Custom Dashboard<br/>appian-connector-{stage}]
end
subgraph "Alerting"
SNS[SNS Topics<br/>Failure notifications]
SLACK[Slack Integration<br/>#cms-bigmac channel]
EMAIL[Email Alerts<br/>Operations team]
end
CONNECTOR --> CW_LOGS
LAMBDA --> CW_LOGS
LAMBDA --> CW_METRICS
CW_METRICS --> CW_ALARMS
CW_METRICS --> DASHBOARD
CW_ALARMS --> SNS
SNS --> SLACK
SNS --> EMAIL
classDef source fill:#e1f5fe,stroke:#01579b,stroke-width:2px
classDef monitor fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
classDef alert fill:#ffebee,stroke:#d32f2f,stroke-width:2px
class CONNECTOR,LAMBDA,ORACLE source
class CW_LOGS,CW_METRICS,CW_ALARMS,DASHBOARD monitor
class SNS,SLACK,EMAIL alert
| Log Group | Purpose | Retention | Expected Volume |
|---|---|---|---|
/aws/ecs/appian-connector-{stage} |
Kafka Connect container logs | 30 days | Medium |
/aws/lambda/appian-connector-{stage}-configureConnectors |
Connector configuration | 7 days | Low |
/aws/lambda/appian-connector-{stage}-testConnectors |
Health check results | 7 days | High |
/aws/lambda/appian-connector-{stage}-createTopics |
Topic management | 7 days | Low |
/aws/lambda/appian-connector-{stage}-cleanupKafka |
Resource cleanup | 7 days | Low |
graph LR
subgraph "JDBC Connector Metrics"
CONN_STATUS[Connector Status<br/>0 = ✅ Running<br/>1 = ❌ Failed]
PKG_RATE[Package Processing Rate<br/>Packages/minute from MCP_SPA_PCKG]
POLL_TIME[Oracle Poll Duration<br/>Query execution time]
ERROR_RATE[Error Rate<br/>Failed polls/total polls]
end
subgraph "Oracle Database Metrics"
DB_CONN[Active Connections<br/>Current Oracle connections]
QUERY_TIME[Query Response Time<br/>Average Oracle query duration]
RECORD_COUNT[Record Count<br/>Packages processed per poll]
end
subgraph "ECS Container Metrics"
CPU_UTIL[CPU Utilization<br/>Container CPU usage]
MEM_UTIL[Memory Utilization<br/>Container memory usage]
NETWORK[Network I/O<br/>Oracle + Kafka traffic]
end
// Published by testConnectors Lambda every minute
const packageDataMetrics = {
[`${connectorName}_failures`]: connectorStatus === 'RUNNING' ? 0 : 1,
[`${connectorName}_task_failures`]: allTasksRunning ? 0 : 1,
[`${connectorName}_package_processing_rate`]: packagesPerMinute,
[`${connectorName}_oracle_query_duration`]: avgOracleQueryTimeMs,
[`${connectorName}_connection_pool_active`]: activeOracleConnections
};| Alarm Name | Threshold | Purpose | Response |
|---|---|---|---|
appian-connector-{stage}-connector-failure |
1 failure | JDBC connector stopped | Immediate restart |
appian-connector-{stage}-task-failure |
1 task failure | Connector task crashed | ECS service restart |
appian-connector-{stage}-oracle-connection-failure |
Connection lost | Oracle database unreachable | Check database + network |
appian-connector-{stage}-ecs-service-failure |
ECS service stopped | Container orchestration failure | Check ECS cluster health |
| Alarm Name | Threshold | Purpose | Response |
|---|---|---|---|
appian-connector-{stage}-processing-rate-low |
<5 packages/min for 10 min | Package processing issues | Investigate Oracle performance |
appian-connector-{stage}-oracle-query-slow |
>30 seconds avg | Oracle query performance | Check database load |
appian-connector-{stage}-ecs-cpu-high |
>80% CPU for 5 min | Resource pressure | Scale up ECS task |
appian-connector-{stage}-ecs-memory-high |
>85% memory for 5 min | Memory pressure | Increase task memory |
// Comprehensive MACPRO package data health check (runs every minute)
const macproHealthChecks = {
connectorHealth: async () => {
const status = await fetch(`http://${ecsIP}:8083/connectors/source.jdbc.appian-connector-dbo-1/status`);
const connectorData = await status.json();
return connectorData.connector.state === 'RUNNING';
},
taskHealth: async () => {
const tasks = await fetch(`http://${ecsIP}:8083/connectors/source.jdbc.appian-connector-dbo-1/tasks`);
const taskData = await tasks.json();
return taskData.every(task => task.state === 'RUNNING');
},
oracleConnectivity: async () => {
// Test Oracle database connectivity
const testQuery = "SELECT 1 FROM appian_schema.MCP_SPA_PCKG WHERE ROWNUM <= 1";
return await testOracleQuery(testQuery);
},
recentPackageActivity: async () => {
// Check for recent package processing
const status = await fetch(`http://${ecsIP}:8083/connectors/source.jdbc.appian-connector-dbo-1/status`);
const connectorStatus = await status.json();
const lastPollTime = parseLastPollTime(connectorStatus.tasks[0].trace);
return (Date.now() - lastPollTime) < 5 * 60 * 1000; // Within 5 minutes
}
};#!/bin/bash
# Appian Connector Comprehensive Health Check
STAGE=$1
echo "Appian Connector Health Check - Stage: $STAGE"
# 1. Check ECS service status
echo "=== ECS Service Status ==="
aws ecs describe-services --cluster appian-connector-$STAGE-connect \
--services kafka-connect \
--query 'services[0].{Service:serviceName,Status:status,Running:runningCount,Desired:desiredCount}'
# 2. Check connector status
echo "=== JDBC Connector Status ==="
CONN_IP=$(aws ecs describe-tasks --cluster appian-connector-$STAGE-connect \
--tasks $(aws ecs list-tasks --cluster appian-connector-$STAGE-connect --query 'taskArns[0]' --output text) \
--query 'tasks[0].attachments[0].details[?name==`privateIPv4Address`].value' --output text)
if [ ! -z "$CONN_IP" ]; then
echo "Available connectors:"
curl -s "http://$CONN_IP:8083/connectors" | jq .
echo "Appian connector status:"
curl -s "http://$CONN_IP:8083/connectors/source.jdbc.appian-connector-dbo-1/status" | jq .
else
echo "Connector container not available"
fi
# 3. Check recent metrics
echo "=== Recent Performance Metrics ==="
aws cloudwatch get-metric-statistics \
--namespace "appian-connector-$STAGE" \
--metric-name "source.jdbc.appian-connector-dbo-1_failures" \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 300 --statistics Sum
# 4. Verify recent package data processing
echo "=== Recent Package Data ==="
aws lambda invoke --function-name {bigmac-debugger} \
--payload '{"topic":"aws.appian.cmcs.MCP_SPA_PCKG","numRecords":1}' \
--region us-east-1 /dev/stdout | jq .
echo "Appian Connector health check complete!"# Look for these patterns in connector logs:
grep "Successfully started" /aws/ecs/appian-connector-{stage}
grep "source.jdbc.appian-connector-dbo-1.*RUNNING" /aws/ecs/appian-connector-{stage}
grep "Finished executing.*query" /aws/ecs/appian-connector-{stage}# Oracle connectivity issues
grep -i "oracle.*error\|connection.*failed\|timeout\|refused" /aws/ecs/appian-connector-{stage}
# JDBC connector errors
grep -i "jdbc.*error\|sql.*exception\|numeric.*error" /aws/ecs/appian-connector-{stage}
# Kafka connectivity issues
grep -i "kafka.*error\|broker.*connection\|ssl.*error" /aws/ecs/appian-connector-{stage}-- MACPRO package processing performance
fields @timestamp, @message
| filter @message like /source.jdbc.appian-connector-dbo-1/
| filter @message like /processing.*records|query.*completed/
| sort @timestamp desc
| limit 100
-- Oracle connection analysis
fields @timestamp, @message
| filter @message like /oracle|jdbc|connection/
| filter @message like /MCP_SPA_PCKG/
| sort @timestamp desc
| limit 50
-- Error pattern analysis
fields @timestamp, @message
| filter @message like /ERROR|WARN|Exception/
| filter @message like /appian|connector/
| stats count() by bin(5m)
| sort @timestamp desc-
Topic Name:
Alerts-appian-connector-alerts-{stage} - Encryption: KMS encrypted with automatic key rotation
- Permissions: EventBridge, CloudWatch, Lambda can publish
- Subscriptions: Manually managed (email addresses, Slack webhooks)
{
"AlarmName": "appian-connector-{stage}-connector-failure",
"AlarmDescription": "Appian JDBC connector has failed",
"AWSAccountId": "{account-id}",
"Region": "us-east-1",
"AlarmArn": "arn:aws:cloudwatch:us-east-1:{account}:alarm:appian-connector-{stage}-connector-failure",
"OldStateValue": "OK",
"NewStateValue": "ALARM",
"StateChangeTime": "2025-09-16T14:30:00.000Z",
"StateReason": "Threshold Crossed: 1 datapoint [1.0] was greater than or equal to the threshold (1.0).",
"MetricName": "source.jdbc.appian-connector-dbo-1_failures",
"Namespace": "appian-connector-{stage}"
}{
"text": "🚨 Appian Connector Alert",
"attachments": [{
"color": "danger",
"fields": [
{"title": "Service", "value": "appian-connector", "short": true},
{"title": "Stage", "value": "{stage-name}", "short": true},
{"title": "Issue", "value": "JDBC connector failure", "short": false},
{"title": "Data Source", "value": "Appian Oracle (MCP_SPA_PCKG)", "short": true},
{"title": "Time", "value": "{timestamp}", "short": true}
]
}]
}graph TB
subgraph "Package Processing Metrics"
PKG_TOTAL[Total Packages<br/>Cumulative count]
PKG_RATE[Processing Rate<br/>Packages per minute]
PKG_STATES[State Distribution<br/>Packages by state code]
PKG_STATUS[Status Distribution<br/>Package statuses]
end
subgraph "System Performance"
ORACLE_PERF[Oracle Performance<br/>Query execution time]
CONNECTOR_PERF[Connector Performance<br/>Polling efficiency]
KAFKA_PERF[Kafka Performance<br/>Message publishing rate]
end
subgraph "Error Tracking"
CONN_ERRORS[Connection Errors<br/>Oracle connectivity issues]
PROC_ERRORS[Processing Errors<br/>Data transformation failures]
KAFKA_ERRORS[Kafka Errors<br/>Publishing failures]
end
# View custom metrics for Appian connector
aws cloudwatch list-metrics --namespace "appian-connector-{stage}"
# Get connector status over time
aws cloudwatch get-metric-statistics \
--namespace "appian-connector-{stage}" \
--metric-name "source.jdbc.appian-connector-dbo-1_failures" \
--start-time $(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 3600 --statistics Sum,Average
# Get package processing rate
aws cloudwatch get-metric-statistics \
--namespace "appian-connector-{stage}" \
--metric-name "source.jdbc.appian-connector-dbo-1_package_rate" \
--start-time $(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 3600 --statistics Average,Maximum// Comprehensive Appian package data health check (runs every minute)
const appianHealthChecks = {
connectorStatus: async () => {
// Check JDBC connector state
const response = await fetch(`http://${ecsIP}:8083/connectors/source.jdbc.appian-connector-dbo-1/status`);
const status = await response.json();
return status.connector.state === 'RUNNING' && status.tasks.every(task => task.state === 'RUNNING');
},
oracleConnectivity: async () => {
// Test Oracle database connection
const testQuery = `
SELECT COUNT(*) as active_packages
FROM appian_schema.MCP_SPA_PCKG
WHERE REPLICA_TIMESTAMP > SYSDATE - INTERVAL '1' DAY
`;
return await executeOracleHealthQuery(testQuery);
},
packageDataFreshness: async () => {
// Verify recent package data processing
const response = await fetch(`http://${ecsIP}:8083/connectors/source.jdbc.appian-connector-dbo-1/status`);
const status = await response.json();
const lastPollTime = extractLastPollTime(status.tasks[0].trace);
return (Date.now() - lastPollTime) < 300000; // Within 5 minutes
},
kafkaPublishing: async () => {
// Verify messages are being published to BigMAC
const bigmacResponse = await invokeBigmacDebugger({
topic: "aws.appian.cmcs.MCP_SPA_PCKG",
numRecords: 1
});
return bigmacResponse.success;
}
};#!/bin/bash
# Appian Package Data Validation
STAGE=$1
echo "Validating Appian package data for stage: $STAGE"
# 1. Check Oracle source data
echo "=== Oracle Source Validation ==="
sqlplus ${APPIAN_DB_USER}/${APPIAN_DB_PASSWORD}@${APPIAN_DB_HOST}:${APPIAN_DB_PORT}/${APPIAN_DB_NAME} << EOF
SELECT
COUNT(*) as total_packages,
MAX(PCKG_ID) as max_package_id,
MAX(REPLICA_TIMESTAMP) as latest_timestamp,
COUNT(CASE WHEN REPLICA_TIMESTAMP > SYSDATE - 1 THEN 1 END) as recent_updates
FROM appian_schema.MCP_SPA_PCKG;
EXIT;
EOF
# 2. Check connector processing position
echo "=== Connector Processing Position ==="
CONN_IP=$(aws ecs describe-tasks --cluster appian-connector-$STAGE-connect \
--tasks $(aws ecs list-tasks --cluster appian-connector-$STAGE-connect --query 'taskArns[0]' --output text) \
--query 'tasks[0].attachments[0].details[?name==`privateIPv4Address`].value' --output text)
curl -s "http://$CONN_IP:8083/connectors/source.jdbc.appian-connector-dbo-1/status" | \
jq '.tasks[0].trace' | grep -i "timestamp\|pckg_id\|offset"
# 3. Check BigMAC topic data
echo "=== BigMAC Topic Validation ==="
aws lambda invoke --function-name {bigmac-debugger} \
--payload '{"topic":"aws.appian.cmcs.MCP_SPA_PCKG","numRecords":3}' \
--region us-east-1 /dev/stdout | grep -o '"PCKG_ID":[^,]*' | head -3
echo "Package data validation complete!"# 1. Check SNS topic configuration
aws sns get-topic-attributes --topic-arn "arn:aws:sns:us-east-1:{account}:Alerts-appian-connector-alerts-{stage}"
# 2. Check EventBridge rule
aws events describe-rule --name "appian-connector-{stage}-ecs-task-failure"
# 3. Test alert manually
aws events put-events --entries '[{
"Source": "test.appian",
"DetailType": "ECS Task State Change",
"Detail": "{\"test\": \"manual alert test\"}"
}]'# 1. Check testConnectors Lambda execution
aws logs filter-log-events \
--log-group-name "/aws/lambda/appian-connector-{stage}-testConnectors" \
--start-time $(date -d '1 hour ago' +%s)000 \
--filter-pattern "ERROR"
# 2. Verify Lambda permissions
aws iam get-role-policy --role-name appian-connector-{stage}-{region}-lambdaRole \
--policy-name appian-connector-{stage}-lambda-policy | grep cloudwatch:PutMetricData
# 3. Check connector API accessibility
./run connect --stage {stage} --service connector
# Inside container:
curl -X GET http://localhost:8083/connectors# End-to-end data flow verification
echo "=== End-to-End Package Data Flow Test ==="
# 1. Create test package in Oracle (if permitted)
sqlplus ${APPIAN_DB_USER}/${APPIAN_DB_PASSWORD}@${APPIAN_DB_HOST}:${APPIAN_DB_PORT}/${APPIAN_DB_NAME} << EOF
-- Note: Only use in development environments
INSERT INTO appian_schema.MCP_SPA_PCKG (
PCKG_ID, REPLICA_TIMESTAMP, STATE_CD, SPA_ID, CRNT_STUS
) VALUES (
999999, SYSTIMESTAMP, 'TEST', 'TEST-001', 'TEST_STATUS'
);
COMMIT;
EOF
# 2. Wait for connector processing
sleep 10
# 3. Verify message in BigMAC
aws lambda invoke --function-name {bigmac-debugger} \
--payload '{"topic":"aws.appian.cmcs.MCP_SPA_PCKG","numRecords":5}' \
--region us-east-1 /dev/stdout | grep "999999"
# 4. Cleanup test data (if created)
sqlplus ${APPIAN_DB_USER}/${APPIAN_DB_PASSWORD}@${APPIAN_DB_HOST}:${APPIAN_DB_PORT}/${APPIAN_DB_NAME} << EOF
DELETE FROM appian_schema.MCP_SPA_PCKG WHERE PCKG_ID = 999999;
COMMIT;
EOF
echo "Data flow verification complete!"# Monitor container resource usage over time
aws cloudwatch get-metric-statistics \
--namespace "AWS/ECS" \
--metric-name "CPUUtilization" \
--dimensions Name=ServiceName,Value=kafka-connect Name=ClusterName,Value=appian-connector-{stage}-connect \
--start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 86400 --statistics Average,Maximum
# Check memory trends
aws cloudwatch get-metric-statistics \
--namespace "AWS/ECS" \
--metric-name "MemoryUtilization" \
--dimensions Name=ServiceName,Value=kafka-connect Name=ClusterName,Value=appian-connector-{stage}-connect \
--start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 86400 --statistics Average,Maximum- Architecture Overview - Understanding the 2-service architecture for monitoring setup
- Services Documentation - Lambda function monitoring and ECS container health
- Configuration Reference - Monitoring configuration and performance tuning
- Operations Manual - Daily monitoring tasks and troubleshooting procedures
- Integration Guide - End-to-end monitoring and data flow validation
- Deployment Guide - Post-deployment monitoring setup and verification
- CloudWatch Dashboard - Real-time connector metrics and performance
- ECS Console - Container health and resource monitoring
- Lambda Console - Function performance and error monitoring
- SNS Console - Alert topic management and subscriptions
- CloudWatch Documentation - Complete monitoring service guide
- ECS Monitoring Best Practices - Container monitoring guidance
- Kafka Connect Monitoring - Connector-specific monitoring
# Health check
curl -s http://{ecs-ip}:8083/connectors/source.jdbc.appian-connector-dbo-1/status | jq .
# Recent performance
aws cloudwatch get-metric-statistics \
--namespace "appian-connector-{stage}" \
--metric-name "source.jdbc.appian-connector-dbo-1_failures" \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 300 --statistics SumNext: Explore Services for detailed component architecture or Operations for daily monitoring procedures.