Skip to content

Operations

Benjamin Paige edited this page Sep 16, 2025 · 5 revisions

Operations

Appian Connector Daily Operations Guide

Page Navigation: ← Home | Daily Tasks | Service Management | Database Operations | Troubleshooting | Next: Monitoring →


Daily Operational Tasks

MACPRO Data Health Check Routine

graph TB
    START[Daily Health Check] --> CONNECTOR[Check JDBC Connector]
    CONNECTOR --> APPIAN[Verify Appian Database Connection]
    APPIAN --> TOPICS[Check Package Data Flow]
    TOPICS --> BIGMAC[Validate BigMAC Integration]
    BIGMAC --> MONITOR[Review Monitoring Metrics]
    MONITOR --> ALERTS[Check Alert Status]
    ALERTS --> COMPLETE[Health Check Complete ✅]
    
    CONNECTOR -.->|Issues Found| TROUBLESHOOT[Troubleshooting Procedures]
    APPIAN -.->|Issues Found| TROUBLESHOOT
    TOPICS -.->|Issues Found| TROUBLESHOOT
    TROUBLESHOOT --> RESOLUTION[Issue Resolution]
    RESOLUTION --> COMPLETE
Loading

Morning Checklist (3 minutes)

Task Check Method Expected Result
[ ] JDBC Connector Status CloudWatch metrics or connector API RUNNING state
[ ] ECS Service Health ECS console service status ACTIVE with 1/1 running
[ ] Package Data Flow BigMAC debugger Lambda Recent MCP_SPA_PCKG messages
[ ] Alert Review Slack #cms-bigmac channel No critical alerts
[ ] Oracle Connectivity Connector trace logs Successful database polls

Service Management

JDBC Connector Operations

Check Connector Status

# Method 1: CloudWatch Metrics
aws cloudwatch get-metric-statistics \
  --namespace "appian-connector-{stage}" \
  --metric-name "source.jdbc.appian-connector-dbo-1_failures" \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 --statistics Sum

# Method 2: Direct Connector API Access
./run connect --stage {stage} --service connector
# Inside container:
curl -X GET http://localhost:8083/connectors/source.jdbc.appian-connector-dbo-1/status

Monitor Package Data Processing

# Check recent package data in BigMAC
aws lambda invoke --function-name {bigmac-debugger} \
  --payload '{
    "topic": "aws.appian.cmcs.MCP_SPA_PCKG",
    "partition": 0,
    "numRecords": 5
  }' \
  --region us-east-1 /dev/stdout

# Verify connector is polling Oracle
curl -X GET http://localhost:8083/connectors/source.jdbc.appian-connector-dbo-1/status | \
  jq '.tasks[0].trace' | grep -i "poll\|query\|rows"

Kafka Connect UI Access

For graphical connector management and monitoring:

# 1. Get port forwarding command
./run connect --stage {stage} --service connector

# 2. Copy and run the 'ecs-exec-pf' command from output
# This forwards Kafka Connect UI to localhost:8000

# 3. Access UI in browser
open http://localhost:8000

# UI provides:
# - Visual connector status monitoring
# - Configuration management interface  
# - Real-time task and worker metrics
# - Package data flow visualization

Restart JDBC Connector

# Method 1: Restart entire ECS service
aws ecs update-service --cluster appian-connector-{stage}-connect \
  --service kafka-connect --force-new-deployment

# Method 2: Restart specific connector
./run connect --stage {stage} --service connector
# Inside container:
curl -X POST http://localhost:8083/connectors/source.jdbc.appian-connector-dbo-1/restart

# Method 3: Restart connector task only
curl -X POST http://localhost:8083/connectors/source.jdbc.appian-connector-dbo-1/tasks/0/restart

Database Connection Management

Oracle Database Operations

Connection Testing

# Test Oracle connectivity from connector
./run connect --stage {stage} --service connector

# Inside container:
# Create test connection file
cat > test-oracle.sql << EOF
SELECT COUNT(*) as package_count 
FROM appian_schema.MCP_SPA_PCKG 
WHERE REPLICA_TIMESTAMP > SYSDATE - 1;
EOF

# Test connection
sqlplus ${APPIAN_DB_USER}/${APPIAN_DB_PASSWORD}@${APPIAN_DB_HOST}:${APPIAN_DB_PORT}/${APPIAN_DB_NAME} @test-oracle.sql

Package Data Validation

-- Verify package data integrity
SELECT 
  COUNT(*) as total_packages,
  COUNT(DISTINCT PCKG_ID) as unique_packages,
  MAX(REPLICA_TIMESTAMP) as latest_update,
  COUNT(CASE WHEN REPLICA_TIMESTAMP > SYSDATE - 1 THEN 1 END) as recent_updates
FROM appian_schema.MCP_SPA_PCKG;

-- Check for missing REPLICA_TIMESTAMP or PCKG_ID
SELECT COUNT(*) as null_replica_timestamp
FROM appian_schema.MCP_SPA_PCKG 
WHERE REPLICA_TIMESTAMP IS NULL;

SELECT COUNT(*) as null_pckg_id
FROM appian_schema.MCP_SPA_PCKG 
WHERE PCKG_ID IS NULL;

Oracle Connection Management

Password Management

# Update Oracle credentials in Secrets Manager
aws secretsmanager update-secret \
  --secret-id "appian-connector/{stage}/dbInfo" \
  --secret-string '{
    "ip": "appian-oracle-host.cms.gov",
    "port": "1521",
    "db": "appian-db",
    "user": "appian_reader",
    "password": "NEW_PASSWORD",
    "schema": "appian_schema"
  }'

# Restart connector to pick up new credentials
aws ecs update-service --cluster appian-connector-{stage}-connect \
  --service kafka-connect --force-new-deployment

Performance Monitoring

MACPRO Package Data Metrics

Metric Normal Range Alert Threshold Action Required
Connector Status RUNNING FAILED/PAUSED Restart connector
Package Processing Rate 10-100 packages/minute <1/minute for >10 min Check database connectivity
ECS CPU Usage 10-50% >80% Scale up resources
ECS Memory Usage 20-60% >85% Increase memory allocation
Oracle Connection Pool <5 active >10 active Check query performance

Performance Optimization

Connector Performance Tuning

{
  "config": {
    "poll.interval.ms": 1000,      # Increase frequency for real-time processing
    "batch.max.rows": 2000,       # Increase batch size for higher throughput
    "connection.attempts": 3,      # Database retry attempts
    "connection.backoff.ms": 10000, # Backoff between retries
    "query.timeout.ms": 60000     # 1-minute query timeout
  }
}

Resource Scaling

# High-volume processing configuration
high_volume_params:
  taskCpu: "4096"                 # Double CPU for high load
  taskMemory: "8192"              # Double memory for large batches
  connectContainerCpu: "4096"
  connectContainerMemory: "6144"  # Leave headroom for Oracle client

Troubleshooting Procedures

Common Issues

Oracle Connection Issues

# 1. Verify Oracle database accessibility
./run connect --stage {stage} --service connector

# Inside container:
tnsping ${APPIAN_DB_HOST}:${APPIAN_DB_PORT}/${APPIAN_DB_NAME}
sqlplus ${APPIAN_DB_USER}/${APPIAN_DB_PASSWORD}@${APPIAN_DB_HOST}:${APPIAN_DB_PORT}/${APPIAN_DB_NAME}

# 2. Check security groups
aws ec2 describe-security-groups --group-ids {connector-security-group} \
  --query 'SecurityGroups[0].IpPermissionsEgress[?IpProtocol==`tcp` && FromPort==`1521`]'

# 3. Verify credentials
aws secretsmanager get-secret-value --secret-id "appian-connector/{stage}/dbInfo" | jq .

Connector Configuration Problems

# 1. Check connector configuration
curl -X GET http://localhost:8083/connectors/source.jdbc.appian-connector-dbo-1/config | jq .

# 2. Validate table access
curl -X GET http://localhost:8083/connectors/source.jdbc.appian-connector-dbo-1/status | \
  jq '.tasks[0].trace' | grep -i "table\|schema\|permission"

# 3. Test JDBC driver
java -cp /usr/share/confluent-hub-components/confluentinc-kafka-connect-jdbc/lib/ojdbc10.jar \
  oracle.jdbc.OracleDriver \
  "jdbc:oracle:thin:@${APPIAN_DB_HOST}:${APPIAN_DB_PORT}:${APPIAN_DB_NAME}" \
  "${APPIAN_DB_USER}" "${APPIAN_DB_PASSWORD}"

Package Data Flow Issues

# 1. Check recent Oracle data changes
sqlplus ${APPIAN_DB_USER}/${APPIAN_DB_PASSWORD}@${APPIAN_DB_HOST}:${APPIAN_DB_PORT}/${APPIAN_DB_NAME} << EOF
SELECT COUNT(*) as recent_changes 
FROM appian_schema.MCP_SPA_PCKG 
WHERE REPLICA_TIMESTAMP > SYSDATE - INTERVAL '1' HOUR;
EXIT;
EOF

# 2. Verify topic message flow
aws lambda invoke --function-name {bigmac-debugger} \
  --payload '{"topic":"aws.appian.cmcs.MCP_SPA_PCKG","numRecords":3}' \
  --region us-east-1 /dev/stdout

# 3. Check connector offset progression
curl -X GET http://localhost:8083/connectors/source.jdbc.appian-connector-dbo-1/status | \
  jq '.tasks[0].trace' | grep -i "offset\|timestamp"

Operational Scripts

Daily Operations

Complete Health Check

#!/bin/bash
# Appian Connector Health Check Script

STAGE=$1
echo "Appian Connector Health Check - Stage: $STAGE"

# 1. Check ECS service status
echo "=== ECS Service Status ==="
aws ecs describe-services --cluster appian-connector-$STAGE-connect \
  --services kafka-connect \
  --query 'services[0].{Service:serviceName,Status:status,Running:runningCount,Desired:desiredCount}'

# 2. Check connector status
echo "=== JDBC Connector Status ==="
CONN_IP=$(aws ecs describe-tasks --cluster appian-connector-$STAGE-connect \
  --tasks $(aws ecs list-tasks --cluster appian-connector-$STAGE-connect --query 'taskArns[0]' --output text) \
  --query 'tasks[0].attachments[0].details[?name==`privateIPv4Address`].value' --output text)

if [ ! -z "$CONN_IP" ]; then
  curl -s "http://$CONN_IP:8083/connectors" | jq .
  curl -s "http://$CONN_IP:8083/connectors/source.jdbc.appian-connector-dbo-1/status" | jq .
else
  echo "Connector container not available"
fi

# 3. Check recent CloudWatch metrics
echo "=== Recent Metrics ==="
aws cloudwatch get-metric-statistics \
  --namespace "appian-connector-$STAGE" \
  --metric-name "source.jdbc.appian-connector-dbo-1_failures" \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 --statistics Sum

# 4. Verify recent package data
echo "=== Recent Package Data ==="
aws lambda invoke --function-name {bigmac-debugger} \
  --payload '{"topic":"aws.appian.cmcs.MCP_SPA_PCKG","numRecords":1}' \
  --region us-east-1 /dev/stdout | grep -o '"value":"[^"]*"' | head -1

echo "Health check complete!"

Emergency Recovery

#!/bin/bash
# Emergency Appian Connector Recovery

STAGE=$1
echo "Emergency recovery for Appian Connector - Stage: $STAGE"

# 1. Force restart ECS service
echo "Restarting ECS service..."
aws ecs update-service --cluster appian-connector-$STAGE-connect \
  --service kafka-connect --force-new-deployment

# 2. Wait for service stabilization
echo "Waiting for service to stabilize..."
sleep 120

# 3. Verify connector status
echo "Verifying connector recovery..."
CONN_IP=$(aws ecs describe-tasks --cluster appian-connector-$STAGE-connect \
  --tasks $(aws ecs list-tasks --cluster appian-connector-$STAGE-connect --query 'taskArns[0]' --output text) \
  --query 'tasks[0].attachments[0].details[?name==`privateIPv4Address`].value' --output text)

curl -s "http://$CONN_IP:8083/connectors/source.jdbc.appian-connector-dbo-1/status" | jq .

echo "Emergency recovery complete - monitor for stability"

Maintenance Windows

Routine Maintenance

Oracle Database Maintenance Coordination

  1. Pre-maintenance: Coordinate with Appian DBA team
  2. Connector Preparation: Monitor package processing queue
  3. Maintenance Window: Oracle database updates, table maintenance
  4. Post-maintenance: Resume connector, verify package data flow
  5. Validation: Confirm MACPRO workflow data integrity

Container and Configuration Updates

# Update connector container image
aws secretsmanager update-secret \
  --secret-id "ecr/images/appian-connector/connector" \
  --secret-string "confluentinc/cp-kafka-connect:7.0.0"

# Update JDBC connector version
./run connect --stage {stage} --service connector
# Inside container:
confluent-hub install confluentinc/kafka-connect-jdbc:11.0.0 --no-prompt

# Restart to apply updates
./run deploy --stage {stage}

Emergency Procedures

MACPRO Workflow Data Outage

  1. Immediate: Check BigMAC cluster health via Slack #cms-bigmac
  2. Connector: Verify ECS service status and restart if needed
  3. Database: Confirm Appian Oracle database accessibility
  4. Escalation: Contact Appian DBA team if database issues
  5. Recovery: Monitor package data catch-up after restoration

Container Resource Issues

# Scale up ECS service resources if needed
aws ecs update-service --cluster appian-connector-{stage}-connect \
  --service kafka-connect \
  --task-definition {updated-task-definition-with-more-resources}

# Monitor resource usage
aws cloudwatch get-metric-statistics \
  --namespace "AWS/ECS" \
  --metric-name "CPUUtilization" \
  --dimensions Name=ServiceName,Value=kafka-connect Name=ClusterName,Value=appian-connector-{stage}-connect \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 --statistics Average,Maximum

Data Quality Issues

Package Data Validation

# Compare Oracle source vs Kafka target
# 1. Check Oracle package count
sqlplus ${APPIAN_DB_USER}/${APPIAN_DB_PASSWORD}@${APPIAN_DB_HOST}:${APPIAN_DB_PORT}/${APPIAN_DB_NAME} << EOF
SELECT COUNT(*) as total_packages FROM appian_schema.MCP_SPA_PCKG;
EXIT;
EOF

# 2. Check Kafka topic message count
aws lambda invoke --function-name {bigmac-debugger} \
  --payload '{"action": "health"}' \
  --region us-east-1 /dev/stdout

# 3. Verify recent processing
curl -X GET http://localhost:8083/connectors/source.jdbc.appian-connector-dbo-1/status | \
  jq '.tasks[0].trace' | grep -i "last.*timestamp\|last.*id"

Troubleshooting Guide

Issue Classification

graph TB
    ISSUE[Issue Detected] --> CLASSIFY{Issue Type?}
    
    CLASSIFY -->|Connector Down| CONN_DOWN[JDBC Connector Failure<br/>Priority: High]
    CLASSIFY -->|Oracle Connection| DB_CONN[Database Connection Issue<br/>Priority: High]
    CLASSIFY -->|Performance| PERF[Performance Degradation<br/>Priority: Medium]
    CLASSIFY -->|Data Quality| DATA[Package Data Issues<br/>Priority: Medium]
    
    CONN_DOWN --> RESTART_CONN[1. Restart ECS service<br/>2. Check connector logs<br/>3. Verify JDBC configuration]
    DB_CONN --> CHECK_ORACLE[1. Test Oracle connectivity<br/>2. Verify credentials<br/>3. Check Oracle status]
    PERF --> TUNE_PERF[1. Check resource utilization<br/>2. Tune polling parameters<br/>3. Scale ECS resources]
    DATA --> VALIDATE_DATA[1. Compare source vs target<br/>2. Check query logic<br/>3. Validate package schema]
Loading

Common Problem Resolution

JDBC Connector Failed to Start

# 1. Check ECS service events
aws ecs describe-services --cluster appian-connector-{stage}-connect \
  --services kafka-connect --query 'services[0].events'

# 2. Check container logs
aws logs get-log-events \
  --log-group-name "/aws/ecs/appian-connector-{stage}" \
  --log-stream-name "ecs/kafka-connect/{task-id}" | grep -i error

# 3. Verify Oracle JDBC driver
./run connect --stage {stage} --service connector
# Inside container:
ls -la /usr/share/confluent-hub-components/confluentinc-kafka-connect-jdbc/lib/ojdbc10.jar
java -cp /usr/share/confluent-hub-components/confluentinc-kafka-connect-jdbc/lib/ojdbc10.jar \
  oracle.jdbc.OracleDriver

Oracle Database Access Issues

# 1. Test basic connectivity
telnet {appian-oracle-host} 1521

# 2. Test authentication
sqlplus ${APPIAN_DB_USER}/${APPIAN_DB_PASSWORD}@${APPIAN_DB_HOST}:${APPIAN_DB_PORT}/${APPIAN_DB_NAME}

# 3. Verify table access
sqlplus ${APPIAN_DB_USER}/${APPIAN_DB_PASSWORD}@${APPIAN_DB_HOST}:${APPIAN_DB_PORT}/${APPIAN_DB_NAME} << EOF
SELECT COUNT(*) FROM appian_schema.MCP_SPA_PCKG WHERE ROWNUM <= 1;
EXIT;
EOF

Package Data Processing Problems

# 1. Check connector query execution
curl -X GET http://localhost:8083/connectors/source.jdbc.appian-connector-dbo-1/status | \
  jq '.tasks[0].trace' | grep -A10 -B10 "query\|sql"

# 2. Verify timestamp and ID progression
curl -X GET http://localhost:8083/connectors/source.jdbc.appian-connector-dbo-1/status | \
  jq '.tasks[0].trace' | grep -i "timestamp\|pckg_id"

# 3. Test manual query execution
sqlplus ${APPIAN_DB_USER}/${APPIAN_DB_PASSWORD}@${APPIAN_DB_HOST}:${APPIAN_DB_PORT}/${APPIAN_DB_NAME} << EOF
SELECT CAST(PCKG_ID AS NUMERIC(8,0)) AS PCKG_ID, REPLICA_TIMESTAMP, STATE_CD, SPA_ID
FROM appian_schema.MCP_SPA_PCKG 
WHERE REPLICA_TIMESTAMP > SYSDATE - INTERVAL '1' HOUR
ORDER BY REPLICA_TIMESTAMP DESC, PCKG_ID DESC
FETCH FIRST 10 ROWS ONLY;
EXIT;
EOF

Service Simplicity Benefits

Operational Advantages

graph LR
    subgraph "Simple Architecture Benefits"
        SINGLE_SOURCE[Single Data Source<br/>One Oracle table<br/>Focused scope]
        MINIMAL_SERVICES[Minimal Services<br/>2 services only<br/>Reduced complexity]
        DIRECT_INTEGRATION[Direct Integration<br/>Oracle → BigMAC<br/>No intermediate processing]
    end
    
    subgraph "Operational Benefits"
        FAST_TROUBLESHOOT[Fast Troubleshooting<br/>Clear problem isolation<br/>Single point of failure]
        EASY_MAINTENANCE[Easy Maintenance<br/>Simple dependencies<br/>Quick updates]
        LOW_OVERHEAD[Low Overhead<br/>Minimal resources<br/>Cost effective]
    end

    SINGLE_SOURCE --> FAST_TROUBLESHOOT
    MINIMAL_SERVICES --> EASY_MAINTENANCE
    DIRECT_INTEGRATION --> LOW_OVERHEAD
Loading

Maintenance Simplicity

  • Single Table Monitoring: Only MCP_SPA_PCKG table requires attention
  • Simple Dependencies: alerts → connector (linear dependency)
  • Direct Troubleshooting: Issues easily traced to Oracle or Kafka connectivity
  • Quick Recovery: Single service restart usually resolves most issues

Emergency Response

Critical Contacts & Response Times

Contact Purpose Response Time Link
Platform Team BigMAC infrastructure issues <15 minutes 💬 Slack #cms-bigmac
Appian Team Database and workflow issues <30 minutes 💬 Slack #C04K1444K89
On-Call Engineer Urgent technical escalation <15 minutes Check rotation in team channels

Emergency Commands

# Emergency restart sequence
aws ecs update-service --cluster appian-connector-{stage}-connect \
  --service kafka-connect --force-new-deployment

# Health verification  
aws lambda invoke --function-name {bigmac-debugger} \
  --payload '{"topic":"aws.appian.cmcs.MCP_SPA_PCKG","numRecords":1}' \
  --region us-east-1 /dev/stdout

Related Documentation

Operational Resources

Implementation Resources

Next: Review Monitoring for comprehensive observability or Integration for BigMAC platform details.

Clone this wiki locally