Skip to content

aws-samples/sample-sagemaker-mlflow-embedded-ui

Sample: Embedded Amazon SageMaker Managed MLflow UI

A custom React web application with the Amazon SageMaker managed MLflow tracking UI embedded via iframe. A Flask reverse proxy running on Amazon EC2 authenticates all requests to the SageMaker MLflow endpoint using AWS Signature Version 4 (SigV4) signing, enabling transparent access to both the MLflow UI and REST APIs.

The entire infrastructure is provisioned using AWS CDK in TypeScript, deploying four stacks: networking (VPC), SageMaker domain, MLflow resources (IAM role and S3 bucket), and the Flask application with an Application Load Balancer (ALB). The serverless MLflow App is created via the AWS CLI as part of an automated deployment script.

Architecture

Architecture Diagram

Prerequisites

  • AWS account with sufficient IAM permissions
  • AWS CLI v2.34.5 or later (required for create-mlflow-app commands)
  • AWS CDK v2 installed and bootstrapped (cdk bootstrap)
  • Node.js 18.x or later
  • Python 3.13 or later (locally, for deployment script JSON parsing)

Project Structure

├── bin/app.ts                      # CDK app entry point
├── lib/
│   ├── networking-stack.ts         # VPC, subnets, NAT gateway, VPC endpoints
│   ├── sagemaker-domain-stack.ts   # SageMaker domain and execution role
│   ├── managed-mlflow-stack.ts     # MLflow IAM role and S3 artifacts bucket
│   └── flask-app-stack.ts         # EC2, ALB, IAM roles, S3 helper upload
├── helpers/
│   ├── app/
│   │   ├── main.py                # Flask reverse proxy with SigV4 signing
│   │   ├── aws_utils.py           # SigV4 signing utilities
│   │   └── requirements.txt       # Python dependencies
│   ├── frontend/
│   │   ├── src/App.js             # React app with MLflow iframe
│   │   ├── build/                 # Pre-built React static files
│   │   └── package.json           # React dependencies
│   ├── install_python13.sh        # Python 3.13 installer for EC2
│   ├── setup_mlflow_proxy_app.sh  # Flask app setup script for EC2
│   └── mlflowproxy.service        # systemd service definition
├── deploy.sh                       # Automated deployment script
├── cleanup.sh                      # Automated teardown script

Deployment

1. Install CDK dependencies

npm install

2. Pre-build the React frontend

cd helpers/frontend
npm install
npm run build
cd ../..

3. Set environment variables

export CDK_DEFAULT_ACCOUNT=<your-aws-account-id>
export CDK_DEFAULT_REGION=<your-aws-region>

If you want to deploy to a region other than us-east-1, also set these to override any default region in ~/.aws/config:

export AWS_DEFAULT_REGION=<your-aws-region>
export AWS_REGION=<your-aws-region>

Note: If you previously deployed to a different region, delete the cached context file before redeploying: rm cdk.context.json

4. Bootstrap CDK (one-time per account/region)

npx cdk bootstrap aws://<your-aws-account-id>/<your-aws-region>

5. Run the deployment script

bash deploy.sh

This single script:

  1. Deploys the networking, SageMaker domain, and MLflow resources stacks via CDK
  2. Creates the serverless MLflow App via aws sagemaker create-mlflow-app
  3. Deploys the Flask App stack with the MLflow App ARN passed as CDK context

Note the ALB URL from the output.

6. Set up the application on EC2

Connect to the EC2 instance via AWS Systems Manager Session Manager, then run:

sudo bash /root/install_python13.sh
sudo bash /root/setup_mlflow_proxy_app.sh

7. Verify

Open the ALB URL in your browser. You should be redirected to /app, showing the React dashboard with the MLflow UI embedded in an iframe.

Test the health endpoint:

curl http://<ALB-URL>/health

Testing MLflow APIs

Create an experiment through the proxy:

curl -X POST http://<ALB-URL>/api/2.0/mlflow/experiments/create \
  -H "Content-Type: application/json" \
  -d '{"name": "my-first-experiment"}'

Cleanup

bash cleanup.sh

This destroys all CDK stacks in the correct order and deletes the serverless MLflow App. The MLflow artifacts S3 bucket has a RETAIN policy and must be manually deleted if no longer needed.

Key Technical Details

  • SigV4 signing: The Flask proxy signs requests with service name sagemaker and includes the x-sm-mlflow-app-arn header
  • MLflow endpoint: https://mlflow.sagemaker.<region>.app.aws
  • IAM: Least-privilege roles — EC2 instance role assumes a dedicated FlaskMlflowRole for MLflow API access
  • Iframe embedding: The proxy strips X-Frame-Options and gzip-related headers from upstream responses
  • VPC endpoints: Included for SageMaker API, STS, S3, CloudWatch, ECR, and KMS

Troubleshooting

Issue Solution
deploy.sh fails at MLflow App creation Ensure AWS CLI v2.34.5+ is installed (aws --version)
ALB returns 502 Bad Gateway Check Flask service: systemctl status mlflowproxy on the EC2 instance
MLflow UI shows blank page Verify gzip headers are stripped in main.py
MLflow REST API returns 403 Check SigV4 service name is sagemaker in aws_utils.py
Stack deletion fails Check CloudWatch logs for the VPC cleanup Lambda function

Security Considerations

  • The ALB is deployed with HTTP (port 80). For production, add HTTPS with an SSL/TLS certificate.
  • No authentication is configured on the ALB. For production, integrate with your SSO provider using ALB authentication with Amazon Cognito or OIDC.
  • The EC2 instance runs in a private subnet with no direct internet access.

License

This project is licensed under the MIT-0 License. See the LICENSE file for details.

Contributors

This project was developed and maintained by:

  • Manish Garg
  • Ashish Bhatt
  • Ram Yennapusa

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors