Orchestrating MLOps with Bedrock AgentCore and SageMaker
Building an autonomous agent that handles the entire ML lifecycle: from exploratory data analysis in a Code Interpreter sandbox to Bayesian hyperparameter tuning and model deployment on AWS SageMaker.

Table of Contents
When you build machine learning pipelines, you realize quickly that the modeling itself is only a small fraction of the effort. The real work is in the plumbing: cleaning datasets, exploring stationarity and seasonality, engineering lagging or rolling window features, running hyperparameter search, tracking jobs, building reports, and deploying endpoints.
Traditionally, data scientists write messy Jupyter notebooks to do this manually. But what if an autonomous AI agent could handle this entire lifecycle?
I set out to build exactly that: a production-ready forecasting system that takes a raw CSV in S3, performs advanced statistical analysis, tunes hyperparameters, trains ARIMA/LSTM models, and deploys a live endpoint—all managed by an LLM-driven orchestrator.
Here is how I built a three-layer architecture combining AWS Bedrock AgentCore, Code Interpreter sandboxes, and AWS SageMaker.
The Three-Layer Architecture
An autonomous system needs to split concerns between reasoning, executing code, and running heavy machine learning jobs. I designed a three-layer architecture:
graph TD
subgraph Layer1["Layer 1: Orchestration (AgentCore)"]
Agent["Intelligent Agent<br/>(16 Tools)"]
Boto3["AWS SDK (boto3)"]
end
subgraph Layer2["Layer 2: Data Science (Code Interpreter)"]
Sandbox["Isolated Python Sandbox"]
EDA["EDA / Statistical Tests"]
Features["Feature Engineering"]
Plotly["Plotly HTML Reports"]
end
subgraph Layer3["Layer 3: Scalable ML (SageMaker)"]
Tuning["Bayesian HPO Jobs"]
Training["Scale Model Training"]
Endpoint["Production REST Endpoint"]
end
Agent --> Sandbox
Agent --> Boto3
Boto3 --> Tuning
Boto3 --> Training
Boto3 --> Endpoint
Let's break down how these layers interact.
Layer 1: Orchestration with AWS Bedrock AgentCore
The orchestrator is built using AWS Bedrock AgentCore (packaged via the Strands library). The agent has access to 16 specialized tools ranging from statistical analysis to SageMaker deployment commands.
In agent.py, the agent is initialized with a list of Python tool references:
from strands import Agent
from agents.advanced_eda_agent import run_advanced_eda
from agents.intelligent_feature_engineering_agent import recommend_features, create_features
from agents.sagemaker_simple import (
create_sagemaker_training_job,
get_training_job_status,
deploy_sagemaker_model,
invoke_sagemaker_endpoint
)
agent = Agent(
name="IntelligentForecastingAgent",
description="Intelligent time series forecasting with ARIMA and LSTM model comparison.",
tools=[
run_advanced_eda,
recommend_features,
create_features,
create_sagemaker_training_job,
get_training_job_status,
deploy_sagemaker_model,
invoke_sagemaker_endpoint,
# ... other tools
]
)
The agent uses these tools to execute a 7-step forecasting workflow:
- Advanced EDA: Runs ADF/KPSS tests and seasonal decomposition.
- Feature Recommendations: The agent examines stats and recommends features.
- Feature Engineering: Generates rolling stats, lag columns, and calendar features.
- Bayesian Tuning: Launches SageMaker HPO to find optimal hyperparameters.
- Model Training: Trains classical models (ARIMA) and deep learning models (LSTM).
- Comparison: Evaluates metrics on hold-out test sets.
- Report Generation: Deploys the best model and outputs a presigned URL to an interactive Plotly report.
Layer 2: Safe Execution via Code Interpreter
A major challenge for LLM agents is handling arbitrary code execution. I cannot run unverified code written by an LLM directly on my application server or inside my main database container.
My solution is to offload all data science computations (EDA, feature engineering, Plotly charting) to an isolated Code Interpreter sandbox. This sandbox has pre-installed scientific libraries (pandas, scipy, statsmodels, plotly) but is locked down with limited CPU, memory, and network permissions.
Here is the design pattern I used for running our Advanced EDA tool:
@tool
def run_advanced_eda(dataset_s3_path: str, time_column: str = None, value_column: str = None) -> str:
"""
Runs stationarity tests (ADF, KPSS), seasonal decomposition, and ACF/PACF analysis.
"""
# 1. Parse the target S3 paths
bucket, key = parse_s3_uri(dataset_s3_path)
# 2. Build the Python script to run in the sandbox
code = f'''
import pandas as pd
import numpy as np
import json
import subprocess
from statsmodels.tsa.stattools import adfuller, kpss
from statsmodels.tsa.seasonal import seasonal_decompose
# Downloader helper - Sandbox has pre-configured read-only AWS CLI credentials
subprocess.run(['aws', 's3', 'cp', 's3://{bucket}/{key}', '/tmp/data.csv'], check=True)
df = pd.read_csv('/tmp/data.csv')
# ... Auto-detect time/value columns and clean data ...
series = df[value_col].dropna()
# Run ADF and KPSS tests
adf_stat, adf_pvalue, _, _, _, _ = adfuller(series)
kpss_stat, kpss_pvalue, _, _ = kpss(series, regression='c')
# Return results as JSON
print("===JSON_START===")
print(json.dumps({{
"adf": {{"statistic": adf_stat, "p_value": adf_pvalue}},
"kpss": {{"statistic": kpss_stat, "p_value": kpss_pvalue}}
}}))
print("===JSON_END===")
'''
# 3. Send script to the Code Interpreter sandbox and parse stdout
stdout = sandbox_client.execute(code)
return extract_json_from_stdout(stdout)
By packaging Python code as a text block and passing it to the sandbox, we keep our agent's host server secure. The sandbox pulls the data from S3, computes stats, and returns clean, structured JSON back to the agent.
Layer 3: Scalable ML Training and Deployment on SageMaker
While Code Interpreter is great for light calculations, it lacks the memory and compute capacity to train deep learning models (like LSTMs) or run large-scale hyperparameter searches. For that, I delegate to AWS SageMaker.
When the agent reaches the model training step, it uses the AWS SDK (boto3) to launch real SageMaker HPO (Hyperparameter Optimization) jobs using preconfigured Scikit-Learn or PyTorch ECR containers:
import boto3
import json
sagemaker = boto3.client('sagemaker')
@tool
def create_sagemaker_training_job(
job_name: str,
dataset_s3_path: str,
role_arn: str
) -> str:
"""
Launches a Scikit-Learn ECR container on SageMaker for training.
"""
try:
response = sagemaker.create_training_job(
TrainingJobName=job_name,
AlgorithmSpecification={
'TrainingImage': '683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3',
'TrainingInputMode': 'File'
},
RoleArn=role_arn,
InputDataConfig=[{
'ChannelName': 'train',
'DataSource': {
'S3DataSource': {
'S3DataType': 'S3Prefix',
'S3Uri': dataset_s3_path,
}
},
'ContentType': 'text/csv'
}],
OutputDataConfig={
'S3OutputPath': f's3://{BUCKET}/sagemaker/models/'
},
ResourceConfig={
'InstanceType': 'ml.m5.xlarge',
'InstanceCount': 1,
'VolumeSizeInGB': 10
},
HyperParameters={
'p': '2',
'd': '1',
'q': '3',
'seasonal-p': '1',
},
StoppingCondition={'MaxRuntimeInSeconds': 3600}
)
return json.dumps({'success': True, 'job_arn': response['TrainingJobArn']})
except Exception as e:
return json.dumps({'success': False, 'error': str(e)})
Because SageMaker jobs run asynchronously and can take 5 to 60 minutes, the agent uses a polling loop tool (get_training_job_status) to track the execution. Once the job succeeds, the agent triggers deploy_sagemaker_model to package the model artifact (model.tar.gz) from S3 and host it behind a real-time SageMaker endpoint.
The Two-Phase Data Pattern for Reports
Generating and serving rich reports presents a unique security challenge. The Code Interpreter sandbox (which generates the Plotly interactive HTML files) should never have write access to my production S3 buckets, nor should it hold long-term AWS credentials.
To solve this, I implemented a Two-Phase Data Handling pattern:
sequenceDiagram
participant Agent as AgentCore (Has S3 Write Credentials)
participant CI as Code Interpreter Sandbox (No S3 Credentials)
participant S3 as AWS S3
Agent->>CI: Execute charting script
CI->>CI: Generate Plotly chart & serialize to HTML
CI-->>Agent: Print HTML string to stdout
Agent->>Agent: Extract HTML string from stdout
Agent->>S3: Upload HTML content using boto3
S3-->>Agent: Return secure presigned URL
By keeping the AWS write credentials strictly in the Orchestration Layer (AgentCore) and treating the Code Interpreter as a pure calculation engine, I maintain least-privilege security across my cloud infrastructure.
Conclusion & Lessons Learned
Building an autonomous ML pipeline taught me several key lessons:
- Decouple CPU Profiles: Keep light calculations (cleaning, simple charts) inside rapid, cheap sandboxes, and reserve expensive cloud computing instances (SageMaker ml.m5/ml.p3 nodes) for heavy model training.
- Handle Asynchrony Gracefully: LLMs are naturally sequential. To prevent the agent from locking up during a 30-minute SageMaker job, use explicit state-machine trackers and periodic polling tools.
- Sandbox Everything: Never let an LLM-generated script run natively. Use a secure container runtime with strict network isolation.
By combining AWS Bedrock AgentCore's reasoning capabilities with SageMaker's massive scaling power, I successfully took a standard time-series pipeline and turned it into a fully autonomous MLOps platform.


