Security¶
MLflow Secrets Auth implements a comprehensive security model designed to protect credentials and prevent unauthorized access while maintaining usability and performance.
Threat Model¶
Assets¶
Primary Assets: - Authentication credentials (tokens, passwords, API keys) - Secret management system access - MLflow server access - Application configuration
Secondary Assets: - Network communications - Cache contents - Log data - System configuration
Threats¶
Credential Exposure¶
T1: Credential Leakage - Description: Credentials sent to unauthorized servers - Impact: Unauthorized access to MLflow servers - Mitigation: Host allowlisting, URL validation
T2: Log Exposure - Description: Credentials visible in application logs - Impact: Credential compromise through log access - Mitigation: Automatic credential redaction
T3: Memory Dumps - Description: Credentials visible in process memory dumps - Impact: Credential extraction from memory - Mitigation: Secure memory handling, limited credential lifetime
Network Attacks¶
T4: Man-in-the-Middle - Description: Interception of credential transmission - Impact: Credential theft during network transit - Mitigation: TLS enforcement, certificate validation
T5: DNS Poisoning - Description: Redirection to malicious servers - Impact: Credentials sent to attacker-controlled servers - Mitigation: Host allowlisting with exact hostnames
Configuration Attacks¶
T6: Environment Variable Exposure - Description: Exposure of configuration through environment variables - Impact: Access to secret paths and authentication parameters - Mitigation: Principle of least privilege, secure configuration management
T7: Cache Poisoning - Description: Injection of malicious credentials into cache - Impact: Use of compromised credentials - Mitigation: Cache integrity validation, TTL limits
Security Controls¶
Host Allowlisting¶
Host allowlisting is the primary defense against credential leakage to unauthorized servers.
Implementation¶
def _is_host_allowed(self, hostname: str) -> bool:
"""Validate hostname against allowlist patterns."""
allowed_hosts = get_allowed_hosts()
if not allowed_hosts:
return True # No restrictions if not configured
return any(
fnmatch.fnmatch(hostname, pattern)
for pattern in allowed_hosts
)
Configuration Patterns¶
Exact Hostnames:
Wildcard Patterns:
# Subdomain wildcards
MLFLOW_SECRETS_ALLOWED_HOSTS="*.company.com"
# Prefix wildcards
MLFLOW_SECRETS_ALLOWED_HOSTS="mlflow-*"
# Complex patterns
MLFLOW_SECRETS_ALLOWED_HOSTS="*.prod.company.com,*.staging.company.com,localhost"
Security Considerations¶
- Default Behavior: If not configured, allows all hosts (development-friendly but insecure)
- Wildcard Risks: Overly broad patterns can allow unintended hosts
- Pattern Validation: Patterns are validated using shell-style globbing
- Case Sensitivity: Hostname matching is case-insensitive
Best Practices¶
-
Always Enable in Production:
-
Use Specific Patterns:
-
Environment-Specific Configuration:
Credential Redaction¶
All sensitive data is automatically redacted from logs and error messages to prevent accidental exposure.
Redaction Patterns¶
The system recognizes and redacts common credential patterns:
_REDACT_PATTERNS = [
# Bearer tokens: "Bearer <token>"
re.compile(r"(Bearer\s+)([A-Za-z0-9._\-]+)"),
# Basic auth: "Basic <b64>"
re.compile(r"(Basic\s+)([A-Za-z0-9+/=]+)"),
# JWT tokens
re.compile(r"(eyJ[A-Za-z0-9._\-]+\.[A-Za-z0-9._\-]+\.[A-Za-z0-9._\-]+)"),
# JSON fields: "token" / "password" / "secret" / "key"
re.compile(r'("(?:token|password|secret|key)"\s*:\s*")([^"]+)(")'),
# URL credentials: https://user:pass@host
re.compile(r"(https?://[^:]+:)([^@]+)(@)"),
]
Masking Strategy¶
Sensitive values are masked while preserving context:
def mask_secret(value: str, mask_char: str = "*", show_chars: int = 4) -> str:
"""Mask secret while showing prefix/suffix for debugging."""
if len(value) <= show_chars * 2:
return mask_char * max(3, len(value))
return f"{value[:show_chars]}{mask_char * 8}{value[-show_chars:]}"
# Example output: "abcd********1234"
Log Output Examples¶
Before Redaction:
Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
Secret data: {"token": "abc123def456", "password": "secret123"}
After Redaction:
Authorization: Bearer eyJh********cCI6
Secret data: {"token": "abc1********def4", "password": "secr********123"}
Secure Caching¶
The caching layer implements security measures to protect cached credentials.
Cache Security Features¶
In-Memory Only: - No persistence to disk or external storage - Automatic cleanup on process termination - Memory-only storage prevents credential recovery
TTL Enforcement:
class CacheEntry:
def __init__(self, data: dict, ttl: int):
self.data = data
self.expiry = time.time() + ttl
def is_expired(self) -> bool:
return time.time() > self.expiry
Cache Isolation: - Per-configuration cache keys prevent cross-contamination - Different providers maintain separate cache spaces - Cache keys include provider-specific identifiers
Automatic Cache Busting:
def _handle_auth_failure(self, response: requests.Response):
"""Clear cache on authentication failure."""
if response.status_code in (401, 403):
cache_key = self._get_cache_key()
self._cache.delete(cache_key)
logger.info("Cache cleared due to authentication failure")
Cache Security Best Practices¶
-
Appropriate TTL Values:
-
Regular Cache Rotation:
-
Memory Protection:
- Use memory-mapped regions when available
- Implement secure memory wiping (platform-dependent)
- Limit cache size to prevent memory exhaustion
Network Security¶
TLS Enforcement¶
All communications with secret management systems use TLS by default:
# Vault
VAULT_ADDR="https://vault.company.com" # HTTPS required
# AWS
# TLS enforced by boto3 by default
# Azure
# TLS enforced by Azure SDK by default
Certificate Validation¶
Certificate validation is enabled by default with options for testing:
# Production: Always validate certificates
VAULT_SKIP_VERIFY=false
# Development: May skip validation (NOT recommended)
VAULT_SKIP_VERIFY=true
Network Timeouts¶
Reasonable timeouts prevent hanging connections:
# Default timeouts
NETWORK_TIMEOUT = 30 # seconds
CONNECT_TIMEOUT = 10 # seconds
READ_TIMEOUT = 20 # seconds
Authentication Security¶
Provider-Specific Security¶
HashiCorp Vault: - Token-based authentication with expiration - AppRole authentication for automation - Policy-based access control - Audit logging support
AWS Secrets Manager: - IAM-based access control - Cross-account access via roles - Resource-based policies - CloudTrail audit logging
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": ["secretsmanager:GetSecretValue"],
"Resource": "arn:aws:secretsmanager:region:account:secret:mlflow/*"
}]
}
Azure Key Vault: - Azure AD authentication - Managed identity support - Role-based access control (RBAC) - Activity logging support
# Key Vault access policy
az keyvault set-policy \
--name vault-name \
--object-id user-id \
--secret-permissions get
Credential Rotation¶
The plugin supports credential rotation through cache invalidation:
- Automatic Detection: Authentication failures trigger cache clearing
- Manual Rotation: CLI commands can clear cache
- TTL-Based Rotation: Regular cache expiration forces re-fetch
# Manual cache clearing (future feature)
mlflow-secrets-auth cache clear
# TTL-based rotation
MLFLOW_VAULT_TTL_SEC=300 # Rotate every 5 minutes
Principle of Least Privilege¶
IAM Permissions¶
Vault Example:
AWS Example:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": ["secretsmanager:GetSecretValue"],
"Resource": ["arn:aws:secretsmanager:us-east-1:123456789012:secret:mlflow/auth-*"],
"Condition": {
"StringEquals": {
"secretsmanager:ResourceTag/Environment": "production"
}
}
}]
}
Azure Example:
# Custom role with minimal permissions
az role definition create --role-definition '{
"Name": "MLflow Secrets Reader",
"Description": "Read MLflow secrets only",
"Actions": [],
"DataActions": [
"Microsoft.KeyVault/vaults/secrets/getSecret/action"
],
"AssignableScopes": ["/subscriptions/{subscription-id}"]
}'
Network Restrictions¶
- VPC/VNet Restrictions: Limit access to secret managers from specific networks
- IP Allowlisting: Configure secret managers to allow specific IP ranges
- Private Endpoints: Use private connectivity when available
Security Monitoring¶
Audit Logging¶
Enable audit logging on secret management systems:
Vault:
AWS:
Azure:
Application Logging¶
The plugin provides security-relevant logging:
# Security events logged:
# - Authentication attempts
# - Host allowlist violations
# - Cache operations
# - Configuration changes
# - Error conditions
logger.warning("Host not allowed: %s", hostname)
logger.info("Secret fetched successfully", extra={"provider": "vault"})
logger.error("Authentication failed", extra={"provider": "aws", "error_code": 403})
Monitoring Recommendations¶
- Failed Authentication Attempts: Monitor for repeated failures
- Unusual Access Patterns: Alert on access from unexpected sources
- Configuration Changes: Track environment variable modifications
- Performance Anomalies: Monitor secret fetch latency
Security Testing¶
Security Validation¶
# Test host allowlisting
MLFLOW_SECRETS_ALLOWED_HOSTS="trusted.com" \
mlflow-secrets-auth doctor --dry-run https://untrusted.com
# Test credential redaction
MLFLOW_SECRETS_LOG_LEVEL=DEBUG mlflow-secrets-auth doctor
# Test authentication failure handling
# (Requires invalid credentials)
Penetration Testing Considerations¶
- Network Segmentation: Test network-level restrictions
- Credential Validation: Verify credential requirements
- Host Validation: Test allowlist bypass attempts
- Cache Security: Verify cache isolation and cleanup
Security Compliance¶
Standards Alignment¶
The plugin design aligns with common security standards:
- NIST Cybersecurity Framework: Identify, Protect, Detect, Respond, Recover
- OWASP Top 10: Addresses authentication and sensitive data exposure
- SOC 2 Type II: Supports security and availability controls
- ISO 27001: Implements information security management controls
Compliance Features¶
- Data Classification: Treats all credentials as sensitive data
- Access Controls: Implements role-based access through secret managers
- Audit Trails: Provides comprehensive logging for compliance reporting
- Encryption: Uses TLS for data in transit, relies on secret managers for data at rest
Security Best Practices¶
Deployment Security¶
-
Environment Isolation:
-
Credential Rotation:
-
Network Security:
-
Monitoring and Alerting:
Development Security¶
- Secure Defaults: Plugin defaults to secure configurations
- Developer Training: Educate developers on security implications
- Code Review: Include security review in development process
- Testing: Include security testing in CI/CD pipelines
Operational Security¶
- Regular Updates: Keep plugin and dependencies updated
- Vulnerability Scanning: Scan for known vulnerabilities
- Incident Response: Have procedures for credential compromise
- Backup and Recovery: Maintain secure backup procedures
Next Steps¶
- Caching and Retries - Performance and reliability mechanisms
- Architecture - Overall system design and components
- Provider Documentation - Provider-specific security considerations
- Troubleshooting - Security-related troubleshooting