Self Hosted Management
Comprehensive guide for ongoing management, maintenance, and optimization of your self-hosted Vantage Compute instance.
Daily Operations
Health Monitoring
Regular checks to ensure system health and performance:
# Check service status
sudo systemctl status vantage nginx postgresql redis
# Monitor resource usage
htop
df -h
free -h
# Check application logs
tail -f /opt/vantage/logs/vantage.log
# Monitor API health
curl -s https://your-domain.com/api/health/ | jq
Performance Monitoring
Key metrics to monitor daily:
- CPU Usage: Should typically be under 70%
- Memory Usage: Monitor for memory leaks or excessive usage
- Disk Space: Ensure adequate free space (>20% recommended)
- Database Performance: Monitor query performance and connections
- Response Times: API and web interface response times
Log Management
# Rotate logs to prevent disk space issues
sudo logrotate /etc/logrotate.d/vantage
# Archive old logs
find /opt/vantage/logs -name "*.log.*" -mtime +30 -exec gzip {} \;
# Clean up old compressed logs
find /opt/vantage/logs -name "*.gz" -mtime +90 -delete
Weekly Maintenance
System Updates
# Update system packages
sudo apt update && sudo apt list --upgradable
sudo apt upgrade -y
# Update Vantage Compute (if updates available)
cd /opt/vantage/app
sudo -u vantage git pull origin main
sudo -u vantage pip3 install -r requirements.txt --upgrade
# Restart services if needed
sudo systemctl restart vantage
Database Maintenance
# Database vacuum and analyze
sudo -u postgres psql vantage -c "VACUUM ANALYZE;"
# Check database size and growth
sudo -u postgres psql vantage -c "SELECT pg_size_pretty(pg_database_size('vantage'));"
# Monitor slow queries
sudo -u postgres psql vantage -c "SELECT query, mean_time, calls FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10;"
Backup Verification
# Verify recent backups exist
ls -la /opt/vantage/backups/
# Test backup restoration (on test system)
gunzip -c /opt/vantage/backups/vantage_latest.sql.gz | psql -h test-server -U vantage vantage_test
Monthly Tasks
Capacity Planning
Review and analyze capacity metrics:
- Storage Growth: Monitor data and log storage growth trends
- User Growth: Track user registration and activity growth
- Performance Trends: Analyze response time and throughput trends
- Resource Utilization: Review CPU, memory, and disk utilization patterns
Security Reviews
# Review user accounts and permissions
sudo -u postgres psql vantage -c "SELECT username, is_active, is_superuser, last_login FROM auth_user;"
# Check for failed login attempts
grep "authentication failed" /opt/vantage/logs/vantage.log | tail -20
# Review SSL certificate expiration
openssl x509 -in /etc/ssl/certs/vantage.crt -noout -dates
# System security updates
sudo unattended-upgrade --dry-run
Performance Optimization
# Analyze database performance
sudo -u postgres psql vantage -c "SELECT schemaname, tablename, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch FROM pg_stat_user_tables WHERE seq_scan > 1000;"
# Check for missing indexes
sudo -u postgres psql vantage -c "SELECT schemaname, tablename FROM pg_stat_user_tables WHERE seq_scan > idx_scan AND seq_tup_read > 1000;"
# Optimize database if needed
sudo -u postgres psql vantage -c "REINDEX DATABASE vantage;"
User Management
Adding Users
# Create user via command line
cd /opt/vantage/app
python manage.py shell
# In Python shell:
from django.contrib.auth.models import User
user = User.objects.create_user('username', 'email@domain.com', 'password')
user.first_name = 'First'
user.last_name = 'Last'
user.save()
Managing Permissions
# List all users and their permissions
python manage.py shell -c "
from django.contrib.auth.models import User
for user in User.objects.all():
print(f'{user.username}: active={user.is_active}, staff={user.is_staff}, superuser={user.is_superuser}')
"
# Deactivate user
python manage.py shell -c "
from django.contrib.auth.models import User
user = User.objects.get(username='username')
user.is_active = False
user.save()
"
Password Management
# Reset user password
python manage.py changepassword username
# Force password reset on next login
python manage.py shell -c "
from django.contrib.auth.models import User
user = User.objects.get(username='username')
user.password_reset_required = True
user.save()
"
Configuration Management
Environment Configuration
Keep track of configuration changes:
# Backup current configuration
cp /opt/vantage/app/.env /opt/vantage/backups/.env.$(date +%Y%m%d)
# Version control for configuration
cd /opt/vantage/app
git add .env
git commit -m "Configuration update: $(date)"
Feature Flags and Settings
# List current settings
python manage.py shell -c "
from django.conf import settings
print('DEBUG:', settings.DEBUG)
print('ALLOWED_HOSTS:', settings.ALLOWED_HOSTS)
print('DATABASE:', settings.DATABASES['default']['NAME'])
"
# Update settings dynamically (if supported)
python manage.py shell -c "
from django.core.management import call_command
call_command('set_setting', 'feature_flag_name', 'True')
"
Scaling and Optimization
Horizontal Scaling
For high-traffic deployments:
# Add additional API servers
docker-compose scale api=3
# Configure load balancer (Nginx example)
upstream vantage_backend {
server 127.0.0.1:8000;
server 127.0.0.1:8001;
server 127.0.0.1:8002;
}
Database Optimization
# Optimize PostgreSQL configuration
# Edit /etc/postgresql/14/main/postgresql.conf
# Key settings for performance:
shared_buffers = 256MB # 25% of RAM
effective_cache_size = 1GB # 75% of RAM
work_mem = 4MB # For sorting operations
maintenance_work_mem = 64MB # For maintenance operations
checkpoint_completion_target = 0.9 # Smooth checkpoints
wal_buffers = 16MB # WAL buffers
Caching Configuration
# Configure Redis for caching
# Edit /etc/redis/redis.conf
maxmemory 512mb
maxmemory-policy allkeys-lru
save 900 1
save 300 10
save 60 10000
Monitoring and Alerting
System Monitoring Setup
# Install monitoring tools
sudo apt install prometheus node-exporter grafana
# Configure Prometheus to scrape metrics
cat > /etc/prometheus/prometheus.yml << 'EOF'
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'node'
static_configs:
- targets: ['localhost:9100']
- job_name: 'vantage'
static_configs:
- targets: ['localhost:8000']
metrics_path: '/metrics'
EOF
Application Monitoring
# Custom health check script
cat > /opt/vantage/health-check.sh << 'EOF'
#!/bin/bash
# Check API health
API_STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://your-domain.com/api/health/)
if [ "$API_STATUS" != "200" ]; then
echo "CRITICAL: API health check failed (HTTP $API_STATUS)"
exit 2
fi
# Check database connectivity
DB_STATUS=$(python3 -c "
import psycopg2
try:
conn = psycopg2.connect('postgresql://vantage:password@localhost:5432/vantage')
conn.close()
print('OK')
except:
print('CRITICAL')
")
if [ "$DB_STATUS" != "OK" ]; then
echo "CRITICAL: Database connectivity failed"
exit 2
fi
echo "OK: All health checks passed"
exit 0
EOF
chmod +x /opt/vantage/health-check.sh
Alerting Setup
# Email alerting script
cat > /opt/vantage/alert.sh << 'EOF'
#!/bin/bash
ALERT_EMAIL="admin@your-domain.com"
SUBJECT="Vantage Alert: $1"
MESSAGE="$2"
echo "$MESSAGE" | mail -s "$SUBJECT" "$ALERT_EMAIL"
EOF
# Cron job for monitoring
echo "*/5 * * * * /opt/vantage/health-check.sh || /opt/vantage/alert.sh 'Health Check Failed' 'Vantage health check failed at $(date)'" | crontab -
Backup and Recovery
Automated Backup Strategy
# Enhanced backup script
cat > /opt/vantage/backup-full.sh << 'EOF'
#!/bin/bash
BACKUP_DIR="/opt/vantage/backups"
DATE=$(date +%Y%m%d_%H%M%S)
RETENTION_DAYS=30
# Create backup directory
mkdir -p "$BACKUP_DIR"
# Database backup
echo "Starting database backup..."
pg_dump -h localhost -U vantage vantage | gzip > "$BACKUP_DIR/db_$DATE.sql.gz"
# Application data backup
echo "Starting application data backup..."
tar -czf "$BACKUP_DIR/data_$DATE.tar.gz" \
/opt/vantage/app/.env \
/opt/vantage/data \
/etc/nginx/sites-available/vantage \
/etc/ssl/certs/vantage.crt \
/etc/ssl/private/vantage.key
# Configuration backup
echo "Starting configuration backup..."
tar -czf "$BACKUP_DIR/config_$DATE.tar.gz" \
/opt/vantage/app \
/etc/systemd/system/vantage.service
# Cleanup old backups
echo "Cleaning up old backups..."
find "$BACKUP_DIR" -name "*.gz" -mtime +$RETENTION_DAYS -delete
echo "Backup completed successfully: $DATE"
EOF
chmod +x /opt/vantage/backup-full.sh
# Schedule daily backups
echo "0 2 * * * /opt/vantage/backup-full.sh >> /opt/vantage/logs/backup.log 2>&1" | sudo crontab -u vantage -
Disaster Recovery Procedures
# Recovery script template
cat > /opt/vantage/recover.sh << 'EOF'
#!/bin/bash
BACKUP_FILE="$1"
RECOVERY_DATE=$(date +%Y%m%d_%H%M%S)
if [ -z "$BACKUP_FILE" ]; then
echo "Usage: $0 <backup_file>"
exit 1
fi
echo "Starting recovery from $BACKUP_FILE at $RECOVERY_DATE"
# Stop services
sudo systemctl stop vantage nginx
# Restore database
echo "Restoring database..."
gunzip -c "$BACKUP_FILE" | sudo -u postgres psql vantage
# Restart services
sudo systemctl start postgresql redis nginx vantage
# Verify recovery
sleep 10
curl -s https://your-domain.com/api/health/
echo "Recovery completed"
EOF
chmod +x /opt/vantage/recover.sh
Troubleshooting
Common Issues and Solutions
High CPU Usage
# Identify CPU-intensive processes
top -p $(pgrep -d',' -f vantage)
# Check for inefficient database queries
sudo -u postgres psql vantage -c "SELECT query, state, query_start FROM pg_stat_activity WHERE state = 'active';"
# Optimize if needed
python manage.py optimize_queries
Memory Issues
# Check memory usage by process
ps aux --sort=-%mem | head
# Monitor for memory leaks
free -m
cat /proc/meminfo
# Restart services if needed
sudo systemctl restart vantage
Database Performance Issues
# Check database locks
sudo -u postgres psql vantage -c "SELECT pid, usename, query, state FROM pg_stat_activity WHERE state != 'idle';"
# Analyze slow queries
sudo -u postgres psql vantage -c "SELECT query, mean_time, calls FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10;"
# Run maintenance
sudo -u postgres psql vantage -c "VACUUM ANALYZE;"
Log Analysis
# Search for errors
grep -i error /opt/vantage/logs/vantage.log | tail -20
# Monitor real-time logs
tail -f /opt/vantage/logs/vantage.log | grep -i "error\|warning\|critical"
# Analyze access patterns
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head -10
Best Practices
Security Maintenance
- Regular Updates: Keep all components updated with security patches
- Access Reviews: Regularly review user access and permissions
- Audit Logs: Monitor and analyze security events
- Vulnerability Scanning: Regular security scans and assessments
Performance Optimization
- Monitor Trends: Track performance metrics over time
- Capacity Planning: Plan for growth and scaling needs
- Query Optimization: Regularly analyze and optimize database queries
- Caching Strategy: Implement appropriate caching mechanisms
Operational Excellence
- Documentation: Maintain up-to-date operational documentation
- Change Management: Follow formal change control processes
- Incident Response: Have procedures for handling incidents
- Recovery Testing: Regularly test backup and recovery procedures
Professional Services
For complex management needs, consider professional services:
- Managed Services: Fully managed hosting and maintenance
- Consulting: Optimization and architecture consulting
- Support: 24/7 technical support and monitoring
- Training: Advanced administrator training programs
For immediate assistance or professional services, contact our support team through your enterprise portal or sales representative.