MIKE ANDERSON

MACHINE LEARNING SPECIALIST

A dynamic and highly skilled Machine Learning Engineer and Data Scientist with over 5 years of experience, Mike was quickly promoted three times at National Guardian Life due to exceptional technical expertise and business acumen. He has a proven track record in machine learning, data analysis, and AI integration, currently pursuing an M.S. in Artificial Intelligence from Johns Hopkins University. He’s a top 10% performer in machine learning competitions and a founding member of NGL's AI Committee. Mike is proficient in Python, SQL, and AWS tools, with a strong background in quantitative analysis, data mining, and data visualization. He has demonstrated leadership in cross-functional teams, delivering impactful, data-driven insights.

SKILLSET

    • Supervised learning: regression (linear, logistic), classification, and model tuning

    • Unsupervised learning: clustering (KMeans, GMM, DBSCAN), dimensionality reduction (PCA, t-SNE, UMAP)

    • Ensemble methods: Random Forest, Gradient Boosting, XGBoost, CatBoost, LightGBM

    • Time series modeling: ARIMA, Prophet, rolling stats, and seasonality decomposition

    • Evaluation metrics: ROC-AUC, F1 score, confusion matrices, precision/recall tradeoffs

    • Model selection: cross-validation, grid/randomized search, hyperparameter tuning

    • Feature engineering: transformation pipelines, interaction terms, encoding, scaling

    • Explainability: SHAP, permutation importance, partial dependence plots

    • Fully connected (dense) networks: feedforward architectures, dropout, batch normalization

    • Convolutional Neural Networks (CNNs): for image recognition, object detection, edge cases

    • Recurrent Neural Networks (RNNs): LSTM, GRU for sequential tasks like time series and text

    • Embeddings: learned representations via Word2Vec, Doc2Vec, and TensorFlow/Keras Embedding layers

    • Optimization: backpropagation, gradient descent variants (Adam, RMSProp, SGD)

    • Overfitting strategies: regularization, dropout, data augmentation

    • Frameworks: PyTorch, TensorFlow, Keras — including building from scratch for deeper understanding

    • Generative AI: prompt engineering, fine-tuning large language models (LLMs), text summarization

    • NLP: transformers (BERT, RoBERTa, T5), sentiment analysis, named entity recognition, embeddings

    • Search + retrieval: vector search with FAISS, embedding-based similarity, RAG (retrieval augmented generation)

    • Reinforcement learning: policy gradients, Q-learning, basic agent-environment frameworks

    • Ethics and alignment: fairness, bias detection, interpretability, responsible deployment

    • AI-driven systems: end-to-end ML/AI architectures that solve real business problems

    • Pipeline orchestration: training, validation, deployment, monitoring

    • Drift detection: custom solutions for feature and prediction distribution shift

    • Model versioning: MLflow, DVC, model registries, reproducibility

    • Automated retraining: scheduled jobs, model decay triggers, batch/online updates

    • Monitoring: prediction confidence tracking, input schema validation, latency thresholds

    • Integration with business systems: embedding models into APIs, web tools, or BI layers

    • AWS: S3, EC2, SageMaker, Lambda, Glue, Athena, Redshift, CloudWatch

    • Azure: Azure ML, Blob Storage, Synapse, Azure DevOps, Key Vault

    • GCP (familiarity): BigQuery, Vertex AI, GCS

    • Infrastructure-as-Code: Terraform for provisioning repeatable, secure cloud environments

    • Cost optimization: data lifecycle policies, autoscaling, spot instances

    • GitHub Actions: secure workflows for testing, packaging, deployment

    • DevSecOps: gated merges, environment secrets, PR labeling, unit tests

    • Package automation: Poetry-based versioning, wheel builds, wheelhouse packaging

    • Multi-environment deployment: staging → UAT → prod rollouts, rollback plans

    • Notebook → production transitions: converting notebooks to robust Python modules and services

    • ETL pipelines: ingesting from APIs, databases, flat files, streaming sources

    • Data wrangling: pandas, PySpark, SQL joins, window functions, cleaning pipelines

    • Storage design: normalized + denormalized schemas, Delta Lake, data lakes vs. warehouses

    • Job scheduling: Airflow, AWS Step Functions, cron-based tasks

    • Graph databases: Neo4j design for knowledge graphs and entity relationship mapping

    • Data quality checks: validation, profiling, anomaly detection models for pipeline health

    • Business Intelligence: Tableau, Power BI, AWS QuickSight

    • Python viz: Matplotlib, Seaborn, Plotly, Dash, Streamlit (custom tools + prototypes)

    • Storytelling: crafting narratives through data with audience-specific framing

    • Interactive dashboards: filters, drilldowns, parameterized views for stakeholder control

    • Model explainability UI: SHAP plots, feature importance summaries, prediction explanations

    • Real-time insights: dashboards connected to streaming/near-real-time data pipelines