Data Science

Auto Added by WPeMatico

How to Design Python-First Interactive Dashboards with Prefab Reactive UI Components and Static HTML Export

In this tutorial, we build a Prefab application that demonstrates how to create interactive dashboards entirely in Python. We use Prefab’s component-based Python interface to design a polished operations dashboard with reactive state, charts, tables, filters, forms, tabs, alerts, metrics, and client-side actions. We generate realistic pipeline monitoring data, connect it to live UI controls, […]

How to Design Python-First Interactive Dashboards with Prefab Reactive UI Components and Static HTML Export Read More »

✅

How to Build a Forecasting Pipeline with TimeCopilot Using Foundation Models and Automated Anomaly Detection

In this tutorial, we build an end-to-end forecasting workflow with TimeCopilot. We prepare a panel dataset containing real airline passenger data and a synthetic seasonal series with injected anomalies, then evaluate a diverse collection of statistical, foundation, and optional GPU-based forecasting models. We use rolling cross-validation and multiple error metrics to identify the strongest model,

How to Build a Forecasting Pipeline with TimeCopilot Using Foundation Models and Automated Anomaly Detection Read More »

A Coding Hands-On on FineWeb for Streaming, Filtering, Deduplication, Tokenization, and Large-Scale Web Corpus Analytics

In this tutorial, we explore the FineWeb dataset through an advanced hands-on workflow. We stream a manageable sample of the dataset without downloading the full multi-terabyte corpus, inspect its schema and metadata, and analyze key fields such as URL, language, language score, and token count. We also reproduce simplified versions of FineWeb’s quality-filtering pipeline, apply

A Coding Hands-On on FineWeb for Streaming, Filtering, Deduplication, Tokenization, and Large-Scale Web Corpus Analytics Read More »

✅

Building a Code Dataset Pipeline from NVIDIA Nemotron-Pretraining-Code-v3 Metadata with Streaming, Pandas, and tiktoken

In this tutorial, we work with NVIDIA’s Nemotron-Pretraining-Code-v3 dataset as a large-scale metadata index for code pretraining research. Instead of downloading the full multi-gigabyte dataset, we stream it, inspect its schema, and build a manageable sample for analysis. We then explore the dataset by studying languages, file extensions, repository frequency, and directory depth, which helps

Building a Code Dataset Pipeline from NVIDIA Nemotron-Pretraining-Code-v3 Metadata with Streaming, Pandas, and tiktoken Read More »

✅

A Coding Guide to Implement a pgvector-Powered Semantic, Hybrid, Sparse, and Quantized Vector Search System

In this tutorial, we build a complete pgvector playground inside Google Colab and explore how PostgreSQL can work as a powerful vector database for modern AI applications. We start by installing PostgreSQL, compiling the pgvector extension, connecting through Psycopg, and registering vector types for smooth Python integration. Then, we create embeddings with SentenceTransformers, store them

A Coding Guide to Implement a pgvector-Powered Semantic, Hybrid, Sparse, and Quantized Vector Search System Read More »

✅

How to Build Knowledge Graph Generation Pipelines From Text With kg-gen, NetworkX Analytics, and Interactive Visualizations

In this tutorial, we will generate knowledge graphs from plain text, conversations, and multiple source documents using kg-gen. We start by setting up the required dependencies and configuring an LLM through LiteLLM, then we extract entities, predicates, and relationships from simple text. As we move forward, we work with longer passages using chunking and clustering,

How to Build Knowledge Graph Generation Pipelines From Text With kg-gen, NetworkX Analytics, and Interactive Visualizations Read More »

↔

A Coding Guide Implementing SHAP Explainability Workflows with Explainer Comparisons, Maskers, Interactions, Drift, and Black-Box Models

In this tutorial, we implement SHAP workflows as a practical framework for interpreting machine learning models beyond basic feature-importance plots. We start by training tree-based models and then compare different SHAP explainers, including Tree, Exact, Permutation, and Kernel methods, to understand how accuracy and runtime change across model-aware and model-agnostic approaches. We also examine how

A Coding Guide Implementing SHAP Explainability Workflows with Explainer Comparisons, Maskers, Interactions, Drift, and Black-Box Models Read More »

A Coding Implementation to Master GPU Computing with CuPy, Custom CUDA Kernels, Streams, Sparse Matrices, and Profiling

In this tutorial, we delve into CuPy as a powerful GPU-accelerated alternative to NumPy for high-performance numerical computing in Python. We start by inspecting the available CUDA device, checking the CuPy version, runtime details, GPU memory, and compute capability so that we understand the hardware environment before running heavy computations. Then, we compare NumPy and

A Coding Implementation to Master GPU Computing with CuPy, Custom CUDA Kernels, Streams, Sparse Matrices, and Profiling Read More »

A Coding Implementation to Portfolio Optimization with skfolio for Building Testing, Tuning, and Comparing Modern Investment Strategies

In this tutorial, we explore skfolio, a scikit-learn compatible portfolio optimization library that helps us build, compare, and evaluate different investment strategies in a structured Python workflow. We start by loading S&P 500 price data, converting it into returns, and creating a time-based train-test split suitable for financial analysis. From there, we build simple baseline

A Coding Implementation to Portfolio Optimization with skfolio for Building Testing, Tuning, and Comparing Modern Investment Strategies Read More »