Pages

Getting Started (2)

Other Pages (2)

MLOps Course

This isn't the right course for me, but I've started, so I'm going to keep going for a bit.

Focused on Data Prep and building the prompt.

First 2 videos are very basic definitions and overview.

Gets started in the 3rd video:

  • Jupyter Notebook setup for Python
  • talking about sql, big query, and stack overflow public dataset

sql - use limit to restrict what returns.

Pandas is a must -- need to refresh.

What data in the warehouse is too large for memory on laptop?

data lineage: where did data come from?

File formats?

  • JSONL, JSON lines. Idea for small to medium datasets
  • TFRecord. for large datasets
  • Parquet. good for large and complex datasets

Versioning of datasets is important: prefix combined with data_time stamp works well.