This is all the more so with the advent of vector embeddings. Semantic search engine (e.g., qmd) chunks documents by token count before embedding—even a massive single markdown file gets sliced into overlapping-ish token-window chunks, each with its own embedding vector. When you query, you get back the relevant section of a large doc, not the entire thing—a 50,000-word file and a 500-word file both get searched at the same granularity (albeit without any insightful metadata). I think humans should decide the chunk size (or understand the chunking logic), instead of arbitrarily getting cut-off by vector embedding program.

Related:


Next:

Related: