Hoping for the end of content marketing

September 6, 2022

Using the internet to retrieve information has become increasingly unpleasant with various business and other interests trying to influence that process - oftentimes by a flood of content of questionable quality. With recent advances of large language model driven text generation starting to work reasonably well, and an onslaught of startups trying to capitalize on it, one has to be worried about the quality of everything going further downhill.

My somewhat contrarian hope is that this phase will only last for a short amount of time and will hopefully take all of content marketing down with it. Reasoning for that hope is twofold: while search engines do have an incentive to somewhat distort the information retrieval process in favor of their paying advertisers there has to be some baseline quality - otherwise users will stop bothering, and usage and with it profits will go down (I believe amazon after years of over-optimizing for profits over quality is already feeling that with product search or soon will). And developing automated ways of evaluating content information density will hopefully also take out most lengthy low quality content marketing pieces. The second piece is hoping that and abundance of generated low information density drivel will gradually train the wider public to ignore any long winding information devoid piece by its length and slow start alone.

Thinking about content density and usefulness from an information theoretic perspective brings up the following observation that could also be relevant to development of language models: there seems to be this interesting conflict, that for text to make sense and flow nicely you'd want the most obvious continuation as a next word, but the most dense signal or information is transported by the least expected continuation. My assumption is this has to be split between abstraction levels, where on a low word or sentence level one would tend towards obviousness whereas further away in paragraphs or beyond the non-obvious continuation is preferable to actually transmit information. Being able to more directly guide that higher abstraction level on language models seems very helpful.