I always had a hard time trying to understand the Big Bang Theory. The fact that I write all three words that compose its name in Camel notation shows that I'm unfamiliar with it at the degree of ignorance. I mean, come on: at the beginning there's an atom (or whatever particle the scientist are going to identify in a race for getting to the source that's far away from being even close to its finish line) and that atom is going to split in smaller parts for ever and ever, creating planets and stars and suns and universes and galaxies and mountains and seas and dinosaurs and Napoleon and Goethe and my son's two cats, Tang and Plum... Coome-oon!
Till a few days ago, when so much preoccupied by a mundane tasks, that of creating simulated data for a simple linear regression algorithm (some still call this a "simple task"), I stumbled over the terms of "procedural generation" that lit up the bulb in my mind: Eurika!
The full article is at https://en.wikipedia.org/wiki/Procedural_generation and I will start with its introductory sentence that to me nailed it down:
In computing, procedural generation is a method of creating data algorithmically as opposed to manually.
That's what I was looking for all along. Ok, I admit, due to my lack of formation in mathematics I wasn't able to articulate not the problem, but the ask. I even used the term "data tunnels" to be similar to the aeronautic tunnels where they test the new aircraft by submitting them to all kind of controlled stresses. In my case the challenge was different. I started to study some machine learning algorithms, and the first axiom common to all is that the more training data they have, the more accurate they get. Bingo! Give me one terabyte of business data from a giant company which is ready to throw away their privacy and expose the data reflecting their inner works, if possible for the last ten years. Once I have that, the rest is simple: I will tell them with a pretty high degree of confidence, backed up by a strong mathematical apparatus, which are the predictions for the next year. Simple, right?
You wish! What's simple is the idea of creating simulated data, one terabyte, two terabytes, the computer storage is the sky and the sky is the limit. And how to do that? Through procedural generation. Apparently they used this approach for quite a while in computer games industry, to craete environments, and textures, and even characters... Why not data with pre-defined patterns? Business data that shows a roller-coaster in the company's performance based on crude oil prices, and the number of hurricanes happening in Gulf of Mexico, and the technological shift from bif appliances manufacturing to small communication devices...
I can do all that and more with procedural generation. My fictitious company will have fictitious data. This is my game... That has no animation, and no characters, and no action, of course, but has something to give: a training and test area for machine learning algorithms.
This is my sof big bang theory. Give me a function and I will create a universe of business data with seeded patterns. And I'm not even God.