Sample
Last updated
Last updated
This task generates a new table with a random sample of the source table. This feature is widely used and useful when creating predictive models, as it is computationally very heavy to apply, for example, Neural Networks to a large set of data. At the same time it is inefficient, because, with less data, it is possible to run more techniques, with more parameterizations and therefore, find a better model. Furthermore, a good sample is enough to understand the universe under study.
Basically there are two alternatives:
Choose a percentage of rows from the source table.
Choose a specific number of rows that will be in the generated table.
Every time this task is run again, a new random set of data will be generated.
In many cases, the desire is to generate a random base and work with it for a longer period. If so, generate the table with random data and immediately delete the Sample task , not allowing the random table to be generated again.
All columns from the source table will be present in the random table. Only the number of lines will be smaller.