I have around 7000 CSV files all containing similar coordinate data. I need to classify this data using unsupervised learning to prepare it to train a supervised algorithm. can I do this by clustering on a file-by-file basis or do I need to reduce the data into one sheet by summarizing each file somehow?
| Position [mm] | Density[kg/m³] |
|---|---|
| -0.924789 | 164.548694 |
| -0.914737 | 164.752771 |
| -0.904685 | 164.969708 |
| -0.894633 | 165.199505 |
| -0.884581 | 165.421192 |
| -0.874529 | 165.63477 |
| -0.864477 | 165.840239 |
| -0.854425 | 166.054321 |
| -0.844373 | 166.277018 |
| -0.834321 | 166.508329 |
| -0.824269 | 166.748254 |
| -0.814217 | 166.996794 |
| -0.804165 | 167.253948 |
| -0.794113 | 167.519716 |
| -0.784061 | 167.794099 |
| -0.774009 | 168.077095 |
| -0.763956 | 168.425204 |
| -0.753904 | 168.838424 |
| -0.743852 | 169.316756 |
| -0.7338 | 169.864012 |
| -0.723748 | 170.480193 |
| -0.713696 | 171.165298 |
| -0.703644 | 171.919328 |
| -0.693592 | 172.742282 |
| -0.68354 | 173.63416 |
| -0.673488 | 174.594963 |
| -0.663436 | 175.62469 |
| -0.653384 | 176.723342 |
| -0.643332 | 177.870035 |
| -0.63328 | 179.057714 |
| -0.623228 | 180.28638 |
| -0.613176 | 181.560452 |
etc.