Are there any small data sets publicly available for running benchmarks on the software such as [email protected]?

Little background on the question, I come from an enterprise bigdata background.. From expanding open source tools to optimizating performance (in the scale of running PBs of data/TBs RAM).

That being said, I think there are many other software engineers/architects out there that get just as hyped over physics... So it would be interesting to see how much the software can be optimized / improved to get the most out of all the distributed computing..

Tl;Dr, is there any sample data set to enable software engineers to poke/expand these tools? LHC software is 2+ years outdated at this point, which in distributed computing is huge!! (before Spark/in memory processing , LXC/virtualization).