CERN Releases Large Hadron Collider Data To Public: 300 Terabytes Of Experimental Data Now Available For Download

CERN has released experimental data obtained from the Large Hadron Collider (LHC). Over 300 TB (terabytes) of data from the LHC is now open for public access.

The European Organization for Nuclear Research (CERN) made a staggering amount of raw data, obtained from experiments conducted in the LHC, available to anyone with a high speed internet access. About 300 TB of data can be downloaded and studied by particle physics enthusiasts. The researchers hope, by dropping the data onto the internet, they could easily extend the longevity of the same, and ensure the data remains accessible to people who have interest in the LHC or simply want to know about it. It is in the interest of preserving data from the countless particle collisions the LHC has so far performed, the CMS Collaboration at CERN thought of releasing over 300 TB of experiment data to the public.

A third of the data, or roughly 100 TB, is from "proton collisions at 7 TeV, making up half the data collected at the LHC by the CMS detector in 2011." Incidentally, this isn't the first time CERN has released datasets. In 2014, the European agency has dumped similar experimental data obtained from the LHC. Back then, about 17 TB of LHC data, covering experiments conducted in 2010, was published, reported The Verge.

How to access the 300 TB worth of data? Not everyone can or will be interested in going through such a vast amount of raw data. However, CERN scientists hope anyone with an avid interest in the LHC or particle physics, notably students and others from the scientific community, would jump at the opportunity. The data is primarily in the raw form, and many may not even understand it. But those who do could gain interesting and cutting-edge insights into the nature of the universe.

CERN has released the datasets in three parts. The first part consists of raw data, which hasn't been processed at all. This is the data that CERN's own scientists have been using. This data routinely turns up fascinating results, if very complex. The second and third parts consist of data that's "derived," or processed, and is much simpler to work with. The scientists hope high school science enthusiasts will respond well to such datasets.

Essentially, the first dataset is called "primary datasets," and used by CERN researchers, and the other lightweight one is called "derived datasets," which is intended to be accessed by a wider audience. The second "derived" dataset, "require a lot less computing power [to process] and can be readily analyzed by university or high-school students," noted a press release from CERN. To make sense of the data, CERN has also released a software, called CernVM. Just like the data, the software is free to download and use. It is basically CERN's in-house data modeling tool and offers a much more visual and constructive view of the data generated during the collisions of the proton beams inside the Large Hadron Collider.

Anyone who is interested at the Hadron Collider data, should head over to the CERN Open Data Portal. The data is completely free to access. Interestingly, The data covers roughly half the experiments run by the LHC's CMS detector, and that too in the year 2011 alone. A press release from CERN explained that the data also includes about 2.5 inverse femtobarns of data — around 250 trillion particle collisions.

Besides helping budding scientists, CERN researchers also hope their findings could be corroborated or validated by external agencies. A few have extended their expectations, adding some brilliant minds could end up taking their research in ways that they didn't initially anticipate, reported Gizmodo.

Judging by the release of data, CERN could keep dumping even more datasets from the Large Hadron Collider once its researchers are through going through it.

[Photo by Fabrice Coffrini/Getty Images]