Difference between revisions of "HvABigDataVisualisation"
From PDP/Grid Wiki
Jump to navigationJump to searchLine 5: | Line 5: | ||
= About the data analytics cluster = | = About the data analytics cluster = | ||
+ | [[File:RTM-2D-busy-2007.jpg|200px|thumb|right|Jobs distribution across EGEE and WLCG]] | ||
But where does all that data go to? Based on the log files produced by the data servers, in 2015 HvA students experts from the technical informatics group set up a search engine based on big data analytics techniques: leveraging ElasticSearch, Logstash, and with a basic analysis front-end using Kibana and custom (Java-coded) queries. This �ELK� stack is hosted next to the data processing facility and contains last month� worth of log file data in a distributed search cluster: four elasticsearch servers, a data ingest and processing server, and a query gateway proxy. | But where does all that data go to? Based on the log files produced by the data servers, in 2015 HvA students experts from the technical informatics group set up a search engine based on big data analytics techniques: leveraging ElasticSearch, Logstash, and with a basic analysis front-end using Kibana and custom (Java-coded) queries. This �ELK� stack is hosted next to the data processing facility and contains last month� worth of log file data in a distributed search cluster: four elasticsearch servers, a data ingest and processing server, and a query gateway proxy. | ||
− | |||
But there�s much more to analysis than just collecting the data. To understand what is happening in the data network, notice anomalies in the system and detect and understand fault conditions needs visualization: a picture says more than a thousand words (which means this text is much too long already!), and keeping track of data flows in the LHC computing grid with over 300 participant sites is not done by reading long lists. Also Nikhef likes to explain what we do, why we do it, and how: making both sub-atomic physics and the experimental techniques used understandable not only by experts, but also by the general public. It is essential to visualize ongoing activity and its meaning in a way to is both appealing and conveys the �gist� of what LHC data traffic is about. | But there�s much more to analysis than just collecting the data. To understand what is happening in the data network, notice anomalies in the system and detect and understand fault conditions needs visualization: a picture says more than a thousand words (which means this text is much too long already!), and keeping track of data flows in the LHC computing grid with over 300 participant sites is not done by reading long lists. Also Nikhef likes to explain what we do, why we do it, and how: making both sub-atomic physics and the experimental techniques used understandable not only by experts, but also by the general public. It is essential to visualize ongoing activity and its meaning in a way to is both appealing and conveys the �gist� of what LHC data traffic is about. | ||