This page is a repository of videos, pictures, scripts, and data to complement my PhD dissertation.
Scripts are only provided as a proof of concept and they are intended solely for demonstration of the idea.
Useful Scripts
To perform some of the daily activities and frequent analysis, I created a toolbox of tiny (but handy) Python and Bash scripts.
Here you can take a look at them. Feel free to use them as you wish.
Python/Bash script
Repetitive patterns of system logs
Looking at the system logs in 1-hour time windows reveals a strong resemblens.
Frame-by-frame
Identifiable symptoms before failures
In many cases, there are visible symptoms before the occurence of node failures.
More examples
Extracting syslog patterns
System logs are automatically clustered according to their similarity, and the relevant syslog patterns are extracted.
Try it
Generating Regular expression
It is also possible to generate regular expressions that represent syslog patterns with minimum errors.
Version 1: Semantic based
Version 2: fully automated
The golden interval
This interactive chart demonstrates a real example of golden intervals on Taurus.
Demo
Turning Privacy Constraints into Syslog Analysis Advantage
Using an irreversible encoding method (hashing), the users privacy can be guaranteed, the log size is reduced and the data quality remains adequate for further analysis.
Read more
Propagation of failures
The failures in one component may propagate to other parts of the system. The video below shows an example of such propagation (starting at 00:30) in Taurus.
More examples
To play the videos, use the Gource command followed by name of the file.
Word's statistics
These statistics show the length of ALL words used in syslog messages on Taurus during 2017.
Raw data
Daily Statistics of Syslog and Deamons in 2017
These statistics show the appearance frequency of each daemon and system log on Taurus during 2017.
Raw data
Pattern recognition in anonymized system logs
To detect the normal patterns in anonymized system logs, the syslog entries collected in a day has been assumed as a long sentence.
Then the re-appearances of each substring have been analyzed.
Demo
SAM files
Among other methods, the DNA-based vizualising tools have been also tested. Here you can find an example of a SAM file that is generated from Taurus system logs.
SAM file
Calculating the SG parameter
The following Python script demonstrates an easy method to calculate the SG parameter.
Python script
Job failures
There was no meaningful correlation among job failures and node failures.
This graph shows the status of successful/jobs on Taurus in 2017.
PDF