Competent crew

After some delays due to cold temperatures, I could finally complete my Competent Crew certification with MOSS sailing school, based in Medemblik in the Netherlands. In 5 days, I could learn the basics of sailing, enjoy great weather conditions and visit some of the many marinas of the Ijsselmeer and Waddenzee : Medemblik, Stavoren, Hindeloopen, Harlingen, Texel and Den Helder. A very instructive week, I am really glad I have reached my objectives !

Tribute to Dun Laoghaire

More than a year ago, I started sailing in Dun Laoghaire, at the Irish National Sailing School, on a 1720 keelboat. Besides discovering the harbour and its surroundings from the Dublin Bay, which was great fun, I also took it seriously to study and learn about seamanship.

Python with Datacamp

My free trial subscription on Datacamp is terminated. I could learn a little bit about Python applied to data analysis and obtain this certification:

There were 11 lessons which gave an overview of variables, their types, lists, functions, methods, packages, how to use numpy and its array objects for easy data computation and analysis.

I then started the next course (intermediate level) and could follow 3 more lessons about matplotlib (data visualisation, basic plots such as scatter plots or histograms, and chart customisation). But I had to stop, for my free trial subscription was terminated and the next lessons required a premium account.

It was a pleasure and quite entertaining to learn on Datacamp. The lessons are not boring, with a little bit of theory (a video of around 5min), followed by exercises and live programming with an integrated, web-based interface, with automated checks. Points are granted with an official certification when a full course is completed. However, a premium account at around 30$/month was needed which is not cheap. There are probably more details to be learned in more comprehensive reference books and manuals, but this web-based training offers the same advantages as class or instructor-led training with the advantage that you can follow it at your own pace.

Python data structures

Yesterday, I completed another course of a specialisation cycle dedicated to Python for data analytics.

Python Data Structures by University of Michigan on Coursera. Certificate earned on October 1, 2016

Python Data Structures by University of Michigan on Coursera. Certificate earned on October 1, 2016

While the first course was dealing with the very basics of variables, conditional loops, iterations and functions, this course further builds on data structures such as strings, files, lists, dictionaries and tuples. In general, there are multiple ways to perform a task on data, but only few of them are simple and smart (“pythonic”). Selecting the right data structure is of utmost importance. The assimilation of Python idioms necessitates a little bit of time, but it is fundamental step to build on, and allows for very short and efficient code that can perform complex tasks.

Getting started with Python

This week, I have completed the first course of a specialisation cycle dedicated to Python for data analytics.

Programming for Everybody (Getting Started with Python) by University of Michigan on Coursera. Certificate earned on August 25, 2016

Programming for Everybody (Getting Started with Python) by University of Michigan on Coursera. Certificate earned on August 25, 2016

Throughout my career of process engineer, I have been constantly facing issues that had to be investigated and understood, most often using the data which was available. Unfortunately, my experience is that only a very tiny amount is effectively uploaded in organised, well-structured databases ready to be queried, and raw data is in general not user-friendly at all and almost unreadable. Processes (especially the ones linked to metrology operations) generate a great deal of raw data, stored in multiple ways and formats. Hence, raw data treatment and preparation is a necessary step of data analysis and inference, but a tedious and low added-value one, unless one comes up with the right tools.

The first one that comes in my mind is the traditional Excel sheet, where data can be imported, filtered and analysed. It is very popular, widespread and versatile. Excel comes with a scripting language, VBA (Visual Basic for Applications), where macro can be designed so as to automate tasks. It is a very decent tool, which should be part of the data analytics survival kit, when nothing else is available. In fact, for manipulating huge datasets, a much better choice is JMP, a licensed statistical discovery software offered by SAS that offers an intuitive, Excel-like interface. JMP provides unique features to transform and combine multiple datasets with hundreds of thousand of lines in a blink of an eye, namely summarising, concatenating, splitting, stacking and subsetting… A bunch of advanced modules allows complex analysis, my favourite one being the profiler where multi-variable and multi-response trends can be immediately displayed with regression parameters associated to the underlying model, which is invaluable for experimental design. Repetitive tasks can be automated thanks to an integrated scripting language (namely JSL), with powerful macros able to build fragments of code.

While JMP is a really great piece of software for data manipulation and on-the-fly analysis, its scripting language lacks portability, is restricted to JMP environment (which is licensed), and basically its inputs are only datasets.

A programming language like Python can alleviate these shortcomings. It is a powerful high-level language, easy to learn, universal, open-source, free and portable. With Python, there is virtually no other limitation than hardware resources and programming skills. A very active community has been continuously designing advanced modules and libraries, allowing for high-productivity programming, with endless potential applications. For all these reasons, I consider Python as a smart choice for a natural extension of professional software specialised in data analytics, and my plan is to go through that full specialisation cycle.

Course on cryptography

Today, I successfully completed a course on cryptography.

Cryptography I by Stanford University on Coursera. Certificate earned on August 21, 2016

Cryptography I by Stanford University on Coursera. Certificate earned on August 21, 2016

Cryptography is the cornerstone of information security and modern communications, and I used it on a daily basis throughout my life and career, most often without being even aware of it ! Its applications are expanding at a pace never seen before with the advent of Internet technologies. While encrypting data is anything but new (ciphers existed way back in the Ancient times), these last few decades transformed cryptography from an art into a genuine science. Formal definitions and assumptions are now rigorously established, from which ciphers can be constructed with mathematically-proven security derived from algebra and number theory. Cryptography is the field of intense active research focused on bullet-proofing existing protocols and creating new ones for new applications.

The exponential increase of computing performance has driven many protocols to become obsolete. The Data Encryption Standard (DES) became notoriously unsecure in 1999 when its 56-bit key became vulnerable to brute-force attacks, and had to be replaced by the Advanced Encryption Standard (AES). Some other encryption schemes were poorly designed, because cryptography science was not as advanced as today, or because the designers just made mistakes. This was probably the case with Wireless Encryption Protocol (WEP), with its multiple weaknesses that are now given as a good case-study of what not to do for students. Besides design, implementation is equally important and can turn a provenly secure cipher into a totally unsecure protocol. And many examples exist in real life, like the padding oracle attack on authenticated encryption.

In practice, the best advice for a reliable encryption is to always use public, open-source and updated crypto-libraries from reliable and well-established providers. However, it is worth to keep in mind that the security of a cipher erodes over time, as computing performance and attacker skill both increase, which represents a real challenge for cryptographers. In fact, the right question for selecting an encryption scheme is not whether the cipher will be decrypted or not, but when, and if this amount of time is acceptable or not for the application. For a long-lifetime secret is more costly than a short-lifetime one, and is not always needed. And the answer to the afore mentioned question is only an estimation. The rise of a disruptive technology like quantum computing may completely wreak havoc in existing secret documents in a much shorter time than expected…