At Dalia, our amazing data science team came together to form a new Meetup Group to place data analytics and machine learning in context. With demonstrations of how to address challenges in short, bite-sized workshops, the meetup is welcoming everyone from absolute beginners to seasoned data scientists. Our first meetup was on June 18th and we had a great time going over some introductory topics with the participants.

Topics we covered

I had the pleasure as a Senior Data Engineer to explain the basics of data science and to provide a guided walkthrough of data analysis with Python. Next, Irati R. Saez de Urabain, Senior Data Scientist, explained what Dalia’s data science team does on a daily basis, provided some general information on our larger projects and showed how we put machine learning algorithms into production. We saved some time for a Q&A, snacks and general networking at the end of the session.

Walking through the data process with Kostas

We started the walkthrough by helping the group to get ready:

1. The group got started by downloading Anaconda, an open source python distribution that includes all the basic libraries needed for data processing and data analysis.

2. Next, we set up a Jupyter Notebook by running the following in our terminals:

conda create -n dalia-meetup python=3.6

source activate dalia-meetup (conda activate dalia-meetup)

conda install anaconda

jupyter notebook

Then we started a new Python 3 notebook in the directory:

3. Then we downloaded two datasets from Kaggle, a great resource for free datasets and data science exercises and competitions. We used the Superhero dataset and the International Football Results from 1872 to 2017.

4. We discussed how to read, clean and transform data using our downloaded datasets.

5. Finally we explored tools like groupby and shape to get a better feel for the data before moving on to graphing line plots, scatter plots and histograms.

If you want to follow along, you can complete the first three steps and then download our notebook and files from Github to see the code used in our data analysis.

How we use data science at Dalia Research with Irati

Irati explained that the majority of the work of the data science team at Dalia can be split into two main groups: the first being Ad Hoc analyses to support the needs of other departments. These Ad Hoc analyses are usually business related, are done in Python or R, and are produced in the form of analytical reports.

The second type of work that Dalia’s DS team focuses on are pure data science projects that involve the implementation of algorithms that support different parts of our business: Sometimes this means building a prototype, and other times it means implementing algorithms that go to production. Some of the data science team’s current projects include: dealing with fraud, MRP (an estimation method to better predict user responses), working on algorithms that help us determine the trustworthiness of users, working to improve the algorithm that matches users to the appropriate survey, and data accessibility and visualization for business teams.

Irati continued the presentation by introducing the way the team uses machine learning algorithms to improve our survey platform, and how machine learning works in production. Check out the the full presentation for all the details!

Our upcoming meetups

In our upcoming meetings we’ll focus on business-related challenges that Dalia’s DS team tackle every day, as well as detailed explanations of our solutions. All levels are welcome to attend, but those with a general to more developed knowledge of data science may find it easier to follow than absolute beginners.

How to join us at our next meetup

You can join our data science meetup by signing up here. Our next meetup will be on July 16th, but we’ll send you a reminder a few weeks before it starts! We look forward to seeing you there 🙂