Portfolio

Academic and artistic projects founded in solid coding in R, Stata, Python, and php.

 

Data Scraping

From online sources data is brought home to do our bidding.

Graph Eulerisation

Graph eulerisation

A programmatical approach to the Chineese postman problem.

 

Birthday paradox

Montecarlo testing of a classic problem in combinatorics.

 

From networks to art

Unusual applications of network visualisation tools.

 

Networks that organised competition

Doctoral thesis in Economic History - Umeå University.

Eastwards

Österled / Eastwards

My podcast - a backpacker's literary journey through time and space.

Data scraping

I have taught data scraping at companies, study-circles and university workshops. Web scraping is the automated process of pulling ordered formats of data from publicly available sources where data-retrieval would be time-consuming and monotonous for humans to do manually.

I've also mined to create comprehensive databases of political speeches, erotic short-stories, Swedish schools, Swedish house-prises, EU-institution documents, American patents and more. If there is publicly available data, their retrieval can be automated.

Scraped Steam traffic-speeds to Belgium 2016-2019

Result from daily data scraping of the online gaming platform Steam . The scraper is written in PHP and deployed on the cloud for regular capture of live data.

I also scrape more static sources in Python using simple xpaths in requested dom-trees or more properly using Beautiful Soup. If data is published, there is always a way to get it, even if it includes time-lags, VPN-rotations or other black magic. Below are the (anonymised for GDPR-reasons) results from the runner's event 20k in Brussels.

Mean finishing times in 20k in Brussels by sex

Running times have been scraped from the event's web-page, cleaned and coded in order to make some multivariate analyses or, as here, basic descriptives of men's and women's finnishing times.

Check out the relevant scripts on my git.

The 20k of Brussels is a yearly runners' event in Brussels where finishing times are neatly published together with some auxiliary variables that invites to multivariate comparasons between you and other runners. :) This data from 2021 has been anonymised.

Check out my git repositories for some data-mining tutorials as part of workshops that I have given in data-mining.

Networks that organised competion

My doctoral thesis (published here) in 2019 is a study of emerging market structures and corporate strategies related to different kinds of interfirm corporate relations and resource sharing. It is a historic case study of 75 years of property insurance and proposes a new history of that industry in Sweden.

The study proposes a network perspective on the organisation of competition and collaboration. It finds that networks lowered firms’ cost threshold for underwriting diversification, causing well-connected firms to expand into new markets more easily. An essential resource to underwriters was information, and information exchange motivated several interfirm rapprochements. The driving forces for the organisational shift towards increased networking were, however, complex, and included both socioeconomic and strategic factors.

Market shares of leading Swedish property underwriters 1875-1950

Bump chart over leading Swedish property underwriters and their market-share by insurance premiums.

Through networks of mutual resource sharing, the consolidation that occurs in the industry after 1950 was preceded by a long historical process in which firms who would later merge developed measurably clustered network structures as early as in the 1910s. In the 1920s the networks already contributed to a high, but partly hidden, market concentration. Networks thereby conditioned the underwriting operation of individual firms as well as the structural evolution of the Swedish insurance market as a whole.

The birthday paradox

The birthday paradox refers to the (often intuitively underestimated) s-shaped relation between group size and the probability that any two personsof a group shares the same birthday. As group size increases, the likelyhoodthat members share the same birthday increases also. In groups larger thanonly 22 people, it is more likely than not that at least two people sharethe same birthday - a paradox in that most people would intuitively think that a much larger group would be needed in order for a within-group same birthday to be posisble.

The probabilities of a same birthday (date only) occurring within a group can be verified using both old-school probabilistic calculations andrandomized simulations of groups of different sizes. This graph illustrates the increasing probability of same birthdays starting with 1/365 for a group of two.

Probability of same birthday by group size and method of calculation

The probabalistic and iterative solutions show well-converging results already at 100 iterations per group-size. R is good at these kind of operations and even 10'000 iterations (700'000) simulations, are very fast.

The reason why the breaking poont in the birthday paradox is unexpectadlylow is based in the increased number of pairwise possibilities of a samebirthday (which can be any of 365 days of the year) may occur. In any groupof [n] people, the number of unique birthday tests is expressed by:n(n-1) / 2

The conditions of the above example is simplified so that a year always contains 365 possible birthdays. No kinship, seasonal variations, or other biases are asumed to affect the independence of each persons' birthday or the equal likelihood of any of the 365 days of a year to bepossible birthdays.

Check out my git repositories a pedagogical script in R.

Eastwards / Österled

Eastwards is an economic-historic backpacker's podcast journeying through time and space in the Russian-speaking world. The ongoing low-priority project has pushed 30 around half-hour episodes in Swedish using recorded material from around one year of backpacking in the countries that once were the Soviet Union.

Eastwards

Photo: Josef Lilljegren © 2016

Historical or literary characters accompany each leg of my journey as I philosophise around themes of Russian history, literature and economy over the past 200 years. Given the ongoing war against Ukraine, I currently contemplating a new angle to the podcast.

All incoming donations now go to the Ukrainian war effort and to my friends in Ukraine who fight for their very existance.

The podcast lives on iTunes and Spotify. You can also listen and explore the project on the Österled web-page. Cлaвa укрaїнi!

Chinese postman problem

In network analysis, a graph contains eulerian cycles if one can visit all nodes without travelling any of its edges more than once. The problem, commonly known as the Chinese postman problem or the Seven Bridges of Königsberg was solved by mathematical father-figure Leonard Euler in the 18th century as an answer to the question if one could pass across neighbourhoods of Prussian Königsberg without crossing any of its bridges more than once. The answer was no. But what is the minimum number of bridges one would have had to construct to enable a Eulerian cycle?

Old bridge Euler's new bridge

Carl Hierholzer (1873) had explained how eulerian cycles exist for graphs that are 1) connected, and 2) contain only vertices with even degrees. In a pedagogical script answering a stack overflow question, I wrote this function in R which takes any graph and attempts to make minimal manipulations of it to achieve a structure where a eulerian path is possible.

Visit my stackexchange answer or download the scripts written in R:

Network visualisations

Eurovision voting patterns

Since the late 1990s, the Eurovision Song Contest has applied a system of per-country telephone votes. The voting patterns have been argued to contain predictable themes, such as mutually high points awarded between Greece and Cyprus,or between the Scandinavian countries.

Using a community-detection algorithm that assigns groups of countries based on the voting networks between 1999 and 2019, an interesting pattern emerges. The sum of points given between each pair of participating countries is much stronger between certain country-pairs than between others.

Clusters of countries by voting-patterns 1999-2019

The cluster analysis is based on a cut-off algorythm where clusters are found where within-community mutual voting is significatnly stronger than outside-comunity votes. The visualisation uses the Fruchterman-Reingold algorythm and draws same-community countries in proximity. Country-size represents total number of points recieved 1999-2019.

Telephone voters of culturally close countries are found to appreciate the musical contributions of their in-group members. The country-groups found using the voter patterns show that cultural and historical clusters of countries like Scandinavia, former Yugoslavia or former Soviet states, still generally prefer the music from eachother's countries.

This visualisation uses the igraph package for R and visualises community members near each other using the Fruchterman-Reingold algorithm which takes into account the edge weights defined by the total mutual points awarded between each pair of countries in the Eurovision Song Contest finals of the past decades.


Alternative geography

This network visualises the land borders of countries of the world. It truthfully depicts relative population sizes and honors neighbor-relations, but otherwise disreguards geograph space. How does our view of the world exist in our minds?

This alternative world map was made to facilitate the re-thinking of concepts like countries and continents, and to trick the beholder into turning the attention inwards - towards the beholder of such consepts and the cognitive functions on which we rely to order and navigate in the world. We can recognise the world in it, but need to rediscover and realign the alternative map to our established perseption in order to read the image.

Network-generated world-map layout

A world map visualised by algoritmic placement of country-nodes in a network of countries joined by land-borders.
Download here the edited pdf and the original script in R using igraph and relational data to create the world-map network.

Infographs communicate factual information about the world and permit us form our understanding of it. Once acquired, though, our understanding of the world is sticky. Humans rely on habits and acquired patterns to function since a constantly reevaluation of our surrounding would be cognatively exhausting. Still, sticking to old info is a cognitive bias and problematic trait in human behaviour. Public helath super-lecturer Hans Rosling noted that the world view of his students corresponded roughly to the socioeconomic state of the world at the time when their teachers were children.

I am fasinated by the waking up from perseptional slumbers. The moment when captured atention or kindeled interest precipitates re-thinking. Here, the statistician and the artist meet and converge over bothe objectives and means to, as Picaso said, brush away from the soul the dust of everyday life.


Networks of the mind

For an event at the Mind Foundation in Berlin, I was invited to participate in a collaboration with other computer scientists, artists and neuroscientists for an exposition with EDGE neuroart in 2020.

Using raw data collected from neurons in mice brains, network tools and a tweaked random network generator, these are parts of the piece that our group generated for the exhibition in Berlin.

Network-generated world-map layout

 

The poster is based in structural differences between uninhibitedly random networks and the real-world structure of the observed neural networks photographed in developing mice brains.


Animating real-world network data

The animation draws on principles of establishing a network structure using basic structural measurements and the consequent generation of similarly structured random networks.

Like the above poster, the use of the network methodology has been left behind in the persuite of artistic rather than analytical goals.

More portfolio projects:

Résumé

Curriculum Vitæ

  • (2022) Database manager at Umeå University (CEDAR), for a research project relying on digitalilsed historic printed data sources.
  • (2020-2021) Data Wizard at Neveo, a Belgian SaaS startup.
  • (2019) Investigator at Transport Analysis, the Swedish governement agency for transport policy advice.
  • (2019) PhD in Economic history, Umeå University
  • (2017-2019) Research assistant, Umeå Center for Gender Studies
  • (2018-2019) Freelance researcher and lecturer in informatics.
  • (2018 - ) Podcaster for Österled: recording, scripting, audio-editing and web development.

Publications

Contact

Email me at contact at Lilljegren dot com or fill in the following form to reach me. I am based in Brussels - Belgium.

What is 9 - 2