Liang Xu

This is Liang's blog for life and work archive.

A user-friendly approximate Bayesian computation package in Python with an application on the coronavirus outbreak in the Netherlands

Approximate Bayesian computation (ABC) is a kind of likelihood-free method that utilizes computational power to generate a huge amount of simulations with randomly chosen parameters to hit the target - the observations. The generating parameters that produce the results that are most analogous to the observations are recognized as the best inference of the truth. This is analogous to the idea of neural networks of machine learning. The difference is that ABC requests process-based modeling, which bases on knowledge of the underlying mechanism behind the data. In contrast, neural networks neglect such process-based modeling but only needs to construct a neural net structure to train for fitting data. Thus, the advantage of neural networks is that it provides a general inference approach without need of knowing the mechanisms. Nevertheless, the disadvantage is also on it. The interpretation of the processes behind the data is lacking. This is why it is widely used in pattern recognition but in biological processes interpretation. Both methods have limited structure types although the combination of them can be infinite. Therefore, coding a general framework to fit any type of models is feasible. While Tensorflow of Google and Pytoch of Facebook stand out in machine learning as the examples of such general frameworks, not many groups are focusing on ABC development. Only a recent package by Dutta et al. in Python is released setting a good example. Here I would like to share a simple version with an application to the recent coronavirus outbreak in the Netherlands. Just for fun and keeping my coding skills warm in the period of quarantine regulation.

GUI your model - a way to sell theoretical models to empiricists

Thesis submitted! Finally, I could have a bit of time to update this blog before getting feedback from reviewers of two submitted papers. So, I will dig several pits as what I did before and see if in the future I could fill them : - ) The first pit stems from my thinking of that how we, biological theoritians, are able to sell our work to empiricists. Our work and interest are to stablish complex mathematical models to mimic biological processes to reveal the underlying mechanisms, which form the observed phenomenon. We could code the processes, do statistic analysis to select the most prominent mechanism, infer the likely generating parameters. However, empiricists may have problems even setting up an environment for a programming language in which we used to deploy our models as their focuses are on field or lab experiments which I have not a clue of what they are (exaggerated). Therefore, to bridge the worlds of empiricists and theoritians, a concrete tool is imperative. GUI (Graphical User Interface) of an abstract model is the right way. (This is the way! – Mandalorian). In this blog, I introduce one GUI example of my model and show how it works within a few clicks.

Approximate Bayesian Computation: standard version and its variant ABC-SMC

In the last few months of my PhD project, it is unimaginable that how busy it could be. You need to focus on the current work to finish them, analyzing data, writing paper, structuring the final thesis etc. whilst think about the future, going to industry or staying in academia. To me, I enjoy the process of researching, learning new techs, tackling with challenges. This is the essential need that I want. The outer environment is to support decorating the need to make the environment attractive. So it feels like clothes to human, no matter what kind of clothes they are their essential function is to cover the body that human wants, keeping warm and comfortable. But undoubtedly if available human is chasing after the most fancy clothes that meet human’s additional requirements besides the basic need. Think through this, when I know clearly what I want, what kind of job I want to do is clear, meeting my basic need and seeking the most luxury that I fit.

The words above are more like a conclusion that I came up with in the last few months and an excuse/explanation of no posting. Here, I would like to record some evolutionary algorithms that I used in the projects I have done and the animations of them. I still have no time to go into the details and cannot post at a normal frequency for a while (at least before my graduation). But I promise I will concretize them in the future.

An animation to show how traits of species evolve with their abundance

Do you like comics? I am a comics fan, majorly favoring Japanese comics and some Chinese comics with the rise of Chinese comics industry in recent years. Why do I mention this? I mean to say drawings are usually the most straight way to express abstract ideas. This is why I want to show an animation of our model to explain such complex mechanism instead of using tons of papers filled with equations.

Bash & R: Dealing with big data; read it! don't source it!

Recently, I was focusing on my 3rd PhD project with mathematical modeling. It was about the extension to the neutral theory. I am not gonna reveal it now but will bring back in the future. So I was away from the blog for a while and have to put machine learning series aside, although the derivation of the recurrent neural network is done. It is coming soon as well. Finally, the mathematical model has been done. I have some time to talk something that I’ve been doing except the modeling when waiting for the simulation results. In this post, I am gonna record a silly issue about loading data in R after extracting the a subset of a big data by Bash.

Machine learning on biology S1: model classification via ML

On my 2nd Phd project, I have constructed a complex trait-population model to describe how species evolve and assemble in a community under natural selection. However, people may naturally ask if such complexity is really needed. What is the discrepancy between the model with and without population dynamics and variable trait variance? To answer this question, a model selection/classification should be performed on the data generated under those candidate models. As what we used in parameter inference, an Approximate Bayesian Computation-Sequential Monte Carlo method is able to handle this work. However, a huge computational demanding is the bottle neck. To avoid to waste time on fitting each empirical data set by ABC-SMC, a method that only uses the feature of data sets would be a better choice. This reminds me the existing and fast-developed took, i.e. machine learning. So, from this post on, I would like to share my experience of learning Machine Leaning, how to construct a neural network, how to derive backwards propagation algorithm for leaning parameters of a neural network and how to use tensorflow to simply use machine learning, or even deep learning (a multiple-layer neural network algorithm), to do model selection/classification on the topic I mentioned above.