Paper Review Software Engineering Challenges of Deep Learning



A set of 12 main challenges of ML development in big companies.

Problems: Experimental code paths. Configuration management (seemingly no new problems?)

Some other problems stem from the distributed computations needed and also the inherent non-determinism.


The author investigates/interviewed a few real world ML projects that are seemingly representative. Author then proposed 12 challenges associated with SE of ML projects.

A table that matches project with the corresponding SE challenge categories is presented.


Transfer learning is ML’s way of reusing. Model composition is the ML’s way of composition.

It has been argued that one of the most difficult components to help track of is the data component, and the cost of storing versioned data cane be very high.

Or put it another way, the concept of “versioning” in the ML world is very difficult from the traditional software world.

In Computer Vision, the DL revolution is actually a trade-off between the accuracy and transparency (or debug-ability).

If the problem is complex and the model is inherently irreducible, than an accurate explanation of the model will be as complex as the model itself

Problems in the real world projects:

  1. Configuration flag caused performance degradation
  2. Difficult to interpret the decision making process of the model, thus more risk-adverse
  3. A bug in pooling (sub-sampling) being too aggressive, leading to a loss of resolution before the data was actually encoded
  4. Difficult to estimate the time and resource need, thus unable to manage the project easily (what date can we deliver this product to the clients etc.)

Practical Value

What you can learn from this to make your research better?

Difficulty in understanding and maintaining is the big issue in ML, rather than constructing.

Details and Problems From the presenters’ point of view, what questions might audience ask?

(Time) Complexity of ML project is hard to predict since many things are controlled through the data rather than code. (Time to convergence?)