My photo in LA Universal Studio
Note-levle automatic music transcription (AMT) by Single-chord Fourier Approximation (Github), a model-based AMT pipeline which has high performance on transcribing linear instruments.
Jazzify (Github), an open-source rule-based computational musicology system, which is able to detect chord symbols and keys given chord sequence, and generate jazz-style comping.
And here are a few of my pieces!
If my music fails to be played here, you can click the links and listen to them in outside webpages. You can also find my music in my SoundCloud Personal Page, YouTube Channel, Bilibili Personal Page or Netease Music (网易云音乐) Personal Page
- Jazz Trio: Cold Clean Autumn; Links: SoundCloud | Bilibili
]]>
- Piano Work: Etude in F Major - Ripple; Links: SoundCloud | Bilibili
Pre-requisites:
This is my lecture note of the natural language processing (NLP) lecture in ETH Zurich. Lecturer: Ryan Cotterell. This lecture tells you what backpropagation algorithm really is.
Well, I have to say that it is surprising that backpropagation (BP) is taught in an NLP course instead of a machine learning course.
To view this article, please download:
Lecture_note1_BP.pdf
If I have time, I will implement BP with Python without PyTorch.
]]>Pre-requisites:
After the happy summer vacation, I have to attend the master program at ETH Zurich. This is my lecture note for my first course, Neural Network Theory, at ETH.
Lecturer: Prof. Helmut Elbrächter
This first note is about the capacity of a single-layer neural network in approximating an arbitrary function whose domain is in ${\rm \pmb{R}}^d$.
To view this note, please download:
NNT_note1_tongyulu.pdf
Pre-requisites:
This is a note for Zafar Rafii’s REpeating Pattern Extraction Technique (REPET) algorithm in its basic form. REPET is a super-simple algorithm which could separate human voice from accompanied music, although it is effective only in typical cases when background music is highly repetitive.
To view this article, please download:
REPET_Algorithm_study.pdf
Pre-requisites:
This article tells you a method to modeling a regression problem whose output is bounded and supposed to be countinuous.
To view this article, please download:
Beta_Regression_vs_Logistic_Regression.pdf
There is a .ipynb notebook for a toy experiment, which could be downloaded from:BoundedRegression.ipynb
This is a reflection on PyTorch: be careful with the shape of tensor, even for differences between shape[B,1] and shape[B].
Have you encountered such a problem when training a neural network: you are sure that your model is perfectly correct, and your training procedure is also correct. But you cannot get the expect result. Your model outputs seem blurred, and the loss curve does not show obvious descent. What is wrong?
Have you encountered python warning like this? UserWarning: Using a target size (torch.Size([12])) that is different to the input size (torch.Size([12, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
There is a lesson behind this. To view this article, please download:
PyTorch_lesson_1D_dimension_leading_to_bad_loss.pdf
Pre-requisites:
This article tells you the basic ideas behind the Beta distribution, and its basic applications. In detail, we could use Beta distribution to model random varibales which represent proportions or probabilities.
In this article, you could understand why Beta distribution is called “distribution of distributions”, and get an intuition about the concept “conjugate distribution”.
To view this article, please download:
From_Beta_Distribution_to_Conjugate_Distributions.pdf
If I have spare-time in the future, I will complete the English version.
]]>by lucainiaoge
This article assumes that readers have been familiar with the concept of computation graph.
Can we use computation graph to calculate arbitrary-order derivative? The answer is yes, we can. But how to leverage it using PyTorch, and how to understand the whole thing?
To view this article, please download:
PyTorch: Use create_graph to Compute Second-order Derivative
And the illustrations in that passage could be downloaded from create_graph_true_PyTorch_tutorial.ipynb
]]>by lucainiaoge
Suppose we have a series of vectors $V=[v_n],n=1,2,…,N$, where $v_n\in {\rm R^K}$. We want to cluster together similar vectors, what will we do?
To view the whole passage, please download:
An Introduction to Deep Clustering
by lucainiaoge
I came up with this idea on October 9th, 2020, when I was considering how to find the overtone peaks in a spectrogram. I happened to be playing LIMBO the previous day, and that inspired me to come up with my idea regarding Linear-upper-coverage (LUC), which converts the peak finding problem into linear optimization subtasks.
How to find the peaks in the following curve?
If we manually mark the peaks, the result may normally look like the following figure?
However, how to design an algorithm to robustly find the desired peaks (local maxima)?
To answer this question, we have to ask ourselves: what is a desired peak?
If we just want every local maxima, we may get this following result:
And according to our current definition of “desired peak”, this ugly result is correct.
Therefore, you may be willing to design an algorithm which is adaptable according to our variable definitions for a “desired peak”.
Well, in addition to the inspiring peak-finding-problem itself, another incentive of my idea is a rolling wheel on a coarse road:
As you can see: the centroid of the wheels form up a much smoother curve (although still a little bumping) than the original curve. However, several insignificant local maxima are still troublesome. And it is also a little bit hart do model a wheel rolling on a coarse road with a few lines of codes. So how to improve this idea?
I was lucky to be playing LIMBO the previous day, and one scene in the vedio game LIMBO just struck me:
Notice the the red arrows: there are tiles covering the roof/hill (whatever it looks like as it is dark).
So, why not cover our curve with tiles? I drew the following picture to show my idea of tile-covering:
First, divide the x-axis into intervals; second, cover the curves in each intervals with tiles; third, find the centroid of each tile; finally, use the centroid scatters to find the local maxima.
Notice that we can cover over curve with tiles of different lengths, which is equivalent our variable definitions on “desired peak”: if we use short tiles, we will be more sensitive to small local maxima, while long tiles will ignore the local maxima which are insignificant in terms of a smaller measure on $x$.
Now comes the question: how to implement this idea?
As the title of this section suggests, I utilized linear optimization to solve the problem. Among the four steps in the previous section, you may notice that the phrase “over the curves in each intervals with tiles” is not clearly defined. How can we cover a curve with a tile? Here is where linear optimization comes into being:
Suppose we got an interval $x \in [x_i,x_{i+1}]$, where a 1-D curve $f(x)$ is defined. The tile is depicted as $l_i(x)=a_i x+b_i$. Obviously, we must have $f(x) \leq l_i(x), \forall x \in [x_i,x_{i+1}]$. With this constraint, we want to find the parameters $(a_i,b_i)$ such that the tile seems to cover the curve.
Newton’s laws suggest: if you put an object in a gravitatio, it is stable when its gravitational potential energy reaches its minimum. If we assume that the density of our tiles is uniformly distributed, we can conclude that the tile’s gravitational potential energy can be calculated by a particle locating at its centroid, which is $(\frac{x_i+x_{i+1}}{2},l_i(\frac{x_i+x_{i+1}}{2}))$. Therefore, we get our objective function: $l_i(\frac{x_i+x_{i+1}}{2})=a_i\frac{x_i+x_{i+1}}{2}+b_i$.
To sum up, we get the linear optimization problems:
$\forall i=1,2,…$, solve the linear programming (LP)problem within interval $[x_i,x_{i+1}]$:
$$min_{a_i,b_i} \quad a_i\frac{x_i+x_{i+1}}{2}+b_i $$ $$s.t. f(x) \leq a_i x+b_i, \forall x \in [x_i,x_{i+1}]$$
And I may call it linear-upper-coverage (LUC) algorithm.
Notice that there are infinite constraints, which are intractable. One intuitive way is to sample the interval $x \in [x_i,x_{i+1}]$ to get $x_{i}^{k} \in [x_i,x_{i+1}], \forall k=1,2,…,K$. And consequently, the linear constraints become: $$f(x_{i}^{k}) \leq a_i x_{i}^{k}+b_i, \forall k=1,2,…,K$$
Until now, we are able to solve that linear optimization problem with inequality constraints! The classical linear programming algorithms (e.g. the Simplex Method) are well-equipped to sove them!
You may noticed that although I divided the x-axis equally, the lengths of tiles are not the same. Is it necessary to set the tile lengths all the same? This is actually about the choosing the segmentation strategy, i.e. the procedure to get $[x_i,x_{i+1}], i=1,2,…$
Therefore, if restricted to linear coverage, the most configurable aspect is the segmentation strategy. Not only can we segment in equal length, we can also segment in a log-scale, or even sample randomly. Different segmentation strategy lead to different definitions of “desired peak”.
Also, it is not necessary to assume that the intervals do not overlap. Instead, we can arbitrarily sample the curve and construct intervals for each of those sample points, even if those intervals can overlap. Such more general definition becomes:
Definition 1 (normal LUC): Given $f:D\rightarrow \pmb{R}$, sample data points $x_i\in D,i=1,2,…$. Define intervals $I_i = [x_i^l,x_i^r]$ where $x_i \in I_i$, and define linear segments $l_i(x)=a_i x+b_i, \forall x\in I_i$.
Define the linear programming problems $LUC_i$ within intervals
$$LUC_i:\quad min_{a_i,b_i} \quad l_i(\frac{1}{2}(x_i^l+x_i^r))$$ $$s.t. f(x) \leq l_i(x), \forall x \in I_i$$ The LUC solution is the set of all optimum linear segments, i.e. $l_i^{\star}(x)$
From now on, $x_i$ no longer means the left endpoint of interval $I_i$, but the centroid point of interval $I_i$. The left endpoint of interval $I_i$ is written as $x_i^l$ instead. (If I sometime breaks this law, please understand that because those cases are obvious.)
Moreover, we can cover the curve with actually whatever function we like: circles, ovels etc. I chose lines because they look more simple and intuitive. What about other elements? It might be worth trying, but I am not going to talk about them at this moment…
If we want to prove that an algorithm can smoothify a curve, we have to prove that the algorithm dampens the high-frequency components in certain ways.
The main challenge to prove this for LUC is that there are linear-programming calculations in each interval, which is non-trivial. I find it difficult to prove this even convex constraint is considered.
I start from the (lower) convex segments defined in $[x^l,x^r]$:
$$f(\lambda x^l+(1-\lambda) x^r)\leq \lambda f(x^l)+(1-\lambda) f(x^r), \forall 0\leq \lambda \leq 1$$
Intuitively, we can reach the following conclusion:
Lemma 1 (LUC for convex function): if $f:[x^l,x^r]\rightarrow \pmb{R}$ is convex in $[x^l,x^r]$, then the optimum LUC function becomes $l^{\star}(x)=\frac{f(x^l)-f(x^r)}{x^l-x^r} (x-\frac{x^l+x^r}{2}) + \frac{f(x^l)+f(x^r)}{2}$
Proof:
①First prove that the given $l^{\star}$ obeys the constraints of LUC linear programming.
That constraint is: $f(x) \leq l(x), \forall x \in [x^l,x^r]$. If we apply $l^{\star}$ to that constraint, we just need to prove: $f(x) \leq l^{\star}(x), \forall x \in [x^l,x^r]$.
Plug in the convex assumption on $f(x)$ and we get: $f(\lambda x^l+(1-\lambda) x^r)\leq \lambda f(x^l)+(1-\lambda) f(x^r), \forall 0\leq \lambda \leq 1$.
As $f(x^l) \leq l^{\star}(x^l)$ and $f(x^r) \leq l^{\star}(x^r)$, we get $\lambda f(x^l)+(1-\lambda) f(x^r)\leq \lambda l^{\star}(x^l)+(1-\lambda) l^{\star}(x^r) = l^{\star}(\lambda x^l+(1-\lambda) x^r)$.
Therefore, we proved that $f(x) \leq l^{\star}(x), \forall x \in [x^l,x^r]$ is true.
②Then prove that $l^{\star}$ is the optimum.
Recall that the objective function is $l(\frac{x^l+x^r}{2}) = a_i\frac{x^l+x^r}{2}+b_i$.
If we want to decrease the objective value $l(\frac{x^l+x^r}{2})=\frac{l(x^l)+l(x^r)}{2}$, either $l(x^l)$ or $l(x^r)$ should be decreased.
However, according to the constraints, $f(x^l) \leq l(x^l)$ and $f(x^r) \leq l(x^r)$: once we have $f(x^l) = l^{\star}(x^l)$ and $f(x^r) = l^{\star}(x^r)$, we can assert that $l^{\star}$ reaches the optimum, for neither $l^{\star}(x^l)$ or $l^{\star}(x^r)$ can be decreased. □
On the other hand, if we assume that $f(x)$ is bounded in interval $[x^l,x^r]$, it is also intuitive to reach the following conclusion:
Lemma 2 (existance of LUC solutions and tightness): if $f:[x^l,x^r]\rightarrow \pmb{R}$ is bounded, then:
- There are at least two zero-slackness LUC constraints at $x^{-}$ and $x^{+}$ ($x^{-}\leq x^{+}$); the term zero-slackness indicates that $f(x^{-})=l^{\star}(x^{-})$ and $f(x^{+})=l^{\star}(x^{+})$.
- Equality of $x^{-}\leq x^{+}$ is reached only when $x^{-}=x^{+}=x^l$ or $x^{-}=x^{+}=x^r$
Proof:
① First prove that there is at least one optimum solution for the LUC problem.
As $f(x)$ is bounded in interval $[x^l,x^r]$, assume that $m<f(x)<M_1<M_2$, there exists $l(x)=a_i x+b_i$ such that $M_2>l(x^l)>M_1>f(\forall x)$ and $M_2>l(x^r)>M_1>f(\forall x)$. Therefore, feasible region is not empty.
On the other hand, there exists $l(x)=a_i x+b_i$ such that $m<f(x^l)\leq l(x^l)$ and $m<f(x^r)\leq l(x^r)$. This is to say, the objective $\frac{l(x^l)+l(x^r)}{2}>m$. Therefore, the objective cannot be arbitrarily small.
In sum, there is at least one optimum solution for the LUC problem.
② Then prove that the optimum solution lead to zero slackness. Recall the complementary slackness theorem:
Given a standard LP problem $$min\quad cx, \quad s.t. \quad Ax \geq b$$ and its dual problem $$max\quad b^T y, \quad s.t. \quad A^T y \leq c^T$$ Definition of slackness of the $k^{th}$ constraint is $s_k = A_k x - b_k$.
The complementary slackness theorem says that if a feasible solution $x_0$ is optimum (which exists because of ①), and its corresponding dual solution is $y_0$ is also optimum, then $x_0^T (c-A^T y_0)=0$ and $y_0^T (A x_0 - b)=0$.
Therefore, for those $y_0^k \neq 0$, we must have $A_k x_0 - b_k = 0, k \in \lbrace k:y_0^k \neq 0 \rbrace$. As a result, $s_k = A_k x_0 - b_k = 0,k \in \lbrace k:y_0^k \neq 0 \rbrace$.
Translate this into LUC context: $x_0=(a_{i}^{\star},b_{i}^{\star})$, $A=[x^{[1:K]},ones(1,K)^T]$, $c=[\frac{x^l+x^r}{2},1]$ and $b=[f(x^1),…,f(x^K)]^T$. The “$y_0^k$” in LUC cannot be all zero, or otherwize $c=0$, which is not true. Therefore, there always exists $k$ such that “$A_k x_0 - b_k = 0$”, which is actually $f(x^k)=l^{\star}(x^k)$ (the $k^{th}$ constraint), leading to zero slackness.
③ Then prove that there are at least two different tight constraints.
According to the convexity of linear feasible region (which is a polyhedron), the optimum solution touches the vertex of that region. As we know, the vertex of a region is an intersection of two hyperplanes (in LUC, they are actually two 2-D lines, because there are only two variables), corresponding to two different constraints. Therefore, if there are at least two constraints which have zero slackness, they are different.
④ Finally prove the property when $x^{-}=x^{+}$.
We know that each constraint can be linked with an $x$ in $[x^l,x^r]$. Moreover, the statement $x\in [x^l,x^r]$ indicates two more underlying constraints $x\geq x^l$ and $x\leq x^r$. Recall ③: the two constraints at $x=x^{-}$ and $x=x^{+}$ are different. If a constraint at $x=x^{-}$ is tight and $x^{-}=x^{+}$, then this case only happens at $x=x^l$ or $x=x^r$, because: any $x\in (x^l,x^r)$ can be linked with only one constraint $f(x)\leq l(x)$; but at $x=x^l$, there exists one more constraint $x\geq x^l$, which reaches tightness (and the case when $x=x^r$ is similar). In conclusion, the statement $x^{-}=x^{+}$ indicates that two tight constraints sharing the same point, and such case can only happen at $x=x^l$ or $x=x^r$; consequently, $x^{-}=x^{+}=x^l$ or $x^{-}=x^{+}=x^r$.□
Now, we have all the ingradients! What is left is to describe our problem in a proper way. Consider a uniformally-sampled sequence of $f(x)$, which is written as $[f_i] = [f_1,f_2,…] = [f(x_1),f(x_2),…]$.
Without downsampling, we do LUC for each interval centered with the $i^{th}$ point, i.e. the interval $[x_i-\Delta x,x_i+\Delta x]$. It is also OK to write this interval as $[x_{i-\Delta i},x_{i+\Delta i}]$ because of equal-partition.
Therefore, we have the LUC problem formally:
$$min_{a_i,b_i} \quad a_i x_i+b_i $$ $$s.t. f(x) \leq a_i x+b_i, \forall x \in [x_{i-\Delta i},x_{i+\Delta i}]$$
According to Lemma 2, there are at least two points $x_i^{-}$ and $x_i^{+}$ such that $f(x_i^{-})=l_i^{\star}(x_i^{-})$ and $f(x_i^{+})=l_i^{\star}(x_i^{+})$. We can subsequently determine $l_i^{\star}$ with those two points: $$a_i^{\star} = \frac{f(x_i^{-})-f(x_i^{+})}{x_i^{-}-x_i^{+}}$$ $$l_i^{\star}(x) = a_i^{\star}(x-\frac{x_i^{-}+x_i^{+}}{2})+\frac{f(x_i^{-})+f(x_i^{+})}{2} \quad (*)$$
Consequently, I write the LUC version of uniformally-sampled sequence as $[f_i^{\star}] = [f_1^{\star},f_2^{\star},…] = [l_1^{\star}(x_1),l_2^{\star}(x_2),…]$
Assume that $f(x)$ is convex in each interval $[x_{i-\Delta i},x_{i+\Delta i}]$ (no need to be convex in the overall definition interval). Then calculate the DTFT for both $[f_i]$ and $[f_i^{\star}]$:
$$F(\omega) = {\rm DTFT}[f_i] = \sum_i f_i e^{\sqrt{-1}\omega i} \quad ①$$ $$F^{\star}(\omega) = {\rm DTFT}[f_i^{\star}] = \sum_i f_i^{\star} e^{\sqrt{-1}\omega i} = \sum_i [a_i^{\star}(x_i-\frac{x_i^{-}+x_i^{+}}{2})+\frac{f(x_i^{-})+f(x_i^{+})}{2}] e^{\sqrt{-1}\omega i} \quad ②$$
According to the convexity assumption, we can apply Lemma 1, which says that $l_i^{\star}(x)=\frac{f(x_{i-\Delta i})-f(x_{i+\Delta i})}{x_{i-\Delta i}-x_{i+\Delta i}} (x-\frac{x_{i-\Delta i}+x_{i+\Delta i}}{2}) + \frac{f(x_{i-\Delta i})+f(x_{i+\Delta i})}{2}$
Compare this with $(*)$ and we will get $x_i^{-}=x_{i-\Delta i}$ and $x_i^{+}=x_{i+\Delta i}$.
Plug this conclusion into $②$ and also apply the equality that $x_i = \frac{f(x_{i-\Delta i})+f(x_{i+\Delta i})}{2}$, we will get:
$F^{\star}(\omega) = {\rm DTFT}[f_i^{\star}] = \sum_i \frac{f(x_{i-\Delta i})+f(x_{i+\Delta i})}{2} e^{\sqrt{-1}\omega i}$
$= \sum_i \frac{f_{i-\Delta i}+f_{i+\Delta i}}{2} e^{\sqrt{-1}\omega i}$
$= \frac{1}{2}({\rm DTFT}[f_{i-\Delta i}] + {\rm DTFT}[f_{i+\Delta i}])\quad ③$
We know that ${\rm DTFT}[f_{i-\Delta i}]=\sum_i f_{i-\Delta i}e^{\sqrt{-1}\omega i}=\sum_i f_{i}e^{\sqrt{-1}\omega (i+\Delta i)}=e^{\sqrt{-1}\omega \Delta i}\sum_i f_{i}e^{\sqrt{-1}\omega i}=e^{\sqrt{-1}\omega \Delta i}{\rm DTFT}[f_{i}]$
Therefore, $③$ becomes $$F^{\star}(\omega) = {\rm DTFT}[f_i^{\star}] = \frac{1}{2}[e^{\sqrt{-1}\omega \Delta i}+e^{-\sqrt{-1}\omega \Delta i}]F(\omega) = F(\omega){\rm cos}(\omega \Delta i) \quad ④$$
Finally, we can reach from $④$ that $\vert F^{\star}(\omega) \vert = \vert F(\omega)\vert \vert {\rm cos}(\omega \Delta i)\vert \leq \vert F(\omega)\vert$
Summing up the discussions above:
Theorem 1 (smooth effect of convex LUC): given $\Delta i > 0$ and bounded $f(x):D\rightarrow R$, sample a sequence $x_i, i=1,2,…$ from $D$, whose corresponding intervals are $[x_{i-\Delta i},x_{i+\Delta i}], i=1,2,…$; the LUC centroids form a sequence $f_i^{\star}=l_i^{\star}(x_i), i=1,2,…$ whose DTFT is $F^{\star}(\omega)$, while the sampled curve becomes $f_i=f(x_i), i=1,2,…$ whose DTFT is $F(\omega)$. Then we have:
if $f(x)$ is convex in intervals $[x_{i-\Delta i},x_{i+\Delta i}], i=1,2,…$ separately, then $F^{\star}(\omega)=F(\omega){\rm cos}(\omega \Delta i)$
Due to Taylor’s approximation of ${\rm cos}(\omega \Delta i)$: when $\omega$ is near zero, $F^{\star}(\omega)$ is near $F(\omega)$; but when $\omega$ is a little bit far from zero, $F^{\star}(\omega)$ will be dampened. Of course, the choice of $\Delta i$ is also a factor controlling such influence.
To understand this effect more intuitively, here is an example, in which $\Delta i = 1/2$, and $f(x)$ is convex in every separate interval.
The red line (downsampled results) is more rugged than the green line (LUC centroids), which confers to my previous analysis.
In the previous section, we discussed the LUC for a convex segment. More importantly, we proved Lemma 2, which guarentees the existance of solution, and even gives the exact solution: $$a_i^{\star} = \frac{f(x_i^{-})-f(x_i^{+})}{x_i^{-}-x_i^{+}}$$ $$l_i^{\star}(x) = a_i^{\star}(x-\frac{x_i^{-}+x_i^{+}}{2})+\frac{f(x_i^{-})+f(x_i^{+})}{2} \quad (*)$$
But it did not touch the essencial case of the tile covering a rugged curve whose convexity is ill defined. How to get a more clear mind about the rugged case? My plan is to find candidates in that rugged curve, from which we select two of them as $x_i^{-}$ and $x_i^{+}$ to determine our final result.
Before delving into this, we look at several examples…
There are different curves and we will cover each of those with a line.
Case (c) is a convex, whose solution is that its $x_i^{-}$ and $x_i^{+}$ are actually the endpoints, which has already been proven.
Case (a) and (b) are concave cases, which are intuitively different from the convex case.
Case (d),(e) and (f) are rugged cases.
Case (g),(h) and (i) are combinitions of convex and concave segments. (h) and (i) has local maxinima, while (g) is monotonic.
Think for a while about the results…
And then, I give my intuitions:
My finding is that: the candidate points (marked in orange) are such typical points:
My intuition leads to an algorithm to find the candidate points:
First, deal with the problem of finding local maximima:
Algorithm 1 (finding local maxima): $m=locmax(y)$
(1-based indexing is used) Given sample squence $y = [y_1,y_2,…,y_k,…,y_K]_{1\times K}$
- initialize a buffer $b = [1,0,…,0,…,0,0]_{1\times (K+1)}$
- initialize the result $m = [0,0,…,0,…,0]_{1\times K}$
- for $k=1,2,…,K-1$:
- $\quad$ $b[k+1] = bool(y[k] \geq y[k+1])$
- for $k=1,2,…,K$:
- $\quad$ $m[k] = b[k]·\overline{b[k+1]}$
- return $m$
Then, deal with the problem of finding $P$ and $Q$:
Algorithm 2 (finding $P$ and $Q$): $k_P,k_Q=findPQ(x,y)$
(1-based indexing is used) Given sample squence $y_{1:K}$ and its corresponding x-axis $x_{1:K}$
- $p = (y[2:K]-y[1])/(x[2:K]-x[1])$
- $q = (y[1:K-1]-y[K])/(x[1:K-1]-x[K])$
- $k_P = {\rm argmax}_k p[k]$
- $k_Q = 1+{\rm argmin}_k q[k]$
- return $k_P,k_Q$
Next, we reach the algorithm for getting the candidate set:
Algorithm 3 (finding candidate set): $I_c=findCandidate(x,y)$
(1-based indexing is used) Given sample squence $y_{1:K}$ and its corresponding x-axis $x_{1:K}$
- initialize $I_c = \lbrace 1,K \rbrace$
- $m=locmax(y)$
- $I_m = find(m==1)$
- $k_P,k_Q=findPQ(x,y)$
- $I_c = I_c \cup I_m \cup \lbrace k_P,k_Q \rbrace$
- return $I_c$
Now, take a look at the example (d),(e) and (f) in the last section: as those curves are rugged, the candidate points in $\lbrace (x_{i_c},x_{i_c}): i_c \in I_c \rbrace$ still form a rugged curve. Why not plug those points again into Algorithm 3? Consequently, we get $\hat{I_c}=findCandidate(x[I_c],y[I_c])$, as illustrated in the following figure:
This leads to the ultimate algorithm:
Algorithm 4 (finding the smallest candidate set): $I_c=findCandidateSmallest(x,y,N)$
(1-based indexing is used) Given sample squence $y_{1:K}$ and its corresponding x-axis $x_{1:K}$;
$N$ denotes the max iteration times;
- initialize $I_c = \lbrace 1,2,…,K \rbrace, \tilde{I}=I_c, \hat{x} = x[I_c], \hat{y} = y[I_c]$
- for $n$ in $range(N)$:
- $\quad$ $\hat{I_c}=findCandidate(\hat{x},\hat{y})$
- $\quad$ if $\vert \hat{I_c} \vert == \vert I_c \vert$:
- $\quad\quad$ break
- $\quad$ else:
- $\quad\quad$ $I_c=\hat{I_c}$
- $\quad\quad$ $\tilde{I} = \tilde{I}[I_c], \hat{x} = \hat{x}[I_c], \hat{y} = \hat{y}[I_c]$
- return $\tilde{I}$
Finally, we reduce the number of constraints to those indexed by $I_c$.
Why does the candidate-set algorithm work? Obviously, my intuitive explanations are far from compelling. To develop the following sections and to reconcile the effectiveness of candidat-set algorithm, I need to introduce my hypothesis, which is not proved at this moment:
Hypothesis 1 (equivalence of candidate-set in terms of LUC):
if the candidate set for bounded $f(x):I_i\rightarrow R$ is $I_{ci}$, then the normal LUC problem reduce to: $$LUC_i:\quad min_{a_i,b_i} \quad l_i(x_i)$$ $$s.t. f(x) \leq l_i(x), \forall x \in I_{ci}$$
Or in the language of Lemma 2: $x_i^{-}\in I_{ci}$ and $x_i^{+}\in I_{ci}$
In addition, if you are careful enough, you may find that my definition for candidate-set is flawwed: it does not lead to the equivalent result as the normal LUC. This can be proven by the figure in the previous section which illustrates the candidate-set algorithm, which provides a counter-example: the ultimate LUC calculated based on the blue points actually violates the LUC constraints based on the initial black curve!
Luckily, although this effect sometimes happens, the LUC results do not seriously change. I am going to demonstrate this in the experiment section.
Still, I have my solution which can perfectly solve this flawwed-definition problem. The key is: modify the $3^{rd}$ criterion to get a tentative $4^{th}$ criterion:
- For type 1 candidate points $X_k = (x_k,f(x_k))$ (which are local maxima), find curve points $P_k = (x_{k}^{+},f(x_{k}^{+})),Q_k = (x_{k}^{-},f(x_{k}^{-}))$ for each $X_k$, such that
- $x_{k}^{+}>x_k$ and the slope of $X_k P_k$ reaches maximum
- $x_{k}^{-}<x_k$ and the slope of $Q_k X_k$ reaches minimum
Consequently, the candidate-set finder algorithm should be modified according to the $4^{th}$ criterion. Fortunatelly, it is quite easy by changing Algorithm 2 and 3 a little bit. After all, I am going summarize all about candidate-set algorithm in the next section.
Definition 2 (full definition for candidate set): Given bounded $f(x):I\rightarrow R$;the endpoints are $x^l=inf(I),x^r=sup(I)$.
Then, the candidate set for $I$ is written as $I_{c}=\lbrace x:x\in I_m, x\in I_{e}, x\in I_{e}^a, x\in I_{m}^a \rbrace$, as defined by the following 4 criteria:
- $I_m = \lbrace x: x\in I \quad AND \quad \exists \delta >0 \quad s.t.\quad f(x)\geq f(x+\epsilon), \forall \vert \epsilon \vert < \delta \rbrace$, named “Maxima Set”
- $I_{e} = \lbrace x^l,x^r \rbrace$, named “Endpoints”
- $I_{e}^a = \lbrace \arg\max_{x\in I} \frac{f(x)-f(x^l)}{x-x^l},\arg\min_{x\in I} \frac{f(x)-f(x^r)}{x-x^r} \rbrace$, named “Endpoint Accompany Set”
- $I_{m}^a = \lbrace \arg\max_{x\in I,x>x_m} \frac{f(x)-f(x_m)}{x-x_m}, \forall x_m\in I_m \rbrace \cup \lbrace \arg\min_{x\in I,x<x_m} \frac{f(x)-f(x_m)}{x-x_m}, \forall x\in I_m \rbrace$, named “Accompany Set”
As described, I need to modify the Algorithm 2 and 3 a little here:
Algorithm 2.2 (finding the accompany set): $I_m^a=findAccompany(x,y,I_m)$
(1-based indexing is used) Given sample squence $y_{1:K}$, its corresponding x-axis $x_{1:K}$ and its maxima index set $I_m$
- Initialize $I_m^a = \emptyset$
- for $k_m$ in $I_m$:
- $\quad I^{-} = [1:(k_m-1)], I^{+} = [(k_m+1):K]$
- $\quad a^{+} = (y[k_m]-y[I^{+}])/(x[k_m]-x[I^{+}])$
- $\quad a^{-} = (y[k_m]-y[I^{-}])/(x[k_m]-x[I^{-}])$
- $\quad k^{a+} = I^{+}[\arg\max_{k}a^{+}[k]]$
- $\quad k^{a-} = I^{-}[\arg\min_{k}a^{-}[k]]$
- $\quad I_m^a = I_m^a \cup \lbrace k^{a+},k^{a-} \rbrace $
- return $I_m^a$
Algorithm 3.2 (finding the full candidate set): $I_c=findCandidate(x,y)$
(1-based indexing is used) Given sample squence $y_{1:K}$ and its corresponding x-axis $x_{1:K}$
- initialize $I_c = \lbrace 1,K \rbrace$
- $m=locmax(y)$
- $I_m = find(m==1)$
- $k_P,k_Q=findPQ(x,y)$
- $I_m^a=findAccompany(x,y,I_m)$
- $I_c = I_c \cup I_m \cup \lbrace k_P,k_Q \rbrace \cup I_m^a$
- return $I_c$
Under Hypothesis 1, there seems no need to bother calling linear programming in order to get $l_i^{\star}$: as the candidate set is small, we can just search infinite-many pairs in order to locate $x_i^{-}$ and $x_i^{+}$.
This motivates me to think about accompany pairs. What is an accompany pair? Well, I define it as follows:
Definition 3 (accompany pair and accompany pair set): Given bounded $f(x):I\rightarrow R$; the endpoints are $x^l=inf(I),x^r=sup(I)$; the maxima set is $I_m$.
Then, an accompany pair is such a tuple $p_m = (x_m,x_m^a)$, where:
- $x_m \in I_m \cup \lbrace x^l,x^r\rbrace$;
- $x_m^a = \arg\max_{x\in I,x>x_m} \frac{f(x)-f(x_m)}{x-x_m}$ or $x_m^a = \arg\min_{x\in I,x<x_m} \frac{f(x)-f(x_m)}{x-x_m}$
Correspondingly, if $x>x_m^a$, then we can also write this pair as $p_m^{-} = (x_m,x^{a-}_m)$, which is called the left-side accompany pair; if $x<x_m^a$, then we can also write this pair as $p_m^{+} = (x_m,x^{a+}_m)$, which is called the right-side accompany pair.
The set of all accompany pairs is written as $P^a(f,I)$, or just $P^a$
Easy to know that $x^l$ has no left-side accompany pair, and $x^r$ has no righy-side accompany pair. If there are $M$ elements in $I_m$, then the number of accompany pair is no greater than $2M+2$.
Lastly, as each accompany pair can determine a linear function, there must exist the best pairs in terms of LUC objective function. Let’s define the best accompany pair step-by-step:
Definition 4 (linear accompany function): Given bounded $f(x):I\rightarrow R$; the accompany pair set is $P^a$.
Then, for each accompany pair $p=(x_m,x_m^a)\in P^a$, define its linear accompany function as $l_{p_m}(x)=\frac{f(x_m^a)-f(x_m)}{x_m^a-x_m}(x-x_m)+f(x_m), x\in I$.
Definition 5 (feasible accompany pairs and their set): Given bounded $f(x):I\rightarrow R$; the maxima set is $I_m$; the accompany pair set is $P^a$.
Then, an accompany pair $p\in P^a$ is feasible i.i.f $l_p(x)\geq f(x), \forall x\in I_m \cup \lbrace x^l,x^r \rbrace$
Correspondingly, all feasible accompay pairs consist of the feasible accompany pair set, written as $P^{a_f}$
Definition 6 (best accompany pair): Given bounded $f(x):I\rightarrow R$; the endpoints are $x^l=inf(I),x^r=sup(I)$; the maxima set is $I_m$; the feasible accompany pair set is $P^{a_f}$.
Then, the best accompany pair $p^{\star}$ is defined as $p^{\star}=\arg\min_{p\in P^{a_f}}\quad l_p (\frac{1}{2} (x^l+x^r))$
Equipped with concepts of accompany pairs, I introduce another hypothesis, which is a vital hint for my next steps.
Hypothesis 2 (at least one accompany pair is a zero-slackness pair):
Given $f(x):I_i\rightarrow R$ whose feasible accompany pair set is $P_i^{a_f}$; then, there is at least one accompany pair ${x_i^{-},x_i^{+}}\in P_i^{a_f}, s.t.\quad l_i^{\star}(x_i^{-})=f(x_i^{-}),l_i^{\star}(x_i^{+})=f(x_i^{+})$, where $l_i^{\star}$ is the optimum LUC in interval $I_i$ for $f(x)$.
The definitions regarding accompamy pair and Hypothesis 2 are actually paving the way to describe the following algorithm, which I expect to be equivalent to the LP based LUC:
Algorithm 5 (LUCA: finding the best accompany pair): $p^{\star}=LUCBestAccompany(x,y)$
(1-based indexing is used) Given sample squence $y_{1:K}$ and its corresponding x-axis $x_{1:K}$.
- Initialize $P^{a} = \emptyset, y^{\star}=+\infty, x^c = \frac{1}{2} (x[1]+x[K])$
- $m=locmax(y)$
- $I_m = find(m==1)$
- $k_P,k_Q=findPQ(x,y)$
- $P^{a} = P^{a}\cup \lbrace (1,k_P),(K,k_Q) \rbrace$
- for $k_m$ in $I_m$:
- $\quad I^{-} = [1:(k_m-1)], I^{+} = [(k_m+1):K]$
- $\quad a^{+} = (y[k_m]-y[I^{+}])/(x[k_m]-x[I^{+}])$
- $\quad a^{-} = (y[k_m]-y[I^{-}])/(x[k_m]-x[I^{-}])$
- $\quad k^{a+} = I^{+}[\arg\max_{k}a_k^{+}]$
- $\quad k^{a-} = I^{-}[\arg\min_{k}a_k^{-}]$
- $\quad P^{a} = P^{a}\cup \lbrace (k_m,k^{a+}),(k_m,k^{a-}) \rbrace$
- $y_m = y[I_m], x_m = x[I_m]$
- for $(k,k^a)$ in $P^{a}$:
- $\quad y^c = \frac{y[k^a]-y[k]}{x[k^a]-x[k]}(x^c-x[k])+y[k]$
- $\quad \hat{y_m} = \frac{y[k^a]-y[k]}{x[k^a]-x[k]}(x_m-x[k])+y[k]$
- $\quad$if $y^c<y^{\star}$ and $\hat{y_m}\geq y_m$:
- $\quad\quad y^{\star} = y^c$
- $\quad\quad p^{\star} = (k,k^a)$
- return $p^{\star}$
See? the result $p^{\star}$ is actually the final decision!
Even so, it is still unsolved in terms of judging if LUCA is equivalent with the normal LUC, or verifying Hypothesis 2.
To finishing the discussions, I have to recapitulate several unexplained phenomena or unjustified hypothesis:
Lemma 3 (candidate reconstruction): if $f:[x^l,x^r]\rightarrow \pmb{R}$ has LUC solution $l^{\star}(x)$, and the equation $l^{\star}(x)=f(x),x\in I$ has solutions $x_n^s,n=1,2,…,N$ then:
there exists function $f_c(x)$ such that:
- $f(x)<f_c(x)\leq l^{\star}(x), \forall x\in I-\lbrace x_n^s,n=1,2,…,N \rbrace$
- $f_c(x)$ is convex respectively in $[x^l,x_1^s],[x_1^s,x_2^s],…,[x_N^s,x^{r}]$
Proof:
For any interval $[x_i^s,x_{i+1}^s]$:
① if $f(x)$ is convex in $[x_i^s,x_{i+1}^s]$: set $f_c(x)=\frac{1}{2}(l^{\star}(x)+f(x))$ in this interval, which confers to the conditions;
② if $f(x)$ is not convex in $[x_i^s,x_{i+1}^s]$: set $f_c(x)=l^{\star}(x)$ in this interval, which confers to the conditions. □
After reconstruction using Lemma 3, we get $f_c(x)$, which is a concave version of $f(x)$ and is somehow like the case in Lemma 1, although the intervals are not equally partitioned. This intuition gives us a hint that even though the curve $f(x)$ is rugged, the smooth effects may still be similar to the convex case in Lemma 1 (although not exactly the same). Then, it is easy to see that $f_c(x)$ is smoother than $f(x)$ because of down-sampling. Therefore, as $l^{\star}(x)$ is smoother than $f_c(x)$ and $f_c(x)$ is smoother than $f(x)$, $l^{\star}(x)$ has smooth effects on $f(x)$, and even better than a mere down-sampling.
Lemma 4 (LUC for concave function): if $f:[x^l,x^r]\rightarrow \pmb{R}$ is concave in $I=[x^l,x^r]$, then the optimum LUC function $l^{\star}(x),x\in I$ becomes one of the tangents of $f(x)$ at $x^l$ or $x^r$
Proof:
① first prove that $l^{\star}(x)$ must be tangent to $f(x)$ in interval $I$.
Assume that $l^{\star}(x)$ is not tangent to $f(x)$ in $I$, which indicates that equition $l^{\star}(x)=f(x)$ has two different solutions $x^{-}$ and $x^{+}$ where $x^{-}<x^{+}$. Then, for each $x\in (x^{-},x^{+})$, there exists $0<\lambda<1$ such that $x=(1-\lambda) x^{-} + \lambda x^{+}$, and then $f(x)=f((1-\lambda) x^{-} + \lambda x^{+})>(1-\lambda) f(x^{-}) + \lambda f(x^{+})=(1-\lambda) l^{\star}(x^{-}) + \lambda l^{\star}(x^{+})=l^{\star}((1-\lambda) x^{-} + \lambda x^{+})$, which violates the LUC constraints.
② then prove that $l^{\star}(x)$ is tangent to $f(x)$ at $x^l$ or $x^r$.
Let $l^{\star}(x)=f’(x_0)(x-x_0)+f(x_0)$, therefore the LUC min objective function $obj(x_0)=l^{\star}(\frac{1}{2}(x^l+x^r))=f’(x_0)(\frac{1}{2}(x^l+x^r)-x_0)+f(x_0)$.
Take the first derivative of $obj(x_0)$ and get $obj’(x_0)=f’’(x_0)(\frac{1}{2}(x^l+x^r)-x_0)$. When $obj’(x_0)=0$, then $x_0=(\frac{1}{2}(x^l+x^r)$. Unfortunately, when we take the second derivative of $obj(x_0)$ and get $obj’’(x_0)=f’’’(x_0)(\frac{1}{2}(x^l+x^r)-x_0)-f’’(x_0)$, we find that $obj’’(\frac{1}{2}(x^l+x^r))<0$, which indicates that $obj(x_0)$ reaches the maxima at $\frac{1}{2}(x^l+x^r)$. But we want the minima.
Luckily, we find that $obj’(x_0)$ is monotonically increasing in $I^l=[x^l,\frac{1}{2}(x^l+x^r)]$ and monotonically decreasing in $I^r=[\frac{1}{2}(x^l+x^r),x^r]$. Therefore, $obj(x^l)<obj(x), x\in I^l$ and $obj(x^r)<obj(x), x\in I^r$. Therefore, we can just compare $obj(x^l)$ and $obj(x^r)$ in order to determine whether the final $l^{\star}(x)$ locates at $x_l$ or $x_r$.
③ finally give the condition for whether $x^l$ or $x^r$ is the tangent point.
Solve the inequality $obj(x^r)>obj(x^l)$ and we shall get $\frac{f(x^r)-f(x^l)}{x^r-x^l}>\frac{1}{2}[f’(x^r)+f’(x^l)]$. If this equality holds true, we choose $x^l$ as the tangent point; otherwise, we choose $x^r$. □
In terms of task 3, it is OK to regard LUC a low-pass filter. But even if LUC coverage is relatively smooth, it still has infinite frequency components, for it is not perfectly sinusioid. As a result, the tool of Fourier Analysis may not be enough to explain the phenomenon that “LUC ignores the rugged surface, no matter how rugged the surface is”. I think the potintial explanation should be based on candidate sets.
In terms of task 4, my intuition is to segment the definition set into elment cells. Correspondingly, the function is a 2-D manifold in 3-D space. The 2-D LUC task is to cover the manifold with a linear plate, which is definited within each cell.
My intuition tells me that Lemma 2 still holds true for the multi-dimension versions. However, it may not be easy to find the centroids of each element cell, therefore making the statements regarding “element cell division” complicated. One example is that there no longer exists “endpoints” in the 2-D case, but margins instead. Also, the convexivity is more complexed.
Finally, I am going to show you the effectiveness of LUC. I am going to show two examples: one is about LUC applied to a hand-crafted curve, and another is about LUC applied to audio CQT spectrograms. In those examples, time consumptions and comparisons are included. Several challenges subsequently emerge. In this section, I will introduce those experiments and detailed problems.
In this example, the function $f(x):D\rightarrow \pmb{R}$ is designed as: $f(x)=10\sin (0.1x)+\cos (4x) + \mathcal{N}(0,2^2), x\in [0,200]$, where $\mathcal{N}(\mu,\sigma^2)$ represents Gaussian Noise parameterized by mean $\mu$ and std $\sigma$. This curve is shown in the following figure:
(a) is the whole plot, and (b) zooms in $[0,20]$.
For simplication, I use sample rate $1$ to sample interval $[0,200]$, forming $\pmb{x} = x[0],x[2],…,x[199]$. Notice that in this section, Python-styled $0-based$ indexing is used, for easy link with codes.
Before implementing the algorithms, the whole problem still needs more specific definition: what is the hop length and the interval size?
When I am talking about hop length, I assumed that the sampling strategy is equal-stepped. In the this experiment, data points are equally sampled.
If we denote hop length as $\Delta_h i$ and interval size as $\Delta_I$, and each index interval as $I_i$, then: $I_i[0]+\Delta_h=I_{i+1}[0]$ and $I_i[-1]-I_i[0]=\Delta_I$. For example, the first index interval is $I_0 =[0,1,…,\Delta_I - 1] = [0:0+\Delta_I]$, the second index interval is $I_1 = [\Delta_h:\Delta_h+\Delta_I]$, and the third index interval is $I_2 = [2\Delta_h:2\Delta_h+\Delta_I]$, and so forth.
Because we can use a for-loop to implement LUC for each interval, let’s just focus on implementing LUC in one interval. When it comes to one interval, then its LP, candidate set and candidate pairs come into use. In this experiment, I compared four algorithms, which are:
I used SciPy to implement LP-based methods by calling the scipy linprog method.
One of the problems is to translate the LUC problem into the standard LP problem. The proof for Lemma 2 has already shown how to achieve this.
Another problem is to choose a proper method for LP solution. I tried different methods and concluded that the revised simplex method is most efficient.
For (non-full) candidate-set algorithm, I applied Algorithm 1,2,3,4 to find the candidate set, and then use LP (whose constraints are based on the candidate set) to find the solutions.
The full-candidate-set algorithm modifies the candidate-set algorithm by replacing Algorithm 3 with Algorithm 3.2.
For the accompany-pair algorithm, I apply Algorithm 5 only, which takes the role of LP.
In this experiment, I set $\Delta_h = 10$ and $\Delta_I = 10$.
First I extract the first two intervals and do LUC for them. Here is the result: The yellow lines indicate candidate-sets, while the red lines indicate the original curve.
Those results verify the correctness of Lemma 2 by showing that at least two zero-slackness constraints are reached. They also show that applying full definition of candidate set does not lead to seriously different results compared with the results obtained by the non-full definition.
Then, I test the whole curve, which is shown in the following figure:The green-star marks indicate local maxima of the centroid sequence, which are marked mostly by blue marks.
From the whole-curve result we can see several phenomena:
Phenomenon 1 verifies my Hypothesis 1 and 2, because the candidate-set-full algorithm and the accompany-pair algorithm is grounded on Hypothesis 1 and 2.
Phenomenon 2 gives us a hint that we can change candidate-set-full algorithm into non-full candidate-set algorithm without losing much accuracy. In the next section I will compare the time consumption between those two algorithms.
Phenomenon 3 verifies the effectiveness of LUC algorithm: after smoothed by LUC, it is OK to directly apply the strict $locmax$ function to the centroid serie and get the desired peaks.
Next, I will compare each algorithm more deeply regarding their time cost and correctness.
For each algorithm, I adjusted interval size $\Delta_I$ and LP methods (except accompany-pair algorithm). I used Monte-Carlo method to reduce the variation of time cost estimation. In this test, I run each case for 100 times. Results are shown in the following figure:
Time cost experiment results.
This experiment is done by my old laptop, whose CPU is an Intel CORE i5.
From the results, we can observe that:
It is not a must to apply $locmax$ directly after LUC. Insteac, we can do another LUC after LUC, which can be called as 2-order LUC. An example is shown in the following figure:
The first LUC, namely first-order LUC, adopts $\Delta_{h1}=1$ and $\Delta_{I1}=10$. This smoothifies the original curve, as shown in (a). The second-order LUC adopts $\Delta_{h2}=5$ and $\Delta_{I2}=10$, and then we get the correct peaks.
Notice that LUC is good at finding peaks with proper x-axis span. For example, in the figure above, the index width of each peak is around 20; when $\Delta_{h2}=5$, it is prone to find those peaks because 5 is around the same scale as 20, no matter how low those peaks are; when $\Delta_{h1}=1$, however, it is easier to find peaks of shorter x-axis span.
To demonstrate the potentials of LUC on audio analysis, I did another experiment, in which I applied LUC to find peaks in spectrograms.
A spectrogram consists of time-variant spectrums. Those spectrums may be FFT, CQT or mel-spectra etc. Normally, we use color to demonstrate the intensity at each time-frequency. An example of CQT spectrogram is shown as following:
If those CQT stuff do not make sense to you, no need to worry. Just remember that our task is to find the peaks (bright stripes) in that spectrogram, ignoring noisy points.
Sometimes, we do not want our model to detect low peaks, althought those peaks are wide enough. This leads to a problem: what exactly is our definition on a desired peak? On the one hand, we want a desired peak to be wide enough (otherwise, it is a fake peak); on the other hand, we want to ignore the peaks which are too low compared with the higher peaks. The first condition can be guarenteed by LUC, but LUC does not provide solution to meet the second condition. An example is shown in the following figure:
Therefore, we have to come up with a new way to discriminate those low peaks from high peaks. My solution is to weight the peaks with salience.
If a spectrogram $\pmb{S}=[s_{ft}]{F\times T}$ is inputted to a 1-order LUC, I hope my model can output a peak-weight map $\pmb{W}=[w{it}]_{L\times T}$. The relationship between $\pmb{S}$ and $\pmb{W}$ is as such:
There are all kinds of existing smoothing methods, among which the most common ones include: low-pass filtering, kernel smoothing, moving average smoothing and local regression.
In order to fairly compare those methods, I use the same pipeline: signal → smoothing (to be tested) → locmax.
]]>If you have something to add, or if you simply disagree with me, feel free to comment below!
What is the music sample space? The largest (whole) music sample space is the set of all possible combinitions of musical elements. Actually, the whole musical set is far too large. Great composers can extract the exact subsets which are good compositions.
When composing music, we try to reduce our sample space, or eliminate the uncertainty in our mind. This process is like finding a winding path towards a beautiful hidden corner. For example, theory of harmony tells us what kind of chords are recommended to be used in certain musical contexts, and this reduces the chord sample space to a few so-called consonant chords.
Now, we may ask: what is the property of those good subsets, and how to reach them? Well, a musician can write a thousand-paged essay to explain that, maybe from the point of view of melody, harmony, counterpoint and music structures… I am not going to expand those topics here.
Instead, if we are inspired by information theory, we can find the commonplace of good musical works: they bring something unexpected. Those unexpected can be good melodies, satisfying chord prograssions or impressing performance skills etc.
This may be a little hard to understand. I will explain this “unexpectation theory” in the next section.
Appreciating music is like travelling: we aim to encounter the unknown world. From my point of view (and also an information theoretrical point of view), appreciation is centered in acquiring knowledge, eliminating uncertainty, or adding up regularity. (And no matter what we appreciate, like music, sculptures, maths etc.)
Therefore, just like famous attractions have some landmarks which can “surprise” the tourists (e.g. the spectacular landscape of Yello Stone Park which tourists had never experienced on the scene), musical works should contain something out of listeners’ expectations. This is the intuition from information theory, which defines information as the existance to eliminate uncertainty.
When appreciating music, we try to cumulate more information by decoding the music signal. Again, in information theory’s point of view, we are eliminating the uncertainty of the world. Our validation music sample space is then enlarged.
You may be curious about the case that we may listen to a piece of music again and again, enjoying an impressing music moment for many times even if we are familiar with that moment. Why are we so keen on the moment which seemingly cannot surprise us? Well… I explain this with our limited capacity of music decoding.
Remember one of the music pieces which did not impress you when you first hear, but later when you happened to listen to it again, and then it captivated you. You may have listened to a music segment in that piece over and over again, and each time you listened to that piece, you expected to reach that segment. This was because, when you first listened to that music, you did not successfully decode the music signal into exciting information. Later when your decoder worked better, you captured the moment that important information was decipherd, and you wanted to emphasize that moment over and over again until you could easily decipher the similar information.
You appreciation skills are improved during this process!
Then comes the question: how to evaluate “appreciation skills”?
Different people are sensitive on different things, and this is because of the diversity in their capacities of (music) decoders. For example, many listeners cannot be moved by atonal music because they focus on the jarring sound and cannot acquire artistic information which is impressive to musicians, who may be impressed by the organization of the piece or fantastic chord tensions. This is similar to: illiterate people cannot appreciate the beauty of Euler’s $e^{i\pi}+1=0$ because they do not know the prior knowledge associated with this formula.
Again, our appreciation skills are in fact the capacity of our decoders.
How to train those skills? First, we need training data. The key is trying to appreciate unfamiliar music and look for the surprising moments, which is like visiting an unknown place on your own. Another choice is to check music reviews (some of which are logs of the beautiful moments of a music piece), and this process is similar to visiting an attraction with a tour guide.
Of course, even if we are well-trained to appreciate music, we cannot despise the music we have already familar with. After all, we upgraded out decoder system thanks to them. And more importantly, there may remain something we did not discover in the music even if we think we are well familiar with.
We have established the point that listeners are to be surprised when appreciating music. Therefore, composers have to surprise the listeners in order to create impressing works.
What does surprising the listeners mean? This is not necessarily mean that they have to use “$\pmb{pppp} \rightarrow \pmb{ffff}$” to startle the listeners. This is to say that they have to convey important information which is out of listeners’ expectations, just as we have already discussed.
For listeners who did not receive a lot of appreciation training, a fairly normal chord progression could grasp their attentions and make them feel surprised and happy. Hence, most pop music composers achieve this by applying easy-listening chord progressions and music structures to their music, and they focus on writing “surprising” melodies, which are normally easy to appreciate.
In this case, the music sample space of the composer may be greatly larger than listeners. Composers should find the music samples at the edge of listeners’ music sample spaces, and this is enough to entertain listeners.
The intersection of the music sample spaces of most people form the set of universal pop music. What about the rest (which may be a lot larger)?
For listeners who are well-trained, they may seek for an unexpected music element combination (remember, it is actually a sample point of the music sample space). This challenges composers to explore their music sample spaces, and this is the fairly interesting: because not everyone share the same music sample space, the hidden corner you visited is not likely to be visited by others.
Before surprising the listeners, composers had better first surprise themselves, and this needs composers to have be skillful at appreciation, which means that they have to dig out more and more information. In other words, composers themselves have to own powerful decoders.
But where do the surprising things come from? You know that even if you have a good decoder, you may not have musical codes to feed into the decoder. And this is where generative models are crutial!
To be frank, we do generation by implementing randomness. We randomly sample music data points from our music sample spaces. Main differences between composers lie in 1. the structure of music sample spaces, 2. their sampling strategies and 3. their appreciation skills. Appreciation skills are analyzed in the previous section. Therefore, I mainly talk about the other two.
First talk about music sample spaces. Remember the first section? I put it here again…
When composing music, we try to reduce our sample space, or eliminate the uncertainty in our mind. This process is like finding a winding path towards a beautiful hidden corner.
Yeah, good composers know how to reduce their sample spaces. They may use harmony theory to exclude the bad chords, and thus the amount of remaining music data points are smaller. They also have other constraints on their sample spaces, like: melody alignments, music structures, instruments etc. You know, it is fairly easy to sample a data point from a small sample space according to their wills!
In order to be skillful at reducing the sample space, composers need to know what compact sample spaces look like. Therefore, they had better learn harmony theory, music structures, counterpoint and different music styles. All aspects lead to certain compact sample spaces, and great composers are skilled in pinpointing the required compact sample space.
Now let’s go to another question: even if we have a compact sample space, how to sample a music data point from that space?
Notice that the famous generative models (e.g. HMM, VAE, GAN) all depend on randomness, and they achieve generation by sampling their sample space in various ways. With a good sampling stratagy (e.g. conferring to the thinking pattern of first determining the music structure), you can have a good start point to converge your chaotic mind into a relatively small sample space.
Music is contextulized. Therefore, we do sampling according to contexts. Most composers may sample their music measure by measure, and the result might be that their music do not have a good structure. This is similar to the strategy of RNNs.
Another strategy is to first determine some global constraints and then do sampling over time. This is the case of most composers because this safe guards a good music structure. This is similar to the strategy of transformers.
What about music with unexpected beautiful organizations? Chances are that composers see the music structures as non-deterministic, and they have a unique subset of sample space for music organizations. They first sample a music organization, and then follow the traditional way as described before.
Of course there are many many other ways I have not mentioned. For example, some avant-guard compsers may record natural sounds and reorganize them as a music work, which is totally unconventional. This is where composers have to explore various of ways of sampling, and this is where to expand the border of art!
A last topic, how to practice composing music?
We have to know that, when practicing to compose music or perform music, we try to revisit the winding paths towards beautiful hidden corners. This is because we are humans rather than GODs (who know the global maxima), and we have to struggle to do optimization, just like training the machine learning models.
On the one hand, if you find an unvisited “winding path” (e.g. a unique melody), you can revisit it over and over again until you can introduce it to others, and then you become a composer! You must train yourself to be familiar with the process of encoding the “winding path” into audios or music language. Remember not all people have visited those winding paths. Therefore, practicing to efficiently encode your music minds into music signals can let a composer quickly impress the audience.
On the other hand, your decoder should also be sensitive enough to discover the winding paths in your chaotic mind or the colorful world! In this case, you should try to listen to more music you did not tried, and push yourself into digging out surprising information from them, or even imitate them. Just as I had explained previously.
Another thing you can really do is to optimize your sampling stratagy, or the composition process. You have to try a lot in order to find a desirable process. For me, I am kind of conventional that I use draft scores to compose music instead of use DAW. What about you? Try try try!
Finally, your chaotic mind, the generation source. Well, your mind can be influenced by external conditions like changing moods or spirits. Therefore, try to find different envirments of composing.
Go practice, think more, and appreciate more!
In fact, all of my discussions above can be instructions for designing algorithm composing models.
Algorithm composing models face the same challenges: to constrain the sample space, to have a sample stratagy, to have a good decoder and a good discriminator, to be optimized in order to be familiar with the “winding paths”…
Interestingly, you can find corresponding topics in machine learning in terms of each of the aspect I mentioned above. Let’s go and see!
[Updated on 30-April-2021]
I might have taken it for granted that we tend to appreciate music which brings us surprise. How to explain the phenomenon that the music which sounds common is popular? For example, experiments have shown that car-radio listeners tend to prefer music which sounds typical (the so-called “sticky music”), rather than classical music which have abundant artistic information. And how can we explain that we listen to a piece of music again and again, even if we know that we are quite familiar with that? For example, after finishing composing a piece of music, I tend to listen to it again and again even if I am the composer and I know that I know a lot about it; and sometimes I listen to a pop song again and again. How to expain them?
In my previous opinion, I defined happiness as knowing something unexpected. However, a friend of mine inspired me that happiness might also come to us when some of our expectation are met. (We were discussing why people get unhappy, and we found that a great reason is that their expectations, from either subconscious or conscious, are not reached.)
What is expectation? Can we explain it with a definition, like “surprise is elimination of uncertainty, which is measured by entropy”? Well… For me, it is a big problem to be solved. But I have an idea that habit is a kind of expectation. Most of our habits work in our subconscious; therefore, when we find ourself appreciating a familiar old song, we may say that we find our subconscious is feeding on that old song by meeting its expectation.
Here, the two theories seem to fight with each other: when we define happiness as eliminating uncertainty, we may say that things are happening out of our expectation (if not, we may not be surprised), but we cannot explain the case when we are appreciating a piece of familiar music and enjoying one particular moment again and again; when we define happiness as meeting our expectation, we cannot explain the euphoria of listening to a great symphony. There must exist a theory which is able to blend those two into one (and I think the great theory of happiness exists because of Hegel’s dialectics).
I think the main difference between those two theories lie on the quality of happiness: the “surprise theory” dipicts a “tired but happy” case, while the “expectation theory” dipicts a “safe and sound” case. Our habits tend to listen to music which is familiar to us, and they do not consume much of our computation resource in our mind; when we find ourself being able to appreciate a symphony, we have consumed a lot in our mind, and we find that we did not consume it in vain because we found something unexpected.
How to explain when doing experiments, we get an everything ruined (e.g. unexpect error in coding), and get mad? How to explain when listening to a piece of music, we get startled by a wrong note played by the music performer and get angry? In those cases, our expectations are not met at all, but we consumed a lot of mental energy. It seems that, when knowing something unexpected, we get either happy or unhappy, and we really have to tell the difference between those two cases.
To sum up, we have four cases:
We did not discuss case 2 and case 3; but we demonstrated that case 1 brings happiness and case 4 could bring either happiness or unhappiness. When thinking about our experiences, we could find that case 3 could bring either happiness or unhappiness, while case 2 is complicated…
It seems that happiness is a rather complicated thing to be explained. I believe that it could be measured and decomposed into more elementary concepts. However, my assumption that “elimination of uncertainty brings happiness” might be incorrect; or I should endow this statement a more suitable definition.
]]>by lucainiaoge
In a word, statistical inference deal with the problem of “trying to reach the God”. In other words, we observe data, and we try to fit to the distribution of such data. A little bit hard to understand? Hope that this article can help you understand that…
Suppose you are studying digital pictures $x \in \pmb{R^{M\times M}}$. Accordingly, its distribution is $p_G(x): \pmb{X_{G}} \rightarrow \pmb{R^+}\cup\pmb{O}$. One of our goals is to find exactly the subspace $\pmb{X_{G}}\subseteq\pmb{R^{M\times M}}$ where the pictures $x \in \pmb{X_{G}}$ look like hand-written numbers (correspondingly, $\pmb{X_{G}}$ is the space where the ${M\times M}$ hand-written number pictures are defined). Again, our mission is to find the $p_G(x): \pmb{X_{G}} \rightarrow \pmb{R^+}\cup\pmb{O}$ given a hand-written number dataset $\pmb{X_{s}}$.
We take $\pmb{X_{G}}$ as the “God/Ground-truth dataset” and $\pmb{X_{s}} \subseteq \pmb{X_{G}}$ as the “observation dataset” (e.g. MNIST).
Now, question is: how to fit to that God/Ground-truth distribution? Well… For each question, we have to represent it before we solve it. In this article, I try to represent it clearly.
This article does not try to solve those questions explicitly. The aim of this article is to remind you what you are really doing when you get lost in some annoying math works.
By the way, why it is important to estimate $p_G(\pmb{x})$? Because once we know $p_G(\pmb{x})$, we can do such things: 1. to judge whether a given data point $x_0$ belongs to our distribution $p_G(\pmb{x})$, and then we can do classification; 2. to sample data points according to $p_G(\pmb{x})$, and then we can do generation!
We have already been familiar with this form: given a vector space $X$ (e.g. $X=\lbrace 0,1,2 \rbrace$) and a function $f$ (e.g. $f(x)=x^2$), we want to solve
$$\mathrm{argmax}_{x∈X} f(x)$$
What about the argmax problem which finds the optimum function? i.e. given a function space $\mathcal{F}$ (e.g. $\mathcal{F}=\lbrace f: f(x)=x^2+c,c\in \pmb{R} \rbrace$) and a functional $F: \mathcal{F}→\pmb{R}$ (e.g. $F[f]=\int_0^1 f^2(x)dx$), we want to calculate:
$$\mathrm{argmin}_{f∈\mathcal{F}} F[f]$$
This problem is called a variational problem, which aims to find a function $f∈\mathcal{F}$ which minimizes/maximizes a functional $F$.
We have already encountered this kind of problem in information theory!
For example, we have already found the “discrete maximum entropy”
$$\mathrm{argmax}_{p∈\mathcal{P}} H[p]$$
If $\mathcal{P}= \lbrace AllDiscreteDistributionsWith\quad M \quad Values \rbrace$, and $H[p(x)]=\sum_{x}p(x) \mathrm{log} \frac{1}{p(x)}$, then
$$\mathrm{argmax}_{p∈\mathcal{P}} H[p]=U(x)$$
where, $U(x)$ is the uniform distribution on $M$ possible discrete values.
What we want to do is: given data $\pmb{x}$, we want to approximate its distribution $p_G(\pmb{x})$. i.e. we want to solve this variational problem:
$$p^*(\pmb{x})=\mathrm{argmax}_{p∈\mathcal{F}} D[p(\pmb{x}),p_G (\pmb{x})]$$
The term $p_G (\pmb{x})$ means the God’s distribution of data $\pmb{x}$, or the “Ground truth distribution”. Well, I have to say that we assumed that there is a ground truth (or GOD!), which is our philosophical point of view.
Now is the problem:
- Problem 1: How to define the functional (distance between distributions) $D[p(\pmb{x}),p_G (\pmb{x})]$?
- Problem 2: How to define the function space $\mathcal{F}$?
Problem 1: how to find the functional (distance between distributions) $D[p(\pmb{x}),p_G (\pmb{x})]$? Note that we do not know the GOD distribution i.e. $p_G(\pmb{x})$. Thus, it is impossible to calculate this functional by directly calculating the distance between $p(\pmb{x})$ and GOD distribution. However, there is an easy way to define $D[p(\pmb{x}),p_G (\pmb{x})]$ as:
$$D[p(\pmb{x}),p_G (\pmb{x})]≜\mathrm{log}p(\pmb{x})$$(Other ways include estimating the momentums or other sufficient statistics. Those are out of the current scope.)
For i.i.d. data $\pmb{x}$, we can even write $\mathrm{log}p(\pmb{x})$ as $\mathrm{log}p(\pmb{x})=\sum_{i=1}^N \mathrm{log}p(x_i)$
The term $\mathrm{log}p(\pmb{x})$ is called as “log-likelihood”, and the variational problem $p^*(\pmb{x})=\mathrm{argmax}_{p∈\mathcal{F}} D[p(\pmb{x}),p_G (\pmb{x})]$ is called “maximum likelihood”, which we have already been familiar with.
(Note that in the scheme of “maximum likelihood”, we did not consider $p_G (\pmb{x})$ explicitly. We assumed that if $\pmb{x}$ is from the ground-truth, $p_G (\pmb{x})$ should always be larger or equal than other $p(\pmb{x})$. We tend to think that “what we see is GOD”. Or in other words, “data (what we observed) are all we know, and we cannot do better unless we know more information (more data, or more observations)”. The “momentum estimation method” directly applies this philosophical assumption, i.e. calculating the distances between moments in order to make $p(\pmb{x})$ as near as $p_G (\pmb{x})$.)
Well… For problem 2, we have two ways: to parameterize or not.
What is parameterization? I use an example to illustrate that:
Suppose the data $\pmb{x}$ are 1D. And we assume the function space is Gaussian, i.e. $\mathcal{F}=\lbrace AllOneDimGaussianPDFs \rbrace$. We can easily write $\mathcal{F}$ in the form of the space of its sufficient statistics (i.e. $\mu$ and $\sigma$). In other words, the space $\Theta=\pmb{R}\times\pmb{R^+}$ is actually the function space $\mathcal{F}$. And there exists a unique Gaussian Function $G:\Theta→\mathcal{F}$ such that for any $(\mu,\sigma)∈\Theta$, $G(\mu,\sigma)=f(x\vert \mu,\sigma)∈\mathcal{F}$
In this case, we parameterized $\mathcal{F}$ with $\Theta$, with the distribution assumption $P:\Theta→\mathcal{F}$. In other words, $\Theta$ are the parameters which control the function space. Thus, the variational problem
$$p^*(\pmb{x})=\mathrm{argmax}_{p∈\mathcal{F}} D[p(\pmb{x}),p_G (\pmb{x})]$$
can be written as an argmax problem for parameters:
$$\theta^*=\mathrm{argmax}_{\theta∈\Theta} D[f(\pmb{x}\vert\theta),p_G (\pmb{x})]$$
$$p^* (\pmb{x})=P(\theta^* )$$
Of course, we can search the $\mathcal{F}$ directly without introducing $\Theta$ and $P:\Theta→\mathcal{F}$. How to search $\mathcal{F}$ given $\pmb{x}$? There are also various ways (e.g. $k-means$). However, we are not talking about those methods right now.
How to assume a nice function space $\mathcal{F}$ such that the $p^* (\pmb{x})∈\mathcal{F}$ is easy to find and that $D[p(\pmb{x}),p_G (\pmb{x})]$ is relatively small? In other words, the function space $\mathcal{F}$ (or parameter space $\Theta$ in the case of parameterization) should satisfy such properties in order to be a good one:
- there exists $p^* (\pmb{x}))∈\mathcal{F}$ such that $D[p(\pmb{x}),p_G (\pmb{x})]$ is small, i.e. $\mathrm{E}_{\pmb{x}\in\pmb{X_G}} [D[p(\pmb{x}),p_G (\pmb{x})]]$ is small;
- $D[p(\pmb{x}),p_G (\pmb{x})]$ cannot be too unstable, i.e. $\mathrm{Var}_{\pmb{x}\in\pmb{X_G}} [D[p(\pmb{x}),p_G (\pmb{x})]]$ is small.
If $\mathrm{E}_{\pmb{x}\in\pmb{X_G}} [D[p(\pmb{x}),p_G (\pmb{x})]]$ is small, we say that the function space F is of small bias compared with the GOD’s function $p_G (\pmb{x})$;
If $\mathrm{Var}_{\pmb{x}\in\pmb{X_G}} [D[p(\pmb{x}),p_G (\pmb{x})]]$ is small, we say that the function space $\mathcal{F}$ is of small variance compared with the GOD’s function $p_G (\pmb{x})$, or that our model $p^* (\pmb{x})$ has a good ability to generalize. The term “generalize” means that no matter what data $\pmb{x}$ is given, the difference between $p^* (\pmb{x})$ and $p_G (\pmb{x})$ will not change too much.
Intuition from Hung-yi Lee: see the pictures in
http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/Bias%20and%20Variance%20(v2).pdf
Each “sample point $p_s^* (\pmb{x})$” is actually an instance of $p_s^*(\pmb{x})=\mathrm{argmax}_{p∈\mathcal{F}} D[p(\pmb{x}),p_G (\pmb{x})]$, where $\pmb{x}\in\pmb{X_s}$, and $\pmb{X_s}$ (e.g. MNIST) is a subset of the God data set $X_G$ (e.g. all possible pictures of handwritten numbers).
You have already got an intuition of our challenges. Now, time to solve the challenges! Of course, for several specific problems, simple models are enough! For example, if we want to fit a Gaussian distribution, the most convenient way is to calculate the mean and variance of the dataset (observation). However, most problems are complex in reality. We have to design models which can deal with such complexity.
A genious way is to introduce Latent Variables to make our models more powerful. We assume that the Latent Variables can reveal the underlying modes of our observations. How to calculate the Latent Variables and find the argmax? We will discuss later when introducing HMM (Hidden Markov Model), GMM (Gaussian Mixure Model), EM (Expectation Maximization) algorithm and variational inference. The Latent Variable trick is in fact a statistical assumption.
Another genious way is to use Neural Networks in order to fit arbitrary functions. The Neural Networks trick is in fact a structural (connectivism) assumption.
In order to reduce the variance, more tricks can be considered (e.g. early stopping, dropout, normalization, residual connection, parameter sharing, simplified models…). Details are overwhelming. Not going to introduce them here!
]]>(2020) I was trying to summarize labs and researchers around the world who dedicate in music computation. I was writing this list because I really wanted to do researches about MIR and music generation. Of course, this list is far from complete, and will be out-of-date in a few years. If you have recommendations, feel free to comment below!
Here is a superfacial introduction for starters and for people who are interested in music computation.
MIR means Music Information Retrieval. Its relevant topics are broad! If I am talking about MIR, I am refering to the topics which trie to analyze everything about music in a technological fashion. In many cases, MIR is a term which has the same meaning with “music computation”. By the way, another term “music technology” mainly means topics about synthesizers, digital music formats (e.g. MIDI, mp3) and instruments, which emphasizes differently from the term “MIR”.
You may be curious about how to identify a song by humming, which is an MIR topic called Query by Humming (wiki). You may also be curious about how to translate a musical performance into scores, which is an MIR topic called automatic music transcription (AMT) (paper). Other interesting topics include music recommendation (blog), computational musicology (blog), sound synthesizing (blog) and algorithm composing (wiki) etc., which are all relevant topics.
Search “MIR” in wiki: https://en.wikipedia.org/wiki/Music_information_retrieval.
Those listed above are all tasks/goals. But what about the methodology to achieve them? Remember: music is created by people. Therefore, tasks about understanding musical rules and phenomenons are actually tasks about understanding human thoughts. How to understand them?
Well… On the one hand, the nature of music is audio and the nature of musical understandings is human cognition; thus, we have to deal with audio properties and human perceptions. Relevant techonologies include Digital Signal Processing (wiki), Synthesizer (wiki) and Cognitive Musicology (paper).
On the other hand, music is a kind of rule-based art, and we are interested in such rules. But how to understand and apply such rules? Relevant methods include Music as Language (paper), Natural Language Processing (NLP)(wiki). We know that the “rules” in rule-based-arts are far from simple. Therefore, we have to use systems which have great complexity to fit to those rules. Such method is machine learning (paper).
With such tools, we can deal with problems which are directly relevant to music. For example, we can design our “generative music language”, and then apply it for algorithm composing. Or we can apply the DSP methods to identify chords from audio. Of course, every topic has its unique side. Therefore, never stop learning!
The following links in the rest of this page are all relevant with what I introduced before.
- Research Centers Summarized by SMC:
http://www.smcnetwork.org/centers.html
This is a comprehensive list, which includes groups mainly in Europe and America.
- Archives, Journals and Societies about “Science & Music” by University of Cambridge, Center for Music and Science:
https://cms.mus.cam.ac.uk/links
This page is a summarization of research groups and research society mainly in Europe. You may find links of research groups as well as academic sources in this page.
- Research Centers Summarized by Beici Liang:
https://mp.weixin.qq.com/s/2nDWikda9fh2x5o093F6HA
The groups mentioned in this list are mostly in universities. By the way, the wechat official account (in Chinese 中文) in this link is a tutorial for beginners who are interested in music tech.
I give those comprehensive links first. After all, I have to reinvent the wheel because I cannot remember all of them unless I write them down and read them all.
Here is a summarization of research group websites I have visited. My list is far from complete. I am just writing them down in order to have a review on the MIR society I have explored, straightening my mind. After all, I felt dizzy when I first began to search about MIR groups around the world…
If you are a starter, hope that the following links can help you. If you are already a researcher, well… hope that my naive list does not bother you and that you can give some suggestions!
Note that the information here is limited. I may have left out or misunderstood a lot of information. Therefore, the information in my list is inevitably biased! Again, this list is created mainly for myself and for starters to get familiar with the MIR society. If I wrote something improper, please contact me for correction!
…To be expanded
- Center for Digital Music (C4DM), Queen Mary University of London (QMUL)
http://c4dm.eecs.qmul.ac.uk/index.html
A very big lab with many groups.
- Digital and Cognitive Musicology Lab (DCML), École polytechnique fédérale de Lausanne (EPFL)
https://www.epfl.ch/labs/dcml/
Led by Prof. Martin Alois Rohrmeier.
- Music Technology Group (MTG), Universitat Pompeu Fabra (UPF), Barcelona
https://www.upf.edu/web/mtg/
- Center for Music and Science (CMS), Faculty of Music, University of Cambridge.
Faculty of Music: https://www.mus.cam.ac.uk/; CMS: https://cms.mus.cam.ac.uk/
- Prof. Remco Veltkamp, Utrecht University
http://www.cs.uu.nl/centers/give/multimedia/music/index.html
- Laboratory of Audio and Music Technology (FD-LAMT), Fudan University
http://homepage.fudan.edu.cn/weili/fd-lamt/
Led by Prof. Wei Li.
- Musix X Lab, New York University Shanghai (NYU Shanghai)
http://www.musicxlab.com/#/index
Led by Prof. Gus Xia.
- Sound & Music Computing Lab, National University of Singapore (NUS)
https://smcnus.comp.nus.edu.sg/
Currently Led by Prof. Ye Wang.
- Affective Computing and AI Team (AMAAI), Singapore University of Technology and Design (SUTD)
https://dorienherremans.com/team
Currently Led by Prof. Dorien Herremans.
- Center for Music Technology, Georgic Tech (GaTech).
https://gtcmt.gatech.edu
Currently Led by Prof. Alexander Lerch.
- Center for Computer Research in Music and Acoustics (CCRMA), Stanford University
https://ccrma.stanford.edu/
This is a big lab. There are many groups in CCRMA. See here.
- Prof. Julian McAuley, Computer Science Department, University of California San Diago (UCSD)
https://cseweb.ucsd.edu/~jmcauley/
- Music and Audio Research Laboratory (MARL), Dept. of Music and Performing Arts Professions, New York University (NYU)
https://research.steinhardt.nyu.edu/marl/
This is a big group. Its music informatics group is currently led by Prof. Juan Pablo Bello,
- Prof. Roger B. Dannenberg, CMU
http://www.cs.cmu.edu/~rbd/
A great computer music research, a camposer and a trumpet player. However, he is not accepting students.
- Centre For Interdisciplinary Research in Music Media And Technology (CIRMMT), McGill University
https://www.cirmmt.org/
A big lab.
…Pending
An outline: Google Magenta, Spotify, Open AI, NAVIDA, Jukedeck - TikTok London, Tencent Music (TME), Netease Music…
- ISMIR (International Society for Music Information Retrieval)
https://ismir.net/
“The ISMIR conference is held annually and is the world’s leading research forum on processing, searching, organising and accessing music-related data.”
- MIREX (Music Information Retrieval Evaluation eXchange)
https://www.music-ir.org/mirex/wiki/MIREX_HOME
“The Music Information Retrieval Evaluation eXchange (MIREX) is an annual evaluation campaign for MIR algorithms, coupled to the ISMIR conference.”
- SMC (Sound and Music Computing Network)
http://www.smcnetwork.org/
“The SMC Conference is a double-blind peer-reviewed international scientific conference around the core interdisciplinary topics of Sound and Music Computing.”
- NLP4MusA
https://sites.google.com/view/nlp4musa
“In this context, we propose the First Workshop on NLP for Music and Audio, a forum for bringing together academic and industrial scientists and stakeholders interested in exploring synergies between NLP and music and audio.” “Accepted papers will be published in the ACL anthology.”
- CSMT (Conference on Sound and Music Techonology)
http://www.csmcw-csmt.cn/
CSMT是中国音乐科技相关的研究者交流的很好平台,可以见到很多工业界和学术界的同行。CSMT推动者中国的音乐科技领域的发展合作,也在吸引世界范围内学术界的关注。
Of course, information here is limited due to my limited knowledge… More links pending…
- Datasets summarized by Prof. Alexander Lerch, Gatech
https://www.audiocontentanalysis.org/data-sets/
A grrrrrreat list of useful music datasets. Brief tags are attached there (e.g. MIDI or not? Labelled or not? Rhythm or melody? Monophonic or Polyphonic…)
- Datasets summarized by UPF Compmusic
https://compmusic.upf.edu/datasets
Mainly about Indian art music, Turkish Makam music and Beijing Opera
- Magenta Datasets
https://magenta.tensorflow.org/datasets
Mainly about Bach Doodle Dataset, Groove MIDI Dataset, MAESTRO and NSynth
- Computer Music Conferences Deadline by Yixiao Zhang’s Blog
https://yixiao-music.github.io/?sub=SYM,AI,OTHER,AUO
See also: his zhihu
- Chinese Blog of rogerkeane
http://blog.sina.com.cn/rogerkeane
A little bit old Sina blog. He studied in GaTech. 很有生活趣味的一个博客!
- Intelligent sound engineering by Prof Joshua D Reiss
Intelligent sound engineering
- Chris Donahue
https://chrisdonahue.com/
- …To be expanded
- And… Hey! And my blog here!
- dblp: computer science bibliography
https://dblp.uni-trier.de/
“The dblp computer science bibliography provides open bibliographic information on major computer science journals and proceedings.”
- Papers With Code: The latest in machine learning
https://paperswithcode.com/
“Papers With Code highlights trending ML research and the code to implement it.”
- imslp: download sheet music
https://imslp.org/wiki/Main_Page
“The International Music Score Library Project (IMSLP), also known as the Petrucci Music Library after publisher Ottaviano Petrucci, is a subscription-based project for the creation of a virtual library of public-domain music scores.” - wiki
- word2tex and tex2word
https://www.chikrii.com/
]]>
- Google scholar, Media, Wiki, Zhihu, CSDN…
Welcome! I am Tongyu Lu (路通宇)! This is an introduction of my explorations on composing!
This is an introduction of my explorations and my musical pieces! I am far from professional. Instead, I hope that my explorations on composition can inspire more music lovers to try bravely!
If my music fails to be played here, it may be because of copyright barriers. You can click the links and listen to them in outside webpages. You can also find my music in my SoundCloud Page or Netease Music Personal Page
I tried composing orchistral works in 2018 just for fun:
Collection of 6 pieces: The Walking of Insects (惊蛰)
Here are two of this collection:
The Walking of Insects (惊蛰) was performed in Shenzhen College Music Festival of 2019.
Further, I tried to compose a march and an overture in Chinese style:
In 2020, I realized that I lacked professional skills :) Then I began studying harmony myself… (I will modify many of my pieces in the future because I am not satisfied with my composing skills in the past… After relistening my works, I found a lot of flaws. Anyway, I am kind of surprised that I composed them even if I did not learn harmony and other basics about composing!) By the way, I am currently fancinated with Brahms Symphony No.4!
Meanwhile, I tried to compose solo piano pieces and chamber music. In early 2019, I composed a few pieces trying to depict my feelings about winter:
Two Quartets and a piano Etude: Charming winter
Here are the two Quartets:
I collected my early piano solo works. Although I play the piano, it is still challenging to write friendly piano pieces (especially when I want to express beauty of skills). I collected many of my piano solo works (old ones and new ones) here:
Piano solo collection of 7 pieces: Lucainiao’s Trial on Piano Pieces
Here are the two of them:
After reflections on my early chamber music, I found that cooperation between different voices is important. Then I tried more:
Chamber music collection of 4 pieces: Lucainiao’s Trial on Chamber Music
Here are the two of them:
In late 2020, I plan to explore orchestration topics, and I tried to orchestrate Beethoven’s Piano Sonata No.30, 2nd movement.
- Orchestration - Beethoven Piano Sonata No.30 - 2 Prestissimo; Links: SoundCloud | Netease
In 2018, inspired by a friend who plays saxphone, I tried to write several works in Jazz style. After trying this, I found myself arrested! I began to add something jazzy in my works in late 2019 and early 2020. I created my first collection for my jazzy works in 2018 and early 2019.
I then began to read the Jazz Piano Book in early 2020, and learned several tricks! Then I composed a few jazz piano solos (and even played and recorded some of them):
In late 2020, I began trying to compose pieces for brass ensambles. Here are several of such pieces:
- Jazz Sextet: 6+4; Links: SoundCloud | Netease
- Jazz Sextet: Walking Freely; Links: SoundCloud | Netease
- Brass Quintet in E flat major - A Happy Festival; Links: SoundCloud | Netease
In 2021, I realized that my explorations on jazz is actually on “jazz style”. What is real jazz? Improvisation!
In the summar of 2021, I began studying jazz piano with Xin. He is really an outstanding music educator. I began learning how to comping, how to interpret Real Book, how to solo, and how to compose a jazz piece etc.
During jazz learning, I also started writing my own real jazz pieces. My methodology is: first write a lead-sheet draft on piano, and then use music scoring softwares to compose the full score for the band (including solo), and finally use DAW to turn the midi into audio with the help of VST.
For example, I have composed an album <Jazz Seasons> in 2022; Links: Youtube | Bilibili
One day I listened to the music I composed 2 years ago, and I just realized: I am now composing music with better profession, but less ideas.
This log currently stops at Mar/2023. I am still learning jazz and composing music. I will carry on, and get back when I am making gread changes.
by lucainiaoge
This is a detailed derivation and summerization on back propagation (BP) algorithm in fully-connected neural networks and CNN:
neural network concepts, CNN concepts, vector calculus
Just a review, not a tutorial! How to derive the formula? Please go to learn neural network basics.
Give a link of tutorial from Andrew Ng
by lucainiaoge
This is a fundamental course for those who wanna have a thorough understanding of Baysian Networks:
When studying the course, I was somehow confused partly because the lecture goes too fast. I have to frequently halt and comprehend the slides. Luckily, the slides are very well arranged.
Through the first several courses, we are lead to discover the mathematical essence of Baysian Networks (BN).
Knowing what sets are, knowing basic boolean logic, a mathematical mind
Set up the foundation of probability calculus.
Link logical calculation with set calculation.
Motivation: There will be a great deal of definitions. My understanding is that they are like fundations of the magnificent building of probibilistic inference for what we interpret as “events” are explained with the Propositional Logic.
Definition (propositional variable and sentence/event): a propositional variable $p$ is defined on a discrete set P;a sentence or an event $\alpha$ is an assignment of propositional variables. $\alpha$ can hold true or false
In other words,$(p=p_0)=\alpha \in \lbrace \pmb{true},\pmb{false} \rbrace$
In the following discussions, we take True as T and False as F.
e.g. propositional variable: B=”Burglery”, E=”Earthquake”, A=”Alarm”; a sentence: $\alpha$=(B=T,A=F)
When we have a lot of propositional variables, the combinitions of possible circumstances are exploding exponentially. To efficiently deal with that is one of our tasks.
First we have to define such possible circumstances with term “world”:
Definition (world): a world $w$ is an assignment for all propositional variables $p_i$.
We can treat worlds as smallest elements of the whole set $\Omega$.
e.g. $p_B$=”Burglery”, $p_E$=”Earthquake”, $p_A$=”Alarm”
$w_1=[(p_B,p_E,p_A)=(T,T,T)]$
$w_2=[(p_B,p_E,p_A)=(T,T,F)]$
$w_3=[(p_B,p_E,p_A)=(T,F,T)]$
$w_4=[(p_B,p_E,p_A)=(T,F,F)]$
$w_5=[(p_B,p_E,p_A)=(F,T,T)]$
$w_6=[(p_B,p_E,p_A)=(F,T,F)]$
$w_7=[(p_B,p_E,p_A)=(F,F,T)]$
$w_8=[(p_B,p_E,p_A)=(F,F,F)]$
$\Omega=\lbrace w_1,w_2,…,w_8 \rbrace$
Definition ($w \models \alpha$): a world $w$ holds true if a sentence $\alpha$ is true.
e.g. given sentences: $B=[p_B=T]$(means Burglery happens), $E=[p_E=T]$, $A=[p_A=T]$, $\neg B=[p_B=F]$ and so forth
Then, $w_1 \models B$, $w_5 \models \neg B$
Definition (model of sentence $\alpha$): $Mods(\alpha)=\lbrace w:w \models \alpha ,w\in \Omega \rbrace$
Model of a sentence is a set. It’s obvious that model of a sentence is unique and given a set of worlds there is a unique sentence whose model is the set.
e.g. $Mods(B)=\lbrace w_1,w_2,w_3,w_4 \rbrace$
Until now, we know that:
Hope you are getting excited: we are linking the logic world of Ture or False with the world of sets. We can really define logical calculations.
Definition (Basic Propositional Logic Calculation): given sentence $\alpha$ and $\beta$
$\alpha \wedge \beta $ is such a sentence that $Mods(\alpha \wedge \beta)=Mods(\alpha)\cap Mods(\beta)$
$\alpha \vee \beta $ is such a sentence that $Mods(\alpha \wedge \beta)=Mods(\alpha)\cup Mods(\beta)$
$\neg \alpha $ is such a sentence that $Mods(\neg \alpha)=\Omega \setminus Mods(\alpha)$
You see? Your familiar logical calculation “and”,”or” and “not” is actually set calculation “intersection”,”union” and “complement”.
Definition (relationships of sentences): given sentence $\alpha$ and $\beta$
$\alpha$ is consistent or satisfiable $\Leftrightarrow$ $Mods(\alpha)\neq \emptyset$
$\alpha$ is valid $\Leftrightarrow$ $Mods(\alpha)= \Omega$ $\Leftrightarrow$ $\models \alpha$ $\Leftrightarrow$ $\alpha = True$
$\alpha$ and $\beta$ are equivalent $\Leftrightarrow$ $Mod(\alpha)=Mods(\beta)$
$\alpha$ and $\beta$ are mutually exclusive $\Leftrightarrow$ $Mods(\alpha)\cap Mods(\beta)=\emptyset$
$\alpha$ and $\beta$ are exhaustive $\Leftrightarrow$ $Mods(\alpha)\cup Mods(\beta)=\Omega$
$\alpha$ implies $\beta$ $\Leftrightarrow$ $Mods(\alpha)\subset Mods(\beta)$ $\Leftrightarrow$ $\alpha \models \beta$
I think those definitions are to remind people the importance of such relationships. Also helping people to communicate in a more logical manner…
A few more definitions of Propositional Logic calculation:
Definition (More Propositional Logic Calculation): given sentence $\alpha$ and $\beta$
$\alpha \rightarrow \beta $ $\Leftrightarrow$ $\neg \alpha \vee \beta$
$\alpha \leftrightarrow \beta $ $\Leftrightarrow$ $(\alpha \rightarrow \beta)\wedge (\beta \rightarrow \alpha)$
Worth noticing those definitions.
$\alpha \rightarrow \beta $ is in fact telling you that $\alpha$ implies $\beta$, or contraposition. $\alpha \rightarrow \beta = True$ means: whenever $\alpha$ is true, $\beta$ is true. So $Mods(\alpha)\subset Mods(\beta)$. (Think the worlds!)
$\alpha \leftrightarrow \beta $ is actually XNOR. $\alpha \leftrightarrow \beta = True$ means that $\alpha$ and $\beta$ are equivalent.
Several examples:
given sentences: $B=[p_B=T]$, $E=[p_E=T]$, $A=[p_A=T]$, $\neg B=[p_B=F]$ and so forth
$\alpha=(E \vee B)\rightarrow A$
$Mods(\alpha)=\lbrace w_1,w_3,w_5,w_7,w_8 \rbrace$
$\beta=(E \rightarrow B)$
$Mods(\beta)=\lbrace w_1,w_2,w_5,w_6,w_7,w_8 \rbrace$
then, $Mods(\alpha \wedge \beta)=\lbrace w_1,w_5,w_7,w_8 \rbrace$
Calculating intersection is in fact adding information, and we throw away world elements by doing this. This is compactible with information theory!
]]>It is assumed that readers have known the concepts about harmonic (overtone) structure of musical notes. And readers are recommended to know the non-negative matrix factorization (NMF) algorithm. But anyway, I tried to briefly elaborate it in this article.
This article gives a brief review of harmonic constraints on timbre dictionary. Such harmonic constraints are useful in helping non-negative matrix factorization (NMF) algorithm to converge into a musically meaningful result, which could enable automatic music transcription (AMT) and music source separation (MSS). Noticing the disadvantages of the traditional NMF algorithm and hard-harmonic-constraints, this article proposes a dB-trick for NMF which could loosen the non-negativity constraints, and a soft-harmonic-constraint method based on regularization which gives more freedom to the parameter compared with the existing hard-harmonic-constraints.
To view this article, please download:
Harmonic_Constraint_in_Music_Spectrogram_Reconstruction
This article assumes readers have grasped the basic concepts of linear algebra (e.g. inner product space, subspace, rank of matrix, solving linear equations with Gaussian elimination etc.).
When I was learning about convex optimization, I encountered a statement about feasibility: given constraint $Ax=b$, if $b\notin R(A)$, then this problem is not feasible. I was wondering what $R(A)$ was. After reading this article, hope that the answer will be trivial to you!
To view this article, please download:
Reflections on Linear Algebra: Range and Nullspace
by lucainiaoge
这篇文章讲解2019年数学建模国赛B题的求解历程。这是第二篇。
上回书说道,为了解决前两问,我们搭建了逼真的物理引擎,然后建立了竖直方向施力的反馈机制。上一文章链接:2019数模国赛B题解析Part1(Analysis of 2019CUMCM Question-B Part1)
但是,我们怎么让我们的模型适用于三维世界?我们还有哪些没有做?捋了一遍,发现还差好多!具体总结如下:
让我们逐(man)一(man)解决!
在这里依然使用反馈的思路,只不过不能是简单地画一个单调递减函数就能解决的问题了。这回,我们要按照控制论的思想实时进行控制了!
什么是控制论的思想?就是:首先,我们有一个目标的值,叫做期望值。假如我们想让$\Delta t$时间后,被控制物体的位置$\pmb{r}(t+\Delta t)$变到$\pmb{r_0}$,那么我们就说期望值为$\pmb{r_0}$。
在这里请允许我借用概率论中求期望的符号:$E[\pmb{r}(t+\Delta t)]=\pmb{r_0}$
请注意:这里的$E[·]$作用的对象是我们要调整的时变量,我们假定$E[·]$对于这些时变量的期望具有线性性质,这么认为的理由在后面会有解释。
那么接下来,我们就要竭尽所能达成这个期望。我们能干什么?当然是:调整力!牛顿力学的体系下,我们可以认为:力是改变物体运动状态的原因!
接下来,我是用PID(比例积分微分)的方法来进行控制。什么是PID?可以自行百度,这里只是一个名称罢了,代表一种方法,没必要知道工程上如何使用。实现这道题的要求,进行比例控制就够了。
如果仅使用比例控制,则$E[x]=K_p x$,$K_p$为比例系数。也就是下一次的输出量为期望值乘以一个系数。下文中,出现“求期望”字样,指的是获取目标量的期望值,不是统计学中的期望!
①好说,每次将要碰撞(球中心距离距鼓面距离等于球的半径)的时候进行一次判断,如果此时球在水平面的投影不落在鼓的投影内部,那么就判负。
主要讲如何实现②:控制拉力合力的水平分力,使鼓始终跟随球走。
力和位置,一个是二阶量,一个是零阶量,无法直接得出力的控制方程(就是决定下一时刻的输出量的计算式)。有一个办法:使用串级PID,不过这里我不打算采用这样的方法。
还有一个办法:我们连推两阶,就可以获取力的控制方程了。
下一时刻,鼓水平位置的期望值为球水平位置:
$E[\pmb{r_d}(i+1)]=\pmb{r_b(i)}$(注意,这里向量都是二维向量)
考虑运动方程$\pmb{r_d}(i+1)=\pmb{r_d}(i)+\pmb{v_d}(i)\Delta t$
由于实际世界是连续的,我们的迭代步长可以取得足够小,使得:$|\pmb{v_d}(i+1)-\pmb{v_d}(i)|<\epsilon$
所以运动方程还可写为:$\pmb{r_d}(i+1)=\pmb{r_d}(i)+\pmb{v_d}(i+1)\Delta t$
运动方程两边求期望得到:$E[\pmb{v_d}(i+1)]=(E[\pmb{r_d}(i+1)]-\pmb{r_d}(i))/\Delta t=(\pmb{r_b}(i)-\pmb{r_d}(i))/\Delta t$
记$\pmb{r_{db}}(i)=(\pmb{r_b}(i)-\pmb{r_d}(i))$
则有$E[\pmb{v_d}(i+1)]=\pmb{r_{db}}(i)/\Delta t$
仅使用PID比例控制,则可以拆掉期望运算$\pmb{v_d}(i+1)=\pmb{K_{pv}}\pmb{r_{db}}(i)/\Delta t$
其中:$\pmb{K_{pv}}=diag(k_{vx},k_{vy})$
又根据冲量定理,$\pmb{v_d}(i+1)=\pmb{v_d}(i)+\pmb{F_{xy}}(i)\Delta t/m_d$
结合以上式子,经过简单代入和移项,可以推出:
$\pmb{F_{xy}}(i)=\pmb{K_{pv}}\pmb{r_{db}}(i)m_d/(\Delta t)^2-\pmb{v_d}(i)m_d/\Delta t $
这就是我们想要的控制方程,按照这个式子设置此时的合力即可使得鼓的水平位置趋向于球的水平位置!
③的目的是:让鼓始终保持水平。为什么要让鼓保持水平?一个原因是让球更容易被接到;还有一个原因就是减少转动防止力的不均匀性使得鼓过不了多久就转翻了。
对于力矩的推导相似,但是更复杂。具体呢:看下面吧!
我们期望鼓的法向量$\pmb{n}(i+1)$为$z单位向量\pmb{e_z}$,即$E[\pmb{n}(i+1)]=\pmb{e_z}$
根据转动定理和连续性,$\pmb{n}(i+1)=\pmb{n}(i)+\pmb{\omega}(i+1)\times\pmb{n}(i)\Delta t$
两边求期望得到$E[\pmb{\omega}(i+1)]\times\pmb{n}(i)\Delta t=E[\pmb{n}(i+1)]-\pmb{n}(i)$
代入$E[\pmb{n}(i+1)]=\pmb{e_z}$得到$E[\pmb{\omega}(i+1)]\times\pmb{n}(i)\Delta t=\pmb{e_z}-\pmb{n}(i)$
我们仅使用比例控制PID,那么可以继续拆掉期望运算,得到:
$\pmb{\omega}(i+1)\times\pmb{n}(i)\Delta t=\pmb{K_{pr}}(\pmb{e_z}-\pmb{n}(i))$
其中,$\pmb{K_{pr}}=diag(k_{rx},k_{ry},k_{rz})$
列出动力学公式$J\pmb{\omega}(i+1)=J\pmb{\omega}(i)+\pmb{M}(i)\Delta t$
将动力学公式两边同时叉乘$\pmb{n}(i)$,结合已得到的式子,推出
$\pmb{M}(i)\times\pmb{n}(i)\Delta t=J\pmb{K_{pr}}(\pmb{e_z}-\pmb{n}(i))/\Delta t-J\pmb{\omega}(i)\times\pmb{n}(i)$
我们最终要得到$\pmb{M}(i)$<所以把这个向量积方程解出来就好了。这个方程怎么解?
我们先把和设定值$\pmb{M}(i)$无关的已知量(即时刻$i$的运动参量)挪到一边去,
令$\pmb{S}(i)=J[\pmb{K_{pr}}(\pmb{e_z}-\pmb{n}(i))/\Delta t-\pmb{\omega}(i)\times\pmb{n}(i)]/\Delta t$
那么这个方程就写成了$\pmb{M}(i)\times\pmb{n}(i)=\pmb{S}(i)$
求解过程如下图
最终得到:
$\pmb{M}(i)=diag(-S_y(i)/n_z(i),S_x(i)/n_z(i),0)$
这就是我们想要的力矩的控制方程,按照此式子迭代可以使得鼓始终保持水平
问题一个个解决,一半多的理论问题已经被搞定了!那么,三维碰撞怎么描绘?如何刻画使球竖直弹回的条件?
先解决碰撞的迭代公式
上图是碰撞瞬间部分物理量的示意图。我们需要研究N点两个物体的速度。这涉及到了:鼓在N点的线速度(质心速度和转动线速度之和)、鼓的转动角速度、碰撞点的位置、球的质心速度(之前假设不考虑球的自转)。让我们一一列方程!
设鼓的厚度为$h$,球的半径为$R$,用单一字母代表原点到该字母表示点的向量,那么有
$\pmb{PQ}=h\pmb{n}/2$,$\pmb{N}=\pmb{O}-h\pmb{n}/2$,$\pmb{Q}=\pmb{P}+\pmb{PQ}$
我们知道$平面\sigma:n_x(x-x_Q)+n_y(y-y_Q)+n_z(z-z_Q)=0$
其中$n_x,n_y,n_z$为鼓面单位法向量的分量
那么球到鼓面的距离为$d_{ON}=|n_x(x_O-x_Q)+n_y(y_O-y_Q)+n_z(z_O-z_Q)|$
每次迭代判断一次$d_{ON}\le R?$,如果成立则对碰撞进行迭代:
设此时鼓的质心速度为$\pmb{v_d}$,鼓的角速度为$\pmb{\omega}$,$N$点处鼓的旋转线速度为$\pmb{v_{\tau}}$,N点鼓的质点速度为$\pmb{v_{N}}$,球的质心速度为$\pmb{v_d}$,那么我们有:
$\pmb{v_{\tau}}=\pmb{\omega}\times\pmb{PN}$
$\pmb{v_{N}}=\pmb{v_d}+\pmb{v_{\tau}}$
下面,列写动力学方程更新下一时刻鼓和球的运动参量:
对物体列冲量定理和碰撞系数定义式
$\pmb{P}(i)=m_{b} \pmb{v_b}(i)+m_{d} \pmb{v_d}(i) $(前一时刻动量)
$\pmb{P}(i+1)=m_{b} \pmb{v_b}(i+1)+m_{d} \pmb{v_d}(i+1) $(后一时刻动量)
$I(i)=\sum \pmb{F}(i) \Delta t-\left(m_{b}+m_{d}\right) g \Delta t \pmb{e_z} $(外力冲量)
$\pmb{P}(i)+\pmb{I}(i)=\pmb{P}(i+1) $(冲量定理)
$e\left[\pmb{v_b}(i+1)-\pmb{v_d}(i+1)\right]=\pmb{v_d}(i)-\pmb{v_b}(i) $(碰撞系数定义)
上面$\pmb{\sum F}$为队员拉力的合力。
可以求得下一时刻二者的质心速度
对系统列角动量守恒得到
$J\pmb{\omega}(i)+\pmb{PO}\times m_b\pmb{v_b}(i)=J\pmb{\omega_d}(i+1)+\pmb{PO}\times m_b\pmb{v_b}(i+1)$
整理得到下一时刻鼓的角速度
$\pmb{\omega}(i+1)=\pmb{\omega}(i)+\pmb{PO}\times m_b(\pmb{v_b}(i)-\pmb{v_b}(i+1))$
下面,我们需要探究碰撞瞬间满足什么条件,就可以将球竖直弹回。
如果下一时刻球竖直弹回,那么一定有:
$\pmb{v_b}(i+1)\times\pmb{e_z}=0$
根据上一部分推出的鼓质心速度$\pmb{v_d}(i+1)$的计算公式,忽略重力影响,得到
式子两边同时叉乘$\pmb{e_z}$并移项化简,得到:
写成分量形式并运算向量积,得到:
这就是让球竖直弹回应该满足的条件。可以发现,如果不考虑碰撞过程中摩擦等情况,仅调用碰撞系数,球在下一时刻的弹回方向仅由鼓和球的水平质心速度决定,而且二者速度反向共线。
将上一部分推导的条件写成二维向量的向量表达式:
下面推导省去脚标$xy$,向量都是二维向量,$水平合力\pmb{F_{xy}}=\pmb{\sum T}$。
将此式子中的$ \pmb{v_d}(i)$作为下一时刻鼓速度$\pmb{v_d}(i+1)$的参考值。移项,可以获得:
$E[\pmb{v_d}(i+1)]=-(\frac{\pmb{\sum T(i)}-(m_b+m_d)g\pmb{e_z}}{(1+e)m_d}\Delta t + \frac{m_b-em_d}{(1+e)m_d}\pmb{v_b}(i))$
仅采用比例控制PID,则可以拆掉期望运算:
$\pmb{v_d}(i+1)=-\pmb{K_{pf}}(\frac{\pmb{\sum T(i)}-(m_b+m_d)g\pmb{e_z}}{(1+e)m_d}\Delta t + \frac{m_b-em_d}{(1+e)m_d}\pmb{v_b}(i))$
其中$\pmb{K_{pf}}=diag(k_{fx},k_{fy})$
结合冲量定理:$m_d\pmb{v_d}(i+1)-m_d\pmb{v_d}(i)=\pmb{\sum T(i)}\Delta t$
令$\lambda=(em_d-m_b)/\Delta t$,再参考连续性并化简上式可以得到:
$\pmb{F_{xy}}(i)=\pmb{\sum T(i)}=diag(\frac{\lambda k_{fx}-1}{1+k_{fx}},\frac{\lambda k_{fy}-1}{1+k_{fy}})\pmb{v_b}(i)$
这就是迎接球使其竖直弹回需要设定的下一时刻水平合力迭代公式
细心的小伙伴会发现:这个式子和之前问题②的迭代式子打架了,不过没关系,我们使用分段PID,在判断球快要碰撞时才采用此表达式。
先上效果图:
如何实现的?之前我们的准备工作已经十分充足了!话不多说,现在只需要把迭代公式列一遍:
- Section 0: get several values
$d_{ON}=|n_x(x_O-x_Q)+n_y(y_O-y_Q)+n_z(z_O-z_Q)|$ (球到鼓面的距离)
set $\pmb{K_{pv}},\pmb{K_{pr}},\pmb{K_{pf}},\epsilon_d,\epsilon_v,\epsilon_{collide}$
- Section 1: set $\pmb{M}(i),\pmb{F}(i), status \quad X$
- 角动量设置
$\pmb{S}(i)=J[\pmb{K_{pr}}(\pmb{e_z}-\pmb{n}(i))/\Delta t-\pmb{\omega}(i)\times\pmb{n}(i)]/\Delta t$
$\pmb{M}(i)=diag(-S_y(i)/n_z(i),S_x(i)/n_z(i),0)$- 水平合力设置
$if \quad d_{ON}>\epsilon_d$ ……(距离较远,鼓追球)
$\quad\pmb{F_{xy}}(i)=\pmb{K_{pv}}\pmb{r_{db}}(i)m_d/(\Delta t)^2-\pmb{v_d}(i)m_d/\Delta t $ ……(二维向量)
$else$ ……(距离较近,鼓迎球)
$\quad\lambda=(em_d-m_b)/\Delta t$
$\quad\pmb{F_{xy}}(i)=diag(\frac{\lambda k_{fx}-1}{1+k_{fx}},\frac{\lambda k_{fy}-1}{1+k_{fy}})\pmb{v_b}(i)$…… (二维向量)- 竖直合力设置
$F_z(i)=min(F_X(i),T_{lim}(z_d(i)))$,$F_X(i)\in \lbrace F_I,F_{II},F_{III},F_{IV} \rbrace$- 根据第一问的状态机更新状态X(此处略去)
- Section 2: physics system
- rotation
$\pmb{\omega}(i+1)=\pmb{\omega}(i)+\pmb{M}(i)\Delta t/J$……(转动定理)
$\pmb{\Delta n}(i)=\pmb{n(i)}\times\pmb{\omega}(i)\Delta t$……(角速度造成的法向量瞬时变化量)
$\pmb{n}(i+1)=\pmb{n}(i)+\pmb{\Delta n}(i)$……(更新法向量)
$\pmb{n}(i+1)=\pmb{n}(i+1)/ | \pmb{n}(i+1) |$……(归一化消除长度误差)
$\pmb{R_k^o}(i+1)=\pmb{R}(i+1)\pmb{R_k^o}(1)$……(更新鼓的径向,R为旋转矩阵)- translation
drum:
$\pmb{f_{d_{air}}}(i) = 0.5 C_d \rho S_d \pmb{v_d}(i) |\pmb{v_d}(i)|$
$m_d \pmb{v_d}(i+1) = (-m_d g\pmb{e_z} + \rho V_d g\pmb{e_z} - \pmb{f_{d_{air}}}(i) + \pmb{F}(i))\Delta t + m_d \pmb{v_d(i)}$
$\pmb{r_d}(i+1) = \pmb{r_d}(i) + \pmb{v_d}(i)\Delta t$
ball:
$\pmb{f_{b_{air}}}(i) = 0.5 C_b \rho S_b \pmb{v_b}(i) |\pmb{v_b}(i)|$
$m_b \pmb{v_b}(i+1) = (-m_b g\pmb{e_z} + \rho V_b g\pmb{e_z} - \pmb{f_{b_{air}}}(i) + m_b \pmb{v_b(i)}$.
$\pmb{r_b}(i+1) = \pmb{r_b}(i) + \pmb{v_b}(i)\Delta t$- collision
$if \quad d_{ON}<\epsilon_{collide}$
$\quad\pmb{P}(i)=m_{b} \pmb{v_b}(i)+m_{d} \pmb{v_d}(i) $
$\quad I(i)=\pmb{F}(i) \Delta t-\left(m_{b}+m_{d}\right) g \Delta t \pmb{e_z} $
$\quad$
$\quad\pmb{\omega}(i+1)=\pmb{\omega}(i)+\pmb{r_{db}}(i)\times m_b(\pmb{v_b}(i)-\pmb{v_b}(i+1))$
- Section 3: judge system
$if \quad d_{ON}<\epsilon_{collide}$……(接不到球)
$\quad if \quad (x_d-x_b)^2+(y_d-y_b)^2>R_d^2$……(简单化处理)
$\quad\quad fail=2$
$if \quad v_b(i)<\epsilon_{v}$……(球达到最高)
$\quad if \quad d_{ON}<0.4$……(球太低的判负)
$\quad\quad fail=1$
Q4: 当鼓面发生倾斜时,球跳动方向不再竖直,于是需要队员调整拉绳策略。假设人数为10,绳长为2m,球的反弹高度为60cm,相对于竖直方向产生1度的倾斜角度,且倾斜方向在水平面的投影指向某两位队员之间,与这两位队员的夹角之比为1:2。为了将球调整为竖直状态弹跳,请给出在可精确控制条件下所有队员的发力时机及力度,并分析在现实情形中这种调整策略的实施效果。
初始条件,按照我们的解读刻画如下:
$\pmb{r_b}(1) =\left(\cos \left(\left(\frac{2 \pi}{N}\right)(k-1)+\frac{2 \pi}{3 N}\right) x, \sin \left(\left(\frac{2 \pi}{N}\right)(k-1)+\frac{2 \pi}{3 N}\right) x, \Delta H\right)$
$\pmb{v_b}(1) =\left(\cos \left(\left(\frac{2 \pi}{N}\right)(k-1)+\frac{2 \pi}{3 N}\right) v_{x O y}, \sin \left(\left(\frac{2 \pi}{N}\right)(k-1)+\frac{2 \pi}{3 N}\right) v_{x O y}, 0\right)$
代入初始条件,依然按照目标规划搜索三组比例系数,z方向直接调用第一问的最优结果(我们跑了很多次,由于搜索时间实在太长,每次都没跑完,所以在这里只取一个有代表性的来演示,论文中用的另一个结果),得到第四问的决策结果和游戏过程:
这个问题是本队翔哥解决的,在这里直接贴出来原汁原味的总结:
按照这个规则分解以后,代入问题4的初始条件,出来结果是这样子:
可以看到,大部分我们已经做完,就差最后一个验证了。计算接近度,我们使用的是相关系数。将上面搞出来的分力合成以后再次代入系统,发现:原来设定25000步,运行了11584步就输掉了。大概是因为重新合成的力没有引入反馈吧,误差会累计。不过结果已经相当不错了。
计算一下接近度(标注part的是只算了11584步的接近度)
发现即使力非常接近(都是1),运行轨迹也不是很接近。反馈的重要性可见一斑了。
一个遗憾:时间和能力有限,我们没有做分力代入系统的实时反馈以及考虑了力矩的合力分解。
啊写了好多!鲁棒性分析就贴图好了。
结果是:系统至少可以支持碰撞系数0.68<e<0.99的变化,在此区间内可以无限颠球。
先写到这里吧,期间有许多细节也略过了,毕竟要总结的东西特别多,要抓住重点。
总觉得这次数学建模以后,我就可以去游戏公司开发3D游戏了!
其实自己写出仿真程序还是很爽的,体验了一把决定论哲学下的上帝视角!今后用到控制、用到三维世界描述的时候,也会更加得心应手。
数学建模还是很占精力的,尤其是当你不是为了划水得奖而是认认真真研究问题的时候。不过闹了半天,我也没有用上什么特别先进特别现代的算法,也没用上什么近代的数学知识。不过也不要好高骛远,慢慢成长慢慢来,时间可以填补知识空白。
祝大家学业有成,参加数模的小伙伴遇到自己心仪的问题,有志于科研的小伙伴遇到自己心仪的课题(祝我自己)!