I see there has been some use of so called AI to convert audio to
MIDI. Actually, for solo piano in a controlled environment, this
has been a laboratory exercise for several decades.
I have been playing around with the fundamentals for years. Warren
Trachtman used Bayesian estimators and statistical error correction
into the RollScan converter program that converts roll perforations
to MIDI.
I also noticed that my programs for creating pipe organ definition
files to play scanned pipe organ rolls uses fuzzy logic and other
statistical means so that rolls scanned on one instrument can play
on a different one -- more often or no an emulator such as Haupwerk
or jOrgan.
The only thing new about the current so called AI fad is that it
is the current buzzword-du-jour to get banks to loan money or seek
research grants. A few years back it was NFTs and bitcoin. Some
will even remember when collecting antique instruments was a sure
way to beat the stock market returns.
This is just a new name for old wine. Many businesses like medical
have been using expert systems for decades. Elizabeth Holmes went
to prison for misusing this tech. (Her claims are probably valid,
but she was too soon to make them.) I used some of the same blood
count image analysis to find centers of overlapping circles in piano
roll scans.
I looked at 30-year-old textbooks and they could have been written
yesterday, or this very instant, or tomorrow. What we now have is
the effect of Moore's law on these formulae -- some of what are
centuries old.
The cams that drove the automatons and early computing (especially
the fortune telling magicians and slot machines) that allowed dolls
to write and draw are based on the same mathematics used by Bernoulli
and Fourier. I call these phase circle infinities: the sums of sine
waves to infinity. Some call them imaginary numbers (as they are
based on the square root of -1).
What we are seeing hearing and feeling is illusion, although any
electrical engineer will tell you that the shock from a charged
capacitor/condenser is no illusion! The real question is what happens
to the MIDI when it is edited in Cakewalk [MIDI file editor]? Can the
different sections be extracted?
In some effect this is not much different than Mozart's musical dice
game, or the Panharmonum I saw and heard demonstrated in one of the
European museums. (I wonder what happens when AI gets a gambling
addiction attempting to predict lottery numbers?)
Most of my work (which I gave a lecture on last year at the Second
Global Piano Roll Conference) was in using Postscript to directly
process MIDI. This had the effect of being able to render MIDI as
sheet music, which for the most part is unplayable as the data is
overfit to the page.
This is an area of concern. There is risk of bias in the training set.
The results get old quickly as they only respect the existing data.
I was also surprised how many there did not know that there are apps
for playing grooved recordings photographed with an iPhone (or other
smartphone.) The recovery of event data from such programs was one of
my predictions. Do to time limitations I had to cut most of this out
of my presentation.
I was most interested when I rendered the scanned roll music as
waveform (Using what are called sound font samples.) How the discreet
samples combine to come together as a performance recording. The
Apple sound engine also renders MIDI sounds this way. One reason the
built-in Apple MIDI player does not work in real time as it used to do.
One of the things I would like to do is to recover the equivalent of
sound fonts from wave recordings. An emulator such as jOrgan or
Haupwerk is only as good as the underlying sound samples.
It is also interesting that the resultant MIDI from the GIANTMIDI-Piano
project has problems with the pedaling cut off. This was an issue with
the solenoid floppy player I worked on in the early 2000s, mostly as
the note event fades to infinity.
I also noticed that the pedal solenoid (and the player) uses gravity
and time to catch the dampers. This is often used to compensate for
temperature and the room acoustics, which vary.
It will be interesting to see what all this leads to.
Julie Porter
Martinez, California
|