45degree Slope, Language, and Information Theory
  • Just came across a remarkable article. To sum it up: A scientist named Laurance Doyle decided to apply information theory to language. The idea is simple: Language is different from a random signal in a very special way.

    Create a chain of completely random words, say 10,000 words, and then plot on a graph the frequency that any particular word shows up. The resulting graph would show a 0 degree horizontal slope. Any one word has the exact same likelihood as any other word.

    However, intelligent signals, language, is different. Some words we use quite often(the, of, a, to, I), some words are less common(tomorrow, happy, house, dinner), and other words are quite uncommon(oblique, transistor, intercept). If you plot human language by the frequency of words used, a 45 degree slope results. The common words(the, of, a) will show up often...the less common words...less often.

    Interestingly, this slope results when you test ANY human language. Therefor, even if you don't recognize a certain language as even being a language...graph the signal, and it's slope will determine if it's random noise...or an intelligent language.

    So, they tested dolphin sounds. They graphed each distinct phoneme(basic unit) of dolphin vocalization...and found it renders a perfect 45 degree slope. Even though we have absolutely no way of telling what dolphins are communicating...we can be certain it is an intellgent language...not just random sounds. Indeed, information theory can even determine how complex the language is. Dolphin language rates at a '4' on the scale...whereas human language rates around '8' or '9'.

    So...they had an idea: what if we test EVERY signal there is in the universe? You could measure the subtle seamingly-random fluctuations in the magnetosphere of the sun. If distinct patterns matched up to the 45 degree slope it would be a huge discovery...and would force us to come to terms with some big ideas.

    SETI is currently attempting to process signals from deep space in just this way. For a signal from space, or anywhere on Earth even, to fit a 45 degree slope would be so absurdly improbable that it would be de-facto proof of intelligent communication...even if we didn't recognize the signal as language at all.

    Super interesting stuff. It makes me wonder if there could be a way to run certain mathematical descriptions of fundamental forces(gravity, electromagnetism) to see if the very foundation of reality is random...or an intelligently encoded language. When you take into account the fact that mathematics is a symbolic language that describes reality itself...the idea isn't that far out there.

    Here be the link..........
    CLICK ME o_O
    "Follow your inner moonlight; don't hide the madness."
    - Allen Ginsberg
  • holy shit dude thanks for posting this the next time I get high I am sure there will be a massive freakout this is one of the coolest pieces of info I have ever seen
    Much love
  • From the article:
    "Think of all the different sounds human beings make as they speak to each other, the different letters and pronunciations. Some, such as the letters ‘e’ and ‘t’ or words such as ‘and’ or ‘the’ will occur far more frequently than ‘q’ or ‘z’ or longer words such as ‘astrobiology’. Plot these on a graph, in order of the most frequently occurring letters or sounds, and the points form a slope with a –1 gradient."

    Doesn't make sense to me... Frequency is the y-axis, what is the x-axis, a unitless frequency rank? You need 2 inputs to form a slope. It sounds tabular, with the x-axis representing unique symbols/sounds/frequencies which aren't quantifiable or plotable in themselves.

    The concept makes sense, and applicable to a wide range of data... Certain inputs have a higher likelihood of happening, while others hardly happen, and an entire gradient exists in between. There still is the human influence in defining variables, and ranges of variables, in applications like radiation, tonal frequency, etc.

    It seems like the article is playing with the reader's desire to unveil some universal constant that all of life is writ by. There very likely may be...
    Post edited by Gubermensch at 2012-08-27 14:45:31
  • Very cool. I was reading about Claud Shannan's work in The Information by James Gleick and it blew my mind. Awesome article.

    Trying to understand all this is real mind fuck.

    http://en.wikipedia.org/wiki/Entropy_(information_theory)
    Post edited by jimmybob at 2012-08-27 15:08:12
    "Out beyond ideas of wrongdoing and rightdoing there is a field. I will meet you there." Rumi
  • double post
    Post edited by jimmybob at 2012-08-27 15:07:46
    "Out beyond ideas of wrongdoing and rightdoing there is a field. I will meet you there." Rumi

  • Doesn't make sense to me... Frequency is the y-axis, what is the x-axis, a unitless frequency rank? You need 2 inputs to form a slope. It sounds tabular, with the x-axis representing unique symbols/sounds/frequencies which aren't quantifiable or plotable in themselves.



    I'm not sure I follow you. Wouldn't the 'x' axis simply be each discrete word/phoneme/etc? Say you have an essay with 10,000 words. First, count up how many unique words there are. Maybe there's 2,000. So, you plot each unique instance on the x axis, count up the frequency of each and let that be the y axis. Obviously, a random set of data would result in a horizontal slope.

    Actually, just for the fun of it, having no idea what the result would be, I decided to plot the above quote from you. The theory posits you can use any discrete letter/phoneme/word/sound/etc as long as you're consistent. Heck, I bet morse-code works as well. I chose to plot it by letter frequency. Then, as a control, I found a random letter generator online and had it spit out a chain of letters to plot as well. I used the symbol ")" next to each letter each time it showed up and then listed them by frequency. I think the result speaks for itself. Obviously, the larger the sample, the better. There's a reason people always call "s" and "t" in Wheel of Fortune. ;)

    ----------------------------------------
    "Doesn't make sense to me... Frequency is the y-axis, what is the x-axis, a unitless frequency rank? You need 2(two) inputs to form a slope.
    It sounds tabular, with the x-axis representing unique symbols/sounds/frequencies which aren't quantifiable or plotable in themselves."

    ----------------------------------------

    e )))))))))))))))))))))))))))))
    s ))))))))))))))))))))))
    t ))))))))))))))))))
    n )))))))))))))))))
    i ))))))))))))))))
    a ))))))))))))))
    o ))))))))))))
    u ))))))))))))
    r ))))))))))
    h ))))))))
    l ))))))))
    y ))))))
    f )))))
    m )))))
    x )))))
    q )))))
    c ))))
    d ))))
    p ))))
    w ))))
    b ))))
    k ))
    g )
    v )


    ----------------------------------------
    "hgbadnkojagfipwfqtskqojlovoiuemmsxhdcehwfrndkbwncxpclstrlumvlgxgujpimnmtkjvcnblkjerhyoa"
    ----------------------------------------

    n )))))
    j )))))
    k )))))
    l )))))
    m )))))
    o )))))
    g ))))
    h ))))
    c ))))
    r )))
    s )))
    t )))
    u )))
    b )))
    a )))
    d )))
    x )))
    e )))
    f )))
    i )))
    p )))
    v )))
    w )))
    q ))
    y )
    Post edited by PrimalPurpleMoons at 2012-08-27 20:27:53
    "Follow your inner moonlight; don't hide the madness."
    - Allen Ginsberg
  • This is the equation of a line:

    y=mx+b

    The y-coordinate is equal to the slope of a line times the x-coordinate, plus the y-coordinate where the line crosses the y-axis.

    M (Slope) is the line they're claiming is 45 degrees, or -1. To find this slope, there has to be both an x and y coordinate, not a y-coordinate and frequency-oriented facets of a category, that is a bar chart.

    False statement: "Plot these on a graph, in order of the most frequently occurring letters or sounds, and the points form a slope with a –1 gradient". Unless he's assigning each facet a unit value of 1 and using some differential equation to produce linearity. But, the article doesn't tell us....

    Sounds like shenanigans to me.
    Post edited by Gubermensch at 2012-08-27 20:58:00
  • @Gubermensch if there's anything wrong (which I don't think there is) it's a problem with the way it is perhaps described in the article, but it's not shenanigans. Read up on the subject independently, outside of this article.
    "Out beyond ideas of wrongdoing and rightdoing there is a field. I will meet you there." Rumi
  • Yeah...there's lots more info online about this. I suspect they havent overlooked such a glaring problem that youre pointing out. SETI is using this to process signals now. I just can't imagine them overlooking a simple highschool geometry issue.
    "Follow your inner moonlight; don't hide the madness."
    - Allen Ginsberg
  • Haha, k.
  • The x axis represents their rank. So starting with the most common word they could be represented by a polar coordinate (rank, frequency) or (1, however many times it was used) then the next most common would be (2, however many times it was used) and so on.
    "No one knows enough to worry." - Terence McKenna

    http://highdeas.com/users/NotHilarious
  • @moozy I understand. That isn't plotable. That is a barchart. Check the source, is SETI a reputable institution?
  • @gubermensch what was done above is a bar-chart, yes, but the information is definitely plotable using Cartesian coordinates. I have absolutely no idea how reputable SETI is.
    "No one knows enough to worry." - Terence McKenna

    http://highdeas.com/users/NotHilarious
  • Found it:
    http://www2.eve.ucdavis.edu/gpatricelli/McCowan et al 1999.pdf
    pg.4

    Article explained it poorly.

    Differential equations, eckh.
    Post edited by Gubermensch at 2012-08-28 12:56:03
  • Claude Shannon pretty much knocked down all the walls when he was able to differentiate or relate between information and chaotic/random information/noise, in such elegant work.

    As for language being neither perfectly chaotic nor perfectly ordered, you can find a power law relationship between frequency of usage and the amount of words fitting into that bucket.

    You can tell Bach from Beethoven just by looking at the power law's exponent / slope of log plot.

    Power laws tend to show up wherever there is a fractal or self-similar nature; music, language, city size, business age. It's a beautiful dance between complexity and complicatedness, the procedural and the declarative. Little in life is like the bell curve, which implies a single target with perfectly random and independently determined noise.
  • I wish I fully understood what you just said, because the significant part of that was quite interesting. I think I follow you. Thanks for the explanation :)
    "Follow your inner moonlight; don't hide the madness."
    - Allen Ginsberg
  • Does this have anything to do with the occult? It just seems like it should somehow.
  • Does this have anything to do with the occult? It just seems like it should somehow.



    Well, 'occult' simply means 'hidden.' So...not really sure if that would relate to this specifically. Why do you think it does? Just curious. I don't know all that much about occult subjects...but find it interesting.

    "Follow your inner moonlight; don't hide the madness."
    - Allen Ginsberg
  • Its embarrassing to have that outloud wondering captured. What a poorly formed question I asked.

    I remember being really excited about this discovery and wanting more follow up information. Plus it seems like some really high level mystery school math. Its just so easy for my to imagine this coming out of some Pythagorean philosophy. All is number. This bit of code we can express with numbers describes this illusion of reality. That sort of thing.

    We need a portable microphone computer that we can carry around and test stuff everywhere. Like a Geiger counter for intelligence.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!