Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Wolfram Alpha is Coming -- and It Could be as Important as Google (twine.com)
97 points by toffer on March 8, 2009 | hide | past | favorite | 73 comments


For a system like this, I've been trained through repeated disappointments to ignore hype and only look at results. I might also be influenced by knowing a bit about AI.

  it doesn't use natural language processing, it *computes* the answer. 
Gibberish


It wouldn't be the first time the world was promised something from Wolfram that went basically nowhere.


On the other hand, Mathematica is pretty cool. Wolfram is a clever guy with a lot of money. That's worth something.


mathematica is a reimplementation of similar systems that were already proven to work. alpha is a reimplementation of similar systems that were proven not to work.


Ah, the "witty put down" at its finest! While nothing of this sort has worked to date, the same could have been said of powered flight on December 16, 1903. In any event... Thanks for making me smile :-)


A new kind of hype.


The Wolf who cried Alpha?


Even if it worked as advertised, what fraction of your search queries is of the factual type that this system would be able to answer?


Easy: Today, zero percent.

But the more interesting question is "why is that?" And the answer is that you've used Google for so long that you've thoroughly internalized its capabilities that you think in Google and can't easily see outside of those capabilities.

In the programming language context, we call this the Blub paradox, but Blub happens everywhere, and is hardest to see when there is nothing to step up to.

If this works (a big if, IMHO, but I'm willing to give it a try), what might happen is that it might entirely recalibrate what queries you can conceive of. Until we see the system, though, it's hard to even begin to imagine what those queries might be.

How many queries in 1980 did people wish they could query Google for? Nearly none, not because people didn't have questions Google could answer, but because only a bare handful of visionaries could even conceive of Google.


Awesome point. It's also why any serious contender to Google won't really look like Google at all.

Because if it looks like Google, then users will interface with it like they're used to interfacing with Google, and then you really have to win by beating Google at its own game. Much better to pick your own place & time.


You can already ask such questions on Google and get back answers in many cases in the web page snippets. Try this query for example: how many bones are there in the human body?

So if you don't normally issue queries of this sort on Google even though you would probably get back an answer in the web page snippets, then perhaps you would not find this new system all that useful.


Google is great for finding random data, but the results generally require human perusal. You couldn't feed them into a running program without at minimum a hand-written screen scraper.


"Gibberish" - why? Many of the commercially used products that deal with representation of natural language use naive bayesian inference or statistical classification to return results.

Further, one of the promising approaches in this area involves using generative grammars (or other generative, non-parametric approaches) to approximate natural language representations.

Both these approaches 'compute' answers without a notion of natural language grammars that are usually associated with natural language processing.


   usually associated
  
I've definitely read papers about using those methods you describe on processing text. They are all just algorithms attacking a problems, so the distinction between NLP and computing is gibberish.


Yes, I agree that all NLP must include some computational model. However, it is an interesting distinction to the reader of this article that the natural language engine is not based off of a linguistically derived grammar - like Powerset, and many of the larger, more notable NLP efforts.

This is the point the author was making. Had you not paraphrased the article to skew the intended meaning of this observation, it would have taken on a different meaning.

The actual text: "It doesn't simply parse natural language and then use that to retrieve documents, like Powerset... Instead, Wolfram Alpha actually computes the answers to a wide range of questions"

Computing the answer for 'what does a string of natural language mean' and 'what is the intended answer of the question being asked' are 2 different things.


I see the distinction, but I think the review is light on content, so the criticism is fair. Why, pray tell, do we need to be hyped about a great new piece of software?

If it even close to real, the results will certainly speak for themselves. And the results might even be really awesome! I just would advise people to not pay attention until they see them.


I'm sort of assuming that its going to be an extension of the online data sources that Mathematica introduced a little while back, with perhaps a natural language interface.


I hope it fares better than numenta.


Numenta makes software that is used in (prototype) missile systems. The vision systems in missiles and aeronautics generally are awesome. They are much closer to practical, deployable software than this sounds.


Agreed. Plus, I doubt the companies using Numenta's sdk want it advertised that they are using it.

If you have not tried their sdk, I highly recommend it. It is open source(not free though). You can play around with it for free but if you want to use it commercially you need to pay.


I have taken a look at the website. Apart from hype there is nothing that I have seen that makes me want to use it for any AI (which I do a lot of commercially).

How does it compare to other architectures? What sort of problems does it excel on? Where are it's weaknesses? What is the training like compared to other methods?

I wouldn't really take them seriously until a paper is linked to from their front page explaining the above. At the moment it just looks like hype.


Are we still talking about Numenta? Their website links to several papers that address all of your questions:

http://numenta.com/for-developers/education.php


Yes we are talking about numenta.

http://numenta.com/for-developers/education/htm-related-pape...

Unfortunately the three (why aren't there more?) independent papers didn't use HTM, and certainly don't answer all my questions.


I had a chance to see this in action a while back. While I, and none of the people I saw this with, were not at all impressed by NKS, this project blew our minds. We watched as it pulled up and manipulated everything from Egyption fraction expansions to historic weather data to the human genome. If the author of this article is exaggerating, it's not by a whole lot. While Wolfram may not be bringing about the revolution in science he hoped to, don't forget that he and his crew made Mathematica, and are very capable of creating impressive software.


I want to believe you, but the fact that you signed up just to leave this comment anonymously doesn't do much to alleviate my skepticism.


Yeah, I know it looks suspicious. I've been lurking around here for a while, but hadn't really been compelled to comment before. I'm happy to try and answer questions about what I saw, but it was a while ago, so I don't remember the details that well.


I'm very curious if, during this demonstration, you were actually allowed to directly interact with the system or not?


No, but he gave a pretty thorough demonstration, and took some requests for input.


How limited was the input? Did someone type in whatever queries you guys asked?


Reminds me of the demo of Freebase Parallax: http://mqlx.com/~david/parallax/

The demo was very impressive, but in practice they just couldn't match the scope of Wikipedia or the wider Web. It may be the same way with Wolfram|Alpha.


Sounds like another Cuil hype type campaign. When Google came out they didn't make any claims. Only factual performance counts.


> When Google came out they didn't make any claims. Only factual performance counts.

A cite to some Google claims when they came out would be nice. (Academic papers don't count.)

I was in the SF Bay Area at the time and the "publicity" that I remember was friends saying "check this out". I don't recall Google saying anything beyond "here's how many pages we indexed".


> I don't recall Google saying anything beyond "here's how many pages we indexed".

I think that's the point shimonamit is making.


If all Wolfram will be spitting back is an answer, I'm wondering how the user will determine the answer's correctness. Will there be a "proof" of some sort, or a list of references for the underlying facts and assumptions. With information found via Google, at the very least you'll be able to assess the reliability of the author/source (random message board v. NYTimes article) - not saying this is perfect, but a good measure.


Let's not get too excited here. Generally, stuff that's hyped up before it launches tends to suck.

Was Twitter hyped up like this before it launched? Facebook? Google? Microsoft? Apple? TechCrunch? Hacker News? Wikipedia? Heck, pretty much ANYTHING that's successful now? (Even small stuff like Balsamiq that's currently very successful in a small way wasn't hyped before it launched).

Now think of stuff that was hyped massively before launch. Cuil. Powerset. Yeah.

Stuff that ultimately becomes super successful becomes successful over quite a long period of time and due to the excitement of users after launch - not the bleatings of gurus before launch.


Wolfram Alpha is coming -- and It Could be as Imporant as WolframTones!

http://tones.wolfram.com/


There's already a company beta-testing this core technology: TrueKnowledge, based in Cambridge (UK).

http://www.trueknowledge.com/technology/

It's an interesting concept, and has much broader applications through their API.


TrueKnowledge has been doing this for a while now... And they do it well. Did it change the world? No. I call BS on this article.


Thanks for the link! I checked it out and TrueKnowledge is pretty interesting!


Or it could be as unimportant as Powerset. It's best not to hype it up too much since the odds are that it won't be a panacea to everyone and a lot of people will be disappointed.

Products that tend to be modest initially and improve and prove themselves rapidly tend to do better than products that are hyped up beyond all proportions.


Indeed, why hype a product that is so insanely great the news will spread practically by itself? I expect a product to be worthless if its hype goes beyond a certain treshold.


I said something similar about A New Kind of Science, and that was ridiculously underwhelming. I respect Wolfram like crazy, but I want my money back on that thing :)


Do you feel that Wolfram can rightly claim the thesis offered and explored in the book? Regardless, how well do you think he supported the thesis?

The thesis of A New Kind of Science is something like "systems comprised of a small number of simple rules can perform arbitrarily complex computations."

The book proceeds to support the thesis. The content is comprised of descriptions of such systems, corresponding Mathematica execution trace diagrams, and analysis. These analyses are related to a ambitiously large scope of natural phenomenon and scientific knowledge.


Knowing no more about this than the PR thus far, I am pessimistic, but for not quite the same reason as some other commenters.

I think a couple of things are clear.

(1) We are at the point where something impressive is likely to be able to be produced, and Wolfram may very well have the resources to do it.

(2) We are not at the point where the be-all-end-all version of this can be produced.

Compare this with the symbolic computation packages (Mathematica, Maple, etc.). Around 1990, we were at the point where we could produce a very good one. Several were written. They have been improved since, but only marginally. We're still pretty much using 1990 technology.

And that's fine. We knew how to make a really good symbolic computation package. We did. End of story.

But consider the proposed packages (Alpha, etc.). We might produce something impressive. But we are not ready to produce something really good and useful. Our initial efforts will require lots of improving.

And Wolfram is definitely not the one to do that improving. He runs an aggressively closed shop. Always has. I predict, therefore, that the cathedral-bazaar effect is going to mean his product will be difficult to improve, and so will never become truly useful.


This too is my concern. The guy is brilliant, but I really feel technology like this can only reach its full potential by being open and extendible by domain experts. Hopefully Wolfram realizes this as well -- it sounds like he has put forth significant effort and money bootstrapping the engine with knowledge and so on, hopefully he passes the torch to the rest of the world and doesn't propose his company be the only source of information into this engine.

If he provides not just technology and data but also the means to extend that technology and data by following his example he might be contributing something truly revolutionary.


Excellent article describing what could be the biggest advance in the web since the launch of Google. However, I wonder if it will be inundated with 99% of the questions being about who Miley Cyrus's current boyfriend is - or some other frivolous usage.

Seems to me that this technology should have been released for some other scientific usage first (if it is indeed that powerful). It could be valuable as an engine for other applications as well in this manner.

I would also argue that one of Google's advantages is that it enables discovery of new information instead of just giving you the one page you for which you are looking.


Successful search engines are innundated with questions about what its users care about.

> Seems to me that this technology should have been released for some other scientific usage first

Why?

Technologists need to get over the idea that technology is for science/technology.

> I would also argue that one of Google's advantages is that it enables discovery of new information instead of just giving you the one page you for which you are looking.

Huh? If there isn't a page that states which city is the fourth largest in eastern Montana, how will Google help you answer that question? (No fair going to the "populations of cities in eastern montana" page.)

Google doesn't (yet) do join queries.


Sounds a lot like what cyc (http://www.cyc.com/cyc/company/about) has been trying to do for the last 25 years and actually the holy grail of AI since the 50s.

I think its still way out of reach for non-trivial data-sets. Something like this doesn't just show up out of the blue, its not a problem amenable to some single new algorithm or breakthrough.


I don't think a formal system with symbolic inference is useful for describing the knowledge of any reasonably complex domain that doesn't have a mathematical model. And most of the human knowledge tends to be like this.

I'd love to be proven wrong...


Your statement is a tautology.

A domain that doesn't have a mathematical model would not be describable by a formal system.

And likewise, any domain describable by a formal system would have a mathematical model.


I'll add another tautology to counteract your argument: isn't it true that any domain that can be described by a formal system is essentially equivalent to mathematics?

Which brings us back to the original point: human stuff is not very math-friendly. I want to deal with emotions, politics, etc.

Think Bush and torture: can this system give me any definitive answers?


Your logic is not correct here

You would be correct if you said:

"Isn't it true that any domain that can be described a formal system is also describable by mathematics."

If I have a friend Joe who is largely predictable then in certain situations, he is describable by a mathematical system (a logical system).

His full set of actions go beyond mathematics and if Joe realized how predictable he was, he might stop being so predictable.

Describable by mathematics does not mean "essentially equivalent" to mathematics.


And when Joe started doing something new, you'd just extend your formal system to include all of Joe's new behaviors. Given enough symbolic content, no matter what Joe comes up with, you can model it.

Just like we extended our formal number system into negatives, imaginaries, quaternions, etc.

Math is a terrifically abstract, self-consistent model of reality. But that's all it is: a model. Sometimes the model tells us things we didn't know before, and sometimes we have to change the model to make it work with what we're observing.


I don't think so. My assumption is that Joe is predictable by a mathematical equation. I find that this is only true of certain people and only true in certain circumstances.

People, in general, operate the opposite of computers. We don't think about what we should do, we think about we shouldn't do. So, it is very hard, if not impossible, to represent human behavior by a mathematical system.

This argument is explored in great detail in the book: Godel, Escher, Bach.


I think that's where the "restricted to domains where everyone agrees that there is a factual answer" comes in. (I'd like to have a system answer me if I really violated any traffic laws when I got a ticket last week, but I don't think that's going to be very easy...)


"Wolfram Alpha is not HAL 9000, and it wasn't intended to be. It doesn't have a sense of self or opinions or feelings."

Too bad for that because right away I was thinking "Wow! It's a sentient version of Google only a bazillion times better!" but then I realized it's just a parser that turns natural language questions into queries against a large dataset then I became all sad and disappointed.


How will they prevent malicious questions such as prime number factorization, np problems, from eating all processing time?? Im asking it seriously! At least they have to enumerate all these questions to prevent system abuse. Funnyly "The Last Question", Asimov's short story also comes to my mind.


I wonder how useful this will be if you can only ask a single question or a set of questions that can be easily expressed in single line text field?

If I'm asking something like, "What is the capitol of Nebraska?" why not just get directed to the Wikipedia entry where I can learn a lot more than the one fact that answers what I just asked?

If Alpha is actually going to do computation, I'd rather be able to use it for something more complex than a single natural language query.


I don't think it's gibberish.

I would love to gather good questions and discuss the results when they are available. I think it is important to find questions to which google, yahoo, powerset or wikipedia don't provide a straight answer.

How about:

1) What is the smallest unknown prime number? ;)

2) Where on earth is the rainy days to sunny days ratio the lowest?

3) How many languages does the average person from the Benelux countries speak?


I predict this product will score at the very least 3 cuils.

But heck, I wouldn't even be surprised if they push the scale, 7 cuils anyone?

http://cuiltheory.wikidot.com/what-is-cuil-theory


Please, please stop with the "cuil theory" meme. It's dumb.


The other big question: will they hardwire it to come up with "42" for "the answer to life, the universe, and everything" or will it actually source that "fact" from the web?


doesn't Google do a pretty good job at answering facts?

eg: http://is.gd/mnQw


If you ask Google "What is the capital city of the country that has the 15th highest average rainfall in the world?" then you don't get a straight answer.

This software should be able to look up rainfall data from Stephan Wolfram's Bumper Book of Trivia, work out average rainfall for each country, work out which country has the 15th-highest rainfall from that result, then look up the capital city for that country.

All determinable facts with a straight answer; you should simply get the name of the city as a result.


I tried the query and the top of the list is the parent comment. (Google seems to be paying close attention to HN.) The second was an article about Uruguay which said it was colonized from Spain in the 15th century. Google replied with any old 15th. (The article is actually wrong. It was settled in 1536.)

The parent posting is correct. Google is particularly poor at complex questions, especially if you aren't sure about the exact phrasing. I have had several queries in the past couple of weeks where I have had to spend 30 to 90 minutes trying to get the right set of terms to return the right set of documents to look at. This is particularly true if the name of the product is a common English word.


This is as you say a fact, something you can look up in a geographical info database. As I understand it Wolfram's goal is to also answer questions which require computation and application of laws or formulas, or inference.


First question: What is the airspeed velocity of an unladen swallow?


If there is only one answer to one factual question, why not have it already written and available like wikipedia?

Practically speaking, static content that doesn't change often is better served by models like wikipedia.

If Wolfram knows all the answers, write them all in static html for the world to use, search, browse, replicate and extend instead of stored on semantic databases or ethereal brains.

I am not pissing on their parade, I know the scientific work is commendable, but practically speaking it can't compete with more efficient models.


> If there is only one answer to one factual question, why not have it already written and available like wikipedia?

Wikipedia isn't all that useful for storing all of the sums of integers.

In other words, you can't enumerate all of the questions that have one answer.


Google 2+2

Not to be a dickhead, I know what you mean.

Questions that involve some kind of processing power can be a good target for Wolfram, but then, how much marketable besides academia?

The answer to the population of X country/city/town = wikipedia, plus more facts you may be interested while doing your research paper.

Maybe I just need 10 different questions/implementations of such service to get it.


Couldn't you just do this:

askwolfram "What are all of the questions that will ever be asked?" | askwolfram "How do I format this answer for wikipedia?" > wikipedia.html


Gödel might have said something about the possibility of a universal inference engine.

Like TrueKnowledge and the Freebase answers in Powerset, this system will likely be good at answering a small subset of very direct questions. Having access to Mathematica's symbolic solver algorithms would definitely help in building this system.

If it's successful it will either be faster than current inference engines, or capable of solving more complex queries. Or perhaps both. We'll see.


This would be slightly more impressive if it actually had a... demo.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: