Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Wizard for Mac – new kind of statistics program (wizardmac.com)
125 points by deathtrader666 on April 7, 2018 | hide | past | favorite | 62 comments


Generally speaking, IMHO, making the interface easier, in practice, doesn’t actually make using statistical methods correctly easier and that’s scary.

SPSS is a classic example - and the social sciences have had a series of discredited papers over recent years due to poor application of statistical methods. Just because you can put a dataset in and get values out - even values that look significant, it doesn’t mean they are - did all of the assumptions and requirements of the methods/tests you used hold true/pass with your data?

So, while I haven’t looked closely at this tool, all I saw was talk of the interface and the ease of getting results even if “you don’t know where to start”. That scares me. Especially when you start talking about applications in domains like medicine. That could be lives in the balance. Would we want civil engineers using tools like this to build bridges? Or people designing nuclear reactors?

To me, this is worse, not better unless it somehow helps you actually understand when, why and how to correctly use these tools.


Agreed. It's odd to charge money for this at all, given that there are such robust statistical tools for free (read: R). But on top of what you said, the fact that trial versions can't export p-values is tacit endorsement of the "p < 0.0x is all that matters" camp, further promoting toxic statistical thinking.

Promoting ease of use of tools and simplification of harder problems is great, but this is a really, really dangerous thing to make easy and oversimplified.

Nope nope nope nope nope.


To be fair, R isn't really that accessible to most social scientists. Heck, R isn't really that accessible to most programmers either. The tidyverse and R notebooks have definitely improved things by leaps and bounds, but R at its core is an unusual language. (I say this as a somewhat proficient R user) Its programmatic nature does aid in coming up with reproducible data analyses though.

SPSS has, shall we say, a less than savory reputation.

All this to say that there is a market for something much friendlier than R. R is used by pure statisticians, data scientists and the like, but most social scientists prefer Stata, which has pretty legit statistical routines as well as a point-and-click UI.


As a proficient R user, though, you probably agree that it makes a certain sense when you get into a rhythm.

I think it surprisingly makes a lot of sense to social scientists, in spite of seeming backwards to computer scientists. I remember being in school and using STATA, and just plugging in numbers to get through exercises too quickly to bother understanding what the labs were about.

R seems to make you stop to realize what you are doing at the right moments, then uses a lot of magic to abstract away almost everything else.


I'm fairly fond of JMP, which in my mind manages to thread the gap between "Easy UI" and "Push buttons until you get a p-value" decently.


Yeah I quite like JMP. JMP for the easy routine stuff. R for when shit gets real. I don't do much stats these days though, so not having a JMP license, I just reach straight for R on the rare occasions I need it.


Personally I found that Pandas was more easy to use than R, especially you can use the rest of python ecosystem for say interfacing with databases and data format conversations.


Absolutely. I'm sort of hoping to see Python/Pandas eclipse R at some point in my career. But for now, my field is only reluctantly in R, and advising colleagues and students into Python is... non-trivial.


> Heck, R isn't really that accessible to most programmers either.

R is objectively a bad programming language. However, it is by no means inaccessible. I have no statistics background whatsoever, and I managed to learn enough R to be dangerous in a mere week. Other than the 1-based indexing and the utterly disgusting dynamic dispatch mechanism (you could simply not to use the latter), R is surprisingly pleasant to use. What I enjoyed the most is that vectors and matrices are first-class values, not objects that are referred to through pointers. It's probably copy-on-write under the hood, but I don't need to care. Hallelujah!


It's not really an objectively bad language, it's just heavily optimised for one thing - the types of linear algebra calculations used for statistical analysis. For everything else (i.e. as a general purpose programming languate) it's awful. Which is why many languages have R bridges, so you can do the grunt work somewhere less insane.


What does that even mean anyways. I mean I can write down the p value it reports. But that's useless to me without the associated statistic and whatever model I eventually build that was motivated by the spurious correlations I found in the first place.


R is a trainwreck...


This. I tooled around with it a little bit using a data set I often use for teaching (from Klein and Moeschberger's survival analysis text) and it was remarkably easy to get an answer without any clue as to what was happening, and that's...not reassuring.

As you note, getting an answer and getting the statistically correct answer aren't the same thing.


Yes, sadly having lots of fancy sounding stats creates a smoke-screen of believability.


This is a well known problem in cognitive systems engineering, which sometimes seem to promote design which might seem backwards to an interaction designer focused on creating for everyday consumption.

You don't want to design the control room for a nuclear reactor like you design a webpage, because you can under no circumstances sacrifice correctness for ease. A lot of times, it's about making the wrong thing really hard to do, with multiple layers of fail-safes. The potential gain in productivity when designing for ease is vastly overshadowed by the risks associated with making an error.

When we're not talking about nuclear reactors, but—for example—statistics, where consequences are abstract, it might get tempting to err on the side of ease. In fact, any domain you don't understand well enough well tempt you to design for ease over correctness.

Ideally, the designer should understand the domain at hand well enough to create a design that makes it easy to be correct, and hard to make an error.

There's a lot of stuff out there that has been neither designed for ease nor correctness, however, but which has a design arbitrarily dictated by the table layout of a database, or some other random technical constrain that has nothing to do with the problem domain.


Thank you - I think you said much of what I was trying to say in a much better way.


Making tools easier to use frees up mental capacity for understanding what your are doing.

Just because someone has spent a lot of time learning how to use a difficult tool doesn‘t mean they understand what they are doing any better.


Making tools easier to use also allows you to get to a statistical result without the help of an expert who knows how to properly interpret it. People will launch an A/B test and start checking p-values for 20 different variables within hours when they have easy tooling available.


I’ll give you a counter example. I’m a barely stats-literate biology graduate. I recognise that’s misuse of stats is dangerous but would like to become more expert. If this program can help me by handling the mechanical bits in a elegant fashion, while leaving me to worry about the appropriate tests and methodologies, I’m all for it.


I use Wizard all the time. I'm not a data scientist, but I often need to make engineering decisions based on trends hidden in large datasets. Wizard makes it easy to find what I'm looking for. I'm generally using it for a first pass at data, and when I find the trend, outlier, correlation, etc, that I'm looking for, I'll then move on to a tool with more features like Tableau or JMP to really dig in.

What is wrong with having a fast and easy to use tool which makes data analysis accessible? If people misuse their data or their tools, that's their fault.


Excuse a brief bit of hyperbole, but what’s your stance on gun control? How do you feel about seatbelt and helmet laws?

All of those things and more exist for the protection of the users and/or general public - but many people hold the attitude that it should be up to the user to do or not so it right.

We have a history of banning or socially stigmatizing dangerous products and practices - see 3 wheel ATVs and lawn darts, for easy examples.

Similarly, and less extreme, standards and best practices exist exactly because it’s often pragmatically dangerous to expect the end user of a product to be an expert enough to avoid hidden or infrequent but high risk dangers, much less very common and high risk dangers. (See examples everywhere from medicine to aviation or the national elictrical code).

Making it easier to shoot yourself in the foot isn’t a virtue on its own...


wow, this attitude is really patronizing and completely misled.

The difficulty of learning the software doesn't says much about the quality of the ideas, of the research design, of data gathering and interpretation.

Like it or not, SPSS made science much easier. And we should work in that direction. Not creating more complex tools for no reason. A software can and should also be a pedagogic tool, guiding people through...


>Like it or not, SPSS made science much easier.

No - it didn’t. Science isn’t running stats tools. It’s knowing what tools to use and using them properly so that you get valid answers.

If you haven’t been paying attention, the social sciences have been plagued by an epidemic of failures of reproducibility - see: https://www.nature.com/news/over-half-of-psychology-studies-... for reference. Only 39% of studies were found to be replicable and much of the blame falls to poor methods and techniques like p-hacking.

> And we should work in that direction.

Yes, we should - but dumbing down interfaces in ways that don’t actually provide good guardrails/handholding to help with proper application is not making good science easier.

> Not creating more complex tools for no reason.

Nowhere have I or others suggested that. But a tool that makes it simple to get an answer is not the same as one that makes it simple to get a correct and valid answer.

Science is a means to an end in many cases. Medicine, for example, is about better human lives. Shoddy research methods could do anything from giving false hope to actually endangering lives.


I own a copy of Wizard and have found it valuable on numerous occasions from digging around databases to tinkering with models. It handles non-trivial amounts of data with relative ease, allows you to do joins in the UI, has nice graphical representations that can change based on the type of the column. The list goes on. It does quite a lot of stuff.

Everyone I show this software to goes: "Whoa, what is this?" I would recommend checking it out before dismissing it.


I use Wizard all the time at work for analyzing manufacturing data to quickly check for trends and correlations. I find it better, easier, and faster than the internal tools purpose-built for the same task. I also prefer Wizard over JMP, Tableau, or any R/numpy/gnuplot methods, specifically for one-off tasks and analyzing new issues.

I’m not going to write a script or configure complex software for a quick check. Wizard is perfect for that. It’s also super fast at scanning through really enormous CSV files, then generating plots for every parameter.

Every coworker who sees me using it for these tasks wants to know what it is, or how I send out relevant plots so quickly when new datasets are available. It’s really a fantastic tool for quick work.

There seems to be a lot of negativity in this comments section about misuse of statistics. I think people are missing the point. Easy tools make data analysis more accessible, but misuse of data is the fault of the user, not the tool.


I some how cannot read these positive comments without believing that they are AstroTurf.


Robust statistical analyses require knowledge, judgement, and increasingly, specialist expertise. To market a product as a way to jettison the statistician is so shortsighted as to be intellectual malpractice.

Forgive me if I seem overly aggressive but I have grown weary of my and my colleagues' profession being side-lined and belittled. Politicians, administrators, and even some other scientists see statistics as merely a badge to be placed atop their own work for validation. Well, ladies and gentlemen, statistics is more than that. It is an empirical science in its own right.

  Of course, it doesn't always take a statistician to do the necessary statistical work. I am no physicist yet I can certainly apply the Clausius Clapeyron equation as needed. Likewise, I expect many (perhaps most) scientists to be able to apply an ANOVA or simple regression as the need arises.

  HOWEVER, the lack of intellectual humility on the part of so many non-statisticians when applying statistical tools to their own work is maddening.


I'm having a hard time understanding how statistics could be considered an empirical science, and really what bearing that term has on statistics at all. Can you explain?


Not OP, but I would say that when statistics acts like a subfield of mathematics, then it is an art.

However, if we consider judging whether a statistical method will be useful for the world as a part of statistics, then that part sometimes is an empirical science.

That is, the statement "I should use X technique because it performs Y% well" is sometimes an an empirical statement.


One simple example is that the power of a statistical test is usually modeled using mock experiments.


I don't understand the hostility to this application. Progress comes from trying different ways to do things. I tried this program a few years back, and thought it was ok. It is easier to use than JMP, which some others have mentioned, though not as powerful.

There is nothing about easy-to-use that precludes understanding, and certainly nothing about difficult-to-use that promotes it. They are largely orthogonal. Using R doesn't make you a statistician, any more than using C++ makes you a software engineer. If anything, a simple interface can reduce the number of ways you can shoot yourself, and leaves more time to focus on the problem.

Being easy-to-use may be the difference between some analysis and no analysis, or at best, analysis by spreadsheet.

And finally, this is Hacker News. The author wrote this software and makes some money off it. Great. Isn't that what this place is all about?


I can see a 1.000 ways I can use this app. R is cool and stuff but it is very hard to get any chart if you don't know the language. And it might take a while to write down any code.

This is much simpler. You just input the data and it gives you charts right away.


Wizard is an awesome tool for initial investigation and initial slicing and dicing of incoming data. For trying out ideas and seeing if what you're "seeing" in the data might be worthy of further investigation and hypothesis testing.

It isn't R, SPSS or Minitab. It's brilliant at what it does and I love it. I've been using it for 3 / 4 years and wouldn't swap it for any other tool.


I don't understand this program. It states it is a statistics program but on the front page only test stated is " Shapiro-Wilk". I don't know how many here are familiar with statistics, but that is basically the hello-world thing for statistic tests. Also the pricing puts this program directly at the range of Graphpad's Prism, which is widely-used in academic fields other than the field of statistics, and quite intuitive.

Making good-looking figures nowadays is not a selling point anymore. If one's willing to script rather than clicking-the-mouse, Prism, Igor Pro, Origin Pro, Matlab (pricing from low to high) all can produce great figures and solid statistical test for people out of the statistics field. But nothing these days beats R for versatile of statistical tests.


If you are willing to defer dismissing this tool out of hand as a p-value generator, you can get a better feel for how the author (Evan Miller) thinks about stats from his web site [1] and a presentation he gave [2]. I think you'll find that "Wizard" is not the product of whimsy.

[1] http://www.evanmiller.org/ [2] https://www.youtube.com/watch?v=TzJMFxj7GRI


> Trial versions never expire. They do not report p-values, and cannot save or export. Requires OS X 10.10 or later.

Maybe this makes the trial versions better?


Apart from being literally the opposite of where I think statistics should be headed (i.e. I see the notion of "removal of 'complicated statistics knowledge'" to be more dangerous than helpful), I also had some practical feedback from watching the 12 minute intro video.

-Firstly, how does the visualization know the respective population or sample size from which the summary statistics and intervals are to be drawn?

- The demo used a pie chart to try to display summary stats and confidence intervals from the general social survey. Aside from professional statisticians general dislike of pie charts, you cannot plot confidence intervals in this way into a pie chart just by inserting 'more white space between the slices'. There's only 100% of the area of the circle that you've got to play with, so any attempt to increase the 'white space' between the slices necessarily warps the real estate remaining to represent each actual slice.

- Honestly, I see this tool likely to be used by people who participate in the practice of p-hacking, whether deliberately or not. The ability to throw lots of simple models quickly at lots of data mindlessly reporting some notion of statistical significance is dangerous. I'm assuming your stats are not (cannot) be adjusted in any fashion to implicate what you're really doing by using an automated model-building/reporting regime in this way (potentially running heaps of models on heaps of data until you find one that appears 'significant' based on a statistical test designed under the assumption that this is NOT what you're doing). No where did i see any application of train/test, sample/resample type methods to try to control for over-fitting in the prediction application or truly estimate how predictive/replicable such a technique would be in the real world.

While I appreciate the work done required to put something like this together (a lot of it looks like a gui interface to my own exploratory functions/scripts in R, for example), i genuinely believe this approach is more dangerous/likely to lead to false conclusions than helpful.


I agree, in this case, you’re right on (and I have another comment saying as much) but couldn’t there be a UI that guides you safely through some of those dangers? I have yet to see anything close to one but I could imagine a system that would at least ask a few questions that would quickly disqualify a dataset or intended analysis for some use cases. Imagine, if you will, turbo-tax for (basic) statistical analyses.

As I said elsewhere, a simple interface doesn’t necessarily mean a correct outcome - In short, most stats software is solving the wrong problem. Many (especially this one) make it easy to get an answer, right or wrong. I’d rather see them make it hard to get a wrong answer - or, perhaps, hard to get an answer when you’re using it wrong.


Hmmm. I understand that by answering in the negative (no, its not possible), i would put myself into the "64k is enough for anybody" type comments. Which is to say, since its not a logical impossibility and not a well defined concept and our technology/understanding is increasing, odds are we'll at least make headways towards it to the point that its already pretty good. One could argue that we're already there compared to statistical environments of the 80's and 90's.

Honestly, I think this is a tricky human problem, not a tech problem.

My reasoning is thus. I've been known to argue that even R is bad from this perspective: I view its success (apart from its free/OS nature) due to the fact that it carries with it libraries, a functional flavour, tied around a core engine/philosophy of implicit actions preference to result production rather than bothering the user.

These enable a person (with just enough knowledge to be dangerous) to load an externally authored package (that has neither been tested nor verified) with one line, load a dataset with one line (which silently corrupted or changed something during import), and apply a function in one line (which silently coerced objects/values in the background and unreliably expressed/suppressed errors and warnings). Where it does express warnings/errors, it does so unreliably/unhelpfully, so amateurs are led down the path of excessive warnings/errors where things continue on regardless: ignore them.

To many, what I've written up there is not a bad thing: you get statistical analysis in three simple lines for free, all while copying and pasting scripts from stack overflow or the internet.

Now, i'm actually working on my own hobby project of designing a language/library for my own use which is designed around fixing those principals: still function based, interactive, fast, no implicit coercion, allowing flexibility while imposing restrains and guarantees.

But I'm under no illusion that it would necessarily be popular if i ever realised it to the public. The attraction of R is that you get a model out in three lines, rather than 14 errors and no result telling you that there are issues involving realms of thought you didn't know you were ignorant in and you'll have to go away and study before you continue, or even that your data might not be suitable for what you're doing. It might not be a good piece of work, but cynically, for the type of person buying into such a mind-set of quick analysis via "darts thrown at an analytical wall", i'm not convinced that quality genuinely matters to them (even though it matters in the effects it has on the public further down the line).

And i've not even gotten into the problem that in the real world 90% of work is not analysis/modelling but in data cleaning/munging, critical thinking and technical problem solving, and that there's another layer of problems below that one which most academics and professionals rarely engage in, which is questioning the systematic/contextual nature of the data before it got to your data set (which no software/interface i'm aware of currently addresses and most people just blithely ignore).


I video if it in action would be really helpful. Just reading the landing page its hard to know how im supposed to use it. "Just click and explore" isnt very convincing.


They link to a demo video near the bottom of the page: https://www.youtube.com/watch?v=IcA9YG9yJgs


Since other comments on this thread have complained about missing screenshots. Some can be found here: https://itunes.apple.com/us/app/wizard-statistics-analysis/i...


There is a video link at the very bottom of the page. It'll take some hazard to fall onto it. I'm not going to download a program and figure out how to use it.

Why not have the video at the top. Maybe pop right on my face. Actually, these are moments when I'd not mind a popup that takes focus out of a page.


SPSS user here, I run a bunch of surveys in Qualtrics and SurveyMonkey, and frequently use their export to SPSS file functionality.

Often, I have questions that take the format of "for each of the following categories please rank them between strongly disagree and Strongly agree". The way these questions end up in the SPSS files are typically as different variables for each row, and a 1,2,3,4, or 5 as the measurement along with the labels.

Frequently, I want to pivot those types of questions by a variable like Region or Number of Employees (categorical), and then see the resultant tables. This is never fun, and inevitably takes a lot of time.

As others have said, statistics is a careful business that doesn't necessarily warrant ease of access to all mathematical functions BUT, handling what SPSS calls "Multiple Response Sets" better would be a godsend, just for the data prep and visualization step. I still ultimately fall back on recoding these or leveraging the MRS functional in SPSS to get this done (sometimes this is better than just using pivot tables in excel).

It would be great to be able to specify this kind of thing in this program, since without it, you can't really use/trust the computed percentages in some question configurations. Take a real look at the SPSS Tables feature, the Multiple Response Sets, and then visualization of them, and consider how that data is actually coded in SPSS files (the common export of survey tools) and maybe you can improve on that feature (It shouldn't be hard, MRS is a pretty bad setup, but it gets the job done).


Just had a quick skim of the site and saw no screenshots just a load of marketese. Next I head to the HN comments: some preach FUD, others are stauch defenders. It seems very few speak from experience of actually using this application. It seems a lot of the discussion stems from the poor marketing than anything else.


no screenshots?


Was thinking same thing, there is a link to a video at the bottom though.


and the video is blurry in 480p (maybe because it was uploaded 6 years ago?)


I came across this program about 6 years ago, and the webpage hasn't changed. I have a suspicion that the program may not have been updated since, but it's hard to tell.


No, it has received frequent updates.


The app should use mutually exclusive colors for the visualizations to communicate the right and intended information.


I'd recommend some screenshots or the video before you get to pricing.

Some of the graphics need to be improved. It would also be great to see why this is better than tradition BI tools (Tableau etc) and what your unique value proposition is.


Is this close sourced? I'm not using a statistics package I don't trust or that I can take apart to see what's wrong when it's returning weird results.


How much are you willing to pay to use something like this? Would you still pay if it was open source?


I use python and R for all of my research. Why would I pay for anything when those tools allow me to do these tasks for free?


Please check your home page on 4” devices. Headline is clipped.


> PS- Wizard is only available for Mac, but if you’re reading this on a PC, consider this: for the price of high-end statistics software, you can buy Wizard and still have enough money left over for a top-of-the-line MacBook Air or MacBook Pro. Amazing, isn’t it?

Wow, that's a great way to kill off any sympathy for him. Especially given that he's apparently had 4-5 years to dig himself out of that hole.


It's especially strange since there are good free multi-platform statistics programs. There's R (as other commenters have pointed out) and JASP (https://jasp-stats.org/) which has a similar point-and-click interface for those who are used to SPSS.


It's not about "sympathy". It's about pointing out that similarly-targeted apps like Analytica can cost $1000+ per license.


But just think, for the price of this app, you could buy a nice dinner for two and an infinite number of copies of R.


$200 for this + $1000 for the cheapest macbook > your $1000 analytica license. And that's before considering that you're now stuck lugging around an extra paperweight everywhere, or that R is available for free.


> or that R is available for free.

Along with Python and Julia.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: