Highest Rated Comments


quohr45 karma

... from your GitHub:

“- Split data randomly into training and testing sets.

  • Train classifiers on training data for 1 minute.

  • Take the best classifier and note its result on test data.

  • Repeat above steps 20 times.

  • Record median result on test data.”

I certainly hope you understand that you CANNOT use the test data in the process of determining which model to use. You go even further and “repeat 20 times”, then take the MEDIAN??

  • Why choose 20 times and not 10 or 30?
  • How can you claim your method avoids overfitting?
  • Have you tried using a validation AND test set instead?

EDIT: OP’s approach is an ensemble of learners and doesn’t touch the test set during training. Thanks for clarifying OP

quohr27 karma

Not using other libraries doesn’t have anything to do with whether what you’ve developed is or isn’t DL though.

quohr11 karma

Not the point I was making. OP said that their method isn’t DL because they “don’t use any ML/AI libraries”. I get that it’s a handmade ensemble of.. regression trees (??) but OP should explain why (or when) using this approach is advantageous compared to DL - especially since we have wonderful, user-friendly and completely open sourced python libraries for this.

quohr11 karma

If you plan on publishing this, I recommend doing a formal test of repetition versus accuracy. I’d imagine it would plateau after some amount depending on whatever factors are involved (particular application, training set size, etc.)

I get what you mean, but computers don’t operate on equivalent timescales. Imagine training your method for a minute using AiMOS versus on a 1990s Macintosh haha.

Plus, the less subjective the better :)

quohr6 karma

OP’s GitHub has a one-to-one mapping test result of 100% (under “breast cancer classification mistake”), so they covered the simplest case at least.