Hello,
I built this tool to help me find original contents on HN. It is impossible to go through all posted contents everyday and I'm not interested in all the news content. So this tool try its best to filter out 'non-original contents', here is the rules that it uses to filter out content:
- If one of these terms is present in text : 'trump', 'google', 'facebook', 'reasonml', 'graphql', 'amd', 'nvidia', 'ai', 'linkedin', 'apple', 'tesla', 'go', 'rust', 'severless', 'firefox', 'amazon', '(2005)', '(2006)', '(2007)', '(2008)', '(2009)', '(2010)', '(2011)', '(2012)', '(2013)', '(2014)', '(2015)', '(2016)', '(2017)', '(2018)', '(2019)', 'chatbot', 'react', 'kubernetes', 'aws', 'ml', 'webassembly', 'coronavirus', 'docker', 'heroku', 'typescript', 'python', 's3', 'swift', 'bitcoin', 'serverless', 'javascript', 'hiring', 'books', 'git', 'tutorial', 'cve', 'c++', 'medium.com', 'github', 'java', 'haskell', 'covid-19', 'covid19', 'covid', 'covid 19', 'show hn',
- If it comes from these domains : 'www.nature.com', 'www.vox.com', 'www.pnas.org', 'www.apnews.com', 'developers.google.com', 'capitalandgrowth.org', 'www.politico.com', 'www.scmp.com', 'bostonreview.net', 'slate.com', 'www.sciencedaily.com', 'www.economist.com', 'appleinsider.com', 'www.psypost.org', 'dev.to', 'yorfriends.com', 'www.marketwatch.com', 'motherboard.vice.com', 'news.mit.edu', 'a16z.com', 'itunes.apple.com', 'yellrobot.com', 'arxiv.org', 'www.revyuh.com', 'ssnet.org', 'www.kickstarter.com', 'papers.ssrn.com', 'spectrum.ieee.org', 'stackoverflow.blog', 'finance.yahoo.com', 'analyticsindiamag.com', 'www.dailymail.co.uk', 'hbr.org', 'nesslabs.com', 'dzone.com', 'play.google.com', 'science.sciencemag.org', 'aws.amazon.com', 'chrome.google.com', 'www.schneier.com', 'www.ancient-origins.net', 'www.phoronix.com', '9to5mac.com', 'nwn.blogs.com', 'edition.cnn.com', 'www.nasa.gov', 'www.bbc.com', 'www.reuters.com', 'www.anandtech.com', 'en.wikipedia.org', 'puri.sm', 'www.theatlantic.com', 'web.archive.org', 'www.technologyreview.com', 'www.inc.com', 'mobile.twitter.com', 'www.geekwire.com', 'docs.google.com', 'www.buzzfeednews.com', 'techcrunch.com', 'theconversation.com', 'www.microsoft.com', 'www.linkedin.com', 'www.citylab.com', 'www.btcwires.com', 'www.entrepreneur.com', 'www.tecmint.com', 'www.tomshardware.com', 'insights.dice.com', 'www.artificiallawyer.com', 'www.smithsonianmag.com', 'krebsonsecurity.com', 'www.quora.com', 'foreignpolicy.com', 'sociable.co', 'www.hollywoodreporter.com', 'www.japantimes.co.jp', 'www.ncbi.nlm.nih.gov', 'cntechpost.com', 'bigthink.com', 'developer.apple.com', 'www.psychologytoday.com', 'www.axios.com', 'github.com', 'www.scientificamerican.com', 'semiengineering.com', 'news.yahoo.com', 'www.digitalphablet.com', 'bit.ly', 'www.theverge.com', 'www.bloomberg.com', 'www.eff.org', 'www.namasteui.com', 'www.seattletimes.com', 'www.businessinsider.com', 'twitter.com', 'thenextweb.com', 'imgur.com', 'uk.reuters.com', 'www.cnn.com', 'blog.mozilla.org', 'www.forbes.com', 'www.techrepublic.com', 'www.facebook.com', 'www.npr.org', 'stratechery.com', 'sciencemag.org', 'lwn.net', 'blogs.scientificamerican.com', 'youtube.com', 'prakhartechviz.blogspot.com', 'mindmatters.ai', 'apps.apple.com', 'blog.usejournal.com', 'psmag.com', 'www.howtogeek.com', 'thebulletin.org', 'gist.github.com', 'www.nybooks.com', 'www.mercurynews.com', 'jobs.lever.co', 'www.lyricsongation.com', 'www.nationalgeographic.com', 'habr.com', 'electrek.co', 'www.ebay.com', 'getpocket.com', 'nypost.com', 'newatlas.com', 'www.cbc.ca', 'www.zdnet.com', 'youtu.be', 'www.theregister.co.uk', 'japantoday.com', 'wccftech.com', 'www.telegraph.co.uk', 'www.nextplatform.com', 'www.vanityfair.com', 'www.computerworld.com', 'reason.com', 'mashable.com', 'www.bleepingcomputer.com', 'link.springer.com', 'aeon.co', 'www.reddit.com', 'gizmodo.com', 'qz.com', 'decryptmedia.com', 'arstechnica.com', 'neurosciencenews.com', 'futurism.com', 'www.wired.com', 'www.popsci.com', 'www.wired.co.uk', 'thehill.com', 'fortune.com', 'hackernoon.com', 'www.vice.com', 'www.theinformation.com', 'www.redhat.com', 'www.amazon.com', 'heartbeat.fritz.ai', 'nautil.us', 'www.iafrikan.com', 'venturebeat.com', 'theintercept.com', 'www.cnbc.com', 'www.indiehackers.com', 'www.theguardian.com', 'www.outsideonline.com', 'www.bbc.co.uk', 'www.independent.co.uk', 'www.washingtonpost.com', 'leimao.github.io', 'www.newscientist.com', 'melmagazine.com', 'phys.org', 'www.engadget.com', 'www.usatoday.com', 'techerati.com', 'medium.com', 'www.sfchronicle.com', 'www.cnet.com', 'news.ycombinator.com', 'www.sciencealert.com', 'www.wsj.com', 'www.npmjs.com', 'link.medium.com', 'itnext.io', 'www.newyorker.com', 'www.nytimes.com', 'www.youtube.com', 'avc.com', 'www.infoq.com', 'slatestarcodex.com', 'www.nbcnews.com', 'blog.acolyer.org', 'sloanreview.mit.edu', 'apnews.com', 'economictimes.indiatimes.com', 'ai.facebook.com', 'www.newsweek.com', 'nymag.com', 'about.gitlab.com', 'www.thespec.com', 'arinsider.co', 'www.fastcompany.com', 'www.confluent.io', 'decrypt.co', 'kotaku.com', 'www.livescience.com', 'www.cbsnews.com', 'www.techdirt.com', 'www.sciencemag.org', 'wire.mpelembe.net', 'slashdot.org', 'blog.floydhub.com', 'www.latimes.com', 'webwriterspotlight.com', 'm.youtube.com', 'www.xda-developers.com', 'www.ft.com',
- If the user is flagged
A user is flagged if :
- He posted more than 2 two links in the last 1 hour
- He posted more than 5 links in the last 5 days
- He has posted more than 5 links in the last 30 days and among the posts he posted 30% were flagged
The idea behind all that is to reduce enough the number of links posted per day, so it is ok to go through the list of all links left everyday. And also by giving the maximum chance to 'non-professional' posters