Thursday 30 June 2022

New top story on Hacker News: Ask HN: Forensic analysis for 50 yr old tape
Ask HN: Forensic analysis for 50 yr old tape
14 by passer_byer | 10 comments on Hacker News.
Im posting for a friend who moved recently. He found a magnetic tape that was written over 50 years ago. He says, if he recalls correctly, the tape would have contained 2 file, EBCDIC encoded, written by an IBM utility using a tape sub-system attached to a s/360 processor running MVS. Assume the IBM guys have this setup in some dusty basement in upstate New York. This is a 2 part question. What is the best method for attempting to read this mag tape? That’s the main question. Second, what is the probability of success here? Assume the tape was kept in a climate controlled home.

Wednesday 29 June 2022

New top story on Hacker News: Show HN: Ploomber Cloud (YC W22) – run notebooks at scale without infrastructure
Show HN: Ploomber Cloud (YC W22) – run notebooks at scale without infrastructure
23 by idomi | 3 comments on Hacker News.
Hi, we’re Ido & Eduardo, the founders of Ploomber. We’re launching Ploomber Cloud today, a service that allows data scientists to scale their work from their laptops to the cloud. Our open-source users ( https://ift.tt/TAJS2Vt ) usually start their work on their laptops; however, often, their local environment falls short, and they need more resources. Typical use cases run out of memory or optimize models to squeeze out the best performance. Ploomber Cloud eases this transition by allowing users to quickly move their existing projects into the cloud without extra configurations. Furthermore, users can request custom resources for specific tasks (vCPUs, GPUs, RAM). Both of us experienced this challenge firsthand. Analysis usually starts in a local notebook or script, and whenever we wanted to run our code on a larger infrastructure we had to refactor the code (i.e. rewrite our notebooks using Kubeflow’s SDK) and add a bunch of cloud configurations. Ploomber Cloud is a lot simpler, if your notebook or script runs locally, you can run it in the cloud with no code changes and no extra configuration. Furthermore, you can go back and forth between your local/interactive environment and the cloud. We built Ploomber Cloud on top of AWS. Users only need to declare their dependencies via a requirements.txt file, and Ploomber Cloud will take care of making the Docker image and storing it on ECR. Part of this implementation is open-source and available at: https://ift.tt/0ksTY6L Once the Docker image is ready, we spin up EC2 instances to run the user’s pipeline distributively (for example, to run hundreds of ML experiments in parallel) and store the results in S3. Users can monitor execution through the logs and download artifacts. If source code hasn’t changed for a given pipeline task, we use cached artifacts and skip redundant computations, severely cutting each run's cost, especially for pipelines that require GPUs. Users can sign up to Ploomber Cloud for free and get started quickly. We made a significant effort to simplify the experience ( https://ift.tt/7vuUtcO ). There are three plans ( https://ift.tt/BSCn3uT ): the first is the Community plan, which is free with limited computing. The Teams plan has a flat $50 monthly and usage-based billing, and the Enterprise plan includes SLAs and custom pricing. We’re thrilled to share Ploomber Cloud with you! So if you’re a data scientist who has experienced these endless cycles of getting a machine and going through an ops team, an ML engineer who helps data scientists scale their work, or you have any feedback, please share your thoughts! We love discussing these problems since exchanging ideas sparks exciting discussions and brings our attention to issues we haven’t considered before! You may also reach out to me at ido@ploomber.io.

Tuesday 28 June 2022

Monday 27 June 2022

Sunday 26 June 2022

Saturday 25 June 2022

New top story on Hacker News: Show HN: Feather – 90 percent of Bloomberg terminal, for 5 percent of the price
Show HN: Feather – 90 percent of Bloomberg terminal, for 5 percent of the price
38 by akrai | 64 comments on Hacker News.
Hey, Wanted to share what my friend and I built — Feather. It provides investors with all imaginable financial data, without breaking the bank. Effectively 90 percent of the Bloomberg Terminal, at 5 percent of the price. We just opened sign ups for early access — all you need to sign up is your email address. We’ll open access to the software in order of sign ups, and we’d love to have you onboard. Check it out! https://try-feather.com

Friday 24 June 2022

Thursday 23 June 2022

Wednesday 22 June 2022

New top story on Hacker News: Show HN: Data Diff – compare tables of any size across databases
Show HN: Data Diff – compare tables of any size across databases
32 by hichkaker | 0 comments on Hacker News.
Gleb, Alex, Erez and Simon here – we are building an open-source tool for comparing data within and across databases at any scale. The repo is at https://ift.tt/fasmrb3 , and our home page is https://datafold.com/ . As a company, Datafold builds tools for data engineers to automate the most tedious and error-prone tasks falling through the cracks of the modern data stack, such as data testing and lineage. We launched two years ago with a tool for regression-testing changes to ETL code https://ift.tt/Vk3ZWpg . It compares the produced data before and after the code change and shows the impact on values, aggregate metrics, and downstream data applications. While working with many customers on improving their data engineering experience, we kept hearing that they needed to diff their data across databases to validate data replication between systems. There were 3 main use cases for such replication: (1) To perform analytics on transactional data in an OLAP engine (e.g. PostgreSQL > Snowflake) (2) To migrate between transactional stores (e.g. MySQL > PostgreSQL) (3) To leverage data in a specialized engine (e.g. PostgreSQL > ElasticSearch). Despite multiple vendors (e.g., Fivetran, Stitch) and open-source products (Airbyte, Debezium) solving data replication, there was no tooling for validating the correctness of such replication. When we researched how teams were going about this, we found that most have been either: Running manual checks: e.g., starting with COUNT(*) and then digging into the discrepancies, which often took hours to pinpoint the inconsistencies. Using distributed MPP engines such as Spark or Trino to download the complete datasets from both databases and then comparing them in memory – an expensive process requiring complex infrastructure. Our users wanted a tool that could: (1) Compare datasets quickly (seconds/minutes) at a large (millions/billions of rows) scale across different databases (2) Have minimal network IO and database workload overhead. (3) Provide straightforward output: basic stats and what rows are different. (4) Be embedded into a data orchestrator such as Airflow to run right after the replication process. So we built Data Diff as an open-source package available through pip. Data Diff can be run in a CLI or wrapped into any data orchestrator such as Airflow, Dagster, etc. To solve for speed at scale with minimal overhead, Data Diff relies on checksumming the data in both databases and uses binary search to identify diverging records. That way, it can compare arbitrarily large datasets in logarithmic time and IO – only transferring a tiny fraction of the data over the network. For example, it can diff tables with 25M rows in ~10s and 1B+ rows in ~5m across two physically separate PostgreSQL databases while running on a typical laptop. We've launched this tool under the MIT license so that any developer can use it, and to encourage contributions of other database connectors. We didn't want to charge engineers for such a fundamental use case. We make money by charging a license fee for advanced solutions such as column-level data lineage, CI workflow automation, and ML-powered alerts.

Tuesday 21 June 2022

Monday 20 June 2022

Sunday 19 June 2022

Saturday 18 June 2022

New top story on Hacker News: Ask HN: Why Are Git Submodules So Bad?
Ask HN: Why Are Git Submodules So Bad?
10 by ghoward | 6 comments on Hacker News.
I have been a git user for a long time, but I've never used Subversion or any other VCS more than a little. I also hardly use Git submodules, but when I do, I don't struggle. Yet people talk about Git submodules as though they are really hard. I presume I'm just not using them as much as other people, or that my use case for them happens to be on their happy path. So why are Git submodules so bad?

New top story on Hacker News: Ask HN: What was the actual impact of Microsoft anti trust case on the industry?
Ask HN: What was the actual impact of Microsoft anti trust case on the industry?
21 by hardware2win | 4 comments on Hacker News.
What projects were affected - stopped existing or the opposite, could grow. How was the atmosphere at Microsoft at the time? How normal devs were affected? What implications it had on other companies? In general, what changed after threat of break up How big topic that was? Did people outside IT even know about it?

Friday 17 June 2022

New top story on Hacker News: Ask HN: Best Dev Tool pitches of all time?
Ask HN: Best Dev Tool pitches of all time?
8 by swyx | 7 comments on Hacker News.
Hey folks! I'm trying to actively get better at pitching developer tools. So I had the idea of collecting an inspiration list of the "best of all time". Would like to crowdsource this! The vibe I'm going for is pitches that left you with a clear "before" and "after" division in your life where you not only "got it" but also keep referring to it from that point onward. Obvious candidate for example is DHH's 15 minute Rails demo (and i've been told the Elixir Liveview demo is similar) and Solomon Hykes' Docker demo. What other pitch is like that? (or successfully pitches a developer tool in a different way, up to your interpretation)

Thursday 16 June 2022

Wednesday 15 June 2022

Tuesday 14 June 2022

Monday 13 June 2022

Sunday 12 June 2022

Saturday 11 June 2022

Friday 10 June 2022

Thursday 9 June 2022

Wednesday 8 June 2022

Tuesday 7 June 2022

Monday 6 June 2022

Sunday 5 June 2022

Saturday 4 June 2022

Friday 3 June 2022

New top story on Hacker News: Show HN: Plasmo – a framework for building modern Chrome extensions
Show HN: Plasmo – a framework for building modern Chrome extensions
29 by coldsauce | 7 comments on Hacker News.
Hey HN, we're excited to have people try out our framework! When we built out a Chrome extension earlier this year, we noticed that the config was too imperative. You had to constantly tell Chrome via the manifest.json file where your files were, what your permissions should be, etc. So we thought it might be interesting to build a more declarative framework. When we built a proof of concept, we enjoyed working with it and decided to invest more time into making it usable and adding more features. We're still pretty early in building it out, and there's a bunch more we want to add, but this feels like a good time to showcase it and hear what people think!

Thursday 2 June 2022

New top story on Hacker News: I'm Afraid We're Shutting Down
I'm Afraid We're Shutting Down
297 by RBBronson123 | 46 comments on Hacker News.
So it’s with deep professional and personal sadness that I must announce my plans to shut down 70 Million Resources, Inc., the parent company of 70 Million Jobs (the 1st national, for-profit employment platform for people with criminal records) and Commissary Club (the first mobile social network for this population). When I launched 70MR in 2016, I was motivated to build a company that could short circuit the pernicious cycles of recidivism in this country--cycles that destroy lives, tear apart families and decimate communities. I sought to disrupt the sleepy reentry industry by applying technology, focusing on data, employing an aggressive, accountable team, and moving with some urgency. And for the first time, approaching the challenge as a national, for-profit venture. This approach, which I named “RaaS,” (Reentry as a Service), turned out to be wildly effective, and by the beginning of 2020, we were delivering on our mission of driving “double bottom line returns”: build a big, successful business and do massive social good. With the help of Y Combinator and nearly 1,500 investors, I assembled a team and got to work. We succeeded in facilitating employment for thousands of deserving men and women and became operationally profitable. However, the pandemic had other plans for us. When it hit in force in March 2020, companies made wholesale terminations of nearly all our people, and continued their halt in hiring for two years. Our revenue dropped like a rock to almost nothing. I immediately responded by paring our expenses to the bone and began letting team members go. There was no opportunity to raise additional funding, so I began injecting my own money into the company—money I barely have—just to keep the lights on. When the economy and job market began storming back, we were inundated with inbound requests for our services. Our perseverance seemed to be paying off. Except now we were hit with a new gut punch: “The Great Resignation.” Now our workers were reticent to come back to work. And if they did accept a job, they’d often leave after only a few days. It became obvious that we lacked the resources to weather this new storm while hoping and praying the world would normalize soon. (It still hasn’t.) Our coffers are empty. We’ve incurred a relatively small amount of debt (that I personally guaranteed) that I hope to negotiate down. All employees have been paid what they were owed (except for me). I will explore sale of assets we hold. On a personal note, I can’t tell you how grateful and humbled I’ve been that many would entrust their investment or business with me. For a person who’s done time in prison (me), it’s almost impossible to ask for someone’s trust. I have not yet forgiven myself for things I did which ultimately got me into trouble. But I will be eternally grateful to those that assisted me in my efforts to settle the score and win back my karma. From the beginning I was blessed by an unbelievable team of smart, funny, passionate young people who shared my ambition to cause change. They stuck with me/us until the very end. I’m most saddened by the millions of formerly incarcerated men and women who we won’t be able to help. These are some of the most sincere, honest, and heroic people I’ve ever met. It was my life’s honor to work with them. I’m pretty sure I’ll continue my reentry work. Several prominent organizations have indicated their interests in me assuming a leadership role. I need to work, and I need to continue my work. I’m so sorry for this outcome, despite the good we’ve done. I’m not sure we could have done anything differently or better, but ultimately, I take full responsibility. Needless to say, if you have any thoughts or suggestions, please don’t hesitate to reach out, here or at Richard@70MillionJobs.com. This has been the greatest experience of my life; it couldn’t have happened without my getting a second chance. Richard

Wednesday 1 June 2022

New top story on Hacker News: Ask HN: Is the stock market's growth largely anything more than inflation?
Ask HN: Is the stock market's growth largely anything more than inflation?
50 by coned88 | 43 comments on Hacker News.
It seems like the stock markets (USA) growth is strongly correlated to inflation. The more money the government prints the more the stock market goes up. So the market doesn't actually represent value in a company but instead debt owed to somebody else. What are the implications of this and is it a bad thing? What would the market look like if we corrected for the money supply? PS: I'm asking here because when asking at other places I was told to just not worry about it.