Builds
Technical projects and implementations.
- ChatGPT-integrated Smartshell
I recently started playing with the idea of a “smart” command-line shell integrated with ChatGPT and implemented in Python. The code for my rudimentary implementation is freely available here, and should be simple enough to install and run on a Linux system.
- XBee Protocol Design (from scratch!)
This post was entirely AI generated by Anthropic AI’s Claude. I did this because I’ve gotten very busy with college and my research projects, and I’d rather put out AI-generated content than no content at all. For the record, my friend is not named “Julia”; Claude hallucinated that part. I also added in some markdown formatting to make the presentation a little nicer. For those interested in prompt engineering (or those who just want to read human content), the prompt I used is at the end of the article, and has all the same information.
- Insights from Playing with Language Models
Ever since the groundbreaking release of ChatGPT, I’ve been wanting to look into these “large language models” (referred to from here on as LLMs). LLMs, at their core, are autoregressive transformer-based machine learning models scaled up to be able to learn from vast collections of data scraped from the internet. The central concession of an autoregressive model is that it cannot have infinite memory; instead, it takes the prior $n$ tokens as input to generate the $n + 1$th token, and discards the earliest token in memory to replace it with the most recently generated one in a sliding-window fashion, before passing the result back into the model to generate the $n + 2$nd token. While one wouldn’t expect intentionally forgetting the earliest inputs would make for an effective language model, results ever since OpenAI’s Generative Pre-Trained Transformer (GPT) have proven otherwise. Combined with major advancements in other areas of NLP like Google’s SentencePiece tokenization, researchers have been able to achieve record-breaking performance on many natural language tasks using autoregressive language models. The most recent iteration of OpenAI’s GPT, GPT-4, can even perform better than most human specialists in legal and medical exams.
- Bots will Win the Detection War
For nearly the internet’s entire history, websites have been at war with automated requests (known as “bot” requests) that attempt to exploit their services. The war has resulted in an arms race of websites trying to develop increasingly effective Turing tests and bots getting increasingly effective at defeating them. From the very first instance of distorted characters being used on AltaVista’s search engine to confound optical character recognition systems and protect the service from bots, to the original CAPTCHA (which stands for Completely Automated Turing test to tell Computers and Humans Apart)’s distorted text with added markings, to today’s Google reCAPTCHA V3, bot detection technology has advanced and been defeated by competing advances in bot technology.
- Finishing This Website
I’ve finally done it: months after starting work on this website, I’ve completed all the essential components of this website. Notably, there’s a new My Projects page that has links to long-form descriptions of all the major projects I’ve taken part in over the years. The hardest part of making this was actually writing those pages, but once that was done Jekyll did the heavy lifting and generated a website to my liking. I also got dark mode working!
- TJ Space Program
The name “Space Program” is actually something I created towards the end of my senior year; for most of my high school career, the club was called TJ NanoSat. This was because its primary mission was to build small cube satellites (known as CubeSats) and launch them to orbit. One of the club’s biggest accomplishments is the successful assembly and integration of its first independent satellite, TJ REVERB (many years prior to this, there had been a separate satellite built by a different team in partnership with Orbital Sciences—read more about that here).
- Kubuntu Media Player and NAS
For well over a year now, I’ve had my old laptop sitting on my desk, completely unused. It’s actually a decently performant HP Pavilion with a 4-generations-old i5. The reason I had to let it go and get an upgrade is physical damage to its screen (my own fault, really; I’ll never be opening a laptop from its bottom corner again). I was able to keep the screen from falling apart with tape, but the laptop is in no condition to be my daily driver.
- Starting This Website
This is it: my first post here, before my website is even close to complete. I thought it’d be worth taking a break from web development for a minute to explain what I’ve accomplished so far, because where I am now is certainly an accomplishment. Currently, I have a live (albeit work-in-progress) website with a complete structure for adding everything that I want to add.
- Independent Research
The following page contains information that might not be up to date with the most current research in the bioinformatics field. It’s intended as a record of my story, not a valid scientific source of any kind. Feel free to contact me with any questions.
- Vex Robotics
I competed in the Vex Robotics Competition (VRC) in all three of my middle school years. VRC played a pivotal role in developing my love for STEM, and served as the foundation for most of the work I’d later do in high school. This is where I learned the principles of programming, engineering, and strategic thinking. My school curriculum barely touched on any of these ideas; I had to learn everything on my own through trial and error and lots of Googling.
- EDIT ML
In the summer after my junior year of high school, I worked on the EDIT ML internship with Dartmouth College. The program gave me the opportunity to work with experts in the field of computational pathology on real cancer research that could potentially lead to faster and more accurate diagnoses for cancer patients. I worked on a team of three high school students total under Dr. Joshua Levy.