Welcome to the third vocabotics AI magazine. It’s been a slightly quieter week in the world of AI on the consumer front, but in the world of models and dev there’s been a lot going on. Expand the sections below to read more.
Hardware
DJI FlyCart 30
Here’s one we missed from last week. DJI, announced their first delivery drone – the FlyCart 30. This drone is designed with a long range and heavy payload. It can carry up to 30kg over a distance of 16km at 20 m/s. This is clearly no toy. In emergency single-battery modem it can carry up to 40kg over 8km. With IP55 protection and temperature range of -20° to 45° C (-4° to 122° F) there’s not many places you won’t be able to fly it. The DJU Delivery Hub system makes the operational planning, monitoring, and resource management a doddle. Of course, such horse power doesn’t come cheap, starting at $17,000. So watch out for your next parcel delivery.
Google Gemini in the Samsung Galaxy S24
Google and Samsung have announced that the new Samsung Galaxy S24 series will incorporate Google’s Gemini. The announcement seems to suggest there will be a combination of Gemini Pro in the cloud, and Gemini Nano running locally on the device. This makes perfect sense and should lead to an enhanced user experience.
rabbit r1 update
After phenomenal success during launch week, orders have kept rolling in. Jesse Lyu has confirmed that there have now been 50,000 orders of the AI fanboy’s latest toy. He also announced a team up with Perplexity AI to bring their web connected LLM’s to the rabbit r1. There’s also a $200 Perplexity AI Pro credit for the first 100,000 buyers.
Apple Vision Pro update
So the pre-order came and went without any major hitches. A few people complained that they missed out because they didn’t have an iPhone, so couldn’t scan their face. But early rumors suggest they sold somewhere in the region of 160-180k units over the weekend.
Global News
Mark Zuckerberg interview with the Verge
Mark Zuckerberg took the time to sit down with Alex Heath from the Verge this week to give an update on what’s going on at Meta. The key points that set tongues wagging include:
- Meta’s new goal is to create Artificial General Intelligence (AGI)
- Llama 3 the next gen open source large language model will remain open source.
- Meta is investing in over 340 thousand NVIDIA H100 GPUs. Expecting to have almost 600k GPUs by the end of the year.
- AI glasses are on the roadmap.
- Meta will combine the FAIR research group with the gen AI team.
You can read the full article at The Verge.
Elon Musk’s Ambitious Vision for Space and Beyond
SpaceX’s Stellar Year in Review: Elon Musk, in a recent address to staff, highlighted SpaceX’s unprecedented achievements in the past year. Leading the space race, the company has outperformed all expectations with a record-breaking 96 launches, surpassing even the Soviet Soyuz. The Falcon rockets have not only set new benchmarks in frequency but also in reusability, with a single Falcon 9 booster being used for an astounding 19 flights.
Starlink’s Global Connectivity Leap: Shifting the focus to Starlink, Musk’s ambition of global internet coverage is turning into reality. With the introduction of V2 mini satellites, SpaceX is not just enhancing network capacity but also reducing latency below 20ms, crucial for a superior online experience. Impressively, Starlink has managed to transmit text messages directly from mobile phones to satellites, pioneering a new era of remote connectivity.
Starship: A Behemoth in Making: The Starship program, central to Musk’s Mars colonization dream, has made significant strides. Surpassing the Saturn 5, it stands as the largest flying object ever constructed. With plans to increase its size and capabilities further, Starship aims to conduct multiple launches per day, revolutionizing space travel and cargo transport.
A Glimpse into Future Mars and Moon Missions: Elon Musk’s vision extends beyond Earth’s orbit, with ambitious plans for establishing human presence on Mars and the Moon. The creation of a sustainable Moon base and Mars city encapsulates SpaceX’s long-term goals. These plans aren’t just about exploration; they’re about ensuring humanity’s survival as a multi-planetary species.
Urgency in Achievement: In his address, Musk underscored the urgency of these goals in the backdrop of potential existential threats to civilization. With advancements like orbital refueling and payload deployment in the pipeline, SpaceX is not just chasing dreams but is on a mission to redefine humanity’s future in space.
You can watch the full speech here.
Decoding AI’s Present and Future: Insights from Sam Altman’s Panel event in Davos
The Evolution and Impact of AI – A Balanced Perspective: In an insightful panel discussion, Sam Altman, the CEO of OpenAI, delved into the current state and the future of artificial intelligence. Amidst the growing dichotomy where AI is either seen as humanity’s doom or a panacea for all problems, Altman presented a balanced viewpoint. He acknowledged AI’s profound yet flawed capabilities, highlighting its utility in enhancing productivity and creativity while cautioning against over reliance in critical scenarios.
Building Trust in AI – The Road Ahead: Trust in AI, a cornerstone of its widespread adoption, was a focal point of discussion. Altman emphasized the need for AI systems to explain their reasoning in a human-comprehensible manner, facilitating a deeper understanding and trust in their capabilities. This approach, he argues, will be more effective than attempting to decipher the intricate workings of AI’s neural networks.
AI in Society – A Tool, Not a Replacement: Contrary to the dystopian view of AI rendering humans obsolete, Altman envisions AI as an advanced tool augmenting human capabilities. He predicts that AI’s role will primarily be to elevate human efficiency and creativity, not to replace the innate emotional intelligence and empathy that define humanity. This perspective resonates with the continued relevance of human judgment and interaction in an AI-enhanced world.
The Role of AI in Public Services: The potential of AI to revolutionize public services was also discussed, with an optimistic outlook on its ability to improve efficiency and reduce operational costs. The panel underscored the importance of governmental involvement in AI regulation, advocating for a balanced approach that fosters innovation while ensuring responsible use.
Data Usage and AI Training – Navigating the Complex Landscape: Altman touched upon the contentious issue of using public data for training AI models. He suggested the future might see a shift towards using smaller, high-quality data sets for more meaningful learning. The conversation pointed towards the need for innovative economic models that fairly compensate content creators, aligning with the evolving nature of AI training.
AI on the Global Stage – Collaboration Over Competition: The panel highlighted the significance of setting global AI standards that mirror democratic values while encouraging international dialogue and cooperation. The goal is to harness AI as a unifying force, steering clear of using it as a tool in geopolitical power struggles.
Personal Reflections – Resilience in the Face of Adversity: Sam Altman’s personal experiences, including his recent boardroom challenges, offered a glimpse into the resilience and adaptive leadership required in the rapidly evolving AI landscape. He reflected on the strength and capability of the OpenAI team, underscoring the importance of preparation and adaptability in navigating the unpredictable journey of AI development.
In conclusion, the panel, led by Altman’s insights, painted a picture of AI as a powerful yet imperfect tool. It’s a technology that demands responsible stewardship, thoughtful regulation, and a collective effort to realize its full potential for societal good. The conversation echoed a sentiment of cautious optimism, where AI is a partner to humanity’s progress, not a replacement.
You can watch the full interview here.
Software & Models
Runway ML Motion Brush
Runway ML has launched its Motion Brush. This highly anticipated feature allows users to markup areas of an image then apply specific directional controls on them. This feature brings us one step closer to feature length AI movie productions. You can see the settings used for our cover image below.
Try it here.
Copilot Pro
Copilot Pro is a premium subscription version Microsoft Copilot launched this week for $20 per month. It offers a single AI experience that runs across devices, understanding context on the web, PC, and across apps, and mobile devices.
The main features of Copilot Pro include:
- Enhanced image creation
- The ability to create custom GPTs for specialized tasks
- Priority access to the latest AI models
- AI features in Office apps, such as Powerpoint, Word, Excel, Teams, and Outlook
- Code interpreter
- Faster performance
The ability to use AI directly within office is surely going to boost productivity for many.
CGDream AI
CGDream.ai is a cutting-edge app that combines 3D models with generative AI to create stunning 2D images. It is designed to be accessible to everyone, regardless of their 3D experience, and offers many features for various creative industries. Key features of CGDream.ai include:
- 3D to image generation: Users can generate images using 3D models with just a few text prompts.
- Filters: CGDream.ai offers a variety of filters for artists to create beautiful AI art.
- 3D visualizers: CGDream.ai allows users to rapidly create visualizations with their 3D models.
- Architects & Interior designers: The app enables professionals to visualize and alter their design concepts.
- Product designers: CGDream.ai helps product designers experiment with product variations.
You can try CGDream.ai for free today here.
testimonial.to
Testimonial.to is a service designed to streamline the process of collecting and showcasing customer testimonials. It allows users to gather and manage testimonials efficiently, and then embed them into websites with ease. This tool is particularly useful for businesses looking to display customer feedback and endorsements on their sites, providing an effective way to enhance credibility and trust with potential clients or customers.
lmstudio.ai
LM Studio is a desktop app that allows users to experiment with local and open-source Large Language Models (LLMs). It supports the openai compatible API and is completely open-source. The app enables users to discover, download, and run LLMs on their local machines. The app is designed to facilitate various tasks such as reviewing written content, creating summaries, and extracting keywords and entities. Users can also use it for reductive tasks and other language model experiments. The app is available for Mac and Windows platforms, and it is designed to be user-friendly for experimenting with LLMs.
Download it here.
Product Hunt finds
TidyCal 3.0 – https://www.producthunt.com/posts/tidycal-3-0
TidyCal 3.0 is a simple calendar and booking solution that allows users to manage their appointments and meetings more efficiently. Key features of TidyCal 3.0 include:
- Multiple booking types: Users can create free and paid meetings, as well as one-on-one or group meetings.
- Customizable availability: Users can customize their availability by day with gap times to optimize their time.
- Integration with Zoom and Google Meet: TidyCal is integrated with popular video conferencing platforms, making it easier to organize and manage online video conferences.
- Sharing booking pages and collecting client information: TidyCal allows users to easily share booking pages via personalized links or embeddable widgets, making it convenient for clients or customers to book appointments without leaving the website or sending emails back and forth.
TidyCal is designed to be user-friendly and reliable, with a focus on ease of use and value for money. It has been praised for its powerful alternative to Calendly at a fraction of the price. However, some users have outgrown its functionality and moved to other platforms due to its lack of integration features. Overall, TidyCal is a popular choice for businesses looking for a simple and efficient calendar and booking solution.
Photify AI – https://www.producthunt.com/posts/photify-ai
Photify AI is a photo editing app that uses artificial intelligence to transform selfies into various styles and looks. The app allows users to change hairstyles, clothes, and even gender, and transport themselves to different eras and locations. Photify AI is available on the Android platform and as a Telegram bot and has gained popularity for its ability to generate vivid AI portraits in just a few seconds. The app has been praised for its ability to unleash the user’s imagination and elevate their selfie game.
Notion Calendar – https://www.producthunt.com/posts/notion-calendar
Notion Calendar is a productivity tool that integrates and syncs with all Google Calendar events. It simplifies time management and is fully integrated with the Notion workspace. Users can view their commitments in one place and customize their calendar view by day, week, or month. Notion Calendar is a database that allows users to organize information by date and add other properties such as text, numbers, single-select menus, multi-select menus, dates, people, and more. Users can also create custom-built table views, lists, boards, and switch seamlessly between them by clicking on their tabs. Notion Calendar is a popular choice for businesses and individuals looking for a simple and efficient calendar and productivity solution.
AI Assist – https://www.producthunt.com/posts/ai-assist-4
AI Assist is a next-generation tool designed to help users query, analyze, and report on data in a spreadsheet. It is inspired by Open AI’s Assistants API and powered by GPT-4 Turbo, enabling it to write, edit, and fix queries, formulas, charts, and more. The tool is included for free with paid plans while in beta, and it aims to provide advanced assistance in working with spreadsheet data.
Al Picasso – AI dance – https://www.producthunt.com/posts/ai-picasso-ai-dance
The AI Picasso is a new product launched on Product Hunt, which allows users to create full-body AI dance videos from photos. The app is primarily aimed those making TikTok, Instagram Stories/Reels, and Youtube Shorts.
AI Lawyer 2.0 – https://www.producthunt.com/posts/ai-lawyer-2-0
AI Lawyer 2.0 is a tool designed to provide instant legal help for consumers and act as a co-pilot for lawyers. It utilizes artificial intelligence to assist with legal research, document analysis, and contract review. This is one of the first steps into a regulated industry using AI. Do we think it will be accepted?
GitHub Trending
The GitHub Trending page is an amazing resource for those looking to play with the latest AI tools and models.
A few of the trending AI related repos this week included:
bespoke_automata
https://github.com/C0deMunk33/bespoke_automata
“bespoke_automata” is a GUI and deployment pipeline for creating complex AI agents locally and offline. It provides a platform for developing and deploying artificial intelligence agents.
Maybe: Open-source personal finance app
https://github.com/maybe-finance/maybe
Maybe is a personal finance and wealth management app built using React, Node.js, TypeScript, and PostgreSQL. The app is licensed under the AGPL-3.0 license.
xiaolai/everyone-can-use-english
https://github.com/xiaolai/everyone-can-use-english
AI Gateway
https://github.com/Portkey-AI/gateway
AI Gateway allows you to route LLM API requests from your application to a wide range of hosted LLM’s using a unified API.
Open Interpreter
https://github.com/KillianLucas/open-interpreter
Open Interpreter is a tool for letting LLMs execute code such as Python and Javascript locally. This means you can perform tasks on you PC using natural language such as creating and editing videos, PDFs, controlling a browser window etc.
LibreChat
https://github.com/danny-avila/LibreChat
LibreChat is an advanced chatbot platform that allows the integration of multiple AI models and enhances original client features like conversation search, message search, prompt templates, and plugins. LibreChat eliminates the need for ChatGPT Plus by offering free or pay-per-call API options.
supervision
https://github.com/roboflow/supervision
Computer vision toolkit for trackers, zones, annotators etc.
Dev news
AlphaCodium
In the recent publication “Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering”, researchers Tal Ridnik, Dedy Kredo, and Itamar Friedman from CodiumAI introduce AlphaCodium, an innovative approach designed to enhance the abilities of Language Models (LLMs) in code generation. This method, a test-based, multi-stage, code-oriented iterative flow, significantly outperforms traditional techniques in handling the unique complexities of code generation, such as precise syntax and edge case identification. In tests using the challenging CodeContests dataset, AlphaCodium dramatically improved GPT-4’s coding accuracy, showcasing its potential to revolutionize code generation tasks in various programming environments.
You can try it out here.
codeqai
codeqai allows you to chat with your code. It achieves this by parsing the code with treesitter to extract all the methods and documentation, then embedding it into a local FAISS vector database. llama.cpp or Ollama is used to chat with the code.
You can try it here.
LangGraph
LangGraph is a Python library built on top of LangChain, designed for building stateful, multi-actor applications with LLMs (Large Language Models) by enabling the creation of cyclical graphs, which are often needed for agent runtimes. It extends the LangChain Expression Language with the ability to coordinate multiple chains or actors across multiple steps of computation in a cyclic manner. LangGraph is inspired by Pregel and Apache Beam and provides an interface inspired by NetworkX. It is used for adding cycles to LLM applications and is not a DAG (Directed Acyclic Graph) framework. LangGraph is intended for scenarios where cycles are important for agent-like behaviors, such as calling an LLM in a loop to determine the next action. It is recommended to use LangGraph when cycles are needed, as LangChain Expression Language allows the easy definition of chains (DAGs) but does not have a good mechanism for adding in cycles.
Try it here.
Wavecoder
WaveCoder-Ultra-6.7B has been introduced in the paper “WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation”. This model focuses on improving the instruction tuning of LLMs for improved code development.
Read the paper here.
LLMLingua
LLMLingua is a project developed by Microsoft to speed up the inference of large language models (LLMs) and enhance their perception of key information. The project aims to achieve up to 20x compression with minimal performance loss by compressing the prompt and KV-Cache. The main goal is to improve the efficiency and effectiveness of LLMs in processing and understanding information.
The paper and demo can be found here.
Proxy-tuning
The paper “Tuning Language Models by Proxy” introduces a method called proxy-tuning, which allows for the efficient customization of large language models (LMs) using small tuned LMs during decoding time. This method has shown promise in improving coding benchmarks over the base model by 17% – 32% absolute improvement. The approach involves modifying the logits of the target LM during decoding time, without directly modifying the model’s weights. The paper discusses the potential of this method for low-resource language cases and its implications for knowledge distillation.
Read more here.
Papers
As usual there have been a lot of interesting papers published on HuggingFace. Read more about some of this weeks at the links below.
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads https://huggingface.co/papers/2401.10774
Synthesizing Moving People with 3D Control
https://huggingface.co/papers/2401.10889
Self-Rewarding Language Models
https://huggingface.co/papers/2401.10020
DiffusionGPT: LLM-Driven Text-to-Image Generation System
https://huggingface.co/papers/2401.10061
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
https://huggingface.co/papers/2401.09962
ReFT: Reasoning with Reinforced Fine-Tuning
https://huggingface.co/papers/2401.08967
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
https://huggingface.co/papers/2401.06951
Tuning Language Models by Proxy
https://huggingface.co/papers/2401.08565
TrustLLM: Trustworthiness in Large Language Models
https://huggingface.co/papers/2401.05561
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
https://huggingface.co/papers/2401.05566
Towards Conversational Diagnostic AI
https://huggingface.co/papers/2401.05654
Back pages
This week a few videos have caught my eye:
“Sister of Battle” an AI film from Brett Stuart @bstuartTI
@ScottBatemanArt has been making this amazing Chuckie Egg scene on the ZX Spectrum using Blender. This brings back fond memories. Scott chose to build this in public and I’ve been keeping a close eye on it over the last few days.
@yoheinakajima’s TED talk is finally published. A great story from the inventor of BabyAGI.
In the crypto world, infamous bitcoin inventor – @satoshi reappeared briefly on Jan 18th with a “Hello World” post. There is a lot of speculation however as to whether he ever really owned this account. The post was quickly deleted.
@elonmusk also gave us an update this week as to a new skill Optimus has learnt – how to fold a shirt!
Hugging Face has launched a competitions platform. If you want to create your own competition, find out more here:
That’s all for this week. To receive this Newsletter to your inbox please subscribe in the links below.