MiniG este un model AI antrenat pe un set de date de sinteză de peste 120 de milioane de parametri
Acest set de date a fost generat prin aplicarea unor modele de limbaj de ultimă generație care utilizează ferestre mari de context care permite MiniG să accepte la intreare atât text, cât și imagini.
1. Unde putem folosi CausalLM MiniG?
MiniG este un model capabil să abordeze o gamă variată de sarcini, de la conversații cu utilizatorii, până la integrări de computer vision:
- Generare de conținut: MiniG poate genera text de o calitate foarte bună, incluzând articole sau chiar povești pe baza unor instrucțiuni ale utilizatorului sau chiar imagini;
- Sarcini multimodale: Modelul este capabil de computer vision și poate fi folosit la descrieri de imagini, conversații pe baza imaginilor, analiza de imagini, clasificare de imagini.
- Coversații: modelul este capabil de înțelegerea limbajului natural și poate fi folosit în clasificări, analize de text, dialog, întrebări și răspunsuri.
Unul din marile avantaje ale MiniG este contextul de 1 milion de tokeni care îl face să poată prelucra un volum foarte mare de date.
Modelele AI de tip CausalLM sunt clase de modele de obicei care conțin doar decoderul modelului. Acest tip de modele generează răspunsuri exclusiv pe baza tokenenelor anterioare. De aici și de numirea de modele AI „cauzale”, pentru că modelele anterioare cauzează generarea tokenelor următoare.
Un exemplu de funcționare este generarea expresieie „pisca sta la fereasta și toarce”, modelul făcând următoarele predicții:
- pisica –> stă
- pisica stă –> la fereastă
- pisica stă la fereastă –> toarce
Fiecare pas poate fi considerat o imagine a aceleași propozții din diferite puncte de observare. În procesul de instruire a modelelor, fiecare propoziție este împârțită automat pe modelul de mai sus, în tokenuri individuale și regenerată la sfârsit.
2. Testare model MiniG
Ne-am propus să testăm capacitatea de generare text pe baza unei teme date de utilizator. Pe aceste sarcini MiniG se confruntă cu LongWriter sau LLM-uri dedicate compunerii de povești precum Nexus StoryWriter.
Pentru testare vom folosi următorul cod python:
import torch
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
)
from huggingface_hub import login
device = "cuda"
tokenizer = AutoTokenizer.from_pretrained("CausalLM/miniG",trust_remote_code=True)
query = """Write an article about the development of home-use robots, from the novels od Isaac Asimov from the middle of 20th century to cheap humanoid robots of the year 2024.
Themes to explore:
- how can robots help people in household activities
- how can robots help the lone people
- what are the risks of using robots
Genre:
Article
Tone:
Useful information
"""
inputs = tokenizer.apply_chat_template([{"role": "user", "content": query}],
add_generation_prompt=True,
tokenize=True,
return_tensors="pt",
return_dict=True
)
inputs = inputs.to(device)
model = AutoModelForCausalLM.from_pretrained(
"CausalLM/miniG",
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True
).to(device).eval()
gen_kwargs = {"max_length": 2500, "do_sample": True, "top_k": 1}
with torch.no_grad():
outputs = model.generate(**inputs, **gen_kwargs)
outputs = outputs[:, inputs['input_ids'].shape[1]:]
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Am solicitat modelului AI să genereze un text despre utilizarea roboților casnici sub formă de articol informativ.
Înainte de rularea codului va trebui să instalăm următoarele librării:
pip install torch
pip install git+https://github.com/huggingface/transformers
pip install accelerate huggingface_hub bitsandbytes torchvision
pip install tiktoken torchvision charset_normalizer
Modelul miniG are o dimensiune de 28 GB, iar din testele noastre a rezultat câ nu poate fi rulat pe 32 GB de VRAM.
În consecință am rulat codul pe o mașină virtuală dotată cu o placă NVIDIA RTX a6000 ADA cu 48 GB VRAM.
Pentru utilizarea modelului este nevoie de obținerea unui API Huggingface și logarea cu acest API.
La rularea codului am obținut următorul rezultat:
## From Science Fiction to Reality: The Evolution of Home Use Robots
The dream of robotic companions capable of performing household chores and aiding the infirm has long been a staple of science fiction, appearing in the pages of Isaac Asimov's *I, Robot* and the vivid imaginations of countless authors and filmmakers. While these machines once existed solely in the realm of imagination, the rapid advancements of artificial intelligence (AI), robotics, and miniaturization technology have propelled us into an era where home-use robots are no longer science fiction, but a reality rapidly evolving into our daily lives.
**From cleaning to companionship: A journey through the evolution of home robots**
The journey towards the present landscape of home robots began with Asimov's *I, Robot* in 1950. In this seminal work, Asimov outlined his "Three Laws of Robotics," which continue to influence the development of ethical and safe robotic systems. While the robots depicted in the novel were primarily focused on aiding humans in dangerous and physically demanding tasks, the core concept of intelligent machines performing domestic duties resonated with readers and set the stage for future innovations.
Early attempts at creating home robots were often clunky, expensive, and limited in functionality. The 1980s saw the emergence of dedicated vacuuming robots, inspired by the "Roomba" concept popularized in *I, Robot*. These early models, while groundbreaking, were often plagued by technical limitations and a lack of adaptability to changing environments.
The turn of the millennium marked a significant turning point in the development of home robots. The rise of AI, particularly machine learning and computer vision, provided the necessary intelligence for robots to navigate complex environments and interact with their surroundings in a more sophisticated manner. This paved the way for the development of more advanced robots capable of performing a wider range of tasks, from vacuuming and mopping to cooking, laundry, and even companionship.
**The present and future of home robots: A world of possibilities**
Today, the market is flooded with a diverse range of home robots, each designed to cater to specific needs and preferences. From the ubiquitous vacuuming robots to advanced robovacuum models equipped with mapping capabilities and app-controlled schedules, the possibilities are seemingly endless.
Beyond the mundane tasks, home robots are increasingly being integrated into the lives of individuals with disabilities or limited mobility. These robots can assist with tasks such as opening doors, retrieving items, and even providing companionship through voice-activated assistants and interactive displays.
However, the integration of robots into our homes is not without its challenges and ethical considerations. Concerns regarding privacy, data security, and the potential for misuse of personal information collected by these devices are paramount. Additionally, the question of job displacement due to automation remains a significant concern, requiring careful consideration and proactive measures to ensure a smooth transition for workers in affected industries.
**A glimpse into the future: A world where robots and humans coexist**
As technology continues to advance at an unprecedented pace, the future of home robots is brimming with exciting possibilities. The integration of artificial intelligence, robotics, and the Internet of Things (IoT) is paving the way for a future where homes are not just equipped with intelligent appliances, but rather, become part of a seamlessly interconnected ecosystem.
Imagine a world where your robot vacuum not only cleans your floors but also learns your preferences and adjusts its cleaning patterns accordingly. Where your robot cook prepares meals based on your dietary needs and preferences, while your robot companion provides companionship and emotional support through interactive conversations and activities.
While these scenarios may seem like science fiction, they are not far from becoming reality. As researchers and developers continue to push the boundaries of what is possible, the line between fiction and reality is blurring, and the future of home robots is poised to transform the way we live, work, and interact with technology.
In conclusion, the evolution of home robots from Asimov's fictional companions to the sophisticated machines of today and tomorrow highlights the transformative power of technology. These machines have the potential to revolutionize the way we live, work, and interact with our environment, offering solutions to everyday challenges, aiding those in need, and creating a future where humans and machines can coexist harmoniously in a world of unprecedented convenience, efficiency, and companionship.
Am solicitat apoi generarea unui articol pe o altă temă, un articol de data aceasta mult mai tehnic cu accent pe computer vision:
query = """Write an 7000 tokens article about computer vision an LLM in artificial intelligence context.
Themes to explore:
- what is computer vision?
- where can computer vision be used?
- what known LLM have computer vision?
- an python code example for using Phi3.5-vision-instruct from Microsoft ai model for describing the content of an image
Genre:
Technical article
Tone:
Useful information
"""
Articolul generat este următorul:
Unveiling the Powerhouse: Computer Vision Meets LLMs in AI
In the ever-evolving landscape of artificial intelligence, two behemoths have emerged as potent forces, shaping the way we interact with technology and understand the world around us: computer vision and large language models (LLMs). This article delves into the heart of these technologies, exploring their individual strengths, symbiotic potential, and transformative impact on various sectors.
Decoding Computer Vision:
Imagine being able to imbue machines with the ability to „see” and „understand” the world like humans do. That, in essence, is the power of computer vision. Born from the intersection of artificial intelligence, image processing, and photography, computer vision enables computers to analyze and interpret images and videos, extracting meaningful information from visual data.
Think of it as giving machines a digital eye, capable of recognizing objects, faces, scenes, and even actions within images and videos. This technology is far from a mere curiosity; it’s a game-changer with applications that are as diverse as they are impactful.
Where Computer Vision Reigns Supreme:
The possibilities of computer vision are as vast as they are exciting. From the mundane to the groundbreaking, this technology is leaving its mark across numerous sectors:
- Healthcare: Detecting diseases like cancer from medical images, analyzing patient vital signs, and assisting in surgeries are just a few ways computer vision is revolutionizing healthcare.
- Transportation: Self-driving cars rely heavily on computer vision for object detection, lane keeping, and traffic interpretation, paving the way for a safer and more efficient future on the roads.
- Retail: Enhancing online shopping experiences through product search, personalized recommendations, and inventory management based on visual data is transforming the way we buy and sell.
- Security and Surveillance: Face recognition, object tracking, and behavior analysis are used for security purposes, from identifying suspects to ensuring public safety.
- Entertainment: From creating realistic special effects in movies to developing interactive gaming experiences, computer vision adds a new dimension to our entertainment.
The Synergy with LLMs:
While computer vision excels at understanding visual data, it often faces challenges when it comes to interpreting the context and nuances captured within images and videos. This is where LLMs come into play. These powerful language models, trained on massive datasets of text and code, possess an uncanny ability to understand and generate human language, providing a bridge between the visual and textual worlds.
The combination of computer vision and LLMs creates a powerful synergy, unlocking new possibilities and expanding the horizons of AI. LLMs can leverage computer vision to enhance their understanding of visual data, while computer vision can benefit from the textual insights provided by LLMs.
Notable LLMs with Computer Vision Capabilities:
Several LLMs have started integrating computer vision capabilities, blurring the lines between text and images:
- Google’s Imagen: This text-to-image generation model can create stunning visuals from textual descriptions, pushing the boundaries of creative expression.
- OpenAI’s DALL-E 2: Similar to Imagen, DALL-E 2 allows users to generate images from text prompts, demonstrating impressive image synthesis capabilities.
- Microsoft’s Bing AI: Bing AI utilizes computer vision to enhance search results, providing users with more informative and visually engaging experiences.
Demystifying Phi3.5-Vision-Instruct:
Let’s delve into the world of Phi3.5-Vision-Instruct, a powerful tool that combines the might of Phi3.5, a cutting-edge LLM, with the prowess of computer vision.
What is Phi3.5-Vision-Instruct?
Phi3.5-Vision-Instruct is a specialized AI model developed by Microsoft AI. It seamlessly integrates the advanced capabilities of Phi3.5, an LLM known for its impressive text generation and understanding capabilities, with the power of computer vision. This hybrid approach allows the model to analyze and interpret images with remarkable accuracy, then leverage its LLM prowess to provide detailed and insightful descriptions.
How Does It Work?
To harness the power of Phi3.5-Vision-Instruct, you simply need to provide it with an image. The model will then utilize its computer vision algorithms to analyze the image, identifying objects, scenes, and other visual elements. It will then employ the strength of Phi3.5 to generate a comprehensive and informative description, transforming visual data into textual insights.
Example Python Code:
```python
import requests
import json
# Image URL
image_url = "https://example.com/image.jpg"
# API endpoint
api_url = "https://api.microsoft.com/phi-vision-instruct"
# API key (replace with your own key)
api_key = "YOUR_API_KEY"
# Make API request
response = requests.get(api_url, params={
"apiVersion": "1.0",
"image": image_url,
"API_KEY": api_key
})
# Parse response
data = json.loads(response.text)
# Print description
print(data["description"])
```
This code demonstrates how to interact with the Phi3.5-Vision-Instruct API. You need to replace „YOUR_API_KEY” with your actual API key. Once you have the API key, the code sends a request to the API endpoint with the image URL and your API key. The response contains a JSON object with the description of the image.
Future Directions and Challenges:
The future of computer vision and LLMs is brimming with exciting possibilities. As these technologies continue to evolve, we can expect to see even more sophisticated applications emerge. However, several challenges remain:
- Bias and Fairness: Ensuring that computer vision and LLMs are fair and unbiased is crucial to prevent discrimination and promote inclusivity.
- Explainability and Trust: Making these complex models more transparent and understandable is essential for building trust and ensuring responsible use.
- Data Privacy and Security: Protecting the privacy and security of the vast amounts of data required to train and refine these models is paramount.
Despite these challenges, the potential benefits of computer vision and LLMs are immense. By addressing these hurdles, we can unlock the full potential of this powerful duo, ushering in a new era of AI innovation that benefits society as a whole.
Conclusion:
Computer vision and LLMs are two sides of the same coin, each formidable in its own right, and even more powerful when combined. Their potential to transform industries, enhance our lives, and push the boundaries of what’s possible in AI is undeniable.
As we move forward, it’s not just about developing more accurate models or creating more impressive applications. It’s about ensuring that these technologies are developed and deployed ethically, responsibly, and for the benefit of all. By embracing this vision, we can harness the true potential of computer vision and LLMs, shaping a future where technology empowers and enriches our lives.
Concluzii
Modelul AI MiniG este impresionant prin capacitatea de a genera texte de o calitate înaltă care reușesc să păstreze cerințele specificate de utilizatori.
Totuși, cerințele tehnice sunt destul de mari, ceea ce face ca modelul să nu poată fi rulat direct de pe un dispozitiv local uzual.