Info Tutorial: python pandas data analysis library quickstart introduction 1 read data pandas IO 2 database style merge inner join indexing on date time 3 write data

Sunday, October 30, 2016

python pandas data analysis library quickstart introduction 1 read data pandas IO 2 database style merge inner join indexing on date time 3 write data

I discovered python pandas recently.
Using to read in ifconfig logs and add up total network traffic (VM network) across multiple hosts.
I needed to combine files from multiple hosts, sort values and combine by date and time.

I was sorting and parsing into python dicts.
I looked at perl PDL and other data manipulation libs.
But looks like python pandas wins because it combines VERY FLEXIBLE file IO with a lot of methods for selecting rows/columns and manipulating the data.

I made a simple test to make sure I knew how it all worked and described it here:
http://stackoverflow.com/questions/19222043/parse-two-files-and-merge-lines-if-time-stamp-matches/33100756#33100756

A flexible and general way of manipulating data is python pandas. Worth mentioning here as it really is the right tool for the job. Allows spreadsheet or database style merges/joins/concats on selected index rows or columns.

Two example files to illustrate how it works

$ cat File1
date0,time0,data01,data02,data03
date1,time1,data11,data12,data13
date2,time2,data21,data22,data23
date3,time3,data31,data32,data33
date4,time4,data41,data42,data43
date5,time5,data51,data52,data53
$ cat File2
date1,time1,data14
date4,time4,data44
date2,time2,data24

Run python . . .

Use pandas read_csv to slurp in files in pandas table structure. (read_csv is very clever and can read in many formats not just csv)
Use pandas merge to do inner(intersection of indices) join, using date+time as indices (index list=[0,1]).
Use pandas to_csv to write output.

THE IMPORTANT BIT:

$ python
>>> from pandas import merge, read_csv
>>> f1=read_csv("File1",header=None)
>>> f2=read_csv("File2",header=None)
>>> merged = merge(f1, f2, how=inner, left_on=[0,1], right_on=[0,1])
>>> merged.to_csv("Out", na_rep=0, index=False, header=False)
>>> [Ctrl-D]

Job done!

$ cat Out
date1,time1,data11,data12,data13,data14
date2,time2,data21,data22,data23,data24
date4,time4,data41,data42,data43,data44

Easy as 1, 2, 3.

1. read data (pandas IO)

2. database style merge (inner join indexing on date+time)

3. write data.

VERY clean, no messing. I really do love bash/grep/sed/awk also perl and python manipulating data in structures BUT right tool for the job makes the job much easier and gives much more potential for use of the data.

Breakdown:

1. read_csv A bog-standard(plain, unadorned) read_csv("File1") treats first line as header names. So we use header=None.

>>> f1=read_csv("File1")
>>> f1
date0 time0 data01 data02 data03
0 date1 time1 data11 data12 data13
1 date2 time2 data21 data22 data23
2 date3 time3 data31 data32 data33
3 date4 time4 data41 data42 data43
4 date5 time5 data51 data52 data53
>>> f1=read_csv("File1",header=None)
>>> f1
0 1 2 3 4
0 date0 time0 data01 data02 data03
1 date1 time1 data11 data12 data13
2 date2 time2 data21 data22 data23
3 date3 time3 data31 data32 data33
4 date4 time4 data41 data42 data43
5 date5 time5 data51 data52 data53
>>> f2=read_csv("File2",header=None)

pandas DataFrame describe() gives a useful summary especially for big tables. For numeric data you also get total, max, min, mean, e.t.c.

>>> f1.describe()
0 1Go to link Download






Posted by



Unknown




at

6:59 PM















Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest




Newer Post


Older Post

Home




















Blog Archive








        ► 
      



2017

(645)





        ► 
      



June

(105)







        ► 
      



May

(118)







        ► 
      



April

(112)







        ► 
      



March

(125)







        ► 
      



February

(100)







        ► 
      



January

(85)









        ▼ 
      



2016

(342)





        ► 
      



December

(99)







        ► 
      



November

(103)







        ▼ 
      



October

(105)

PCBoost 4 8 1 2011 Efficiently manage the resource...
Price Ducati 2015 Coolest New Luxury
Platinum Sudoku 2 Gameloft Mobile Games
python pandas data analysis library quickstart int...
Portrait Painting
Pesan Ballmer untuk Pimpinan Microsoft Tahun Depan
Playground survives Halloween Younger residents re...
Prestasi Kejari Surabaya 2012 Terpuruk
Print My Fonts 1 5
Play and Free Download Crusade and Record Walkthro...
Referensi Alat Ukur Elektronika Beserta fungsinya
R Studio 5 3 Build 132 965 Corporate Edition Serial
R Wipe Clean 8 0 1458
Partial Annular Solar Eclipse Photo Opportunity
Progress In Apple Vs FBI DOJ NY Judge Backs Apple ...
Penyebab Kegagalan Install Ulang Windows
Penjelasan Lengkap Tentang Organisasi File Pada Linux
Password ေတြ strong ျဖစ္ၾကရေလေအာင္
Photoshop බ්‍රෂ් Ultimate Grunge Brushes
Porsche 929 Design Student Work Julliana Cho
recover Fionns mobile from badness
Printer Epson TX121X Lampu Ngeblink
Penyebab PLN Sering Mati Candaan
PROPOSAL IT MAINTENANCE
Red Effect for Photos Photoshop Tutorials
Play and Free Download Snail Bob 2 Flash Game
Panduan Service Monitor BLOK VERTIKAL
Panduan Cara Menulis Novel Bagi Pemula
RAMUAN HERBAL UNTUK PENYAKIT MALARIA
Photo Editor Pro App APK Free Download For Android
pulse Music Audio Radio Template Music and Bands d...
Pinoy BB5 Unlocker version 5 0 by bryantfurry
Pelajaran TIK Komputer dan Latihan Soal Teknologi ...
Reception fun
PS4P Part 2 Review
Photozoom Pro 4 Serial Number
PARTE 3 COMO FAZER UM CSS STYLE PARA ANIMESPIRIT S...
Poof Vs The Cursed Kitty Crack
Portable ClamWin 0 96 2 Rev 1
Pekanbaru Komputer
Rally Against Fracking in the NWT
Photoshop Cs4 Portable แบบพกพา ไม่ต้องลงใช้ได้เลย
PASAR WISATA DAN KULINER SURAT PESANAN PERJANJIAN ...
PLDT MY DSL ADMINISTRATOR PASSWORD AND LOGIN CHANGING
Peek Cool Sport Cars 2017 Yamaha Will Release
Perlukah HH kita di Root
PC Cleaning Utility 3 0 5 Multilanguage Full Patch
PORTSENTRY SEBAGAI PINTU KEAMANAN JARINGAN ANDA DA...
Pengertian Sistem Operasi Operating System
Premium Wallpapers HD 2 4
PHTOSHOP CS5 Xóa các vết nóng trên ảnh chân dung
Papervision3D GreatWhite MD2 Animation
Photoshop Splatter dispersion photomanipulation Tu...
Programsbase
PhotoInstrument 7 5 Build 870 Multilingual Full Ve...
RECUVA Recovery Tool
Pengertian BIOS
Pengenalan Komputer Untuk Anak dan Pemula
Photoshop Sneak Peek Image Deblurring
Printer Canon BJ100
Puppet Warp and Animation in Photoshop
PotPlayer Terbaru 1 6 56815
Quê Hương Tôi
PUISI Bukan Berarti Aku Adalah Kamu
Proses Proses Dalam Akuntansi Bag 2
PayPal Account Types
Panduan Memosting di Blogger com
Pengalaman menggunakan Indovision Anywhere
Play Station Vita Preferred Choice Japanese Than W...
Printer Canon MP 145 Error E 3
Puisi Patah Hati Terbaru
PARTE 1 COMO FAZER UM CSS STYLE PARA ANIMESPIRIT S...
RAR Password Unlocker Free Download Full Version
Rayakan Malam Minggu Ini Bersama Meteor Orionids
Play Old Cannon Game Online and Free Download and ...
Pato Donald 219 Editora Morumbi 1989
Photorealist Tryout Carson Grubaugh
Play and Download Ron Paul Road to REVOLution Vide...
Photoshop අමතර පාඩම් 1 Moon Shine Text Effect
PUISI Tuan Bolehkah Aku Singgah Sejenak
Penyebab Rusaknya Motherboard pada Laptop serta SO...
Printhead T30 Head C110 Printer Mainboard T30 Boar...
Prana Hello World Prana Minimalist
Recent Magic the Gathering Art
Personalize Your Website
Qoppa PDF Studio Pro 10 4 1 Full Serial Key
Reformatting string into tokens with quotes
PENYEBAB MENDAPAT BANNED GOOGLE ADSENSE
Phoenix Service Software Cracked 2016 Free Downloa...
Pts Stationery Responsive Prestashop 1 6 Theme
Quran 51 56 and 1 5
Radar Bluetooth 2 2 for Computer Bluetooth
Problem After resetting Canon Pixma iP1700
Quicken Starter Edition 2016 For Windows Free Down...
Panduan Service Monitor BLOK DIAGRAM MONITOR
Pilih Optimasi SEO atau pasang Iklan
RAM Manager Pro v8 0 9 APK For Android
PC Crafter is CLOSED!!
Paper Jump pada Canon IP2770
Premiere Pro 5h De Formation








        ► 
      



September

(35)



























Ethereal theme. Powered by Blogger.