XML 파일을 나이스 팬더 데이터 프레임으로 변환하는 방법?

programing

XML 파일을 나이스 팬더 데이터 프레임으로 변환하는 방법?

codeshow 2023. 9. 19. 21:30

XML 파일을 나이스 팬더 데이터 프레임으로 변환하는 방법?

다음과 같은 XML이 있다고 가정해 보겠습니다.

<author type="XXX" language="EN" gender="xx" feature="xx" web="foobar.com">
    <documents count="N">
        <document KEY="e95a9a6c790ecb95e46cf15bee517651" web="www.foo_bar_exmaple.com"><![CDATA[A large text with lots of strings and punctuations symbols [...]
]]>
        </document>
        <document KEY="bc360cfbafc39970587547215162f0db" web="www.foo_bar_exmaple.com"><![CDATA[A large text with lots of strings and punctuations symbols [...]
]]>
        </document>
        <document KEY="19e71144c50a8b9160b3f0955e906fce" web="www.foo_bar_exmaple.com"><![CDATA[A large text with lots of strings and punctuations symbols [...]
]]>
        </document>
        <document KEY="21d4af9021a174f61b884606c74d9e42" web="www.foo_bar_exmaple.com"><![CDATA[A large text with lots of strings and punctuations symbols [...]
]]>
        </document>
    </documents>
</author>

이 XML 파일을 읽고 팬더 DataFrame으로 변환하고자 합니다.

key                                         type     language    feature            web                         data
e95324a9a6c790ecb95e46cf15bE232ee517651      XXX        EN          xx      www.foo_bar_exmaple.com     A large text with lots of strings and punctuations symbols [...]
bc360cfbafc39970587547215162f0db             XXX        EN          xx      www.foo_bar_exmaple.com     A large text with lots of strings and punctuations symbols [...]
19e71144c50a8b9160b3cvdf2324f0955e906fce     XXX        EN          xx      www.foo_bar_exmaple.com     A large text with lots of strings and punctuations symbols [...]
21d4af9021a174f61b8erf284606c74d9e42         XXX        EN          xx      www.foo_bar_exmaple.com     A large text with lots of strings and punctuations symbols [...]

이것이 제가 이미 시도한 것이지만 몇 가지 오류가 발생하고 있으며 아마도 이 작업을 더 효율적으로 수행할 수 있는 방법이 있을 것입니다.

from lxml import objectify
import pandas as pd

path = 'file_path'
xml = objectify.parse(open(path))
root = xml.getroot()
root.getchildren()[0].getchildren()
df = pd.DataFrame(columns=('key','type', 'language', 'feature', 'web', 'data'))

for i in range(0,len(xml)):
    obj = root.getchildren()[i].getchildren()
    row = dict(zip(['key','type', 'language', 'feature', 'web', 'data'], [obj[0].text, obj[1].text]))
    row_s = pd.Series(row)
    row_s.name = i
    df = df.append(row_s)

이 문제에 대해 더 나은 방법을 알려줄 수 있는 사람이 있습니까?

(Python 표준 라이브러리에서) 쉽게 A로 변환할 수 있습니다.pandas.DataFrame. 파일 바꾸기에서 읽을 때 수행할 작업은 다음과 같습니다.xml_data파일 또는 파일 개체의 이름과 함께):

import pandas as pd
import xml.etree.ElementTree as ET
import io

def iter_docs(author):
    author_attr = author.attrib
    for doc in author.iter('document'):
        doc_dict = author_attr.copy()
        doc_dict.update(doc.attrib)
        doc_dict['data'] = doc.text
        yield doc_dict

xml_data = io.StringIO(u'''YOUR XML STRING HERE''')

etree = ET.parse(xml_data) #create an ElementTree object 
doc_df = pd.DataFrame(list(iter_docs(etree.getroot())))

원본 문서에 여러 작성자가 있거나 XML의 루트가 아닌 경우author, 그런 다음 다음 생성기를 추가합니다.

def iter_author(etree):
    for author in etree.iter('author'):
        for row in iter_docs(author):
            yield row

그리고 변화doc_df = pd.DataFrame(list(iter_docs(etree.getroot())))로.doc_df = pd.DataFrame(list(iter_author(etree)))

한 번 보세요.ElementTree 에서 제공하는 자습서xml도서관의 자료

v1.3에서는 다음을 간단히 사용할 수 있습니다.

pandas.read_xml(path_or_file)

xml을 팬더 데이터 프레임으로 변환하는 다른 방법이 있습니다.예를 들어 문자열에서 xml을 파싱하지만 이 논리는 파일을 읽는 것에서도 잘 유지됩니다.

import pandas as pd
import xml.etree.ElementTree as ET

xml_str = '<?xml version="1.0" encoding="utf-8"?>\n<response>\n <head>\n  <code>\n   200\n  </code>\n </head>\n <body>\n  <data id="0" name="All Categories" t="2018052600" tg="1" type="category"/>\n  <data id="13" name="RealEstate.com.au [H]" t="2018052600" tg="1" type="publication"/>\n </body>\n</response>'

etree = ET.fromstring(xml_str)
dfcols = ['id', 'name']
df = pd.DataFrame(columns=dfcols)

for i in etree.iter(tag='data'):
    df = df.append(
        pd.Series([i.get('id'), i.get('name')], index=dfcols),
        ignore_index=True)

df.head()

xml을 사용하여 라이브러리를 구분할 것을 권장합니다.당신의 xml 텍스트를 꽤 잘 처리했고 거의 백만 개의 레코드를 가진 xml 파일을 수집하는 데 사용했습니다.

요소 사전을 만든 다음 데이터 프레임으로 직접 변환하여 변환할 수도 있습니다.

import xml.etree.ElementTree as ET
import pandas as pd

# Contents of test.xml
# <?xml version="1.0" encoding="utf-8"?> <tags>   <row Id="1" TagName="bayesian" Count="4699" ExcerptPostId="20258" WikiPostId="20257" />   <row Id="2" TagName="prior" Count="598" ExcerptPostId="62158" WikiPostId="62157" />   <row Id="3" TagName="elicitation" Count="10" />   <row Id="5" TagName="open-source" Count="16" /> </tags>

root = ET.parse('test.xml').getroot()

tags = {"tags":[]}
for elem in root:
    tag = {}
    tag["Id"] = elem.attrib['Id']
    tag["TagName"] = elem.attrib['TagName']
    tag["Count"] = elem.attrib['Count']
    tags["tags"]. append(tag)

df_users = pd.DataFrame(tags["tags"])
df_users.head()

언급URL : https://stackoverflow.com/questions/28259301/how-to-convert-an-xml-file-to-nice-pandas-dataframe

'programing' 카테고리의 다른 글

How to get PowerShell to keep a command window open? (0)	2023.09.19
드롭다운 목록 변경 시 Ajax.BeginForm이 전체 페이지를 바꿉니다. (0)	2023.09.19
도커가 내 파일의 압축을 풀지 않음 (0)	2023.09.19
"untyped"는 학문적 CS 세계에서 "동적으로 typed"를 의미하기도 합니까? (0)	2023.09.19
문자열로 변환하지 않고 데이터베이스 열을 서명되지 않은 상태에서 이진 상태로 변경 (0)	2023.09.19

현재글XML 파일을 나이스 팬더 데이터 프레임으로 변환하는 방법?

각종 프로그래밍 정보를 다루는 블로그입니다.

JSON, MySQL, SWiFT, jQuery, reactjs, spring-boot, Excel, PowerShell, Ajax, mariaDB, Spring, oracle, AngularJS, bash, Android, C, Python, WordPress, CSS, sql-server,

Today :
Yesterday :

codeshow

XML 파일을 나이스 팬더 데이터 프레임으로 변환하는 방법?

XML 파일을 나이스 팬더 데이터 프레임으로 변환하는 방법?

'programing' 카테고리의 다른 글

'programing'의 다른글

티스토리툴바

« 2026/07 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

XML 파일을 나이스 팬더 데이터 프레임으로 변환하는 방법?

XML 파일을 나이스 팬더 데이터 프레임으로 변환하는 방법?

'programing' 카테고리의 다른 글

'programing'의 다른글

관련글

티스토리툴바