Saturday, 25 May 2019

Read and process data from URL in python

I am trying to get the data from URL.below is the URL Format.

What I am trying to do
1)read line by line and find if the line contains the desired keyword. 3)If yes then store the previous line's content "GETCONTENT" in a list

<http://www.example.com/XYZ/a-b-c/w#>DONTGETCONTENT    
 a       <http://www.example.com/XYZ/mount/v1#NNNN> , 
<http://www.w3.org/2002/w#Individual> ;
        <http://www.w3.org/2000/01/rdf-schema#label>
                "some content , "some url content ;
        <http://www.example.com/XYZ/log/v1#hasRelation>
                <http://www.example.com/XYZ/data/v1#Change> ;
        <http://www.example.com/XYZ/log/v1#ServicePage>
                <https://dev.org.net/apis/someLabel> ;
        <http://www.example.com/XYZ/log/v1#Description>
                "Some API Content .

<http://www.example.com/XYZ/model/v1#GETBBBBBB>
a       <http://www.w3.org/01/07/w#BBBBBB> ;
        <http://www.w3.org/2000/01/schema#domain>
                <http://www.example.com/XYZ/data/v1#xyz> ;
        <http://www.w3.org/2000/01/schema#label1>
               "some content , "some url content ;
        <http://www.w3.org/2000/01/schema#range>
                <http://www.w3.org/2001/XMLSchema#boolean> ;
       <http://www.example.com/XYZ/log/v1#Description>
            "Some description .

<http://www.example.com/XYZ/datamodel-ee/v1#GETAAAAAA>
 a       <http://www.w3.org/01/07/w#AAAAAA> ;
        <http://www.w3.org/2000/01/schema#domain>
                <http://www.example.com/XYZ/data/v1#Version> ;
        <http://www.w3.org/2000/01/schema#label>
                "some content ;
        <http://www.w3.org/2000/01/schema#range>
            <http://www.example.com/XYZ/data/v1#uuu> .

<http://www.example.com/XYZ/datamodel/v1#GETCCCCCC>
 a       <http://www.w3.org/01/07/w#CCCCCC , 
<http://www.w3.org/2002/07/w#Name> 
        <http://www.w3.org/2000/01/schema#domain>
                <http://www.example.com/XYZ/data/v1#xyz> ;
        <http://www.w3.org/2000/01/schema#label1>
              "some content , "some url content ;
        <http://www.w3.org/2000/01/schema#range>
               <http://www.w3.org/2001/XMLSchema#boolean> ;
        <http://www.example.com/XYZ/log/v1#Description>
               "Some description .

below is the code i tried so far but it is printing all the content of the file

  import re
        def read_from_url():
            try:
                from urllib.request import urlopen
            except ImportError:
                from urllib2 import urlopen
            url_link = "examle.com"
            html = urlopen(url_link)
            previous=None
            for line in html:
                previous=line
                line = re.search(r"^(\s*a\s*)|\#GETBBBBBB|#GETAAAAAA|#GETCCCCCC\b", 
        line.decode('UTF-8'))
                print(previous)
        if __name__ == '__main__':
        read_from_url()

Expected output:

GETBBBBBB , GETAAAAAA , GETCCCCCC 

Thanks in advance!!



from Read and process data from URL in python

No comments:

Post a Comment